[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Elidable terminators
Date: Wed, 17 Apr 91 16:37:28 -0700
From: Jeff Prothero <jsp@milton.u.washington.edu>
jp> = Jeff Prothero
gs> = Guy Steele Jr.
db> = Dave Bowen
gs>Indeed. Moreover, it may even be the case that the language actually
gs>is LR(1), that is, there does exist an LR(1) grammar that specifies
gs>it but such a grammar has not been found yet. Part of my point is
gs>that the attempt to find an unambiguous LR(1) grammar may yield new
gs>insights about the grammatical structure of the language.
<C example deleted>
gs>om a contained statement to its container did not
gs>become clear to me until it was forced upon me by the exercise of
gs>constructing the grammar. We might find lojban constructs also
gs>falling into two general categories: those after which (before
gs>which?) elision is permissible, and all others. The nature of that
gs>dichotomy is worth exploring.
I don't understand this comment at all. (If it were someone less
illustrious than Guy, I'd say something stronger. :)
You are too gracious, which is kind of you. But never accept a proof
by intimidation! (I am about to disagree with you, but don't accept
it without thinking about it!)
Without dragging
out my attempt to make this formal, if it is even still around, the
elidability of a particular ET in a particular utterance depends
entirely on the particular sentential construct following the ET --
can it be mistaken for a continuation of the current construct, or
not? (To make it formal, you have to decide things like how much
lookahead you are going to allow in making this decision...)
So far I agree.
This is a purely contextual computation, and trying to perform it with
context-free-grammars is an abuse of the formalism.
But this I will label "balderdash". Carried to an extreme, this
notion states that *any* word can come at *any* point in a sentence,
because to take an adjacent word into consideration would make the
decision "context sensitive". The term "context-free" has a very
particular technical meaning, namely that in generating a string from
a grammar a production may be applied to any nonterminal at any time
without regard to what is adjacent to that nonterminal. But what we
would *loosely* call "contextual considerations" are enforced by the
structure of individual productions. For if not, how could we
guarantee that a preposition precedes what it governs, or that
ELSE must follow an IF?
Yes, you can do
it -- if construct M in the grammar can be followed by constructs X, Y
and Z, you can split M into three variants M-followed-byX,
M-followed-by-Y, and M-followed-by-Z, and you may be able to show that
in all of these cases, the "elidable" terminator is either clearly
required or clearly forbidden. But the exercise serves no particular
purpose that I can see
It serves precisely the purpose of distinguishing *formally* the cases
in which elision is permitted. Indeed, it may be useful to understand
that Y and Z are the cases that permit elision, whereas X does not.
It's one thing to eyeball a simplfied grammar and say to yourself,
"Well, eliding a terminator here doesn't seem to cause any problems"
and it is quite another to create a complete grammar and have a parser-
generating tool confirm that the resulting grammar is indeed unambiguous
(and perhaps LR(n)).
, and leads to an exponential increase in the
size of the grammar.
Fixing the IF-ELSE problem in C increases its grammar by 5% or less.
If in fact the rules of lojban cause an exponential increase in
complexity, that may be worth knowing, too. But I doubt that it does.
Maybe Guy would like to compute the approximate
number of rules the final grammar would have? ;-)
I'm not sure I know what is the latest thing, or have the latest copy.
But if you will e-mail to me the latest version, I'll take a look at it.
DB> That's certainly one way of dealing with the issue of elibable
DB> terminators. Guy's comments raised the question, at least for me, of
DB> whether it is the best way. Is it possible to create a grammar for
DB> Lojban which has the terminator as an optional production in those places
DB> and only those places where it can safely be elided? Or are we stuck
DB> with the <error> kludge, even if it is safe? If the answer to this first
DB> question is, "We don't know.", then my question is what would be a
DB> reasonable way to investigate this question, given the lack of tools
DB> for hacking on grammars? Sorry for all the quotes, but I wanted to
DB> ensure people understood the basis for my questions.
A reasonable question, Dave, but I think you have a false dichotomy
here. After a decade, things will be clearer... :)
The first approach I tried was the error kludge.
The second and third (PLoP) were variants on topdown backtracking parsing,
which got beat to death by the exponential explosions.
The fourth (FLoP) was a bottom-up backtracker -- i.e., a regular
LALR(1) engine, but with the ambiguities left in the parser tables,
and the stackmachine hacked to backtrack at ambiguities. This gets
most of the speed of a regular LALR(1) parser (since most tokens are
unambiguous, and the shift/reduce can be done with no backtracking
needed) while allowing the non-LALR(1) (and non-ET) parts of the
grammar to be handled without special preparsing hacks.
Elidable terminators are handled by establishing a budget for ETs --
first, try to parse with no ETs inserted, then with only 1, then with
only two, etc. A fair amount of backtracking, but not intolerable,
given that you only have to parse one connected sentence at a time.
Last time I seriously thought this through, this seemed the only
approach that is simple and clearly gives you the parse you want.
Again, I urge you: If you want to tackle this, start by forgetting
YACC, sit down and formally specify what elidable terminators are
supposed to do -- what strings are you adding to the language and what
are their parses? This is not as obvious as it may seem!
Exactly! What I am proposing is that producing a context-free grammar,
or at least attempting to do so, may be a particularly productive
formal method.
When you
have this crystal clear, *then* start thinking about implementations
which will produce the result you have specified, or an acceptable
approximation.
--Guy