[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Elision, or: Nick rides again in jbonai
Yes, you were being so snappy and irritable that you completely
missed the point of my relatively pointless postscript: you
overlooked the seventh sentence. Go back and look at it again.
^ The elidable terminators make the language unambiguous, but may often be
^ ^^ ????????
^Did you mean "ambiguous"?
No he does not. Whatever do you mean, Guy?
Ah, now I understood what was meant: the modifier "elidable" was meant
purely for identificational purposes rather than for purposes of
characterization. I mistook the thought to be:
If the language didn't have elidable terminators, it would
be ambiguous; the presence of elidable terminators (as opposed
to any other kind) somehow makes the language unambiguous.
This seemed to me to state a falsehood, and I wanted to correct it to:
If the language didn't have elidable terminators, it would
be unambiguous; the presence of elidable terminators (as opposed
to any other kind) somehow makes the language ambiguous.
which has the ring of truth. But the thought apparently intended was actually:
The terminators in question, namely those that happen to be
elidable (that's a useful way for me to identify them to you),
are required in the language to avoid ambiguity, though in
practice they may be omitted in many places.
So the construction in that English sentence was itself ambiguous.
(Do you remember that Saturday Night Live sketch with Ed Asner as
supervisor of a nuclear power plant, about to go on vacation?
His parting words were, "Remember: you can't give the reactor
too much coolant!" Those left behind spent the rest of the sketch
discussing whether me meant that they mustn't exceed a given
threshold or that there was no such threshold.)
^Hm. It seems to me that if the "official" grammar allows such elision
^in practice, then it behooves the language definers to produce a more
^elaborate grammar that takes this into account, if it can be done using
^a context-free grammar. But if the resulting grammar is context-sensitive,
^then allowing such elision may be a bad idea in the first place.
Think again. In JL13, lojbab's YACC has no problem in filling in the
missing terminals.
Success with YACC is a lousy existence proof for having done the job right.
Consider C, the original application for YACC: it has the classic dangling-ELSE
problem:
if (x > 0)
if (y > z) y = 3;
else x = 4;
The dangling-ELSE ambiguity is resolved in practice by using an ambiguous
grammar, as given in Kernighan and Ritchie's book (The C Programming
Language, both editions), and then using a piece of code in the semantic
productions that provides a context-sensitive patch: ELSE goes with the
innermost eligible IF. So it causes the misindented example given above to
be interpreted as
if (x > 0) {
if (y > z) y = 3;
else x = 4;
}
and not as
if (x > 0) {
if (y > z) y = 3;
}
else x = 4;
(I must stress that this patch is *not* part of the grammar proper.
It's a piece of C code.)
>From this point of view, the ability to elide the disambiguating
braces in C is exactly analogous to being able to elide terminators
in lojban.
But that is not the only way to deal with the problem. It is not
difficult to eliminate the dangling-ELSE ambiguity explicitly in the
grammar. Instead of writing:
statement:
expression ;
break ;
continue ;
{ declaration-list/opt statement-list/opt }
if ( expression) statement
if ( expression ) statement else statement
while ( expression ) statement
do statement while ( expression ) ;
...
you write something like this:
statement:
dangling-statement
non-dangling-statement
primitive-statement:
expression ;
break ;
continue ;
{ declaration-list/opt statement-list/opt }
do statement while ( expression ) ;
non-dangling-statement:
primitive-statement
if ( expression ) non-dangling-statement else non-dangling-statement
while ( expression ) non-dangling-statement
...
dangling-statement:
if ( expression ) statement
if ( expression ) non-dangling-statement else dangling-statement
while ( expression ) dangling-statement
...
The idea is that "primitive statement", which do not have trailing embedded
statements, are the base case of non-dangling-ness, whereas an IF statement
without an ELSE is the source of dangling-danger. Statements that have
trailing embedded statements, such as WHILE, have the same danglingosity as
the embedded statement. (This means that such statements must appear in the
grammar in both dangling and non-dangling forms.) Only a non-dangling
statement may appear before an ELSE: this is what eliminates the ambiguity.
Thus, with this grammar it is impossible for my example to be misinterpreted,
and no special-case code is needed to ensure that there is only one parse.
I am proposing that it is better for some purposes, if feasible, to produce a
more complex grammar for lojban that would eliminate the need for
context-sensitive side-conditions that dictate when elision is permissible.
The C grammar has another bad property that cannot be eliminated with
a more complex grammar: TYPEDEF symbols. It is impossible to determine
the meaning of the statement
printf(x);
without knowing whether or not it has been preceded by
typedef int printf;
if it has, then printf(x); is a declaration of x as an integer,
not a statement at all. This is also taken care of by parse-time
semantic code. (Imagine a variant of lojban in which you could
declare "little words" on the fly!)
typedef at John;
I gave John the office. /* This means "I gave at the office." */
As for what the grammar allows or disallows - my God,
have you actually ever written an lojban sentence?! There is no need
to dot every i and cross every chicken across the road.
I agree. But I am not saying that the sentences generated by the grammar
should be explicit; I am saying the grammar itself should be stated in
an explicit form, rather than having parts of it written in C.
A language in
which I'd have to put in {ku} after every single damn sumti is a language
I would not stick around in. In {le klama le seklama cu klama}, it is
fairly obvious to me (and I don't really think it needs codifying)
Maybe it's obvious to you, but it may not be obvious to a lot of your friends
(some of whom might be computers :-). So here is where we disagree: I feel
*very strongly* that it does deserve codifying in BNF (as opposed to
code-ifying in C!).
that
{le klama} and {le se klama} are two distinct sumtis, and that starting
a new sumti with {le} means the old sumti must have finished (elided {ku}).
I don't know enough about parsing to tell whether the BNF handles such
elisions as well as did the YACC: JC's presentation of what should be
obvious was admittedly handwaving a bit. But people have no problem with
elision, and machines have no major problem with it either, so where's
the problem? However handwaving rule 10 was (and all you've got to do
is read five lines of lojban to realise that it doesn't really matter),
JC's BNF is cool. It's a pleasure to be actually able to check through
a structure's validity in half a minute.
I don't want to belittle the value of partial grammars. But using an
incomplete grammar (such as the one in the back of K&R) doesn't tell you that
a structure is valid. It may tell you that it is *invalid*, and if it does
so you will have found out quickly, which is a useful thing. But conforming
to an incomplete grammar does not guarantee correctness.
The analogy: my "bad" C example will be considered valid by the K&R grammar,
but misparsed. Similarly, if I incorrectly omit a lojban terminator, the
existing grammar may consider the result valid, but it will be misparsed.
For purposes of formal rigor and completeness, an unambiguous context-free
grammar is desirable, even though it may be larger than what is useful for
tutorial purposes.
--Guy Steele