[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ANNOUNCEMENT: English-Lojban 15000+ entry dictionary draft avail on ftp
Following is the document header for the first published draft of LLG's
official English to Lojban dictionary. It is very incomplete and very
drafty, but contains 15604 entries and takes up almost 2.4 Meg unzipped
and 600K zipped. Both zipped and unzipped versions may be found on the
ftp server: ftp.cs.yale.edu directory pub/lojban/draft/dictionary
The file names are ENGDICT.GIS and ENGDICT.ZIP.
I am also uploading an incremental change to the gismu list. There have
been a few incidental changes since the upload of 2 weeks ago, but I
wanted the version to be the same as that used in the dictionary.
This is in filenames GISMU.LIS and LOGDATA.RAW (identical files) in
the pub/lojban directory.
I am also publishing the place structure keyword list developed mostly
by Colin Fine and Nora. This is in file OBLIQUE.KEY in the pub/lojban
directory.
Nick's current draft lujvo list of 3800 words, which is not yet
incorporated or official, is in the pub/lojban/incoming directory, file
JVOSTE3. His list does not include the automatically generated lujvo
based on SE conversions of the gismu list, another 2200 words.
Comments are of course very welcome, though I would appreciate they be
clearly distinguished between typos, formatting issues, and technical
issues.
I have promised a dictionary for 7 years now. In July at Logfest, I
promised a draft dictionary in September (this year), and for once I
have made such a deadline, though the draft is rather less than I had
hoped to have done before publishing. This dictionary is not complete,
but it is already far better than what I thought could be accomplished
when I started. It is in good enough shape that I think it becomes the
most usable and complete English-order Lojban word list yet produced and
so I can commend it into the covetous hands (well, computers at least)
of the community.
Thanks go out to all of you who have supported us over the years.
Acknowledgements are due to too many people. A few are named in the
header below, whose impact was most felt on the current drive.
There is a lot more to come in the near future, but this will hopefully
serve to get people going on using the language.
It will also hopefully inspire some of you into donations to the cause.
Our finances could use a boost to break even, and publishing costs have
yet to be budgeted.
.o'acaise'i.uoru'e
Enjoy!
----
lojbab lojbab@access.digex.net
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
For the artificial language Loglan/Lojban, see ftp.cs.yale.edu /pub/lojban
or see Lojban WWW Server: href="http://xiron.pc.helsinki.fi/lojban/"
==============================================================================
English/Lojban Dictionary - First draft official publication 26 September 1994
Copyright 1994, The Logical Language Group, Inc.
Bob LeChevalier, President lojbab@access.digex.net
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Posted for comment and reference only. Please do not redistribute
without verifying that this is the latest version available. This
version is far from final form and may contain duplicates, typos, and
more serious errors.
What this document IS:
This is a first draft of what will eventually be a much larger
English/Lojban dictionary file.
It contains 15604 English entries, one entry per line.
It contains all English entries so-far derived from the official version
of the gismu list being published simultaneously (8391). This includes
some number of references to lujvo embedded in the published gismu list.
These entries are marked with a "*" in column 1.
It contains all English entries derived from a place structure keyword
list (6828) prepared by Colin Fine and by Nora and Bob LeChevalier and
being published simultaneously with this document. These are in a
simpler, column-based, more abbreviated form than the gismu list
entries, and stand out strongly in the list due to white space. These
entries have NOT been weeded at all, and may contain useless
information; their purpose is to ensure the most complete possible
coverage of English semantic space, and they thus cannot be properly
weeded until other English-order words have been completely added.
These entries are marked with a "&" in column 1.
It contains a number of entries (140) in the same form as gismu list
entries, but for lujvo derived from abstractions and other simple
transformations of the corresponding gismu, such that the place
structure closely resembles the original gismu. These entries are
marked with a "%" in column 1.
It contains a small number (246) of miscellaneous and cross-reference
entries proposing metaphors for lujvo, and indicating other
English-entries where relevant meanings may be found. These were added
ad hoc where it was recognized that the semantic coverage of the entries
omitted important meanings for the English keyword. Because they are ad
hoc, they may be erroneous or misleading, but they may fill a user's
need for a meaning where the keyword does not appear in the gismu list
text with semantics corresponding to your desired meaning. These
entries are marked with a "@" in column 1.
The above lists are merged together and sorted alphabetically. Use
column 1 sorting to separate them if desired.
What this document is NOT
It is unformatted, with the intent of making it usable on a computer
without an appropriate word processor/database program, but it contains
marks and guides to aid LLG in later formatting. Layout was based on
final formatting needs, and as a result there are some clumsinesses
associated with the file when limited to computer use. Most significant
is that the gismu list-based entries use columns extending out as far as
around 900 with no wrapping, and the Lojban gismu from which the line
was derived appears at the RIGHTMOST end of that line. However the
English word may not be best translated by the gismu, as when it is
derived from a lujvo proposal incorporated in the gismu list. To find
these, you need to examine all occurrences of the English keyword in the
entry. These occurrences are marked with the | symbol (which is easier
to spot on a computer screen than the ~ character that will be used in
the formatted document).
This document is not complete!
- There are several thousand entries to be generated from approximately
6000 lujvo in the files generated by Nick Nicholas, John Cowan, and Bob
LeChevalier. This list may be up to 5 times (or more) the length of the
current file. There were 20000 and 40000 entries in the raw list
generated in two different ways by John Cowan's key-word-in-context
(KWIC) program from the 3000 gismu in the list back in July, and we
aren't sure which will be used. Nick Nicholas has several hundred lujvo
that have been used in Lojban text and discussion since the start of
1994, that have not been added to his list. Whether they are included
will depend on his available time in the next couple of months.
- There are a large but unknown number of entries to be derived from the
cmavo list. There are 6500 such entries automatically generated by John
Cowan's key-word-in-context (KWIC) program. A random survey indicated
that more than half of these will be deleted before being incorporated
in this list.
- There are an unknown number of entries from the gismu list yet to be
generated. These are associated with a small number of words that occur
with frequency greater than 20 in the text of the gismu list as of last
July. The list of such words is given below, with the number of
occurrences in the gismu list. Some (like "is" and "of" will obviously
generate very few or no English entries, while others may result in
several entries. An ! indicates some words that seem especially likely
to generate a lot of entries. You will find few if any entries in the
gismu-list-derived format (* lines) for these words, but you may find
entries based on the @ or & line formats.
1491 is
919 of
865 a
438 in
338 to
305 also
291 the
267 by
224 for
173 from
166 with
126 an
121 at
111 not
110!property
109!quantity
107!material
102 be
99 or
95!aspect
95!standard
91!made
91 on
91!species
86 and
81!body
81!object
79 about
73!part
66!contains
66 may
65!culture
59 as
59!form
56!under
52!reflects
51!conditions
50 that
49!metaphor
48!nationality
46!event
46!source
45!dimension
44!shape
42 are
40!composition
39 including
39!subject
38!language
37!breed
35!location
33!commodity
33!state
32 non
31 this
30 into
30!type
30 which
29!strain
28!function
28!person
27!set
26 but
26!need
26!specific
25!audience
25!frame
25 has
25!necessarily
25!purpose
25!reference
25!surface
24 agentive
24!action
24!locus
23!time
22!among
22!place
22!over
22!system
21!direction
21!activity
21!process
21!force
20!properties
20!objects
20!means
20!point
20!used
20!tool
20!sumti
This document is very "drafty". It has not been spell-checked, nor have
duplicates been weeded (and there are expected to be a lot of duplicates
in the file, especially due to overlaps between & lines and * lines).
This document is subject to change/replacement/removal at any time
without notice. Specifically, ANY change in a gismu place structure at
this point will require some amount of work, and there are open
technical issues that could affect a number place structures (but we
hope they will not).