In a previous document, we tried to describe the relation between a word and a set of definitions. The more we consider the problem, the more we understand that a word is much more complex entity than we initially suspected.
In order to develop our translation engine, we must begin to develop some ideas of the structures within words. We do not need to implement everything right away, but we must always be mindful that futures modules will be grafted upon our product.
At this point, our toolbox will only do a word-by-word translation, but will not attempt to address syntax. The proper meaning of a word does not emerge until it is contextualized within a full sentence. Although we will not attempt to address this yet, we must design our software to prepare for future sentence-level translations.
In this document, we will attempt to map out two fundamental structures of words: lexical and grammatical. We are unsure exactly how these two aspects should relate to each other. We do not know how ambitious we should be in attempting to map one onto the other
Our knowledge of these structures is limited to the languages we know and the grammar books we have on hand. Undoubtedly, we are forgetting to list some structures. Furthermore, we are as yet incapable of defining structures in languages we do not know, such as Chinese or Finnish
We are further unsure about what precisely we are designing in this document. Is this an Object design map? Is this an XML definition? We have read about XML, but never used it, so we are not too comfortable with it yet. We hope that our programmer friends will help clarify for us the definition between objects and XML
In our previous document, we sketched out the following one-to-many relationship between words and definitions:
WORD DEFINITIONS ---- ------------ desire quiero ---------------> want wish
We immediately recognized however, that a word is actually a composite of prefixes, roots, and suffixes.
Rather than considering an entire word at once, perhaps we should consider a word as a set of "grams."
Thus, the Spanish word "damelo" would consist of the following three grams: "da", "me", and "lo".
Each one of these grams would have a one-to-many relationship with definitions
WORD INFINITIVE DEFINITION_LIST ----- ------------ ---------------- da -------> dar -----> to give to strike to emit
WORD DEFINITIONS ---- ------------ me me ---------------> to me
WORD DEFINITIONS ---- ------------ it lo ---------------> him
As we said above, we know that the a full grammatical understanding of a word only becomes apparent in a sentence. Also, we don't know how this grammatical structure maps onto a lexical understanding of a word as a set of grams. Also, we don't know if this is an Object design or an XML design.
In any case, we feel it might be useful to start sketching out different grammatical properties of different types of words. This is not a complete list by any means
Here are some structures basic to nouns. We still need to consider to what degree these structures apply to proper nouns and to pronouns. We have not fully considered how the whole network of pronouns best fits into this diagram.
Here are some structures basic to verbs. We doubt that we have mapped out all of the structures of a verb or even that we have necessarily categorized everything correctly. This is only a preliminary attemp
We are almost certain that we need to rearrange and further subdivide this category
We have undoubtedly not considered all the grammatical cases. Although we can and will develop this map, we must always preserve an "Other" category for cases we have not yet considered