3.2.2 The Grammar and the Lexicon
The corpus generator used for CLASPnet is based on a traditional context-free grammar (Note 12): there is an initial symbol, a set of non-terminal symbols, a set of terminal symbols, and a collection of rewriting rules which turns non-terminal symbols (the left-hand side) into either more non-terminal symbols or into a non-terminal symbol (the right-hand side). As is customary, the initial symbol is the S of 'sentence'. Figure 3.3 shows a few of the 976 rewriting rules of the grammar (see Appendix A.1 for the full listing):
s -> s1 s -> s1 san s1 san -> and san -> or s1 -> np1_s vpp period vpp -> vpt vpt -> vt np_o vpp -> aux vpi vpi -> vi np1_s -> detp1_hum np1_s -> prsubj detp1_hum -> det adj n prsubj -> i prsubj -> he nl -> pilots aux -> can adj -> interesting nl2 -> cats det -> the vt -> chase vi -> sleep np_o -> detp_ani detp_ani -> plu nl2 plu -> the period -> .
This particular extract of the grammar can generate sentences like: 'I sleep .', 'the interesting pilots can chase the cats .', and 'I sleep . and the pilots sleep .' The complete grammar produces a far greater variety of sentences: e.g. 'in the barn , the limited numbers of happy teachers , that he can not love , shall smile .', ' a mixture of women be being caught by a dog .', 'we can not ask what cat can the wolves like .', 'whom shall the numbers of friendly girls never like near the boat ? and who will never love you near the roof ?', 'she shall promise her to chase a dog or the fish .', 'you ought never to be a nurse .', 'be young !', 'the great collection of young divers shall not wash a huge number of eagles .', 'where will the very hungry expert we chase paint ?', 'the group of eagles shall not tease the pilots or the girls .', 'a pilot do not wash a great teacher in a sled . and he can not flee near the house .', 'I can not inquire whom shall be missed behind a yacht . and the lovely student will inquire if a cow have been the right scientist of the united kingdom .'
A few comments about these sentences are in order:
Despite their numerous defects, the sentences can still be understood without too many problems. (When judging their quality, one should also compare them to the type of sentences which have been used in other connectionist projects: e.g. 'boy who chase boy chase boy', or 'school-girl stirred kool-aid with spoon'.)
In conclusion, I will give some statistical data about the grammar and the lexicon: in the grammar, there are about 150 rules for nominal phrases, and 400 rules which describe clauses (slightly more than half of these describe active clauses); the lexicon has about 14 adjectives, 5 adverbs, 4 auxiliaries, 9 connectors, 6 determiners, 55 nouns (not counting plurals), 2 negation morphemes, 16 pronouns, and 56 verbs (not counting derived forms).
There have already been a few attempts to generate sentences using connectionist networks rather than phrase-structure grammars (e.g. Kalita & Shastri 1994; Van der Velde 1995). None of these models, however, comes close to generating the various types of clauses and sentences that are used in CLASPnet.
Here's a sample of what can be produced when the sentence length restriction is disabled: 'what child can not hit a number of cats behind the yacht ? and where shall a aggressive cow , who will not see a expert , whom the woman and a woman can never miss , , chase the large groups of happy horses near the roof ?'.