|
|
3.3.4.1 An Abundance of Words
In a first experiment I increased the length of the input sentence to more than 20 words -- i.e. the maximum sentence length which the network has seen during training. No major problems arose with sentences of 30 or 40 words, as long as embedded clauses were avoided. The final such sentence contained more than 50 words and read: 'the more the interesting sets of very very happy women ask the large collections of very very sad men to chase the sharks near the barn, the less the group of very very aggressive children can announce to the very very moody expert that she will be the best president for the federal republic of germany.', admittedly not a sentence likely to be uttered in real life any time soon. The network made one clear error while parsing this sentence: when it saw the second 'collections', the high activation value of the Status unit suddenly vanished as if a subordinate clause had started there (but the ID units continued to show the correct value). The Status unit then remained off until 'the less' was processed; at that moment it correctly jumped to a value close to 1 again and stayed there until the complement 'that'-clause started. The Mood units also correctly showed the entire sentence to be indicative.
The next sentence was aimed at testing how well the network does on long tail-recursive strings: 'the man be worshiping the woman, who miss the experts, who like the tigers, that devour the cows, which tease the cow, which sleep.'. The word by word analysis shows that the different relative clauses are processed correctly. The activation values of the Voice and Polarity units drop to medium values at the beginning of each clause before increasing again as it becomes clear that the clauses are active and positive. The ID units, however, did not behave as desired: the fourth, fifth and sixth clauses of the sentence were not distinguished from the third one. As has been mentioned before, this is probably because the training corpus does not contain a single sentence with four clauses, let alone five or six.
As the previous sentences had been unable to show the limits of the memory of the network, I tried a more brutal approach. By repeating the same word over and over again, one could hope that the network would at some stage lose track of the rest of the clause. However, even with 'it be said by the very very very very very very very very very very very very very very very very very very very very friendly girl that she will never love you.' no such lapse could be evoked. The network did 'forget' that the matrix clause was passive after the fifth instance of 'very' but, when the 'that'-clause arrived 15 words later, the ID units changed from 1 to 2, the activation value of the Declarative unit dropped from 0.99 to 0, and the Complement unit went from 0 to 0.99!
What these sentences show is that the network functions by moving from certain regions in a high-dimensional space to other regions on the basis of which words it sees (cf. Elman 1995). The long list of 'very's does not move the network from one state to another -- it just pushes the network closer to the center of the region in which an adjective is expected, followed by a noun, and then a complement-clause. When the adjective finally arrives, the net can jump from that state to the one in which only a noun and a complement-clause are expected. However, as some of the experiments below indicate, the network does not emulate a real Finite State Machine: even when expected words are omitted, it can usually still move to the correct state (skipping the missing one, as it were).