|
|
3.3.3.3 Infinity, Voice, and Polarity
The last three output units in CLASPnet deal with the typical clausal properties of infinity, voice and polarity. The overall performance of the network on these units is summarized in Figure 3.14:
The result for the Infinity unit is similar to the ones we have already seen: close to perfection. Although there are not very many input patterns in the corpora for which the Infinity unit should be on (approx. 2,500 out of 43,000), these patterns are readily recognizable: the -'ing' forms can naturally be spotted by their final morpheme, while the '(in order) to' + infinitival forms are also unambiguous. The task for the -'ing' forms has been complicated by the presence of the adjective 'interesting', and by progressive verb forms (both active as in 'the sharks be catching the fish.' and passive as in 'a great dog be being pursued by the man.'). When looked at in the word by word analyses, the errors which the network makes with respect to the Infinity unit appear to be random: every once in a while, a word is classified incorrectly.
The scores for the other two units are less spectacular: still more than 65%, but nowhere close to the 'normal' 90%. If we look at the percentages of missed and spurious errors then these bad results are confirmed:
Finding the cause for all these errors is not that hard when one realizes that the network has to classify each and every word as being part of a positive/negative and active/passive clause. This task is plainly impossible for the first few words of almost all clauses: for example, after seeing 'the', 'the very', 'the very interesting', or 'the very interesting boys' even native speakers are unable to determine whether the words of this noun phrase belong to an active or a passive clause. It is only when the sentence has progressed to 'the very interesting boys do' that there is sufficient evidence to know that it is an active clause. And in order to be completely sure that it has negative polarity, one has to wait for yet another word: 'the very interesting boys do not'. In effect, the desired behavior for a connectionist model is that it postpones making a final judgment about the voice and polarity of a clause until it has gathered enough evidence (i.e. the presence or absence of certain critical markers) to be sure. And it is exactly this kind of behavior which CLASPnet tends to exhibit. At the start of each clause, the activation values for the Voice and Polarity units tend to jump to values between 0.2 and 0.8 and they then fluctuate in that range until the voice and polarity of the clause have become ascertainable. Comparing the results for these two units of the first three words of each sentence and those of the last three words of each sentence, we find very large differences (see Table 3.4 -- the numbers are from the test corpus).
| First three words of the sentences | Last three words of the sentences | ||
| Voice Unit | Missed Errors Spurious Errors | 44% 70% | 11% 19% |
| Polarity Unit | Missed Errors Spurious Errors | 62% 62% | 23% 9% |
Table 3.4: A comparison of the performance on the first three words of each sentence and the last three words for the Voice and Polarity units.
An issue raised by these results is whether a neural network could be trained to retroactively update its interpretation of an earlier word in the clause. It seems reasonable to assume that humans do so once they find out whether the clause they are reading or hearing is active/passive and positive/negative. In order to simulate this behavior, however, the network would need a far more complex architecture: instead of having a single output layer which signals the properties associated with the current word, one would need perhaps 10 or 20 different output layers on which the network can show its interpretation of the previous 9 or 19 words, in addition to the current word. As new words are presented to the net, one would be able to see the interpretation of a previous word change as it moves back one output layer at a time. Given a certain degree of interconnectivity between the hidden layers linked to the output layer for the current word and those linked to the output layers for previous words, it would at least be possible in theory for the network to learn the task. (Three practical problems which immediately come to mind would be, first, determining the number of additional output layers (a language like German might need 50 or 60 because different parts of a clause can be separated by a large number of words); second, finding an appropriate modular architecture; and, third, obtaining access to computers powerful enough for such a simulation.)