|
|
3.4.8 Analyzing versus Predicting
In the work by Elman (e.g. 1992; see Chapter 2) on which CLASPnet builds, the recurrent network is not asked to analyze the current input pattern on the output layer. Instead, it is taught to predict the properties of the next input pattern. The reasoning behind the prediction task is that it is makes limited assumptions about the role of the 'teacher' for the network -- i.e. in real life there is also no feedback from an omniscient body about the correctness of our analyses. But there is experimental psycholinguistic data which shows that people do indeed anticipate properties of upcoming words. While I agree with the general point, I do not think that it would have been particularly useful for CLASPnet: as Elman's models make use of much smaller vocabularies and grammars than the ones used here, the prediction task was accordingly easier. The more options there are for constructing, for example, noun phrases, the stranger it appears to ask a network to predict which one will come next. And while people do anticipate (sequences of) words, they do not tend to do so in as explicit a manner as a neural net like CLASPnet has to do. Still, to compare the two tasks, I adapted normal training and test corpora of 3,000 sentences (maximum length of 20 words) to be able to run a prediction network. In Figure 3.33 the results on the test corpus is compared to that of a normal analysis network, though the two were not trained on the same training corpus.
It is easy to see that the analysis task is easier for the network in all respects. However, it is not so obvious what one should conclude from this result. It would probably be more appropriate to develop another scoring mechanism for the prediction network: at the moment, a prediction output is considered incorrect if it is not close enough to the desired output for the next pattern; arguably, it should only be incorrect if it is not a possible continuation of the clause. But determining which sequences are possible and which are not soon becomes a time-consuming task with a context-free grammar of any complexity -- because of the recursive nature of the relative clause, it is even impossible if one does not limit the length of the sentences. All in all, it seems fair to wait for an in-depth analysis of how the prediction network functions before passing judgment as to which of the two tasks is more appropriate (Note 35).