|
|
2.3 NLP -- Making It Work
Papers about natural language and neural network models from the NLP side tend not to belong in Cognitive Science proper: NLP people are interested in models which can be put to use in real-life applications; whether such models are psychologically plausible is usually but a distant concern. One of the practical consequences of their aim is that they have mainly pushed neural network research into two directions: the first is the use of hybrid models, which combine elements from both connectionist and classical traditions (Sun 1996); the second is the use of models using (exclusively) localist representations (for examples see the contributions in Reilly & Sharkey 1992 and Adriaens & Hahn 1994). As none of those network are of direct relevance to my own work on CLASPnet, I will take a look here at recent work by Stan Kwasny and Barry Kalman (1995). Kwasny & Kalman's paper entitled "Tail-recursive Distributed Representations and Simple Recurrent Networks" contains a description of a new type of network for the analysis of structured syntactic parse trees. They combined the basic idea of RAAM models with a recurrent neural network architecture. (Note 5)
Figure 2.8 shows the type of network Kwasny & Kalman (1995) developed. This Sequential RAAM (SRAAM) network takes a parse symbol (e.g. N, or Det) as input, together with a distributed representation of the stack of parse symbols which the net has seen so far (the stack is empty when the first symbol of a sentence is seen). The hidden layer is used to construct a new distributed representation of the stack for when the next symbol comes along. And the desired patterns on the output layer are identical to the input patterns -- the auto-associative part of a RAAM network. Hence, the SRAAM works by learning to encode the input in its first layer of weights, and then decode the compressed representation again in the second layer of weights. The important point about such networks is that the parse symbol appears to be lost in the distributed representation but actually is not -- versus the claim by Fodor & Pylyshyn (1988) that connectionist networks cannot manipulate symbols correctly.
Kwasny & Kalman (1995) trained the network on 25 syntactic parse trees that a particular grammar could produce (see Figure 2.9 for some examples -- these trees were rewritten to binary branching trees for the actual input), and then tested if the SRAAM could process all of them correctly. They found only one small structural error, next to 4 errors (out of a possible 222) in which an incorrect symbol was output.
In order to find out how the network managed to perform the task, they made use of cluster analysis: they compared the vectors with the activation values of the hidden units for all 25 parse trees, combined the two which were closest to one another into a single vector, and then repeated this procedure until they had obtained a binary branching tree
showing how close the different parse trees were to one another. The result can be seen in Figure 2.10.
A close look at the cluster analysis shows that the size of the parse tree seemed to be the most important factor: the trees with a small number of symbols cluster in the bottom part, while the trees for more complex sentences are at the top. In addition to length, the network was clearly also sensitive to the internal structure of the parse trees: it is easy to see in Figure 2.10 that the trees cluster more when the order of the symbols from left to right is more similar.
There are two reasons why Kwasny & Kalman's (1995) model is relevant to CLASPnet. The first one is that it provides independent evidence that neural networks can be trained to become sensitive to the internal structure of complex sentences. (There are limits though to the strength of this analogy: CLASPnet uses a more vanilla recurrent architecture instead of an SRAAM, and the corpora used to train CLASPnet where far more complex than the parse trees described above.) The second reason is that no cluster analysis of the hidden layers of CLASPnet has been made -- hence, the analysis presented by Kwasny & Kalman provides a possible glimpse of what a similar analysis would show for CLASPnet.