|
|
3.4.1 The Importance of Being Modular
In order to find out whether the modular architecture of CLASPnet had been of much use to the network, I trained a non-modular network on an identical training corpus. The corpus was generated by the context-free grammar, and consisted of 4,000 sentences with a maximum length of 20 words. The non-modular network had a single hidden layer (+ recurrent layer) of 90 units at its disposal. This is less than the 100 units of the modular network, but because the layers are fully interconnected, the total number of connection weights for the two networks was about equal. And it is the connections which store the knowledge of the network, not the units.
Figure 3.23 compares the results of the two networks after training, for a tolerance value of 0.2. (In order to keep the number of elements in the figure limited, I have divided the 17 output units into the 4 groups which have also been used above: the Mood units; the ID + Status units; the clausal type units; and, the Infinity, Voice and Polarity units.)
Although all the differences are less than 10%, it is still clear that the non-modular network did not learn the tasks as well as its modular counterpart. Especially for the missed clausal type errors, the difference is sometimes considerable: for the Order unit it is 12%, and for the Relative clause unit even 23%. From these results, we can conclude that the network benefits from having access to separate hidden layers in which it can look for regular patterns either in the orthographic representations, or in the semantic representations. While combining the two is not fatal in any sense, it apparently does make it harder for the network to pay attention to only those features which are relevant in every single input pattern.