3.2.1 Introduction and Motivation
Whether using supervised learning algorithms or not, neural network models need training materials if they are to master a task (Note 10). Once the network has learnt the training corpus as well as it can, its overall performance can be computed. But only looking at how well it has mastered the training corpus would tell but half the story: at least in the case of supervised learning, the network may only have stored all the precise input-output mappings it has been asked to learn, rather than having tried to find the regularities underlying those mappings (Note 11). This is why the test corpora play such an important role: a good test corpus is qualitatively similar to the training corpus, but contains input patterns which the network has never seen before. If the net has stored literal input-output mappings, it will fail to recognize anything at all in the test corpus, and hence will perform poorly. However, if it can generalize well from the patterns it has seen in the training corpus to the new patterns in the test corpus, then one can assume that it has become sensitive to the right level of abstraction. Another requirement for both training and test corpora is that they should contain input patterns which are relevant to the output patterns one wants the net to learn: for example, it is hard to imagine that a network which was only shown pictures of butterflies would ever become good at deciding whether these pictures contain the phoneme /y/. The other side of this requirement is that one should take care that the input does not (literally) contain the desired output -- in such cases, the task becomes trivial and nothing much has been proven by a network which learns to do the mappings (cf. Lachter & Bever 1988).
So, without appropriate corpora nothing much of interest can happen. How does CLASPnet fare in this respect? First, there is the negative side:
In supervised learning (e.g. using backpropagation), the network is given the desired output for each pattern in the training input -- the network can then easily compare its real output with the desired one, and change weights to get the real output closer to the desired output. Unsupervised learning algorithms do not have access to the desired output, though they are usually given some feedback about their present state: e.g. a virtual creature looking for food would receive feedback about whether it is still hungry or not -- in the supervised setup, it would receive corrective feedback after each step it had taken, to tell it whether that step was in the right direction or not (Jordan & Jacobs 1992; Jordan & Rumelhart 1992). Unsupervised learning is more attractive from a biological point of view, but usually suffers from poorer performance. In addition, it is not always evident which kind of general feedback could be given to a network to replace the right/wrong information of supervised learning.
The size of the hidden layer(s) is of paramount importance in this respect: if there are not enough hidden units, the network will be unable spot any regularities; and if there are too many, it does not need to spot them in order to learn the task. Only when the number of hidden units is 'just about right' -- there are only rules of thumb for choosing this number -- will the network look for the interesting regularities as the only method to decrease the overall error.