|
|
3.4.7 The Acquisition of Clause Spotting Capacities
The only information about how fast CLASPnet learns has so far been given in Figure 3.7 of section 3.3.1. The drawback of that figure, however, is that it only shows the average error on all the output units, not the far more intriguing answer to the question of which of the classification tasks is learnt first during training. In order to make good for that omission, I ran a new simulation with a corpus of 3,000 sentences (maximum length of 20 words), and tested the net on the training corpus after epochs 1, 3, 5, 7, and 10 -- after 10 cycles only very slow progress is made during training, so the network is fine-tuning its connection weights at this stage rather than acquiring totally new skills. Figure 3.32 shows how the percentage of missed errors on each of the output units evolves as the epochs pass. The tolerated difference was again 0.2.
There are a few things to keep in mind when studying these figures: first, although I have split up the 17 output units into 4 groups, all the units were competing for resources with all the other units during training. Hence, a temporary worsening of the performance of the Status unit might be related to an improvement of the performance of the Infinity unit. Second, the backpropagation learning algorithm forces the network to decrease its global error, so there are moments when individual error rates have to go up again in order to achieve this aim. Third, the results presented above show how the network learnt on a single training instance -- while some of the evolutions can be expected to be typical for this type of network and this type of input data, others will be the result of the particular position which the network occupied in the space of possible solutions when it was randomly initialized.
Some of the more striking evolutions are probably not accidental: it is fairly to see that even after 1 epoch the network has already become sensitive to the statistical properties of the training corpus: all patterns are given the Indicative mood, are considered to be ID1, and Declarative. While this means that there are no missed errors for these units, there are nothing but missed errors for the related units. The network is then forced to abandon this easy strategy, so the missed errors for the dominant units increase while the net becomes engaged in detecting useful hints for the other units. If we look at what happens after the 10th epoch, we see that the net spends most of the last 110 training epochs on tuning the clausal types (especially the Complement and Yes/No-Question units) and the difficult Polarity unit.