|
|
3.4.6.2 Training on Limited Punctuation
The main issue raised by the experiment just described is to which degree the network really requires the punctuation marks to learn the classification tasks. It might be unable to reach good performance without these cues, or the network might have used them because they were available, without looking for somewhat harder to find, but informationally equivalent cues. The following experiment therefore trained and tested three nets on corpora of 3,000 sentences (maximum length of 20 words): a no-commas network, a no-final-punctuation-marks network, and a no-punctuation network. The scores in Figure 3.31 are those of the test corpora for these networks, but it should be pointed out that the scores on the training corpora were a lot closer. Presumably, the nets learnt more sequences 'by heart' when no sentence-final punctuation marks were available.
What we find is that the scores of the different networks are much closer to one another than in Figure 3.30 above. The main reason for this difference is, of course, that the no-final and the no-punctuation networks perform much better now. Hence, we have evidence here to suggest that punctuation is not as necessary as the previous experiment seems to suggest -- as long as the network is allowed to become familiar with 'incomplete' sentences during its training phase. An interesting experiment suggested by this conclusion would be to mix sentences with punctuation with sentences without it and compare those results with the ones presented here.