|
|
3.4.6.1 Deprivation of Punctuation Marks
For this experiment, I processed the first CLASPnet test corpus file so as to produce three other files: the first one missed all the commas, the second one the three sentence-final punctuation marks, and the third one missed all four marks. These new corpora were then presented to the trained CLASPnet network. The results are shown in Figure 3.30, as always for a tolerance value of 0.2.
The message from Figure 3.30 is hard to miss: CLASPnet does not perform well on corpora which do not contain end-of-sentence punctuation marks. While the deprivation of commas seems not to trouble the network too much (except for the Relative clause unit, which has 82% missed errors on the 'No-commas corpus'), the loss of the other three punctuation marks is near-fatal. These general percentages also hold for individual sentences, of course: the word by word analysis of 'the more the man kiss the woman the less he sleep the woman paint' shows that the two sentences are interpreted as being a single unit -- the Mood, ID, Infinity, and Polarity units stay at approximately the same values throughout the entire sequence of words. (The clausal units, however, simply get mightily confused when 'sleep' is seen, with more than three units reaching high activation values.)
As has been mentioned above, it is not certain that these results should be held against CLASPnet. One of the major reasons for the use of punctuation marks in written English is that it makes reading texts easier. Just how much easier is hinted at in Figure 3.30.