|
|
3.4.6 The Relevance of Punctuation
A final property of the corpora used to train and test CLASPnet which may arouse suspicion is that they contained a lot of punctuation marks. Because the marks always occur at the beginnings or ends of clauses (commas), or at the ends of sentences (periods, question marks, exclamation marks), they provide the network with a very useful source of information about when it should reset its output units and start looking for new clues (cf. 3.3.4.5). So, what happens when the punctuation marks are removed? I will first present the results of a net which has been trained on a corpus with punctuation marks and which has then been confronted with a test corpus with missing punctuation. Second, the same question is addressed by training networks on corpora without punctuation marks.
In a certain way, these experiments could be argued to be unnecessary: written English has punctuation marks, just like spoken English has prosodic information. If humans have access to such information, a connectionist model should too. But real people can quite rapidly readjust to written texts without punctuation marks -- one need only think of Molly Bloom's interior monologue at the end of Joyce's Ulysses -- so psychologically plausible nets should be able to do that too. While CLASPnet makes no claims for psychological plausibility, it does offer the opportunity to get a vague idea of the issues that might arise when such nets are confronted with missing punctuation.