4.1 Summary of Results
I have shown that a modular Elman recurrent neural network can be
trained to analyze complex English sentences which are presented to it
one word at a time: the words were presented on the input layer by means
of a semantic and orthographic representation, whereas the units on the
output layer represented linguistic properties of sentences (mood) and
clauses (status, type, infinity, voice, and polarity). A context-free
grammar was used to generate the training and test corpora of 4,000
sentences (with a maximum length of 20 words). Training was done with
the backpropagation algorithm, until the network gave the correct
response to about 95% of the input patterns. Generalization to other
patterns generated by the context-free grammar was very good at about
93%.
Some of the more noteworthy conclusions reached along the way were:
- Increasing the tolerance for calculating what counts as an error
to values higher than 0.1 (even 0.5) hardly improved the performance
ratings of the network, which shows that the network mastered its task
very well.
- Mood: Overall performance was more than 95%. There were
noticeably more missed errors for imperative sentences than for
indicative or interrogative sentences, primarily as the result of the
lower frequency of imperatives in the training corpus.
- ID + Status: Overall performance was more than 95%, though the
network did not see enough sentences with more than three clauses in the
training corpus to learn to correctly classify fourth (or fifth)
clauses. The matrix clause level became the default for the Status
unit.
- Clausal Type: Overall performance was more than 95%. Most of the
errors were caused by interference from the two most frequent clausal
types (declarative clauses, and WH-Questions) with the classification of
the other types, especially orders and complement clauses.
- Infinity, Voice, Polarity: While the infinity classification was
near perfect; the apparently bad scores for the other two units could be
explained by the fact that the network was asked to classify inputs even
when it had not seen enough evidence to make a sound decision.
- Generalization: While the network had no problems analyzing
sentences of far greater length than it had ever seen in the training
corpus, its performance on multiple embedded relative clauses turned out
to be fairly poor. On the other hand, the network had little problems
with clauses in which words had either been omitted or doubled, and also
processed clauses containing unknown words very well -- effectively
using the clausal context to interpret the new word. As for the natural
language samples, the network was able to analyze most simple sentences
correctly, but the combination of construction types and words which it
did not know led to great confusion.
- Disambiguation: The network had no trouble keeping forms apart
which were related to more than one clausal type: e.g. 'that'
could appear as a determiner in any type of clause, as the introducer of
a complement clause, and as a relative pronoun at the start of a
relative clause.
The results of the experiments were:
- Modularity: The network benefits from its modular architecture,
because a similar network in which orthography and semantics were not
kept apart initially clearly performed worse;
- Training Corpus Size: The score of the network improved as the
training corpus increased in size -- a larger training corpus would likely
have led to even better results;
- Orthography vs Semantics: A network used to receiving semantic
information is affected severely when this semantic input is taken away.
While a network only trained on orthography scores nearly as well as one
trained on both, the absence of any semantic information at all still
affects it;
- Perceptual vs Gestalt: Although the combination of Gestalt and
instinctive units was slightly more important to the network than the
perceptual units, the difference was small;
- Orthographic Justification: The fact that the orthographic
representation was both left-justified and right-justified proved to have
little influence on the performance of the network, as both left-justified
and right-justified representations led to similar results;
- Punctuation: Although punctuation marks (especially the
sentence-final marks) were very important to the trained network, it turned
out that a network which did not have access to such marks during training
could still achieve decent performance;
- Acquisition: The classifications which had to be made most (e.g.
indicative, declarative) were initially adopted as the defaults, with less
frequent patterns being learnt later;
- Analysis vs Prediction: The prediction task was noticeably more
difficult for the network, but the calculation of what counts as an error
for such a network needs to be revised;