3.2.1 Introduction and Motivation
Whether using supervised learning algorithms or not, neural network
models need training materials if they are to master a task (Note 10). Once the network has
learnt the training corpus as well as it can, its overall performance
can be computed. But only looking at how well it has mastered the
training corpus would tell but half the story: at least in the case of
supervised learning, the network may only have stored all the precise
input-output mappings it has been asked to learn, rather than having
tried to find the regularities underlying those mappings (Note 11). This is why the test corpora
play such an important role: a good test corpus is qualitatively similar
to the training corpus, but contains input patterns which the network
has never seen before. If the net has stored literal input-output
mappings, it will fail to recognize anything at all in the test corpus,
and hence will perform poorly. However, if it can generalize well from
the patterns it has seen in the training corpus to the new patterns in
the test corpus, then one can assume that it has become sensitive to the
right level of abstraction. Another requirement for both training and
test corpora is that they should contain input patterns which are
relevant to the output patterns one wants the net to learn: for example,
it is hard to imagine that a network which was only shown pictures of
butterflies would ever become good at deciding whether these pictures
contain the phoneme /y/. The other side of this requirement is that one
should take care that the input does not (literally) contain the desired
output -- in such cases, the task becomes trivial and nothing much has
been proven by a network which learns to do the mappings (cf. Lachter
& Bever 1988).
So, without appropriate corpora nothing much of interest can happen. How
does CLASPnet fare in this respect? First, there is the negative
side:
- As has been mentioned before, the network has been trained on corpora
which were generated by a context-free grammar of English. However long this
grammar may be, even a cursory glance at its output suffices to learn that
it does not always produce perfect English sentences (see below for
examples).
- Next to the errors in the sentences which it does generate, there is of
course a very large number of normal English sentences which it does not
-- and can not -- produce, if only because the vocabulary used in the
context-free grammar is extremely small when compared to that found in
most English texts. Consequently, linguists should feel worried about
whether the results of CLASPnet can have any implications for
real English.
- What's even worse, this particular grammar has been written especially
for the CLASPnet project. Which means that there has been a lot of
potential for 'tweaking' the grammar and the network in such ways as to lead
to more impressive results to report on. By making the grammar and the
program used to generate the sentences publicly available, I want to show
that I think there is nothing to hide.
It will be self-evident by now that I would have preferred to use
training and test corpora of real, attested English, be it spoken or
written. There are several reasons why I have not done so:
- Real English corpora are not that easy to find, because most of them are
not free. Moreover, the existing corpora all use their own
classification schemes: some provide phonological and prosodic
information, others (also) information about phrases, or speaker
changes. To the best of my knowledge, however, no existing corpus
provides a classification scheme for clauses similar to the one I had in
mind.
- Real English sentences are 'rich': they are structurally very complex,
and contain a lot of different words. Hence, manual corpus tagging of
thousands of sentences (but how many exactly?) would have been a difficult
and time-consuming task. Determining appropriate semantic representations
for all the words would also have been a major challenge.
- Generating one's own corpora offers a much greater amount of control
over what the network gets to see: for example, it becomes easy to generate
a corpus with only passive sentences, or with an abundance of relative
clauses. Having the option to produce such customized corpora in a very
short time allows one to conduct many experiments which would not be
possible with a corpus of real English.
- Restricting the input of the network to those aspects of English one is
interested in reduces the number of factors involved in the simulation:
if a network like CLASPnet had failed to learn how to detect
clausal properties in a corpus of real English, it might have been
because there were too many different words (the net failed to recognize
the similarities between them), or too many different syntactic
constructions (none occurred frequently enough to be recognized as a
pattern), or because of the effects of agreement, tense, or aspect, etc.
By starting from a simple corpus, and then adding additional factors one
by one, it is possible to investigate which of the factors influences
the performance of the network in which ways.
Note 10
In supervised learning (e.g. using backpropagation), the network is
given the desired output for each pattern in the training input -- the
network can then easily compare its real output with the desired one, and
change weights to get the real output closer to the desired output.
Unsupervised learning algorithms do not have access to the desired
output, though they are usually given some feedback about their present
state: e.g. a virtual creature looking for food would receive feedback about
whether it is still hungry or not -- in the supervised setup, it would
receive corrective feedback after each step it had taken, to tell it whether
that step was in the right direction or not (Jordan & Jacobs 1992;
Jordan & Rumelhart 1992). Unsupervised learning is more attractive from
a biological point of view, but usually suffers from poorer performance. In
addition, it is not always evident which kind of general feedback could be
given to a network to replace the right/wrong information of supervised
learning.
Note 11
The size of the hidden layer(s) is of paramount importance in this respect:
if there are not enough hidden units, the network will be unable spot any
regularities; and if there are too many, it does not need to spot them in
order to learn the task. Only when the number of hidden units is 'just about
right' -- there are only rules of thumb for choosing this number -- will the
network look for the interesting regularities as the only method to decrease
the overall error.
Copyright 1996. All rights reserved.
Ezra Van Everbroeck
Last change: 10 July 1996
http://snow.ucsd.edu/~ezra/msc/321.html