|
|
2.2 Language Acquisition -- The Lexicon
Comparing how neural networks learn a certain task with how human babies do has been quite popular in Cognitive Science for some time now. Of particular fame in this respect is the so-called 'U-curve': children who are acquiring the past tense of English verbs tend to first learn the irregular verbs correctly, only to erroneously start regularizing them later on; in the final upward stage of the 'U' the irregular past tenses are again produced correctly (see e.g. Rumelhart & McClelland 1986; Plunkett & Marchman 1991, 1993). Other aspects of language which have been studied are the acquisition of English plural forms (Lee & Gasser 1992), and vocabulary growth (Plunkett et al. 1992).
In this section, however, I will describe the connectionist net presented by Michael Gasser and Linda Smith (1993). They investigated whether a network could model the language data which show that children learn concrete nouns before they learn dimensional adjectives such as big or green. Traditional linguistic accounts of this phenomenon have sought an explanation along the following lines: "All languages make a distinction between arguments, or objects, conveyed by nouns, and predicates, or relations, conveyed by verbs and adjectives. This distinction must be fundamental to the way people view the world. But nouns are in a sense prior to verbs and adjectives because while predicates presuppose arguments, the reverse is not true." (1993: 2). Gasser & Smith argued that such an ontological difference between the two, if it exists at all, is not visible to the language learning child: confronted with sentences like 'this is thelend' or 'this is the lorax', a child would not have the faintest idea that the first is an adjective and the second a noun (the article is of course not a separate entity in the phonetic input. So even if the child were innately predisposed to look for something like entities and something like predicates, she would still not have an easy time finding out which is which. Gasser & Smith (1993) therefore constructed a connectionist network in an attempt to explain the phenomenon along other lines.
Their starting assumption was that concrete nouns of the kind children learn first (e.g. dog, ball, car) diverge from dimensional adjectives (e.g. small, red, dark) in that the former are defined along many more conceptual dimensions than the latter: "noun categories and adjective categories differ markedly in their size, overlap, and number of relevant perceptual properties" (3). Figure 2.4 illustrates that nouns and adjectives usually vary both in their 'representational span' (i.e. the size of the region which they occupy along a given dimension) and in their 'representational compactness' (i.e. the number of dimensions relevant for defining the concept). The fact that there are many more things which can be called little than there are things which can be called car becomes obvious in Figure 2.5: the different instances of car cluster together in conceptual space, whereas those of little form an infinite hyperplane in the same space. (Note 3) Gasser & Smith hypothesized that this difference in nature might be responsible for the different acquisition rates of nouns and adjectives.
Figure 2.6 shows the architecture of the network they used to test the hypothesis. There were two groups of input units, one of which represented the linguistic context and the other represented the visual input along four different dimensions (e.g. color and size). The linguistic context could be one of four possibilities (represented in a localist way) (Note 4) : one unit stood for what is it? and required a noun as a response; the other three units each stood for a question regarding one of the perceptual dimensions (e.g. what color is it?). This representation of the linguistic context thus showed no syntactic difference between nouns and adjectives. The four dimensions of the visual input each consisted of 12 units, which functioned like a thermometer: if the third unit was on, then so were the first and the second, and if the ninth unit was on, then so were units one up to eight. The output layer had 36 units, which all localistically represented an adjective or a noun. During training, imaginary objects were presented to the network and it was checked how long it took the network to learn the different words (Figure 2.7 shows the difference between an adjective like 'big' for which only one dimension is important, and a noun like 'lorax' for which all four dimensions are important).
In a first experiment, it was checked which of the two lexical categories was learned faster when they were presented with equal frequency. Gasser & Smith (1993) found that for the network, as for children, the nouns proved easier to learn. Moreover, the network made the same kind of errors on adjectives as children: it more often gave an incorrect answer on the right dimension (e.g. green instead of red) than providing an answer on an irrelevant dimension (e.g. big instead of red). This result confirmed Gasser & Smith's hypothesis that the difference between nouns and adjectives in language acquisition might be due to their different types of representations. In two other experiments, they investigated the importance of representational span and representational compactness, and found that having values on more dimensions facilitated learning, as did having a smaller range on these dimensions.
In their conclusion, Gasser & Smith (1993), acknowledged that their model did not take into account other differences between nouns and adjectives, such as varying patterns of semantic overlap or the proportion of the representational space they cover as a class (the different adjectives which together describe a dimension can be used to talk about any possible value on that dimension (e.g. there is a continuum from black to white), but there are many possible combinations on different dimensions for which we have no nouns). They emphasized, however, that the importance of the model lay in showing that the traditional explanation for the different acquisition rates of nouns and adjectives may be more than is really needed: the difference between 'arguments' and 'predicates' would not have its origin as much in a fundamental linguistic opposition as in the nature and number of dimensions of the concepts involved.
As far as I know, Gasser & Smith's network has been the only one so far to vaguely model what cognitive linguists would call 'experiential realism' (Lakoff 1987): how the content of linguistic concepts could be drawn from perceptual experiences. If experiential realism can be shown to be an implementable hypothesis, it would offer a possible solution for the Symbol Grounding Problem (i.e. how can concepts have content -- Harnad 1992) and the even more fundamental Symbol Emergence Problem (i.e. where does the content of the concepts come from -- Plunkett et al. 1992). In this manner, experiential realism would be a viable alternative to the strong nativist position of Chomsky (1988) and Piatelli-Palmarini (1989).
For the implementation of the semantic representations used in CLASPnet I have been inspired by the method used by Gasser & Smith: CLASPnet too uses a combination of perceptual features to give the network information about what's happening in its 'virtual environment'. Contrary to the model just presented, however, features from different senses have been combined, and semantic representations have been developed for word classes other than adjectives and nouns (see 3.1.1).