126.96.36.199 The Hidden Representations
In this penultimate subsection, I will open the black box of the network a bit and show the patterns of activation on the 30 hidden units of the orthographic layer as the network is presented with some words -- they provide a glimpse of the daunting task neuroscientists face every day. First the internal representations of the network for 'I', 'he', 'me' and 'him' -- each of the words has been presented five times in a row, always following the sentence 'the women paint.' -- the larger the square, the more active the unit is (see Figure 3.18).
It is easy to see that there is a high degree of similarity between the representations of 'I' and 'he', and this despite the fact that the units share no orthographic features. But the two words are perfectly interchangeable in the corpus, so it makes sense for the network to associate them as much as possible. The lack of an identical degree of similarity between 'me' and 'him' is somewhat deceiving: if one also takes into account all the units which are inactive in both representations, then they are clearly related (Note 26).
The next figure shows how the network reacts to the presentation of the word 'dude' when it is used in subject, object, and verb position. Recall that 'dude' is unfamiliar to the network, so it has no way of knowing that it is really a noun. So, the figure illustrates the combination of the activation from the orthographic input layer (which remains constant in the three cases) with the pattern stored in the context layer at the time (which stores the expected arrival of a noun, a noun, and a verb, respectively).
For the network, apparently, the noun versus verb distinction is primarily represented in this case by the second unit on the third row, with some help from the third unit on the fourth row. The fact that the representations are different illustrates the context-sensitivity of the network, and also how it interprets new input information on the basis of the knowledge which it already possesses (cf. Geeraerts 1985).
If we look at some verbs then we also find differences (see Figure 3.20).The two representations for 'paint' differ in that the first was captured when the word had been presented five times in a row, while the second one shows 'paint' in the context of a simple sentence like 'the man paint.' Somewhat counter-intuitively, both patterns resemble the noun 'dude' more than the verb 'dude'. The first representation of 'sleep' has been captured in a simple sentence, 'the woman sleep.', while the second can be found at the end of 'the tiger, that chase the cows, sleep.'
Another interesting case is 'that': the word is used in the corpus with three different meanings. First, it can introduce the complement clause of a verb of saying, as in 'he announce to me that he can sleep.' Second, it can be used as a relative pronoun at the beginning of a relative clause, for example 'the tiger, that chase the cows, sleep.'. Third, it is also used sometimes as a determiner, as in 'that man sleep.' Somehow, the network has to be able to represent the different contexts on the hidden orthographic layer, so that the large hidden layer further down the line can use this information to activate the correct output units. As Figure 3.21 shows, the three instances do indeed show both similarities and differences. The first representation is that of the complement clause introducer; the second that of the determiner; and the third that of the relative pronoun. In all three representations, at least one unit is fairly active which is totally inactive in the other two cases.
The last figure shows the four punctuation marks in the corpus. While the period, exclamation mark, and question mark should all tell the network to position itself for the next sentence, the comma only separates clauses and should therefore have less drastic effects: for example, the Mood units should not be reset. One of the units which may be involved in coding this difference is the one at the top right.
When looking at these representations, one should keep in mind that they are distributed, not localist. So, it is entirely unrealistic to expect that a single unit would represent, for example, subject position, negative polarity, or part-of-noun-phrase. Insofar as the network uses such concepts, they will be coded over many different units at the same time; conversely, each unit should be expected to be playing a minor or major role for the coding of many different concepts.