184.108.40.206 Training on Only Orthography or Semantics
The second way of comparing the information stored in the orthographical and semantic representations is to train networks on only one of the two and to compare the results. Starting from a single corpus file of 4,000 sentences (maximum length of 20 words), I again generated a semantics-only corpus file, alongside its orthography-only counterpart. Networks with architectures identical to that of CLASPnet were then trained on these corpora (Note 30). After training, they were both tested on an identical test file, and it are the testing results which are shown in Figure 3.26 (Note 31).
The leftmost bar in the figure indicates the performance of a network which was trained on both types of input, using the same corpus. From Figure 3.26 it is clear that a network which only has access to orthographic information can still be trained to nearly the same level of performance as the network which has access to both. It is only in the group of the missed errors for the Infinity, Voice, and Polarity units that there is a difference of more than 5%. (Actually, for the Voice unit, the performance of the orthography-only network is 2% worse than that of the semantics-only net.) While this may be a comforting thought for syntacticians, Figure 3.26 also illustrates that a network which has only been trained on semantic information can still learn to 'do syntax' -- even in the worst case, that of the clausal types, nearly 60% of the input patterns is still correctly classified (Note 32). As in the previous case, we can only conclude that there seems to be cause to believe that local semantic information can be of much use in detecting properties of clauses.
It would have been possible to speed up learning by removing the part of the network which was not receiving any inputs at all.
It is worth pointing out that the semantics-only network generalizes a lot better than the orthography-only network. With the former, the performance measures differ only one or two percents, while the latter produces discrepancies of more than 10% (e.g. 17% for the Complement unit). So, whatever it is in the semantic representation that the network can use for the tasks, it is certainly a reliable indicator (see also 3.4.4 below).
I still consider this to be a very intriguing result, but further investigation of it was not pursued in the context of CLASPnet. A possible explanation for the performance of the semantics-only net is that the single hidden layer was large enough to become sensitive to the very small differences between the representations of the different words. If that is indeed the case, then the semantic representation covertly includes a type of unique identifying pattern for each word which the net may have used as a replacement for the orthographic representation. At any rate, no two real concepts can be supposed to have identical representations in the human brain, so even if this hypothesis is true, it does not invalidate the method used for CLASPnet.