3.4.3 No Syntax without Semantics?
Although it is one of the tenets of Cognitive Linguistics that the study of syntactic phenomena should not be pursued independent from that of semantic phenomena, it still remains to be proven that semantic information always contributes something useful to the syntactic properties of a sentence. Because of its bipolar setup, CLASPnet can contribute experimental results to this debate: the orthographic representation can easily be construed as the syntactic part, while the semantic representation is a straightforward implementation of the other part. The classification tasks which the network has had to master seem far closer to pure syntax than to semantics. So, do the semantic representations of the words contribute anything at all to how the network learns the task? It is not inconceivable that the net would learn to completely ignore the semantic information if it found that it was not of any use -- in this case, the weights on the links from the hidden semantic layer to the second hidden layer would all stabilize at approximately 0 (Note 28). As a quick look at these links showed that such was not the case, two experiments were done: in the first, the network which was trained on both semantic and orthographic representations was tested on a corpus which had either no semantic, or no orthographic information; in the second experiment, new simulations were run to see how well networks which only had seen one type of input could learn the classification tasks (Note 29).
It is of course also possible that the connection weights would not be close to 0, but that instead the weights on various connections would effectively cancel each other out. Because of the way in which backpropagation forces a network to learn, however, this scenario does not seem as likely.
A factor which has not been investigated is the importance of the number of units in the orthographic and semantic hidden layers. Recall that the former currently has 30 units (for 60 input units), whereas the latter only has 15 (for 25 units). This discrepancy is indeed likely to bias the network to some extent, but because all the units on the second hidden layer are connected to both the semantic and the orthographic hidden units, the network can overcome this bias by making the weights on the connections from the semantic layer stronger than those on the connections from the orthographic layer. In this way, a single unit from the semantic hidden layer can still override many units from the orthographic hidden layer.