Artificial neural networks are “brain like” computer simulations. A network consists of a number of different processing units that send signals to one another through weighted connections. These networks learn to make desired responses to stimuli. A stimulus is a set of features that one presents to a network by activating the network’s input units. When activated, input units send signals through weighted connections to other network processors, eventually producing a pattern of activity in the network’s output units. This output activity is the network’s response to the presented stimulus. Often network responses are patterns of ‘on’ and ‘off’ values that classify a stimulus by naming its category.
The responses of a new network will not be very accurate, because the network has not yet learned the desired stimulus-response relationship. Learning proceeds by giving a network feedback about its responses, feedback for changing its internal structure. Feedback is usually a measure of response error -- the difference between desired activity and actual activity for each output unit. A learning rule uses these errors to adjust all of the connection weights inside the network. The next time the network receives this stimulus, it will produce a more accurate response because of these weight adjustments. By repeatedly presenting a set of training patterns, and by using feedback to adjust network structure, a network can learn to make a very sophisticated judgment about stimuli.
The connectionist revolution that struck cognitive science in the 1980s was a revolt against the ‘symbols + rules’ models that had defined classical cognitive science for many decades. Connectionists argued that artificial neural networks were far more appropriate models of cognition than were logicist models inspired by the operations of the digital computer. The gooey, brain-like innards of networks did not seem to have explicit symbols or rules, and seemed better suited for solving the ill-posed problems that humans are so good at dealing with.
The vanguard of the neural network revolution was something that I like to call ‘gee whiz’ connectionism. Connectionists would take some prototypical problem from the domain of classical cognitive science (usually involving language or logic), and would train a network to deal with it. Then they would claim that, gee whiz, a radical non-classical model of this problem now existed. In the 1980s, everyone assumed that the internal structure of networks was a huge departure from classical models, so the mere creation of a network to solve a classical problem was a sufficient contribution to the literature, as well as a critique of the classical approach.
The problem with ‘gee whiz’ connectionism was that it never validated its core assumption – that the insides of networks were decidedly non-classical – by actually peering inside them to see how they worked. When we started to interpret network structure many years ago, we found that the differences between a network and a classical model was often less distinct than connectionists imagined. We also found that more often than not networks had discovered representations for solving problems that were new, exciting, and interesting. Furthermore these representations were often far cleverer than any that I could think up on my own.
Recent results in my lab reminded me that networks usually discover ingenious solutions to problems. We train some networks to solve probability problems, and we use math to explore the relationship between network structure and important probability theorems. Our math has allowed us to derive equations to define network structure (e.g. connection weights) in terms of the probability rules. These equations all turned out to be loglinear models – equations that involve adding and subtracting natural logarithms of variables. For instance, we might find that the equation for some weight w is the loglinear model ln(a) + ln(b).
We also found in some of our more interesting equations that the variables in our loglinear models looked like important elements in probability theory and in other branches of statistics, (e.g. some variables looked like something called the odds ratio). What puzzled us, though, was that the network was taking the natural logarithm of these variables. This made the relationship between network structure and other branches of mathematics harder to define. Why were networks using the logarithms of these variables?
It finally struck me that this was in fact an extremely elegant solution to a mathematical problem faced by any of our probability networks. In many cases, to determine some probability based on different pieces of evidence, one has to multiply other probabilities together. However, the processors in our artificial neural networks cannot multiply or divide – they can only add or subtract the signals that they are receiving. The network’s solution to this conundrum is to do all of its calculations by using logarithms of variables, because in the world of logarithms, adding and subtracting amounts to multiplying and dividing. Once the logarithmic calculations are complete, the output units use their activation function to remove the logarithm and return the desired result.
In short, our probabilistic networks discovered how to use logarithms to perform multiplication and division. Gee whiz, we would never have discovered this had we not looked at the details of their internal structure.