Artificial
neural networks are “brain like” computer simulations.  A network consists of a number of different
processing units that send signals to one another through weighted
connections.  These networks learn to make
desired responses to stimuli.  A stimulus
is a set of features that one presents to a network by activating the network’s
input units.  When activated, input units
send signals through weighted connections to other network processors,
eventually producing a pattern of activity in the network’s output units.  This output activity is the network’s response
to the presented stimulus.  Often network
responses are patterns of ‘on’ and ‘off’ values that classify a stimulus by
naming its category.
The
responses of a new network will not be very accurate, because the network has
not yet learned the desired stimulus-response relationship.  Learning proceeds by giving a network
feedback about its responses, feedback for changing its internal structure.  Feedback is usually a measure of response
error -- the difference between desired activity and actual activity for each
output unit.  A learning rule uses these
errors to adjust all of the connection weights inside the network.  The next time the network receives this
stimulus, it will produce a more accurate response because of these weight
adjustments.  By repeatedly presenting a
set of training patterns, and by using feedback to adjust network structure, a
network can learn to make a very sophisticated judgment about stimuli.
The
connectionist revolution that struck cognitive science in the 1980s was a
revolt against the ‘symbols + rules’ models that had defined classical
cognitive science for many decades. 
Connectionists argued that artificial neural networks were far more appropriate
models of cognition than were logicist models inspired by the operations of the
digital computer. The gooey, brain-like innards of networks did not seem to
have explicit symbols or rules, and seemed better suited for solving the
ill-posed problems that humans are so good at dealing with.
The
vanguard of the neural network revolution was something that I like to call
‘gee whiz’ connectionism.  Connectionists
would take some prototypical problem from the domain of classical cognitive
science (usually involving language or logic), and would train a network to
deal with it.  Then they would claim
that, gee whiz, a radical non-classical model of this problem now existed.  In the 1980s, everyone assumed that the
internal structure of networks was a huge departure from classical models, so
the mere creation of a network to solve a classical problem was a sufficient
contribution to the literature, as well as a critique of the classical
approach.
The
problem with ‘gee whiz’ connectionism was that it never validated its core
assumption – that the insides of networks were decidedly non-classical – by
actually peering inside them to see how they worked.  When we started to interpret network
structure many years ago, we found that the differences between a network and a
classical model was often less distinct than connectionists imagined.  We also found that more often than not
networks had discovered representations for solving problems that were new,
exciting, and interesting.  Furthermore
these representations were often far cleverer than any that I could think up on
my own.
Recent
results in my lab reminded me that networks usually discover ingenious solutions
to problems.  We train some networks to
solve probability problems, and we use math to explore the relationship between
network structure and important probability theorems.  Our math has allowed us to derive equations
to define network structure (e.g. connection weights) in terms of the
probability rules.  These equations all
turned out to be loglinear models – equations that involve adding and
subtracting natural logarithms of variables. 
For instance, we might find that the equation for some weight w is the loglinear model ln(a) + ln(b).
We
also found in some of our more interesting equations that the variables in our
loglinear models looked like important elements in probability theory and in
other branches of statistics, (e.g. some variables looked like something called
the odds ratio).  What puzzled us,
though, was that the network was taking the natural logarithm of these variables.  This made the relationship between network
structure and other branches of mathematics harder to define.  Why were networks using the logarithms of
these variables?
It
finally struck me that this was in fact an extremely elegant solution to a
mathematical problem faced by any of our probability networks.  In many cases, to determine some probability
based on different pieces of evidence, one has to multiply other probabilities
together.  However, the processors in our
artificial neural networks cannot multiply or divide – they can only add or
subtract the signals that they are receiving. 
The network’s solution to this conundrum is to do all of its
calculations by using logarithms of variables, because in the world of
logarithms, adding and subtracting amounts to multiplying and dividing.  Once the logarithmic calculations are
complete, the output units use their activation function to remove the
logarithm and return the desired result.
In short, our probabilistic networks discovered how to use logarithms to perform multiplication and division. Gee whiz, we would never have discovered this had we not looked at the details of their internal structure.
 
No comments:
Post a Comment