*As described*

*in this previous post*

*, the text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.*

The networks described up to this point in
the book have used the Gaussian activation function in their output or hidden
units. One reason for this is that using
value units leads to networks that are often easier to interpret, largely
because they are tuned to respond to a very narrow range of net inputs (Berkeley, Dawson, Medler, Schopflocher, & Hornsby, 1995).

Most of connectionist cognitive science,
however, uses networks whose processors compute activity with the logistic
function. Let us take a moment to
consider one such network of integration devices, and to explore its performance
on a keyfinding task.

The logistic activation function has a long
history of being used in the study of populations and in economics (Cramer, 2003). It was first invented and
named by Pierre-Franาซois Verhulst in the 19

^{th}century as a mathematical model of growth. It was independently rediscovered on more than one occasion in the early 20^{th}century.
In connectionism, the logistic function is
particularly famous for being used as a continuous approximation of the
threshold function; this in turn permitted researchers to use calculus to
derive learning rules for multilayer perceptrons (Rumelhart, Hinton, & Williams,
1986). However, this equation has
other important roles in connectionism as well.

For instance, the logistic equation permits
network responses to be translated into probability theory (McClelland, 1998). As a result, the responses
of a network that has integration devices in its output layer can literally be
interpreted as being conditional probabilities (Dawson & Dupuis, 2012; Dawson, Dupuis,
Spetch, & Kelly, 2009).

From this perspective, training an integration
device network on a keyfinding task is appealing. Imagine that this network has 24 different
output units, one for each possible major and minor key in Western tonal
music. The activity in each of these
output units would indicate probability judgments: each activity would indicate
the probability that some musical event belonged to a particular musical key.

In Chapter 5 we described a network of
value units that was trained on a set of pitch-class patterns that implied
particular musical keys (Handelman & Sigler, 2013). This network’s ability to
judge the musical keys of 152 different Nova Scotian folk songs (Creighton, 1932) was then examined.

Now let us consider a network that deals
with the keys of these folk songs in a much more direct manner – by being
trained to judge the keys of a subset of these songs. After this training, we can then examine the
network’s performance on the songs that it was

*not*presented during learning.
The network to be discussed uses 24 output
units to represent the possible musical keys, 8 hidden units, and 12 input
units that represent pitch-classes. Each
of the 152 folk songs is represented in terms of their use of the 12 possible
pitch-classes as was described in detail in Section 5.7.1.

A subset of 114 of these songs – 75% of the
Creighton collection – is randomly selected to be used for training
purposes. The multilayer perceptron is
trained on these songs for 10,000 epochs to ensure that overall error is as low
as possible. The desired output for each
input song is the musical key selected for it by the Krumhansl and Schmuckler
keyfinding algorithm (Krumhansl, 1990). After this training, the
total sum of squared error (summed over 114 patterns with 24 different outputs
for each pattern) is only 5.39.

Next, the remaining 38 folk songs (the 25%
of all of the songs that were randomly selected to

*not*be part of network training) are presented to the network to determine whether its learned keyfinding abilities generalize to novel stimuli.
When all of the data for network training
and generalization is obtained, network outputs are considered as
probabilities. Standard methods (Duda, Hart, & Stork, 2001) are now used to convert these probabilities into a keyfinding
judgment for each song. This is done by
finding the output unit that has the maximum activity, and assigning that
output unit’s key to the input song.

For the training set of 114 folk songs,
there is a very high degree of correspondence between the judgments made by
this network of integration devices and the judgments made by the
Krumhansl/Schmuckler algorithm. The network
generates the same judgment for 113 of these songs, or over 99% of the training
set. The two only disagree on the key assignment
for the “Crocodile Song”, which the network judges to be in the key of C major,
while the Krumhansl algorithm judges it to be in the key of F major. The second highest activity in the network’s
response for this song is found in the F major output unit, suggesting that the
network’s error is not too radical!

How does the network perform on the 38
songs that were not presented to it during learning? The network agrees with the
Krumhansl/Schmuckler algorithm on 32 of these songs (84% agreement). This, as well as the 99% agreement on the
training set, demonstrates a much stronger agreement between the two approaches
than was evident in Chapter 5.

Is there anything special about the six
songs for which the network and the Krumhansl/Schmuckler algorithm do not agree? It seems that these songs may be difficult to
correctly keyfind, even for the standard algorithm. This suggests that failing to agree on these
particular songs may not be surprising.

To be more precise, using the
Krumhansl/Schmuckler algorithm on the Nova Scotian folk songs is accomplished using
the HumDrum software package (Huron, 1999). For each key assignment,
this software package generates a confidence value. When this value is high, the algorithm’s
ability to keyfind is clear, which means that the key selected by the
Krumhansl/Schmuckler algorithm generates a high match, and no other possible
keys generate matches that are nearly that high. As confidence decreases, more than one key is
a possible choice, because several different keys generate similarly valued
matches.

For the 32 songs that receive the same key
from both the network and from the Krumhansl/Schmuckler algorithm, the average
confidence is 54.34%. However, for the 6
songs for which the two disagree, the average confidence is only 14.03%. In other words, when generalizing to new
songs, the network tends to disagree with the Krumhansl/Schmuckler algorithm
only on songs for which this algorithm itself is not confident.

Clearly this approach to using networks of
integration devices for keyfinding demonstrates a great deal of promise. This promise, in turn, suggests further
research questions. How well does
learning about the keys of these folk songs generalize to other musical
stimuli? What is the relationship
between the internal structure of this network and the mechanics of the
Krumhansl/Schmuckler algorithm? How
might the network’s structure (e.g. number of hidden units) be altered to
improve performance? The allure of
studying musical networks is that their successes lead to promising future
research projects!

__References__
## No comments:

## Post a Comment