Wednesday, April 15, 2015

Keyfinding Logistics

As described in this previous post, the  text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.

The networks described up to this point in the book have used the Gaussian activation function in their output or hidden units.  One reason for this is that using value units leads to networks that are often easier to interpret, largely because they are tuned to respond to a very narrow range of net inputs (Berkeley, Dawson, Medler, Schopflocher, & Hornsby, 1995).

Most of connectionist cognitive science, however, uses networks whose processors compute activity with the logistic function.  Let us take a moment to consider one such network of integration devices, and to explore its performance on a keyfinding task.

The logistic activation function has a long history of being used in the study of populations and in economics (Cramer, 2003).  It was first invented and named by Pierre-Franาซois Verhulst in the 19th century as a mathematical model of growth.  It was independently rediscovered on more than one occasion in the early 20th century.

In connectionism, the logistic function is particularly famous for being used as a continuous approximation of the threshold function; this in turn permitted researchers to use calculus to derive learning rules for multilayer perceptrons (Rumelhart, Hinton, & Williams, 1986).  However, this equation has other important roles in connectionism as well.

For instance, the logistic equation permits network responses to be translated into probability theory (McClelland, 1998).  As a result, the responses of a network that has integration devices in its output layer can literally be interpreted as being conditional probabilities (Dawson & Dupuis, 2012; Dawson, Dupuis, Spetch, & Kelly, 2009).

From this perspective, training an integration device network on a keyfinding task is appealing.  Imagine that this network has 24 different output units, one for each possible major and minor key in Western tonal music.  The activity in each of these output units would indicate probability judgments: each activity would indicate the probability that some musical event belonged to a particular musical key.

In Chapter 5 we described a network of value units that was trained on a set of pitch-class patterns that implied particular musical keys (Handelman & Sigler, 2013).  This network’s ability to judge the musical keys of 152 different Nova Scotian folk songs (Creighton, 1932) was then examined.

Now let us consider a network that deals with the keys of these folk songs in a much more direct manner – by being trained to judge the keys of a subset of these songs.  After this training, we can then examine the network’s performance on the songs that it was not presented during learning.

The network to be discussed uses 24 output units to represent the possible musical keys, 8 hidden units, and 12 input units that represent pitch-classes.  Each of the 152 folk songs is represented in terms of their use of the 12 possible pitch-classes as was described in detail in Section 5.7.1.

A subset of 114 of these songs – 75% of the Creighton collection – is randomly selected to be used for training purposes.  The multilayer perceptron is trained on these songs for 10,000 epochs to ensure that overall error is as low as possible.  The desired output for each input song is the musical key selected for it by the Krumhansl and Schmuckler keyfinding algorithm (Krumhansl, 1990).  After this training, the total sum of squared error (summed over 114 patterns with 24 different outputs for each pattern) is only 5.39.

Next, the remaining 38 folk songs (the 25% of all of the songs that were randomly selected to not be part of network training) are presented to the network to determine whether its learned keyfinding abilities generalize to novel stimuli.

When all of the data for network training and generalization is obtained, network outputs are considered as probabilities.  Standard methods (Duda, Hart, & Stork, 2001) are now used to convert these probabilities into a keyfinding judgment for each song.  This is done by finding the output unit that has the maximum activity, and assigning that output unit’s key to the input song.

For the training set of 114 folk songs, there is a very high degree of correspondence between the judgments made by this network of integration devices and the judgments made by the Krumhansl/Schmuckler algorithm.  The network generates the same judgment for 113 of these songs, or over 99% of the training set.  The two only disagree on the key assignment for the “Crocodile Song”, which the network judges to be in the key of C major, while the Krumhansl algorithm judges it to be in the key of F major.  The second highest activity in the network’s response for this song is found in the F major output unit, suggesting that the network’s error is not too radical!

How does the network perform on the 38 songs that were not presented to it during learning?  The network agrees with the Krumhansl/Schmuckler algorithm on 32 of these songs (84% agreement).  This, as well as the 99% agreement on the training set, demonstrates a much stronger agreement between the two approaches than was evident in Chapter 5.

Is there anything special about the six songs for which the network and the Krumhansl/Schmuckler algorithm do not agree?  It seems that these songs may be difficult to correctly keyfind, even for the standard algorithm.  This suggests that failing to agree on these particular songs may not be surprising.

To be more precise, using the Krumhansl/Schmuckler algorithm on the Nova Scotian folk songs is accomplished using the HumDrum software package (Huron, 1999).  For each key assignment, this software package generates a confidence value.  When this value is high, the algorithm’s ability to keyfind is clear, which means that the key selected by the Krumhansl/Schmuckler algorithm generates a high match, and no other possible keys generate matches that are nearly that high.  As confidence decreases, more than one key is a possible choice, because several different keys generate similarly valued matches.

For the 32 songs that receive the same key from both the network and from the Krumhansl/Schmuckler algorithm, the average confidence is 54.34%.  However, for the 6 songs for which the two disagree, the average confidence is only 14.03%.  In other words, when generalizing to new songs, the network tends to disagree with the Krumhansl/Schmuckler algorithm only on songs for which this algorithm itself is not confident.

Clearly this approach to using networks of integration devices for keyfinding demonstrates a great deal of promise.  This promise, in turn, suggests further research questions.  How well does learning about the keys of these folk songs generalize to other musical stimuli?  What is the relationship between the internal structure of this network and the mechanics of the Krumhansl/Schmuckler algorithm?  How might the network’s structure (e.g. number of hidden units) be altered to improve performance?  The allure of studying musical networks is that their successes lead to promising future research projects!




No comments:

Post a Comment