Cognition and Reality: April 2015

Tuesday, April 28, 2015

Computing The Fit Of An MDS Solution Using R

If you want to use R² to evaluate the fit of your MDS solution using the statistical programming language R, then read on. Some example code is given below. An R script containing the code is available here: http://www.bcp.psych.ualberta.ca/~mike/BlogStuff/RScripts/MDS-Fit-Eg.R

Recently I have been doing a lot of analysis of artificial neural networks that have been trained on a variety of musical tasks. At many times this analysis has involved computing the distances between different processing units, where these distances are based on connection weights. Then I perform multidimensional scaling (MDS) on the computed distances (Kruskal & Wish, 1978). For the most part I have been performing my MDS analyses in R, using the cmdscale command.

When one performs MDS one typically is doing so to reduce the dimensionality of the distances in order to make the data more understandable. In MDS you decide how many dimensions you want in order to fit the data. One important aspect of this is deciding how many dimensions are needed to give a proper fit to your distance data.

In R cmdscale will generate, if asked, a goodness of fit measure that is based on the eigenvalues of the MDS solution. This goodness of fit measure doesn’t make any sense to me. Being old school, I would basically like to measure goodness of fit by taking the original distances that I analyze and compare them to the new distances that can be determined from my MDS solution. In the old days, this was done by taking the two matrices of distances, stringing them out into two columns, and computing the correlation between the two columns. If you square this correlation, you get R² which measures the proportion of variance in your original data that is accounted for by your MDS solution. You can also compute an F statistic from this information (Pedhazur, 1982). F = (R²/1)/((1- R²)/(N – 2)) where N is the number of rows in your strung out distance matrix. The F can then be evaluated at 1, N – 2 degrees of freedom. You can also use this approach to determine if taking out another dimension accounts for a significant increase in variance accounted for – but see Pedhazur for the details of this approach (in the context of multiple regression).

Unfortunately R does not deliver this old school measure of fit, even though a search of the internet for how to do so reveals that many people seek it. These poor people, probably old school like myself, are simply told to use the eigenvalue-based goodness of fit that cmdscale will deliver if prompted.

I recently rejected this advice, and wrote some R code to compute the “old school” goodness of fit measure. I provide this code below; I hope that someone will find it of use!

#Example of calculating goodness of fit

#for MDS via cmdscale()
#Written by Michael R.W. Dawson, spring of 2015

#Basic idea: Do MDS on distance data with cmdscale. Use the coordinates

#from this solution to recreate the distances. String the original
#distances and the new distances matrices out into two columns.
#Calculate the correlation between these two columns. Square this
#correlation to get proportion of variance accounted for.
#Compute degrees of freedom = df = n -2 where n = length of a strung out matrix
#Compute F = (rsquared/1)/((1 - rsquared)/df)
#Evaluate p value for F at 1,df degrees of freedom

#This example uses the eurodist dataset, already a set of distances

#The distances between 21 European cities, standard dataset in R

eurodist #display the starting distance data

# Perform MDS on the data using cmdscale; k determines the number of dimensions

EuroMDS <- cmdscale(eurodist, eig=TRUE, k=2)

# ---- Code below calculates goodness of fit ----

# Copy coordinates from MDS solution into a new matrix

NewEuroCoords = EuroMDS$points #take coords from cmdscale solution

#Compute a new set of distances from the MDS coordinates delivered by cmdscale

NewEuroDists <- dist(NewEuroCoords, diag=TRUE, upper=TRUE)

#String the two distance matrices out into two columns

#and calculate the correlation between these two columns
r <- cor(c(eurodist), c(NewEuroDists))

#compute and display r squared from this correlation,

#this is variance accounted for

rsquared <- r * r #compute
rsquared #display

#compute F test on this correlation

#degrees of freedom for numerator = 1

#compute degrees of freedom for denominator = N - 2 and display
freedom = NROW(c(NewEuroDists)) - 2 #compute
freedom #display df for denominator

#compute and display F

Fvalue <- rsquared /((1 - rsquared)/freedom) #compute
Fvalue #display

#use pf to determine the p-value of this F at df = 1, freedom

pf(Fvalue, 1, freedom, lower.tail=FALSE)

#End result: goodness of fit measured as ability to

#reconstruct original distances, along with
#a significance test of this ability
#Change k to see fit of a different sized
#solution

References

Kruskal, J. B., & Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publications.

Pedhazur, E. J. (1982). Multiple Regression In Behavioral Research (Second ed.). New York, NY: Holt, Rinehart and Winston.

Monday, April 27, 2015

Composing With Strange Circles

As described in this previous post, the text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.

This post provides a working java program for hearing different combinations of strange circles; the program is described below and is available for free as a Java jar file at this location: http://www.bcp.psych.ualberta.ca/~mike/BlogStuff/Circles/StrangeCircles.zip.

Also, this post is a revised version of this previous post on this blog. The text is a bit different; the big difference is that at the bottom of this blog I provide a java program that lets you compose with a variety of strange circles and hear the results. Feel free to download the program and play with it. If you have any difficulties please leave a comment; this is my first attempt at distributing code in this fashion and I would be surprised if I don't make some mistakes. To run the program, download the zip file, unpack it, and double-click on the .jar file's icon. You need to have Java installed on your program for this code to function.

Figure I-8. The first four bars of an atonal piece composed with some strange circles found within musical networks. See text for details.

Atonal music has no discernible musical key or tonal center because all twelve pitch-classes from Western music occur equally often. Arnold Schoenberg invented a method, called the twelve tone technique or dodecaphony, for composing atonal music. His 1923 piece Fünf Klavierstücke, Opus 23 was the first to be composed using this technique.

In dodecaphony one begins a new composition by arranging all twelve pitch-classes in some desired order; this arrangement is called the tone row. The first note from the tone row is then used to begin the new piece. The duration of this note, and whether or not it is repeated, is under the composer’s control. However, once the use of this note is complete, dodecaphony takes control: the twelve tone method prevents the composer from using again until all of the other eleven notes in the tone row have first been used. Their use, naturally, follows the same procedure used for the first note: the composer decides upon duration and repetition, uses the note, and then moves on to the next note in the tone row. The final movement of Schoenberg’s Fünf Klavierstücke, Opus 23 was the first to be composed using a complete (twelve note) tone row in this fashion.

In the preceding chapter we saw that musical pitch-classes could be arranged in a number of different strange circles: for instance, four different circles of major thirds ([C, E, G#], [C#, F, A], [D, F#, A#], and [D#, E, G]) or two different circles of major seconds ([C, D, E, F#, G#, A#] and [C#, D#, F, G, B]).

We also saw that when artificial neural networks are trained to solve problems in harmony, they often use these strange circles to organize pitch-classes into different equivalence classes. For instance, all of the pitch-classes that belong to one circle of major seconds may all be assigned the same connection weight (e.g. to the connection from a pitch-class input unit to a hidden unit).

In a musical network, the connection weight from an input unit to a hidden unit is essentially the ‘name’ that identifies the pitch-class. If all of the pitch-classes belonging to a strange circle are assigned the same connection weight, then they are all being assigned the same ‘name’. This means that the hidden unit is deaf to any differences between members of this subset of pitch-classes. For a hidden unit that uses equivalence classes based on circles of major seconds, there are only two pitch-classes: some ‘name’ x (the weight assigned to C, D, E, F#, G#, and A#) and some other ‘name’ y (the weight assigned to C#, D#, F, G, and B).

Why do networks use strange circle equivalence classes to represent musical structure? One reason is that networks discover that notes that belong to the same strange circle are not typically used together to solve musical problems, such as classifying a musical chord. Instead, the network discovers that combining notes from different strange circles is more successful.

This use of equivalence classes -- combining pitch-classes from different circles, but not from the same circle – suggests an alternative approach to composing atonal music.

Imagine a musical composition constructed from a set of different musical voices. Each of these voices could be derived from a strange circle. The notes sung by this voice are selected by randomly choosing from the set of pitch-classes that belong to the strange circle. For instance, if one voice was associated with a particular circle of major thirds, then one could write its notes by randomly choosing one note at a time from the set [C, E, G#]. To make the voice more musically interesting, one could add a randomly selected rest to the mix by selecting from the set [C, E, G#, R] where R indicates a rest (i.e. no note is to be sung).

If one associated different voices with different strange circles, and composed via random selection as described above, then one would be following the general principle discovered by the network: pitch-classes from different strange circles can occur together, but pitch-classes from the same strange circle cannot.

Furthermore, one could use this method to compose atonal music by wisely choosing which strange circles to use to create different voices. For instance, imagine creating a piece that included four voices, each associated with a different circle of major thirds. This composition would be atonal, in Schoenberg’s sense, because the four circles combine to include all twelve possible pitch-classes. Randomly selecting pitches from each of these circles would produce a composition that did not have a tonal center because each of the twelve pitch-classes would occur equally often when the composition was considered as a whole.

Figure I-8 provides a short score created by using the approach described above. This score includes six staves, one for each voice. Each voice is generated by randomly selecting from one strange circle (and including rests in this sampling procedure). The top two staves, written in quarter notes, are each drawn from a different circle of major seconds. The bottom four staves, written in half notes, are each drawn from a different circle of major thirds.

The score illustrated in Figure I-8 is created by applying two additional musical assumptions. First, while each wheel generated a pitch-class name, I decided how high or low (in terms of octave) each note was positioned. Second, in order to ensure that all notes tended to occur equally often in the score, I sampled the two circles of major seconds twice as frequently relative to the other four strange circles. That is why the upper two staves use notes that are half the duration of those in the bottom four staves.

Figure I-8 provides the first four bars of a longer composition that can be found at this website: http://cognitionandreality.blogspot.ca/2013/03/composing-atonal-music-using-strange.html. At the bottom of this web page one can find links that play some of the voices individually, some combinations of a small number of the voices, and all of the voices played together.

On listening to these samples, one discovers that individual the strange circles are musical, but are not really musically interesting. Music that is more interesting emerges from combining the random outputs of different circles. For instance, I enjoyed the results of pairing the two circles of major seconds together. I was also surprised at the musicality of the full composition. My impression of this piece was that it is a modern, atonal composition. I am no Schoenberg, but I humbly submit that composing music by combining strange circles provides an interesting and alternative method to dodecaphony.

Of course, there are other strange circles that could be incorporated into this approach to composing, such as the three circles of minor thirds or the six circles of tritones. What kinds of atonal pieces can be created when many different strange circles are available?

To answer this question, I created a Java program that uses David Koelle’s music package jFugue (Koelle, 2008). This package lets the programmer define strings of musical notes, and then takes care of playing them. The program that I wrote lets the user choose a composition’s tempo and length with a mouse, and then make a checkmark beside every strange circle to be used in a piece. All fifteen circles in Figure I-9 can be used at once! The user can decide whether or not to include rests, and set the duration and the octave (2 is lowest, 5 is highest) for each set of circles.

A press of the compose button leads to a pause while the various voices are constructed, and then the piece is played through the computer’s speakers. One can easily explore the possibilities of strange circle composing with this program and listening to the sounds that it creates.

This program is available for free as a Java jar file at this location: http://www.bcp.psych.ualberta.ca/~mike/BlogStuff/Circles/StrangeCircles.zip. Save the zip file to your computer, move it to a desired location, and unpack it. You will see a program called StrangeCircles.jar and a lib directory; these two items have to be in the same location on your computer. To run the program from a command line, when you are in the proper location type: java –jar StrangCircles.jar. On a windows machine, the program can also be run simply by double-clicking on the program’s icon after it has been downloaded. The program requires that Java be installed on your program.

Figure I-9. A screenshot of a Java program that randomly selects from various strange circles to compose atonal music. In the figure, one circle of major thirds, one circle of minor thirds, one circle of major seconds, and two circles of tritones have been selected to be used in a four bar composition that includes rests. See text for details.

References

Koelle, D. (2008). The Complete Guide to JFugue: Programming Music in Java: www.jfugue.org.

Sunday, April 19, 2015

Riemannian Tonics and Symmetry

As described in this previous post, the text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.

Figure I-6. Connection weights from input pitch-classes to two different output units (D and D#) in a network trained to identify Riemann’s roots for major and minor triads.

Hugo Riemann (b. 1849, d. 1919) was one of the most important music theorists (Rehding, 2003). Riemann was strongly influenced by the natural science of music that flourished in the 19^th century (Burnham, 1992; Hui, 2013), such as the prototypical and pioneering work of Helmholtz which was introduced in Chapter 1. Riemann’s goal was to provide the natural laws that governed music; he argued that these laws are rooted in musical harmony (Riemann, 1895). For Riemann, an understanding of the logic of harmony was tantamount to an understanding of the universal structure of music.

In his own analysis of harmony, Riemann was particularly interested in the triad. In modern musical theory a triad is a three note chord that is built upwards from a tonic note. For instance, the A major triad is built upon the tonic A, and includes C# (which is a major third higher than A) and E (which is a minor third higher than C#).

A minor triad can be described as a distortion of a major triad. In general, one creates a minor triad by lowering the middle note of a major triad by one semitone. For instance, to produce the A minor triad, one takes the A major triad [A, C#, E] and lowers the middle note to create the triad [A, C, E]. Another way to consider a minor triad is that it too is built upwards from a tonic, but using different musical intervals.

Riemann conceived triad structure in a different fashion. He proposed an idea known as harmonic dualism. According to harmonic dualism, major and minor triads are constructed from processes that are identical in structure. However, these processes are opposite in direction.

Harmonic dualism conceives of major triads in the same way as modern theory: by building upwards from a tonic with a note that is first a major third higher than the tonic, and then another note a minor third higher than this middle note.

However, harmonic dualism departs from modern theory in its conception of minor triads. Harmonic dualism highlights structural symmetry between minor and major triads. According to Riemann, minor triads are built downwards from the tonic using the same procedure used (in an upwards direction) to create a major triad.

For instance, consider the minor triad [A, C, E]. For Riemann, the tonic of this triad is not A, but is instead E. To create the triad, one first adds a note a major third below the tonic, and then adds a second note a minor third below the middle note. Riemann would not call this triad A minor; instead he would call it ‘under E’ or °E. Furthermore, its symmetric structure decreases its relationship to one major triad (A major) and increases its relationship to another (E major). E major and °E both start from the same tonic, and have identical structure one as moves away from this tonic.

Harmonic dualism assigns different tonics to the minor triads than are assigned by modern music theory. Table I-1 provides each approach’s tonics for the 12 major and 12 minor triads.

Table I-1 provides two qualitatively different theories about the tonic notes of triads. How might the differences between these theories be reflected in network structure? Is modern theory harder or easier for a network to learn than is harmonic dualism?

To explore such questions, perceptrons can be trained to identify the tonics of input triads. This requires 12 input and 12 output units. The input units use pitch-class representation to encode the three component pitch-classes of a triad, and the output units use pitch-class representation to encode the triad’s tonic. The output units are all value units, and each has its µ set to 0, and a learning rate of 0.01 is employed. Two different training sets are used to teach two different types of perceptrons: one that uses the Riemann tonics of the input triads, the other that uses their modern tonics.

Both training sets are learned very quickly: the Riemann tonics require an average of about 73 epochs of training, while the modern tonics are acquired after about 65 epochs of training. This difference is not statistically significant.

Harmonic dualism enforces symmetric processes for constructing major and minor triads. As a result, one constructs °D in exactly the same manner as one constructs D major, except in opposite directions. This symmetry of structure is elegantly evident in the connection weights of a perceptron trained to detect the Riemann tonics.

Consider the left hand side of Figure I-6. It plots the connection weights from each of the 12 pitch-class input units to the output unit that detects the Riemann tonic of D. Note the beautiful mirror symmetry of connection weights as you move either to the left or to the right from the D input weight.

How does the output unit use these weights to detect a Riemann tonic of D? The only positive connection weight is associated with the input pitch-class D, shown in black in the graph. The output unit will only activate when the signal sent through this positive connection weight is cancelled out by two signals sent through negative connection weights. Only two pairs of signals accomplish this: signals from A and F# (which combine with D to represent D major) and signals from A# and G (which combine with D to represent °D). These four connection weights are also shown in black. All of the other connection weights are so extremely negative that they turn this output unit off if any of their associated pitch-classes are present.

Because the symmetric pattern of connection weights emanates outwards from either side of one weight (e.g. from the left or right of the weight for D) there must be one pitch-class that breaks this symmetry. In this case, it is the weight for G# which is shown in white in the figure. Importantly, this outlier weight corresponds to the pitch class that is a tritone away from D. This means that if one was to wrap the x-axis of the graph in a circle (i.e. the circle of minor seconds), the connection weights would be perfectly symmetric across the diameter from D to G# which cuts this circle in half.

Essentially the same pattern of connection weights is found in this perceptron for the other output units. The only difference is that the pattern is shifted to be centered on a different output unit. This is illustrated on the right side of Figure I-6, which illustrates the weights of the connections that feed into the output unit for D#. It provides the identical pattern as shown on the figure’s left, but the pattern has been shifted one pitch-class to the right.

The wonderful symmetry in the Figure I-6 connection weights is completely consistent with the symmetry that is the foundation of harmonic dualism. Such symmetry -- perhaps sadly -- is not found when a perceptron is trained to generate the modern tonics of triads, as can be seen from the weights illustrated in Figure I-7.

Figure I-7. The connection weights from the input units to the output unit representing the tonic D in a perceptron trained to detect modern tonics.

Figure I-7 provides the connection weights that feed into the D output unit of a perceptron trained on modern tonics. Its bars are colored according to the same scheme used in Figure I-6: black bars show pitch-classes involved in turning this output unit on, grey bars show pitch-classes involved in turning this output unit off, and the white bar is associated with the pitch-class a tritone away from D. These weights function in a fashion similar to those of Figure I-6: A signal from D and two other input units produce a net input close enough to zero to turn the output unit on. The combinations are [D, F#, A] for D major and [D, F, A] for D minor. This identical pattern is found for other output units, once again shifted along the x-axis.

The symmetry that is present in Figure I-6, and missing from Figure I-7, is enchanting. There are clearly additional properties to be gleaned from the Figure I-6 weights that involve considering the relationships between pairs of pitch-classes that are assigned the same weight. These relationships, for instance, could be considered in the context of Riemann’s theories about tonal relationships.

In short, training networks to detect regularities defined by opposing musical theories provides a new paradigm – network interpretation – that could be applied to consider their advantages and disadvantages.

References

Wednesday, April 15, 2015

Keyfinding Logistics

As described in this previous post, the text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.

The networks described up to this point in the book have used the Gaussian activation function in their output or hidden units. One reason for this is that using value units leads to networks that are often easier to interpret, largely because they are tuned to respond to a very narrow range of net inputs (Berkeley, Dawson, Medler, Schopflocher, & Hornsby, 1995).

Most of connectionist cognitive science, however, uses networks whose processors compute activity with the logistic function. Let us take a moment to consider one such network of integration devices, and to explore its performance on a keyfinding task.

The logistic activation function has a long history of being used in the study of populations and in economics (Cramer, 2003). It was first invented and named by Pierre-Franҫois Verhulst in the 19^th century as a mathematical model of growth. It was independently rediscovered on more than one occasion in the early 20^th century.

In connectionism, the logistic function is particularly famous for being used as a continuous approximation of the threshold function; this in turn permitted researchers to use calculus to derive learning rules for multilayer perceptrons (Rumelhart, Hinton, & Williams, 1986). However, this equation has other important roles in connectionism as well.

For instance, the logistic equation permits network responses to be translated into probability theory (McClelland, 1998). As a result, the responses of a network that has integration devices in its output layer can literally be interpreted as being conditional probabilities (Dawson & Dupuis, 2012; Dawson, Dupuis, Spetch, & Kelly, 2009).

From this perspective, training an integration device network on a keyfinding task is appealing. Imagine that this network has 24 different output units, one for each possible major and minor key in Western tonal music. The activity in each of these output units would indicate probability judgments: each activity would indicate the probability that some musical event belonged to a particular musical key.

In Chapter 5 we described a network of value units that was trained on a set of pitch-class patterns that implied particular musical keys (Handelman & Sigler, 2013). This network’s ability to judge the musical keys of 152 different Nova Scotian folk songs (Creighton, 1932) was then examined.

Now let us consider a network that deals with the keys of these folk songs in a much more direct manner – by being trained to judge the keys of a subset of these songs. After this training, we can then examine the network’s performance on the songs that it was not presented during learning.

The network to be discussed uses 24 output units to represent the possible musical keys, 8 hidden units, and 12 input units that represent pitch-classes. Each of the 152 folk songs is represented in terms of their use of the 12 possible pitch-classes as was described in detail in Section 5.7.1.

A subset of 114 of these songs – 75% of the Creighton collection – is randomly selected to be used for training purposes. The multilayer perceptron is trained on these songs for 10,000 epochs to ensure that overall error is as low as possible. The desired output for each input song is the musical key selected for it by the Krumhansl and Schmuckler keyfinding algorithm (Krumhansl, 1990). After this training, the total sum of squared error (summed over 114 patterns with 24 different outputs for each pattern) is only 5.39.

Next, the remaining 38 folk songs (the 25% of all of the songs that were randomly selected to not be part of network training) are presented to the network to determine whether its learned keyfinding abilities generalize to novel stimuli.

When all of the data for network training and generalization is obtained, network outputs are considered as probabilities. Standard methods (Duda, Hart, & Stork, 2001) are now used to convert these probabilities into a keyfinding judgment for each song. This is done by finding the output unit that has the maximum activity, and assigning that output unit’s key to the input song.

For the training set of 114 folk songs, there is a very high degree of correspondence between the judgments made by this network of integration devices and the judgments made by the Krumhansl/Schmuckler algorithm. The network generates the same judgment for 113 of these songs, or over 99% of the training set. The two only disagree on the key assignment for the “Crocodile Song”, which the network judges to be in the key of C major, while the Krumhansl algorithm judges it to be in the key of F major. The second highest activity in the network’s response for this song is found in the F major output unit, suggesting that the network’s error is not too radical!

How does the network perform on the 38 songs that were not presented to it during learning? The network agrees with the Krumhansl/Schmuckler algorithm on 32 of these songs (84% agreement). This, as well as the 99% agreement on the training set, demonstrates a much stronger agreement between the two approaches than was evident in Chapter 5.

Is there anything special about the six songs for which the network and the Krumhansl/Schmuckler algorithm do not agree? It seems that these songs may be difficult to correctly keyfind, even for the standard algorithm. This suggests that failing to agree on these particular songs may not be surprising.

To be more precise, using the Krumhansl/Schmuckler algorithm on the Nova Scotian folk songs is accomplished using the HumDrum software package (Huron, 1999). For each key assignment, this software package generates a confidence value. When this value is high, the algorithm’s ability to keyfind is clear, which means that the key selected by the Krumhansl/Schmuckler algorithm generates a high match, and no other possible keys generate matches that are nearly that high. As confidence decreases, more than one key is a possible choice, because several different keys generate similarly valued matches.

For the 32 songs that receive the same key from both the network and from the Krumhansl/Schmuckler algorithm, the average confidence is 54.34%. However, for the 6 songs for which the two disagree, the average confidence is only 14.03%. In other words, when generalizing to new songs, the network tends to disagree with the Krumhansl/Schmuckler algorithm only on songs for which this algorithm itself is not confident.

Clearly this approach to using networks of integration devices for keyfinding demonstrates a great deal of promise. This promise, in turn, suggests further research questions. How well does learning about the keys of these folk songs generalize to other musical stimuli? What is the relationship between the internal structure of this network and the mechanics of the Krumhansl/Schmuckler algorithm? How might the network’s structure (e.g. number of hidden units) be altered to improve performance? The allure of studying musical networks is that their successes lead to promising future research projects!

References