Figure I-1. Three different types of
circles of pitch classes.
In the novel Us Conductors (Michaels, 2014), protagonist Lev Termen reflects that “it was not that I was
careless in my calculations; it was that I was seeking the wrong sum” (p. 161). A parallel situation arose when I first began
to train artificial neural networks on musical problems. My goal was to create a clean example of how to
interpret a trained network to include in a book on connectionist modeling (Dawson, 2004). I felt that it should be
quite straightforward to train a network on a well-defined music problem, and
then be able to pull the theory that defined the problem right out of the
network’s internal structure. I was
wrong.
The task that I investigated was chord
classification. The input units of the
network represented the keys of a small piano.
These inputs were used to present tetrachords – chords built from four
notes – to the network. The network learned
to classify each chord as belonging to one of four types: major, minor,
dominant, or diminished. Traditional
music theory defines each of these tetrachord types in terms of the presence of
particular musical intervals within the chord.
To make the problem more challenging chords were presented in different
musical keys and in different inversions.
In order to learn this problem, the network
required four hidden units. After learning
was completed, the task was to make sense of the network’s internal
structure. In particular, the question
of interest concerned what musical features were being detected by each hidden
unit.
The first attempt at answering this question
involved examining hidden unit activities to the various input chords. It was expected (from traditional music
theory) that each hidden unit would be sensitive to a musical interval involved
in defining tetrachords of a particular type, and that the network would solve
the problem by combining these different intervals. However, there was no evidence that supported
this approach.
After a number of false starts, I simply
looked at the weights of the connections from each input unit (i.e. each ‘piano
key’) to each hidden unit. This approach
revealed two very interesting regularities.
First, there was a regular mapping between weights and note names. For instance, different input units that
represented the note A at different octaves had exactly the same connection
weight feeding into the same hidden unit.
In other words, if one views a connection weight as the network’s name
for a note, then these notes encoded pitch-class instead of pitch.
Second, more than one pitch-class was given
the same name – the same connection weight – by the network. For instance, one hidden unit assigned the
same connection weight to the pitch-classes C, D, E, F#, G#, and A#. Another hidden unit assigned the same
connection weight to the pitch-classes C, E, and G#.
In considering the various pitch-classes
that were given the same connection weights in the network, it became evident
that the network had discovered a new way of solving the chord classification
task: using what I call strange circles.
Traditional music theory employs circular
representations of pitch-classes. For
example, most music students are exposed to the circle of perfect fifths that
is illustrated on the left of Figure I-1.
In this circle one can find each of the twelve possible pitch-classes;
nearest neighbors in the circle are a perfect fifth (seven semitones)
apart. This circle is not strange;
students use it (for example) to determine how many sharps or flats are required
in a written key signature.
The tetrachord classification network discovered
that it needed circles of pitch-classes, but based them on musical intervals
other than the perfect fifth. For instance,
one hidden unit named pitch-classes using the two circles of major seconds that
are illustrated in the middle of Figure I-1.
In either of these circles, nearest neighbors are a major second (two
semitones) apart. If a pitch-class
belonged to one circle, it was given one connection weight. If it belonged to the other circle, it was
given a different connection weight. Any
pitch-class that belonged to the same circle of major seconds was given the
same connection weight.
Similarly, another hidden unit named
pitch-classes using the four circles of major thirds that are illustrated on
the right of Figure I-1. In each of these circles, nearest neighbors are a
major third (four semitones) apart. When
these circles were used, pitch-classes that fell into the same circle of major
thirds were given identical connection weights; pitch-classes that belonged to
different circles of major thirds were assigned different connection weights.
One way in which circles of major seconds
and circles of major thirds are strange circles for this network is that they
define weird equivalence classes for pitch.
Traditional Western tonal music recognizes twelve distinct
pitch-classes. However, a hidden unit
that organizes inputs using circles of major seconds only recognizes two
distinct pitch-classes (one for each circle).
A hidden unit that organizes input using circles of major thirds only
recognizes four distinct pitch-classes (one for each circle).
A second way in which these circles are
strange is that they are not typical topics of music theory. Students learn the circle of perfect fifths
because of its utility; students do not learn strange circles, because their
utility is much less evident. This is
not to say that these strange circles are removed from traditional theory,
because this theory defines the musical intervals which in turn define each
circle.
This leads to the third way in which these
circles are strange. How is it possible
to use circles of major seconds and circles of major thirds to represent the
differences between major, minor, dominant, and diminished tetrachords? We will postpone the answer to this question
until Chapter 7 which examines chord classification networks in detail.
For now, the key point is that even when
traditional music theory defines the input/output mapping for a task, this
theory is the only means for mapping inputs into outputs. Artificial neural networks are capable of
finding alternative music theories. However,
to find such surprises one must examine the internal structure of trained
networks.
In short, supervised learning of standard
musical problems might lead a researcher to expect to find the network solves
the problem in a particular way.
However, networks can easily defy such expectations. It is not that a researcher’s expectations
are careless. It is just that a network
can compute different sums.
No comments:
Post a Comment