In the first half of the 20th century, the notion of an artificial neural network composed of many different layers of processors was born (McCulloch & Pitts, 1943). These networks were very powerful, but had to be hand wired because a learning rule capable of training them had not yet been invented.
The first learning rule for artificial
neural networks was discovered around the time of the cognitive revolution (Rosenblatt, 1958, 1962). However, this rule could
not train networks that contained hidden units.
As a result this learning rule could only train perceptrons, which are
networks of limited capability (Minsky & Papert, 1969).
The rise of modern connectionism began with
the discovery of supervised learning rules for networks with hidden units (Ackley, Hinton, & Sejnowski, 1985; Amari, 1967; Anderson, 1995; Rumelhart, Hinton, & Williams,
1986; Werbos, 1994). Researchers could now teach networks that had
enormous computational power (in principle).
Networks like the multilayer perceptron became the staple of
connectionist cognitive science.
In the early decades of the 21st
century some researchers expressed concern with the limitations of the
supervised training of multilayer perceptrons.
While such networks can learn to perform a variety of complicated tasks,
researchers often encounter practical problems in their use. Some have pointed out that the incredible
power of the human brain arises from its use of many, many different layers of
hidden neurons (Bengio, 2009). However, when 20th
century supervised learning rules are used, networks of many layers are
enormously difficult to train. The old approaches
to network training face practical obstacles that prevent the in principle
power of multilayer networks from being exploited.
Modern researchers have discovered new
types of learning rules that permit networks with many layers of hidden units
to be trained (Bengio, Courville, & Vincent, 2013; Hinton, 2007; Hinton, Osindero,
& Teh, 2006; Hinton &
Salakhutdinov, 2006; Larochelle, Mandel,
Pascanu, & Bengio, 2012). These new rules, often called deep learning, now permit researchers to
train deep belief networks to
accomplish tasks far beyond the capabilities of shallow, late 20th
century networks. Deep learning has produced
networks for classification tasks involving natural language, image classification,
and the processing of sound (Hinton, 2007; Hinton et al., 2006; Mohamed, Dahl, & Hinton, 2012; Sarikaya, Hinton, & Deoras, 2014). Daily news reports reveal deep learning
applications are being employed by various companies such as Google, Facebook
and PayPal; deep learning rules are widely available (Fischer & Igel, 2014; Testolin, Stoianov, De Grazia, &
Zorzi, 2013).
The networks studied in the current book
are clearly antiquated in comparison to modern deep belief networks. What is the point of using older, less
powerful, networks to investigate music?
The primary motivation for exploring music
with older architectures is the frequent disconnect between the technology of
neural networks and the cognitive science of neural networks (Dawson & Shamanski, 1994). The development of
artificial neural networks occurs in many different disciplines, and these
different disciplines often have different goals. For instance, deep learning is emerging from
computer science, and current research on it focuses on developing new procedures
for accomplishing deep learning efficiently (Bengio, 2009). In other words, deep
learning is being developed from a technological perspective; its developers
are concerned with successfully training networks to perform extremely complex
pattern classification tasks.
The cognitive science of deep learning is
lagging far behind its technology. Some
researchers have expressed concern that while deep learning produces networks
that solve problems worthy of human neural processing, these networks are not
themselves providing any insight about the workings of the human brain or the
human mind.
One reason for this is that most deep
learning advances are currently quantitative, not qualitative (Erhan, Courville, & Bengio, 2010). Techniques for interpreting the internal
structure of deep belief networks are in their infancy. If a network cannot be interpreted, then it
likely cannot contribute to cognitive science (McCloskey, 1991). Without interpretation,
deep belief networks are magnificent artifacts, but are neither cognitive nor
biological theories.
Of course, this is not to say that researchers
are not interested in interpreting the internal structure of deep belief
networks (Erhan et al., 2010; Hinton et al., 2006). For instance, in the very
first publication describing a method for deep learning Hinton et al. (2006)
look into a network’s “mind” by observing responses of network processors to
various stimuli in hope of discovering the abstract features that are detected
by hidden layers. However, few
sophisticated techniques for interpreting deep networks exist. Erhan et al. (2010) observe that typically
researchers only visually examine the receptive field (i.e. the connection
weights) that feed into processors in the first hidden layer of a deep belief
network.
One reason to explore older architectures
in the current book is because there are many more procedures in existence for
interpreting their internal structure.
This in turn permits them to be more likely contributors to a cognitive
science of music.
A second reason to focus on older artificial
neural network architectures is the goal of seeking the simplest network that
is required to solve a particular task.
For example, in the next chapter we will see that no hidden units are
required at all to identify the tonic of a scale. If such a simple network can accomplish this
task, then why would we examine it with a deep belief network? Indeed, though very old architectures like
the perceptron are extraordinarily
simple, they can easily be used to contribute to a variety of topics in modern
cognitive science (Dawson, 2008; Dawson & Dupuis,
2012; Dawson, Dupuis, Spetch, & Kelly,
2009; Dawson, Kelly, Spetch, & Dupuis,
2010).
Of course, the proof of the pudding is in
the eating. Thus in order to defend the
claim that older network architectures can contribute to musical cognition, we
must actually demonstrate their utility.
The goal of the remaining chapters in this book is to do exactly that. Can we show that training shallow networks
can provide a deeper understanding of music?
- Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzman machines. Cognitive Science, 9, 147-169.
- Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, Ec16(3), 299-307.
- Anderson, J. A. (1995). An Introduction to Neural Networks. Cambridge, Mass.: MIT Press.
- Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1-127.
- Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
- Dawson, M. R. W. (2008). Connectionism and classical conditioning. Comparative Cognition and Behavior Reviews, 3 (Monograph), 1-115.
- Dawson, M. R. W., & Dupuis, B. (2012). Equilibria of perceptrons for simple contingency problems. IEEE Transactions On Neural Networks And Learning Systems, 23(8), 1340-1344.
- Dawson, M. R. W., Dupuis, B., Spetch, M. L., & Kelly, D. M. (2009). Simple artificial networks that match probability and exploit and explore when confronting a multiarmed bandit. IEEE Transactions on Neural Networks, 20(8), 1368-1371.
- Dawson, M. R. W., Kelly, D. M., Spetch, M. L., & Dupuis, B. (2010). Using perceptrons to explore the reorientation task. Cognition, 114(2), 207-226.
- Dawson, M. R. W., & Shamanski, K. S. (1994). Connectionism, confusion and cognitive science. Journal of Intelligent Systems, 4, 215-262.
- Erhan, D., Courville, A., & Bengio, Y. (2010). Understanding Representations Learned in Deep Architectures. Technical Report 1355: Departement d’Informatique et Recherche Operationnelle, Universite de Montreal.
- Fischer, A., & Igel, C. (2014). Training restricted Boltzmann machines: An introduction. Pattern Recognition, 47(1), 25-39.
- Hinton, G. E. (2007). Learning multiple a layers of representation. Trends in Cognitive Sciences, 11(10), 428-434.
- Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
- Larochelle, H., Mandel, M., Pascanu, R., & Bengio, Y. (2012). Learning algorithms for the classification restricted Boltzmann machine. Journal of Machine Learning Research, 13, 643-669.
- McCloskey, M. (1991). Networks and theories: The place of connectionism in cognitive science. Psychological science, 2, 387-395.
- McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133.
- Minsky, M. L., & Papert, S. (1969). Perceptrons: An Introduction To Computational Geometry (1st ed.). Cambridge, Mass.,: MIT Press.
- Mohamed, A., Dahl, G. E., & Hinton, G. E. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio Speech and Language Processing, 20(1), 14-22.
- Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408.
- Rosenblatt, F. (1962). Principles Of Neurodynamics. Washington: Spartan Books.
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.
- Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE-Acm Transactions on Audio Speech and Language Processing, 22(4), 778-784.
- Testolin, A., Stoianov, I., De Grazia, M., & Zorzi, M. (2013). Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Frontiers in Psychology, 4.
- Werbos, P. J. (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. New York: Wiley.
No comments:
Post a Comment