Cognition and Reality: Shallow Networks for Deeper Understanding?

As described in this previous post, the text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.

In the first half of the 20^th century, the notion of an artificial neural network composed of many different layers of processors was born (McCulloch & Pitts, 1943). These networks were very powerful, but had to be hand wired because a learning rule capable of training them had not yet been invented.

The first learning rule for artificial neural networks was discovered around the time of the cognitive revolution (Rosenblatt, 1958, 1962). However, this rule could not train networks that contained hidden units. As a result this learning rule could only train perceptrons, which are networks of limited capability (Minsky & Papert, 1969).

The rise of modern connectionism began with the discovery of supervised learning rules for networks with hidden units (Ackley, Hinton, & Sejnowski, 1985; Amari, 1967; Anderson, 1995; Rumelhart, Hinton, & Williams, 1986; Werbos, 1994). Researchers could now teach networks that had enormous computational power (in principle). Networks like the multilayer perceptron became the staple of connectionist cognitive science.

In the early decades of the 21^st century some researchers expressed concern with the limitations of the supervised training of multilayer perceptrons. While such networks can learn to perform a variety of complicated tasks, researchers often encounter practical problems in their use. Some have pointed out that the incredible power of the human brain arises from its use of many, many different layers of hidden neurons (Bengio, 2009). However, when 20^th century supervised learning rules are used, networks of many layers are enormously difficult to train. The old approaches to network training face practical obstacles that prevent the in principle power of multilayer networks from being exploited.

Modern researchers have discovered new types of learning rules that permit networks with many layers of hidden units to be trained (Bengio, Courville, & Vincent, 2013; Hinton, 2007; Hinton, Osindero, & Teh, 2006; Hinton & Salakhutdinov, 2006; Larochelle, Mandel, Pascanu, & Bengio, 2012). These new rules, often called deep learning, now permit researchers to train deep belief networks to accomplish tasks far beyond the capabilities of shallow, late 20^th century networks. Deep learning has produced networks for classification tasks involving natural language, image classification, and the processing of sound (Hinton, 2007; Hinton et al., 2006; Mohamed, Dahl, & Hinton, 2012; Sarikaya, Hinton, & Deoras, 2014). Daily news reports reveal deep learning applications are being employed by various companies such as Google, Facebook and PayPal; deep learning rules are widely available (Fischer & Igel, 2014; Testolin, Stoianov, De Grazia, & Zorzi, 2013).

The networks studied in the current book are clearly antiquated in comparison to modern deep belief networks. What is the point of using older, less powerful, networks to investigate music?

The primary motivation for exploring music with older architectures is the frequent disconnect between the technology of neural networks and the cognitive science of neural networks (Dawson & Shamanski, 1994). The development of artificial neural networks occurs in many different disciplines, and these different disciplines often have different goals. For instance, deep learning is emerging from computer science, and current research on it focuses on developing new procedures for accomplishing deep learning efficiently (Bengio, 2009). In other words, deep learning is being developed from a technological perspective; its developers are concerned with successfully training networks to perform extremely complex pattern classification tasks.

The cognitive science of deep learning is lagging far behind its technology. Some researchers have expressed concern that while deep learning produces networks that solve problems worthy of human neural processing, these networks are not themselves providing any insight about the workings of the human brain or the human mind.

One reason for this is that most deep learning advances are currently quantitative, not qualitative (Erhan, Courville, & Bengio, 2010). Techniques for interpreting the internal structure of deep belief networks are in their infancy. If a network cannot be interpreted, then it likely cannot contribute to cognitive science (McCloskey, 1991). Without interpretation, deep belief networks are magnificent artifacts, but are neither cognitive nor biological theories.

Of course, this is not to say that researchers are not interested in interpreting the internal structure of deep belief networks (Erhan et al., 2010; Hinton et al., 2006). For instance, in the very first publication describing a method for deep learning Hinton et al. (2006) look into a network’s “mind” by observing responses of network processors to various stimuli in hope of discovering the abstract features that are detected by hidden layers. However, few sophisticated techniques for interpreting deep networks exist. Erhan et al. (2010) observe that typically researchers only visually examine the receptive field (i.e. the connection weights) that feed into processors in the first hidden layer of a deep belief network.

One reason to explore older architectures in the current book is because there are many more procedures in existence for interpreting their internal structure. This in turn permits them to be more likely contributors to a cognitive science of music.

A second reason to focus on older artificial neural network architectures is the goal of seeking the simplest network that is required to solve a particular task. For example, in the next chapter we will see that no hidden units are required at all to identify the tonic of a scale. If such a simple network can accomplish this task, then why would we examine it with a deep belief network? Indeed, though very old architectures like the perceptron are extraordinarily simple, they can easily be used to contribute to a variety of topics in modern cognitive science (Dawson, 2008; Dawson & Dupuis, 2012; Dawson, Dupuis, Spetch, & Kelly, 2009; Dawson, Kelly, Spetch, & Dupuis, 2010).

Of course, the proof of the pudding is in the eating. Thus in order to defend the claim that older network architectures can contribute to musical cognition, we must actually demonstrate their utility. The goal of the remaining chapters in this book is to do exactly that. Can we show that training shallow networks can provide a deeper understanding of music?

Cited Literature:

Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzman machines. Cognitive Science, 9, 147-169.
Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, Ec16(3), 299-307.
Anderson, J. A. (1995). An Introduction to Neural Networks. Cambridge, Mass.: MIT Press.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1-127.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
Dawson, M. R. W. (2008). Connectionism and classical conditioning. Comparative Cognition and Behavior Reviews, 3 (Monograph), 1-115.
Dawson, M. R. W., & Dupuis, B. (2012). Equilibria of perceptrons for simple contingency problems. IEEE Transactions On Neural Networks And Learning Systems, 23(8), 1340-1344.
Dawson, M. R. W., Dupuis, B., Spetch, M. L., & Kelly, D. M. (2009). Simple artificial networks that match probability and exploit and explore when confronting a multiarmed bandit. IEEE Transactions on Neural Networks, 20(8), 1368-1371.
Dawson, M. R. W., Kelly, D. M., Spetch, M. L., & Dupuis, B. (2010). Using perceptrons to explore the reorientation task. Cognition, 114(2), 207-226.
Dawson, M. R. W., & Shamanski, K. S. (1994). Connectionism, confusion and cognitive science. Journal of Intelligent Systems, 4, 215-262.
Erhan, D., Courville, A., & Bengio, Y. (2010). Understanding Representations Learned in Deep Architectures. Technical Report 1355: Departement d’Informatique et Recherche Operationnelle, Universite de Montreal.
Fischer, A., & Igel, C. (2014). Training restricted Boltzmann machines: An introduction. Pattern Recognition, 47(1), 25-39.
Hinton, G. E. (2007). Learning multiple a layers of representation. Trends in Cognitive Sciences, 11(10), 428-434.
Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
Larochelle, H., Mandel, M., Pascanu, R., & Bengio, Y. (2012). Learning algorithms for the classification restricted Boltzmann machine. Journal of Machine Learning Research, 13, 643-669.
McCloskey, M. (1991). Networks and theories: The place of connectionism in cognitive science. Psychological science, 2, 387-395.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133.
Minsky, M. L., & Papert, S. (1969). Perceptrons: An Introduction To Computational Geometry (1st ed.). Cambridge, Mass.,: MIT Press.
Mohamed, A., Dahl, G. E., & Hinton, G. E. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio Speech and Language Processing, 20(1), 14-22.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408.
Rosenblatt, F. (1962). Principles Of Neurodynamics. Washington: Spartan Books.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.
Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE-Acm Transactions on Audio Speech and Language Processing, 22(4), 778-784.
Testolin, A., Stoianov, I., De Grazia, M., & Zorzi, M. (2013). Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Frontiers in Psychology, 4.
Werbos, P. J. (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. New York: Wiley.

Cognition and Reality

Sunday, March 22, 2015

Shallow Networks for Deeper Understanding?

No comments:

Post a Comment