[This article first appeared in Interface:
Journal of New Music Research,
Vol 21, 1992, pp. 135-148.
It may not be reproduced without permission.
After clicking on a footnote or reference,
please use your browser's back button to return to the main text.]
On the Spectral Analysis of Melody
Nigel Nettheim
The widely-known claim of Voss and Clarke (1978) that much music is well modelled by "1/f noise" is critically examined. Some new data are provided for classical music, yielding mainly negative conclusions. In the course of the investigation some of the problems of the statistical analysis of musical data are discussed.
Spectral analysis has long been applied in the study of the timbre of a given musical tone, but it is only more recently that attention was drawn by Voss and Clarke (1978) to the possibility of applying it to several features of the note-to-note progressions in a piece of music. That paper received the endorsement of Gardner (1978) and Mandelbrot (1983), and has since influenced composers of algorithmic music, including for example Bolognesi (1983) and Dodge and Bahn (1986), as well as theorists such as Boon et al. (1990). The claims, including in particular that "the frequency fluctuations of music ... have a 1/f spectral density at frequencies down to the inverse of the length of the piece of music" (Voss and Clarke, 1978, p.258), have so far not been challenged, and have been further presented by Voss (1988). In the present paper I first examine these claims and then offer some new data from 18th-19th century music.
Review of the Work of Voss and ClarkeLet us acknowledge the appealing character of the work of Voss and Clarke (hereafter abbreviated to Voss) and its stimulation of interest in a novel view of music. The criticism which follows is intended to be entirely constructive.
Data AcquisitionVoss's musical data were obtained by electronically monitoring a radio signal over a twelve-hour period. The varying amplitude of the signal reflected changes in the loudness of the music, a variable which will however not be pursued here. The varying rate of zero-crossing of the signal was also recorded; as each group of two or more zero-crossings (the number depending on the timbre) corresponds to one cycle of sound pressure, it was assumed that this quantity "roughly follows the melody" (Voss, 1978, p.260), though no evidence was given for the reliability of this method. Considering that a melody is normally accompanied by other tones which will have their own zero-crossings, not necessarily in phase with those of the melody, and considering also that the timbre of the instrument(s) playing the melodies, or of words sung to them, will generally vary within and between pieces, it would be of interest to see experimental support for this method, by comparing its results with the printed scores. Until such evidence is provided, I believe the method should be considered suspect. The method also evidently assumes that the highest pitch at each point forms the melody, a somewhat limiting assumption as will be discussed later.
Length of the Data RunVoss accumulated data over a twelve-hour stretch, thus including a variety of composers, pieces, movements, and announcers' comments. By analysing the entire sample undivided he took the period of spectral components up to the average length of a piece, which however seems questionable. A single piece is normally the largest unit of artistic significance, excepting possibly the relatively uncommon groups of related "pieces" such as a song cycle. In most cases, once a piece is finished it is considered to have passed beneath the horizon, and a fresh one starts with the time reset to zero. Appending pieces of different lengths in the order chosen by the broadcaster gives rise to correlations whose musical significance is not clear.1
Pitch, Rhythm and MelodyWe next consider by what method suitable variables may be
defined from an encoded melodic sequence. Here a
contradiction appears between Voss's method of analysis, on
the one hand, and his method of synthesis (stochastic
composition), on the other. For the melody analysed from the
recorded signal (Voss, 1978, p.260) is evidently the resultant
of both pitch-sequence and duration-sequence, which could be
represented by a single graph in which the vertical axis is
proportional to pitch and the horizontal to duration, as in my
Fig. 1. In what follows, I will take this to be the meaning
of "melody". But melody has been synthesized
(Voss, 1978, p.262) from a separate pitch-sequence and duration-sequence.
Similarly in Gardner (1978, pp.24-28), which by
acknowledgement is largely the work of Voss, each of the two
components is plotted as a separate graph with uniform
horizontal increment. Indeed, these graphs have been given
the horizontal label "time" in error for "note number", a very
different quantity, as the cumulated time is formed not by
uniform increments but according to the successive durations
which occur in the composition.
No evidence has so far been provided to my knowledge that
pitch and duration considered separately are 1/f processes,
and it seems hard to know on what basis such a result has been
incorporated into the synthesized music referred to. Indeed,
as successive pitches occur in general at unequal time
intervals, it is not clear what meaning could be attached to a
spectrum of pitches arranged uniformly on the time axis. The
meaning of a spectrum of durations seems even less clear, for
a duration itself involves movement along the time axis.
It is true that the analysis and synthesis of the
separated processes is a simpler statistical task than that of
a joint pitch/duration process.2
But such a separation seems
out of place as a model for most composers within the scope of
the present paper; the possibility of it arose, if at all,
only after Schoenberg in developments such as total serial
music.3
Voss (1978), Gardner (1978)
and others have generated
pitch/duration sequences stochastically according to assumed
separate independent 1/f processes for pitch and duration and,
for comparison, according to white noise and 1/f-squared
processes.4
From what has been said above it is clear that the
resultant processes do not have the properties of their two
constituents. For example, the resultant of "white noise
pitches" and "white noise durations" is by no means "white
noise music".5
In any case, listeners found the music derived from 1/f
processes preferable, the white processes producing music
considered too random and the 1/f-squared too highly
correlated. However, in connection with the highly correlated
music derived from 1/f-squared processes, one can guess that
this would be undervalued when presented without the interest
of appropriate harmony and metre which normally accompanies
the composed music whose analysis produced the method of
synthesis. A mere line of unaccompanied notes might need more
variability to sustain interest.6
The purpose of the music
should be considered too: a funeral march might produce higher
correlation than a frenzied dance, but should not on that
account be underrated as artistic music.
In this section I set aside Voss's work and describe an investigation of the spectrum of melodies encoded from musical scores. The selections listed in Table 1 were chosen. It will be seen that I have considered single movements to have the appropriate scope for the present purpose, by contrast with Voss's twelve-hour run discussed earlier. It would of course be desirable to include more examples; however, not only is the task of data entry and checking fairly time-consuming, but also spectral analysis requires individual attention to each case and does not lend itself entirely to automation.
1. Bach | Prelude in D, Well-tempered Clavier I No. 5. |
2. Mozart | Piano Sonata in Bb, K. 570, 1st movement. |
3. Beethoven | Piano Sonata in D, Op. 2 no. 2, 2nd movement. |
4. Schubert | Piano Sonata in A, D. 664, 2nd movement. |
5. Chopin | Etude in Ab, Op. Posth. No. 3. |
6. Gardner | White music. |
7. Gardner | 1/f music. |
8. Gardner | 1/f-squared music. |
Selection | Pitch | Duration |
1. Bach | Variable, repeated pattern | Smooth |
2. Mozart | Variable | Variable |
3. Beethoven | Smooth | Variable |
4. Schubert | Variable | Variable |
5. Chopin | Smooth | Smooth |
6. White | Very variable | Very variable |
7. 1/f | Intermediate | Intermediate |
8. 1/f-squared | Smooth | Smooth |
My approach to data acquisition has been different from
Voss's, discussed earlier: I have encoded the melodies
directly from the scores of several examples. This approach,
though apparently simple, conceals many problems. Melody may
be clearly defined as a single sequence of pitches in cases
such as liturgical chant, hymn tunes and folk-song, but by no
means always in the case of Western art-music of the 18th-19th
centuries, which is the scope of the present paper. Within
that scope, there is at present no definition of melody
allowing its satisfactory programmed extraction from an
encoded score. Features militating against automatic
extraction include the following:
(1) A single line played by a single instrument or voice may
be formed by movement between two or more melodic or
accompanimental strands: the Bach solo violin sonatas
furnish clear examples, and others abound. One then has
not so much "the melody" as "the intertwined melodies".
(2) Two or more contrapuntal lines may have equal claim as
"the melody", as in some of the Bach two-part inventions.
(3) The melodic line may move from one voice to another,
possibly with overlap, as in the second subject group of
the Mozart selection below.
(4) There may be passages of figuration not properly
considered as melody, as in the Chopin Etude Op 25 No 12,
where only one or two notes out of the sixteen in each
bar belong to "the melody".
(5) A rest might in some cases best be interpreted as if the
previous note were prolonged through it (as it might
indeed be by the piano's sustaining pedal), or in other
cases as a "zero" pitch where the melody is therefore not
comparably defined.
(6) A trilled note is represented by a single note-head on
the page, and analytically it may sometimes best be
regarded as a single entity; but in performance it is
expanded into a number of tones.
Given the difference between artistic melody as properly
understood and melody as it may be represented in a single
sequence, a melodic encoding of a composition which has any of
the above properties may be of questionable significance. A
non-musical investigator may of course take a more detached
view, but one must question what he is then measuring. I
proceeded with the encoding nevertheless; of the present
selections, one may find in the Mozart movement some of the
problems of the extraction of the melodic line discussed
above, whereas the remaining examples tend to minimize those
problems.
Before proceeding with statistical analysis it is healthy to
consider the underlying assumptions. The extent to which the
composition of artistic music is a suitable field for such
analysis is debatable. A work of art is individual and,
although generally related to a tradition, to some extent
establishes its own terms of reference, rather than being a
replication or the output of a production line. If the notes
in artistic compositions are treated as a statistical
phenomenon, a conflict may arise in the analysis or synthesis
of an individual work to be appreciated with some autonomy.
Here we can do no more than raise this question; see Meyer
(1989, pp.57-65).
An underlying assumption of spectral analysis is the
statistical stationarity of the given process. This means the
constancy over time of the probability structure of the
process—or informally that any one portion and any other
non-overlapping portion have equivalent probabilistic
properties. This assumption can hardly be made of a musical
phrase, which typically follows a controlled course of
development from its beginning towards a point of greater
intensity and is followed by a cadence or resolution. For a
similar reason a movement or piece of music can hardly be
considered a stationary process.7
Still, the shape of a
musical unit is partly determined by its harmonic and tonal
structure, and these would presumably account in part for the
non-stationary component while melody might in general depart
not too far from stationarity. Further exploration of this
question would be worthwhile, but for now we will act as if
stationarity applied to melody.
The first step for each selection was the encoding in a
computer file of what was judged to be the melodic line, in
the format of the SCORE commercial music printing program.
The encoding was checked both by printing it in musical
notation and by playing it on the computer's speaker.
Second, a custom computer program derived for each
selection three files as follows.
(i) The sequence of pitches was expressed in Hertz with
rests converted to a prolonging of the preceding
note.8
(ii) The corresponding sequence of durations was
expressed as integer multiples of a suitable small duration
such as a sixteenth-note.
(iii) The resultant of the pitch and duration sequences
was formed, each pitch number from (i) being repeated a number
of times given by (ii).9
Finally, each of the three sequences for each musical
selection was used as input to a program for spectral
estimation.10
The original melodic data are shown in Fig. 1. Points for orientation include the rising sequence just before the middle of the Chopin selection and the beginning of the recapitulation 3/5 through the Mozart. Simple though they are, these traces give a good bird's-eye view of the selections. The trace for "1/f-squared music" resembles in general terms those for Schubert and Beethoven, and indeed their spectra will be seen, below, to be fairly similar.
The claim of Voss and Clarke that 1/f processes well represent pitch in music has been found in these preliminary studies of classical music to have only slender support, and the claim for duration must evidently be rejected. Some apparent confusion involving the separation of melodies into pitch sequences and duration sequences has been pointed out, and it is suggested that melody is more appropriately analysed as their single-sequence resultant, particularly if spectra are to be calculated. In the present studies of melodies so defined, the spectrum has been found to tend more towards the 1/f-squared than the 1/f function, for periods up to about four bars of music. More generally, the appropriateness of spectral analysis as a tool for music analysis seems so far undemonstrated. Hopes for the success of music generated stochastically in the manner which had been advocated would appear not to be well-founded, if the music is to be consistent with 18th-19th century models. Although these conclusions are on the whole negative, it is hoped that they may clear the way for work on other characterizations having a stronger musical basis.
__________________________________1 Not only piece-to-piece but also movement-to-movement relationships might be questioned, considering that even in masterpieces movements have occasionally been substituted by the composer, probably not preserving spectral properties; there is also the question of the appropriate treatment of the un-notated pause between movements.
2 A joint pitch/duration process is an example of a "point process with adjoined random variables" in the terminology of Yaglom (1986, p.34 Fig. 12). Such a process was analysed and synthesized by Spyridis and Roumeliotis (1983).
3 More explicitly, the typical subconscious process of a composer of the period under discussion is unlikely to be first to ask, "what pitch will I use next?" and then independently "what duration will it have?"; it would rather be "what musical gesture will I use next?", where a musical gesture has as its basis a pattern formed from several successive pitches and durations jointly.
4 Informal definitions are as follows. If a series of numbers is regarded as formed from component cycles of various frequencies, its spectral density, or spectrum, measures the relative contributions at those frequencies. The term "noise" generally indicates erratic behaviour, whether of sound or of some other quantity, by contrast with a "signal" which it may accompany; here it simply indicates variable behaviour. "White noise" has equal contributions from all frequencies, just as does the colour white, and has no systematic behaviour. Noise whose spectrum over a certain range of frequencies is approximately a function 1/f or 1/f-squared is so named. The two powers, -1 and -2, have prominence here both because of mathematical properties underlying these phenomena and because of their being widely observed in non-musical applications.
5 It is even possible that music formed as the resultant of white noise pitches and uniform durations sounds more "random" than with white noise durations. It may seem paradoxical that the introduction of rhythmic figures, even random ones, into a series of random pitches can produce music sounding more coherent, but this experiment can be tried by playing Gardner's or Voss's example with the durations replaced by uniform ones. It may indeed be impossible to define "random" music convincingly—see Boon et al. (1990 pp.5-6).
6 In plain-chant and folk-song the text and intended mood are important ingredients.
7 An exception might be made for some music of the Baroque period which is relatively undifferentiated during its course—music which runs along rather homogeneously until it stops, and is sometimes informally referred to as "sewing-machine" music.
8 The unit of measurement of pitch could alternatively have been taken as the note-number on a piano keyboard, which is proportional to the logarithm of the frequency in Herz; this was found to make little difference to the results. The conversion of rests prevents misleading wild fluctuations in the numerical sequence.
9 By this method one sacrifices, apparently unavoidably, the differentiation between repeated notes and a single note having their combined duration. The reason for the conversion to repetitions of a small basic duration is that spectral analysis is always formulated in terms of a uniform time increment.
10 The program was derived from Press et al. (1988), where an introduction to spectral methods is given. The fast Fourier transform method was preferred over the maximum entropy method, as sharp spectral peaks were not the main interest. Means were subtracted from each series. The number of points to be estimated and the bandwidth for averaging were chosen as in Table 3. (Here one would like to know similar details of Voss's spectral estimation procedure.)
Selection | Minimum Duration | Tempo | Melody | Pitch & Duration |
N M K | N M K | |||
1. Bach | 16th | 430 | 559 32 8 | 528 32 8 |
2. Mozart | 32nd | 1270 | 5016 64 38 | 1131 32 17 |
3. Beethoven | 16th | 128 | 960 32 15 | 368 16 11 |
4. Schubert | 16th | 383 | 1800 32 27 | 358 16 10 |
5. Chopin | 12th | 162 | 359 16 10 | 349 16 10 |
6. Gardner white | 16th | 480 | 816 16 25 | 128 8 8 |
7. Gardner 1/f | 16th | 480 | 443 16 13 | 122 8 7 |
8. Gardner 1/f-sq | 16th | 480 | 840 16 26 | 122 8 7 |
11 The need for interpolation is a consequence of a peculiarity of the fast Fourier transform, which works in powers of two, and so does not provide estimates at periods of multiples of three data points. That transform, valuable as it is, thus has two left feet when faced with dance music!
12 If S denotes the spectrum, f the frequency, c a constant, and p a numerical power (special values of which are -1 and -2), it follows from the inverse power relation
Bolognesi, T.(1983). Automatic composition: experiments with self-similar music. Computer Music Journal, 7(1), 25-36.
Boon, J-P., Noullez, A. & Mommen, C.(1990). Complex dynamics and musical structures. Interface, 19, 3-14.
Dodge, C. & Bahn, C.R.(1986, June). Musical fractals. Byte, pp.185-196.
Gardner, M.(1978, 4). White and brown music, fractal curves and one-over-f fluctuations. Scientific American, pp.16-32.
Mandelbrot, B.(1983). The Fractal Geometry of Nature. New York: W.H.Freeman. (Re music: pp.374-375.)
Meyer, L.B.(1989). Style and Music: Theory, History, and Ideology. Philadelphia: University of Pennsylvania Press.
Press, W.H., Flannery, B.P., Teukolsky, S.A. & Vetterling, W.T.(1988). Numerical Recipes in C. Cambridge: Cambridge University Press. (With computer programs.)
Spyridis, H. & Roumeliotis, E.(1983). Fourier analysis and information theory on a musical composition. Acustica, 52, 255-256.
Voss, R.F.(1975). 1/f noise: diffusive systems and music. Unpublished doctoral dissertation, University of California, Berkeley.
Voss, R.F.(1988). Fractals in nature: from characterization to simulation. In Peitgen, H-O. & Saupe, D. (Eds.), The Science of Fractal Images (pp.21-70). New York: Springer-Verlag.
Voss, R.F. & J. Clarke.(1978). "1/f noise" in music: Music from 1/f noise. Journal of the Acoustical Society of America, 63(1), 258-263.
Yaglom, A.M.(1986). Correlation Theory of Stationary and Related Random Functions I. New York: Springer-Verlag.