[This article first appeared in Interface: Journal of New Music Research, Vol 21, 1992, pp. 135-148.
It may not be reproduced without permission.
After clicking on a footnote or reference, please use your browser's back button to return to the main text.]

On the Spectral Analysis of Melody

Nigel Nettheim


The widely-known claim of Voss and Clarke (1978) that much music is well modelled by "1/f noise" is critically examined. Some new data are provided for classical music, yielding mainly negative conclusions. In the course of the investigation some of the problems of the statistical analysis of musical data are discussed.

Spectral analysis has long been applied in the study of the timbre of a given musical tone, but it is only more recently that attention was drawn by Voss and Clarke (1978) to the possibility of applying it to several features of the note-to-note progressions in a piece of music. That paper received the endorsement of Gardner (1978) and Mandelbrot (1983), and has since influenced composers of algorithmic music, including for example Bolognesi (1983) and Dodge and Bahn (1986), as well as theorists such as Boon et al. (1990). The claims, including in particular that "the frequency fluctuations of music ... have a 1/f spectral density at frequencies down to the inverse of the length of the piece of music" (Voss and Clarke, 1978, p.258), have so far not been challenged, and have been further presented by Voss (1988). In the present paper I first examine these claims and then offer some new data from 18th-19th century music.

Review of the Work of Voss and Clarke

Let us acknowledge the appealing character of the work of Voss and Clarke (hereafter abbreviated to Voss) and its stimulation of interest in a novel view of music. The criticism which follows is intended to be entirely constructive.

Data Acquisition

Voss's musical data were obtained by electronically monitoring a radio signal over a twelve-hour period. The varying amplitude of the signal reflected changes in the loudness of the music, a variable which will however not be pursued here. The varying rate of zero-crossing of the signal was also recorded; as each group of two or more zero-crossings (the number depending on the timbre) corresponds to one cycle of sound pressure, it was assumed that this quantity "roughly follows the melody" (Voss, 1978, p.260), though no evidence was given for the reliability of this method. Considering that a melody is normally accompanied by other tones which will have their own zero-crossings, not necessarily in phase with those of the melody, and considering also that the timbre of the instrument(s) playing the melodies, or of words sung to them, will generally vary within and between pieces, it would be of interest to see experimental support for this method, by comparing its results with the printed scores. Until such evidence is provided, I believe the method should be considered suspect. The method also evidently assumes that the highest pitch at each point forms the melody, a somewhat limiting assumption as will be discussed later.

Length of the Data Run

Voss accumulated data over a twelve-hour stretch, thus including a variety of composers, pieces, movements, and announcers' comments. By analysing the entire sample undivided he took the period of spectral components up to the average length of a piece, which however seems questionable. A single piece is normally the largest unit of artistic significance, excepting possibly the relatively uncommon groups of related "pieces" such as a song cycle. In most cases, once a piece is finished it is considered to have passed beneath the horizon, and a fresh one starts with the time reset to zero. Appending pieces of different lengths in the order chosen by the broadcaster gives rise to correlations whose musical significance is not clear.1

Pitch, Rhythm and Melody

We next consider by what method suitable variables may be defined from an encoded melodic sequence. Here a contradiction appears between Voss's method of analysis, on the one hand, and his method of synthesis (stochastic composition), on the other. For the melody analysed from the recorded signal (Voss, 1978, p.260) is evidently the resultant of both pitch-sequence and duration-sequence, which could be represented by a single graph in which the vertical axis is proportional to pitch and the horizontal to duration, as in my Fig. 1. In what follows, I will take this to be the meaning of "melody". But melody has been synthesized (Voss, 1978, p.262) from a separate pitch-sequence and duration-sequence. Similarly in Gardner (1978, pp.24-28), which by acknowledgement is largely the work of Voss, each of the two components is plotted as a separate graph with uniform horizontal increment. Indeed, these graphs have been given the horizontal label "time" in error for "note number", a very different quantity, as the cumulated time is formed not by uniform increments but according to the successive durations which occur in the composition.
      No evidence has so far been provided to my knowledge that pitch and duration considered separately are 1/f processes, and it seems hard to know on what basis such a result has been incorporated into the synthesized music referred to. Indeed, as successive pitches occur in general at unequal time intervals, it is not clear what meaning could be attached to a spectrum of pitches arranged uniformly on the time axis. The meaning of a spectrum of durations seems even less clear, for a duration itself involves movement along the time axis.
      It is true that the analysis and synthesis of the separated processes is a simpler statistical task than that of a joint pitch/duration process.2 But such a separation seems out of place as a model for most composers within the scope of the present paper; the possibility of it arose, if at all, only after Schoenberg in developments such as total serial music.3

Stochastic Music

Voss (1978), Gardner (1978) and others have generated pitch/duration sequences stochastically according to assumed separate independent 1/f processes for pitch and duration and, for comparison, according to white noise and 1/f-squared processes.4 From what has been said above it is clear that the resultant processes do not have the properties of their two constituents. For example, the resultant of "white noise pitches" and "white noise durations" is by no means "white noise music".5
      In any case, listeners found the music derived from 1/f processes preferable, the white processes producing music considered too random and the 1/f-squared too highly correlated. However, in connection with the highly correlated music derived from 1/f-squared processes, one can guess that this would be undervalued when presented without the interest of appropriate harmony and metre which normally accompanies the composed music whose analysis produced the method of synthesis. A mere line of unaccompanied notes might need more variability to sustain interest.6 The purpose of the music should be considered too: a funeral march might produce higher correlation than a frenzied dance, but should not on that account be underrated as artistic music.

Examples from Musical Scores


In this section I set aside Voss's work and describe an investigation of the spectrum of melodies encoded from musical scores. The selections listed in Table 1 were chosen. It will be seen that I have considered single movements to have the appropriate scope for the present purpose, by contrast with Voss's twelve-hour run discussed earlier. It would of course be desirable to include more examples; however, not only is the task of data entry and checking fairly time-consuming, but also spectral analysis requires individual attention to each case and does not lend itself entirely to automation.

Table 1. List of Musical Selections.

1. Bach Prelude in D, Well-tempered Clavier I No. 5.
2. Mozart Piano Sonata in Bb, K. 570, 1st movement.
3. BeethovenPiano Sonata in D, Op. 2 no. 2, 2nd movement.
4. Schubert Piano Sonata in A, D. 664, 2nd movement.
5. Chopin Etude in Ab, Op. Posth. No. 3.
6. Gardner White music.
7. Gardner 1/f music.
8. Gardner 1/f-squared music.

Table 2. Characteristics of the Musical Selections.

Selection Pitch Duration
1. Bach Variable, repeated patternSmooth
2. Mozart Variable Variable
3. Beethoven Smooth Variable
4. Schubert Variable Variable
5. Chopin Smooth Smooth
6. White Very variable Very variable
7. 1/f Intermediate Intermediate
8. 1/f-squaredSmooth Smooth

      Despite the perceived artificiality of the separation of pitch and duration sequences, I have analysed them as well as the resultant melodies, in order to be able to comment on Voss's claims and on the stochastic composition applications referred to. The characteristics of the pitch and duration components of the examples are summarised in Table 2 after preliminary inspection of the scores. It is seen that the selection was made so as to give a variety of combinations of types of pitch and duration processes. The Bach selection has a strongly repetitive pattern, which one will expect to confirm by observing peaks in its spectrum.

Data acquisition

My approach to data acquisition has been different from Voss's, discussed earlier: I have encoded the melodies directly from the scores of several examples. This approach, though apparently simple, conceals many problems. Melody may be clearly defined as a single sequence of pitches in cases such as liturgical chant, hymn tunes and folk-song, but by no means always in the case of Western art-music of the 18th-19th centuries, which is the scope of the present paper. Within that scope, there is at present no definition of melody allowing its satisfactory programmed extraction from an encoded score. Features militating against automatic extraction include the following:

(1) A single line played by a single instrument or voice may be formed by movement between two or more melodic or accompanimental strands: the Bach solo violin sonatas furnish clear examples, and others abound. One then has not so much "the melody" as "the intertwined melodies".
(2) Two or more contrapuntal lines may have equal claim as "the melody", as in some of the Bach two-part inventions.
(3) The melodic line may move from one voice to another, possibly with overlap, as in the second subject group of the Mozart selection below.
(4) There may be passages of figuration not properly considered as melody, as in the Chopin Etude Op 25 No 12, where only one or two notes out of the sixteen in each bar belong to "the melody".
(5) A rest might in some cases best be interpreted as if the previous note were prolonged through it (as it might indeed be by the piano's sustaining pedal), or in other cases as a "zero" pitch where the melody is therefore not comparably defined.
(6) A trilled note is represented by a single note-head on the page, and analytically it may sometimes best be regarded as a single entity; but in performance it is expanded into a number of tones.

Given the difference between artistic melody as properly understood and melody as it may be represented in a single sequence, a melodic encoding of a composition which has any of the above properties may be of questionable significance. A non-musical investigator may of course take a more detached view, but one must question what he is then measuring. I proceeded with the encoding nevertheless; of the present selections, one may find in the Mozart movement some of the problems of the extraction of the melodic line discussed above, whereas the remaining examples tend to minimize those problems.


Before proceeding with statistical analysis it is healthy to consider the underlying assumptions. The extent to which the composition of artistic music is a suitable field for such analysis is debatable. A work of art is individual and, although generally related to a tradition, to some extent establishes its own terms of reference, rather than being a replication or the output of a production line. If the notes in artistic compositions are treated as a statistical phenomenon, a conflict may arise in the analysis or synthesis of an individual work to be appreciated with some autonomy. Here we can do no more than raise this question; see Meyer (1989, pp.57-65).
      An underlying assumption of spectral analysis is the statistical stationarity of the given process. This means the constancy over time of the probability structure of the process—or informally that any one portion and any other non-overlapping portion have equivalent probabilistic properties. This assumption can hardly be made of a musical phrase, which typically follows a controlled course of development from its beginning towards a point of greater intensity and is followed by a cadence or resolution. For a similar reason a movement or piece of music can hardly be considered a stationary process.7 Still, the shape of a musical unit is partly determined by its harmonic and tonal structure, and these would presumably account in part for the non-stationary component while melody might in general depart not too far from stationarity. Further exploration of this question would be worthwhile, but for now we will act as if stationarity applied to melody.

Numerical processing

The first step for each selection was the encoding in a computer file of what was judged to be the melodic line, in the format of the SCORE commercial music printing program. The encoding was checked both by printing it in musical notation and by playing it on the computer's speaker.
      Second, a custom computer program derived for each selection three files as follows.
      (i) The sequence of pitches was expressed in Hertz with rests converted to a prolonging of the preceding note.8
      (ii) The corresponding sequence of durations was expressed as integer multiples of a suitable small duration such as a sixteenth-note.
      (iii) The resultant of the pitch and duration sequences was formed, each pitch number from (i) being repeated a number of times given by (ii).9
      Finally, each of the three sequences for each musical selection was used as input to a program for spectral estimation.10


The original melodic data are shown in Fig. 1. Points for orientation include the rising sequence just before the middle of the Chopin selection and the beginning of the recapitulation 3/5 through the Mozart. Simple though they are, these traces give a good bird's-eye view of the selections. The trace for "1/f-squared music" resembles in general terms those for Schubert and Beethoven, and indeed their spectra will be seen, below, to be fairly similar.

Figure 1. Melodies of the Selections.

Single-line graphical traces of melodies (   bytes: wxh=    )

      The spectral estimates for melody are shown in Fig. 2 on a double-logarithmic scale, the reason for which will become apparent when their slopes are studied; the vertical shifts, which have no numerical significance here, produce the same order of the selections as in Tables 1-2. The horizontal axis has been indicated in terms of musical note-values for the relevant periods. The maximum period for estimation is between three and four bars of music in each case, ensuring reasonable accuracy. The periods thus extend to the length of a typical musical phrase; the higher periods which Voss measured are precluded when the data run is limited to single movements.

Figure 2. Melody Spectra.

Graphs of spectra vs frequency: 8 selections, slope -1 & slope -2 (   bytes: wxh=    )

      The peaks in the Bach spectrum reflect the strong cyclical pattern in this music, the peak at the whole-note period, for example, corresponding to one-bar cycles. Some smaller spectral peaks, on the other hand, should be considered insignificant. As the spectral ordinates are estimated at uniformly spaced frequencies, greater detail appears at the higher frequencies on the logarithmic scale. Interesting periodicities involving multiples of three units, as in the Chopin example, require interpolation on the horizontal axis.11
      As mentioned earlier, the concept of the spectra of the separate pitch and duration sequences, especially the latter, would seem to be dubious. To complete the investigation of Voss's results, these spectra are nevertheless shown in Figs. 3 and 4. Comparisons can be made with the characteristics of each selection listed in Table 2. For the Bach and Chopin selections the durations are uniform, so that their pitch spectra are virtually the same as their melody spectra and their duration spectra are uniformly zero.
      The main purpose of this exercise was to study the slopes of the spectra, and lines with a slope of -1 and -2, corresponding to 1/f and 1/f-squared processes respectively, are included in Figs 2-4 for comparison.12 The comparison is not entirely easy to carry out by sight, the ideal aid being the "rolling ruler".
      Let us look first at the artificial selections (numbers 6-8). The slopes of their pitch and duration spectra (Figs. 3 and 4) confirm their source, allowing for the considerable deviations to be expected from short series. Their melody spectra (Fig. 2) are intermediate, with the 1/f-squared one tending towards its name-sake; in particular, the melodic spectrum of the so-called "white music" is not flat, as was known from the earlier discussion.

Figure 3. Pitch Spectra.

Graphs of pitch spectra (   bytes: wxh=    )

Figure 4. Duration Spectra.

Graphs of duration spectra (   bytes: wxh=    )

      We turn finally to the five composers. In the graphs for melody (Fig. 2), the Bach and Mozart spectra are intermediate between 1/f and 1/f-squared, the Beethoven, Schubert and Chopin close to 1/f-squared. Certainly the notion of a 1/f process is to be taken as a tendency rather than as a literal prescription, but the present results do not support it well; on the basis of these data one would not claim that melody is a 1/f process. The pitch spectra (Fig. 3) of Bach and Chopin reproduce their melody spectra, as mentioned; the Mozart graph is not clearly linear at all; while the remainder appear to be intermediate. The duration spectra (Fig. 4) of all five composers, which have been calculated without the conviction of a reasonable interpretation to be placed upon them, are rather flat, contrary to the claim that the sequence of durations is a 1/f process.


The claim of Voss and Clarke that 1/f processes well represent pitch in music has been found in these preliminary studies of classical music to have only slender support, and the claim for duration must evidently be rejected. Some apparent confusion involving the separation of melodies into pitch sequences and duration sequences has been pointed out, and it is suggested that melody is more appropriately analysed as their single-sequence resultant, particularly if spectra are to be calculated. In the present studies of melodies so defined, the spectrum has been found to tend more towards the 1/f-squared than the 1/f function, for periods up to about four bars of music. More generally, the appropriateness of spectral analysis as a tool for music analysis seems so far undemonstrated. Hopes for the success of music generated stochastically in the manner which had been advocated would appear not to be well-founded, if the music is to be consistent with 18th-19th century models. Although these conclusions are on the whole negative, it is hoped that they may clear the way for work on other characterizations having a stronger musical basis.



1 Not only piece-to-piece but also movement-to-movement relationships might be questioned, considering that even in masterpieces movements have occasionally been substituted by the composer, probably not preserving spectral properties; there is also the question of the appropriate treatment of the un-notated pause between movements.

2 A joint pitch/duration process is an example of a "point process with adjoined random variables" in the terminology of Yaglom (1986, p.34 Fig. 12). Such a process was analysed and synthesized by Spyridis and Roumeliotis (1983).

3 More explicitly, the typical subconscious process of a composer of the period under discussion is unlikely to be first to ask, "what pitch will I use next?" and then independently "what duration will it have?"; it would rather be "what musical gesture will I use next?", where a musical gesture has as its basis a pattern formed from several successive pitches and durations jointly.

4 Informal definitions are as follows. If a series of numbers is regarded as formed from component cycles of various frequencies, its spectral density, or spectrum, measures the relative contributions at those frequencies. The term "noise" generally indicates erratic behaviour, whether of sound or of some other quantity, by contrast with a "signal" which it may accompany; here it simply indicates variable behaviour. "White noise" has equal contributions from all frequencies, just as does the colour white, and has no systematic behaviour. Noise whose spectrum over a certain range of frequencies is approximately a function 1/f or 1/f-squared is so named. The two powers, -1 and -2, have prominence here both because of mathematical properties underlying these phenomena and because of their being widely observed in non-musical applications.

5 It is even possible that music formed as the resultant of white noise pitches and uniform durations sounds more "random" than with white noise durations. It may seem paradoxical that the introduction of rhythmic figures, even random ones, into a series of random pitches can produce music sounding more coherent, but this experiment can be tried by playing Gardner's or Voss's example with the durations replaced by uniform ones. It may indeed be impossible to define "random" music convincingly—see Boon et al. (1990 pp.5-6).

6 In plain-chant and folk-song the text and intended mood are important ingredients.

7 An exception might be made for some music of the Baroque period which is relatively undifferentiated during its course—music which runs along rather homogeneously until it stops, and is sometimes informally referred to as "sewing-machine" music.

8 The unit of measurement of pitch could alternatively have been taken as the note-number on a piano keyboard, which is proportional to the logarithm of the frequency in Herz; this was found to make little difference to the results. The conversion of rests prevents misleading wild fluctuations in the numerical sequence.

9 By this method one sacrifices, apparently unavoidably, the differentiation between repeated notes and a single note having their combined duration. The reason for the conversion to repetitions of a small basic duration is that spectral analysis is always formulated in terms of a uniform time increment.

10 The program was derived from Press et al. (1988), where an introduction to spectral methods is given. The fast Fourier transform method was preferred over the maximum entropy method, as sharp spectral peaks were not the main interest. Means were subtracted from each series. The number of points to be estimated and the bandwidth for averaging were chosen as in Table 3. (Here one would like to know similar details of Voss's spectral estimation procedure.)

Table 3. Details for the Analyses.

    Selection     Minimum Duration  Tempo      Melody     Pitch & Duration
 N   M   K  N   M   K
1. Bach 16th  430  559  32   8  528  32   8
2. Mozart 32nd 1270 5016  64  38 1131  32  17
3. Beethoven 16th  128  960  32  15  368  16  11
4. Schubert 16th  383 1800  32  27  358  16  10
5. Chopin 12th  162  359  16  10  349  16  10
6. Gardner white 16th  480  816  16  25  128   8   8
7. Gardner 1/f 16th  480  443  16  13  122   8   7
8. Gardner 1/f-sq 16th  480  840  16  26  122   8   7

Notes to Table 3:
Minimum Duration = horizontal increment for melody.
Tempo = number of Minimum Duration notes per minute, estimated for selections 1-5 from recordings by Landowska, Schnabel, Gilels, Solomon, and Rosenthal, respectively. These tempi determined the horizontal scales for Fig. 1.
N = number of data points.
M = number of spectral ordinates estimated.
K: the number of data points used per estimated ordinate is (2K+1)M.
The approximate duration in minutes of a selection is N(melody) / Tempo.

11 The need for interpolation is a consequence of a peculiarity of the fast Fourier transform, which works in powers of two, and so does not provide estimates at periods of multiples of three data points. That transform, valuable as it is, thus has two left feet when faced with dance music!

12 If S denotes the spectrum, f the frequency, c a constant, and p a numerical power (special values of which are -1 and -2), it follows from the inverse power relation

S / c = 1 / f p

            that the logarithms are linearly related with slope -p:

log S = log c - p.log f.


Bolognesi, T.(1983). Automatic composition: experiments with self-similar music. Computer Music Journal, 7(1), 25-36.

Boon, J-P., Noullez, A. & Mommen, C.(1990). Complex dynamics and musical structures. Interface, 19, 3-14.

Dodge, C. & Bahn, C.R.(1986, June). Musical fractals. Byte, pp.185-196.

Gardner, M.(1978, 4). White and brown music, fractal curves and one-over-f fluctuations. Scientific American, pp.16-32.

Mandelbrot, B.(1983). The Fractal Geometry of Nature. New York: W.H.Freeman. (Re music: pp.374-375.)

Meyer, L.B.(1989). Style and Music: Theory, History, and Ideology. Philadelphia: University of Pennsylvania Press.

Press, W.H., Flannery, B.P., Teukolsky, S.A. & Vetterling, W.T.(1988). Numerical Recipes in C. Cambridge: Cambridge University Press. (With computer programs.)

Spyridis, H. & Roumeliotis, E.(1983). Fourier analysis and information theory on a musical composition. Acustica, 52, 255-256.

Voss, R.F.(1975). 1/f noise: diffusive systems and music. Unpublished doctoral dissertation, University of California, Berkeley.

Voss, R.F.(1988). Fractals in nature: from characterization to simulation. In Peitgen, H-O. & Saupe, D. (Eds.), The Science of Fractal Images (pp.21-70). New York: Springer-Verlag.

Voss, R.F. & J. Clarke.(1978). "1/f noise" in music: Music from 1/f noise. Journal of the Acoustical Society of America, 63(1), 258-263.

Yaglom, A.M.(1986). Correlation Theory of Stationary and Related Random Functions I. New York: Springer-Verlag.

Nettheim Home