Speech Timing

Speech Timing

Much of my research has been concerned with the control and coordination of the organs of speech, viewing speech as a complex kind of action (and for the most part blithly ignoring its linguistic content). Two experimental paradigms have been developed, called Speech Cycling and Synchronous Speech.


[Speech Cycling]     [Synchronous Speech]


Speech Cycling

Speech Cycling is a method for uncovering large scale rhythmic influences on timing in speech, and was developed by analogy with similar work on limb coordination. Scott Kelso and co-workers spent much energy observing the constraints that limit the stability and form of patterns which arise when two effectors (limbs, fingers) are wagged cyclically. It is easy to prove to oneself that there are two, and only two, stable forms of organization when two hands are wagged with the same frequency. In one, the hands are at the same point in their respective cycles at all times (the in-phase pattern), while in the other, one hand is starting its cycle just as the other is half-way through its cycle (the anti-phase pattern). This model system has many many interesting properties which cannot be followed here.

The Speech Cycling task is designed to see if similar constraints apply to the rhythmic production of speech. Subjects repeat a short phrase, such as “big for a duck”, in time with a regular series of metronome beeps. We then look for regularities in the speech timing as a function of the inter-beep interval (we call this the Phrase Repetition Cycle), rate, etc.

Targeted Speech Cycling is a development of this basic idea, in which the metronome series consists of alternating high and low beeps. Subjects have to try to align the beginning of the phrase with the high beep, and the onset of the second stress (“duck”) with the low beep. By varying the relative timing of the high and low beeps, we can see whether subjects are constrained in the form of speech timing they can produce. For English speakers, three (and only three) patterns are commonly found, each of which corresponds to a ‘simple’ rhythmic pattern in which the stress foot (interval between the onsets of stressed syllables) is neatly nested an integral number of times within the Phrase Repetition Cycle.

Credits

Speech cycling was developed at Indiana University together with Bob Port and Keiichi Tajima.

References

Cummins, F. (2002).
Speech rhythm and rhythmic taxonomy.
In Proceedings of Prosody 2002, pages 121–126
Aix en Provence.
Cummins, F. and Port, R. F. (1998).
Rhythmic constraints on stress timing in English. Journal of Phonetics, 26(2):145–171.

Synchronous Speech

Synchronous Speech is obtained with the simple expedient of having two subjects read a prepared text together, with the minimal instruction to attempt to maintain synchrony (cummins, 2002). The reason for constraining subjects in this manner is perhaps best appreciated by analogy with the difficult task of attempting to reconstruct a musical score, based only on a recording of a specific musician (Heijink et al., 2000). This task in interestingly similar to the work of the theoretically minded phonetician, who attempts to uncover control and timing information, along with combinatorial units, from the continuous stream of speech.

If one were faced with this task, it is worth considering which musician would give one more tractable data: the soloist, or the 14th violin player in the string section. Neither will reproduce the durations (or pitches) specified in their score exactly, of course, due to the inherent underspecification of the score. Both players will overlay some inherent biophysical noise, along with conventional timing variability, such as the predictable decellerando at the end of a phrase. The soloist will add additional complexity, however, in keeping with her role as the expressive focus in performance, making the inverse mapping from the recording to the score considerably more difficult.

We have learned much about Synchronous Speech recently. It provides a simple tool for quickly and naturally reducing inter-subject variability in timing, especially at the level of the phrase and the pause. The references below provide the details. Links to manuscripts and further reading are available at my publications page.

References

Cummins, F. (2003).
Practice and performance in speech produced synchronously.
Journal of Phonetics, 31(2):139-148.
Zvonik, E. and Cummins, F. (2003).
The effect of surrounding phrase lengths on pause duration.
In Proceedings of EUROSPEECH, pages 777-780, Geneva, CH.
Cummins, F. (2002).
On synchronous speech.
Acoustic Research Letters Online 3(1), 7–11