Artigo Acesso aberto Revisado por pares

Animal Communication: Big Talkers and Small Talk

2007; Elsevier BV; Volume: 17; Issue: 7 Linguagem: Inglês

10.1016/j.cub.2007.02.007

ISSN

1879-0445

Autores

Kevin G. Munhall, Stacie K. Byrne,

Tópico(s)

Music and Audio Processing

Resumo

Vocal tract resonances, known as formants, are important perceptual cues for the identification of human speech and animal calls. A recent study shows that monkeys can also use formants to determine the age and size of the monkey producing a call. Vocal tract resonances, known as formants, are important perceptual cues for the identification of human speech and animal calls. A recent study shows that monkeys can also use formants to determine the age and size of the monkey producing a call. On Regina Spektor's album 'Soviet Kitsch' her younger brother whispers her name repeatedly and whispers questions to her. Even from a single listening, it is obvious that this is a child's voice. Yet, it is a whisper with no vocal pitch to act as a cue for age. From anecdotal evidence such as this, and a growing body of research [1Ghazanfar A.A. Turesson H.J. Maier J.X. van Dinther R. Patterson R.D. Logothetis N.K. Vocal tract resonances as indexical cues in rhesus monkeys.Curr. Biol. 2007; 17: 425-430Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar], it is clear that the sounds that humans and other species produce carry much information about the speaker beyond the messages they transmit. Like bullets shot from a particular gun, vocal acoustics bear the traces of the acoustic tube they pass through. The British phonetician, David Abercrombie [2Abercrombie D. Elements of General Phonetics. Aldine Publishing Co, Chicago1967Google Scholar], drawing from C.S. Peirce's philosophy of signs, coined the term indexical properties to refer to the aspects of sound production that convey information about the producer. In Abercrombie's view, indexical properties include a wide range of factors, such as regional accents, physical or mental states and, most important biologically, the size and morphology of the speaker's vocal tract. For more than 50 years (for example [3Chiba T. Kajiyama M. The Vowel: Its Nature and Structure. Tokyo-Kaiseikan Pub. Co. Ltd., Tokyo1941Google Scholar, 4Fant G. Acoustic Theory of Speech Production.Second Edition, 1970. Mouton, The Hague, Netherlands1960Google Scholar]) a model of sound production that formalizes the relationship between the area function of the vocal tract and the acoustics has been the dominant framework for understanding human speech production and the vocalization of other species. In this source–filter framework, vocal tract dimensions determine the resonances of the airway above the larynx and thus filter the acoustic energy emanating from the sound source. Generally, the length of the vocal tract is a major determinant of the distribution of resonances or formants. If the vocal tract is modeled as a straight tube open at one end, the first three formants are given by these formulas, where c is the speed of sound and L is the vocal tract length:F1=c/4LF2=3c/4LF3=5c/4LThus, everything else being equal, the formant frequencies are indexical properties for the size of the vocal tract. This specification of vocal dimensions by the acoustic resonance patterns has an added evolutionary significance if two conditions are met. First, some aspect of vocal tract dimension must be correlated with body size or other physical measures that may be related to fitness. Second, listeners must be able to match the acoustics with these body size characteristics. In a paper published recently in Current Biology, Ghazanfar et al. [1Ghazanfar A.A. Turesson H.J. Maier J.X. van Dinther R. Patterson R.D. Logothetis N.K. Vocal tract resonances as indexical cues in rhesus monkeys.Curr. Biol. 2007; 17: 425-430Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar] provide evidence for the first time in monkeys for this latter condition. Specifically, they show that rhesus monkeys can match the faces of two different-aged monkeys with their appropriate call. Using a method adapted from the infant perception literature, Ghazanfar et al. [1Ghazanfar A.A. Turesson H.J. Maier J.X. van Dinther R. Patterson R.D. Logothetis N.K. Vocal tract resonances as indexical cues in rhesus monkeys.Curr. Biol. 2007; 17: 425-430Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar] found that rhesus monkeys had a visual preference for videos of monkey faces that were consistent in age and size with the distribution of formants in a call that was played along with the film clips. Like a whispered utterance, the monkey calls used in their experiment had no vocal pitch cues that could be used to distinguish the age or size of the monkey that produced the call. Using a sophisticated formant synthesis technique, STRAIGHT [5Kawahara H. Masuda-Kasuse I. de Cheveigne A. Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous- frequency-based F0 extraction: a possible role of repetitive structure in sounds.Speech Comm. 1999; 27: 187-207Crossref Scopus (1499) Google Scholar], the authors created calls in which the vocal pitch, duration and amplitude contours were similar but the formant distributions were those produced by either a larger or smaller vocal tract (10 cm versus 5.5 cm in length). While a number of previous studies had shown that nonhuman primates could be trained to discriminate vowel stimuli that varied only in formant structure [6Sommers M.S. Moody D.B. Prosen C.A. Stebbins W.C. Formant frequency discrimination by Japanese macaques (Macaca-Fuscata).J. Acoust. Soc. Am. 1992; 91: 3499-3510Crossref PubMed Scopus (74) Google Scholar] or could even do so spontaneously [7Fitch W.T. Fritz J.B. Rhesus macaques spontaneously perceive formants in conspecific vocalizations.J. Acoust. Soc. Am. 2006; 120: 2132-2141Crossref PubMed Scopus (75) Google Scholar], Ghazanfar et al. [1Ghazanfar A.A. Turesson H.J. Maier J.X. van Dinther R. Patterson R.D. Logothetis N.K. Vocal tract resonances as indexical cues in rhesus monkeys.Curr. Biol. 2007; 17: 425-430Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar] have shown something far more significant. They demonstrated that untrained monkeys spontaneously associate visual age and size cues with acoustic stimuli that varied in formant patterns. As has been shown in humans [8Smith D.R.R. Patterson R.D. Turner R. Kawahara H. Irino T. The processing and perception of size information in speech sounds.J. Acoust. Soc. Am. 2005; 117: 305-318Crossref PubMed Scopus (156) Google Scholar], the monkeys know that size scales inversely with formant frequency. Bigger and older monkeys produce the same calls as smaller and younger monkeys with lower formant frequencies. The literature documenting the correlations between body size and formants is not, however, as straightforward as these perceptual results might lead one to believe. It is true that, when the analyses are computed across a broad age range, the correlations are relatively strong between body size and a variety of acoustic parameters [9Ey E. Pfefferle D. Fischer J. Do age- and sex-related variations reliably reflect body size in non-human primate vocalizations? A review.Primates. 2007; (Jan 17 [Epub ahead of print])PubMed Google Scholar]. For example, human vocal pitch, formant frequencies and durations of productions systematically change with age [10Lee S. Potamianos A. Narayanan S. Acoustics of children's speech: developmental changes of temporal and spectral parameters.J. Acoust. Soc. Am. 1999; 105: 1455-1468Crossref PubMed Scopus (481) Google Scholar], as does body size and vocal tract dimensions (documented in an unpublished doctoral thesis 'An articulatory model for the vocal tracts of growing children' by U.G. Goldstein, MIT, 1980). Figure 1 shows the growth curves for height and vocal tract length. As Ghazanfar et al. [1Ghazanfar A.A. Turesson H.J. Maier J.X. van Dinther R. Patterson R.D. Logothetis N.K. Vocal tract resonances as indexical cues in rhesus monkeys.Curr. Biol. 2007; 17: 425-430Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar] point out, however, the literature on the relationship between adult size and formants is more complex and varied. Fitch [11Fitch W.T. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques.J. Acoustic. Soc. Am. 1997; 102: 1213-1222Crossref PubMed Scopus (483) Google Scholar] reported that formant frequency and formant dispersion correlated well in his sample of rhesus macaques, but in a study of body size and formant frequencies in adult humans, Gonzalez [12Gonzalez J. Formant frequencies and body size of speaker: a weak relationship in adult humans.J. Phonetics. 2004; 32: 277-287Crossref Scopus (102) Google Scholar] found only modest correlations with larger correlation coefficients being observed for females. In contrast, Rendall et al. [13Rendall D. Kollias S. Ney C. Lloyd P. Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic allometry.J. Acoust. Soc. Am. 2005; 117: 944-955Crossref PubMed Scopus (171) Google Scholar] found a correlation between height and formants (particularly the fourth formant) only for human males. Collins [14Collins S.A. Men's voices and women's choices.Anim. Behav. 2000; 60: 773-780Crossref PubMed Scopus (260) Google Scholar] reported no significant correlations between adult human body size measures and formants. The reasons for this variability are diverse. The simple relationship between tube length and formant frequencies summarized in the equations above is for unconstricted tubes, but humans and other animals have great ability to deform the vocal tract during articulation and thus modify the formant structure of their utterances. We do so to produce different vowels, as animals do for different calls or even to seem larger (for example, by lowering the larynx [15Fitch W.T. Reby D. The descended larynx is not uniquely human.Proc. R. Soc. Lond. B. 2001; 268: 1669-1675Crossref PubMed Scopus (276) Google Scholar]). The virtuoso voice actor, Mel Blanc, used a wide range of vocal tract configurations and larynx heights to achieve the many different cartoon voices he produced. Another potential reason for the variability in correlation measures is that the studies are sampling a very large number of potential acoustic cues for size. Formant frequencies, for example, are good descriptions of the sound structure of speech and animal calls but they are far from complete descriptors [16Bladon R.A.W. Arguments against formants in the auditory representation of speech.in: Carlson R. Granstrom B. The Representation of Speech in the Peripheral Auditory System. Elsevier Biomedical Press, Amsterdam1982Google Scholar]. Failure to find consistent correlations may simply indicate that the acoustic parameters being tested are not suitable candidates for judging size not that there are no such acoustic parameters. The perceptual system is computationally sophisticated and may find statistical regularities that involve some initial analysis. Witness the evidence for the use of depth cues in the auditory signal to 'compute' the perception of audiovisual synchrony [17Alais D. Carlile S. Synchronizing to real events: subjective audiovisual alignment scales with perceived auditory depth and speech of sound.Proc. Natl. Acad. Sci. USA. 2005; 102: 2244-2247Crossref PubMed Scopus (73) Google Scholar]. In the end, the best evidence that there is auditory information for size may come from perceptual studies. It is clear that there are strong biases in people's judgments of size based on auditory recordings [15Fitch W.T. Reby D. The descended larynx is not uniquely human.Proc. R. Soc. Lond. B. 2001; 268: 1669-1675Crossref PubMed Scopus (276) Google Scholar]. More studies like that of Ghazanfar et al. [1Ghazanfar A.A. Turesson H.J. Maier J.X. van Dinther R. Patterson R.D. Logothetis N.K. Vocal tract resonances as indexical cues in rhesus monkeys.Curr. Biol. 2007; 17: 425-430Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar] may help us uncover the possible sources of these perceptual decisions and discover which information is a reliable indexical cue for body size.

Referência(s)
Altmetric
PlumX