Artigo Revisado por pares

Automatic Long-term Loudness and Dynamics Matching

2001; Audio Engineering Society; Linguagem: Inglês

ISSN

1549-4950

Autores

Earl Vickers,

Tópico(s)

Music Technology and Sound Studies

Resumo

Traditional audio level control devices, such as automatic gain controls (AGCs) and compressors, generally have little or no advance knowledge of the dynamic characteristics of the remainder of the current audio program. If such advance knowledge is available (i.e., if audio files can be pre-analyzed), it becomes possible to match desired values of overall loudness and dynamics. We introduce two new measures, “long-term loudness matching level” and “dynamic spread,” and present new methods for long-term loudness and dynamics matching. 0 INTRODUCTION Loudness is a subjective measure relating to the physical sound pressure level (SPL) as perceived by the human ear. A number of devices have been created for controlling audio levels to modify either a signal’s loudness or its dynamic change in loudness. Automatic Gain Controls (AGCs) are typically used to minimize loudness differences between audio programs (for example, between one song and the next). Compressors are similar to AGCs but operate on a faster time scale; they are primarily intended to minimize the loudness changes within a single song or audio program [1, 2]. Compressors have a number of uses, including increasing the loudness of the softer parts of an audio program so they can be heard above the noise floor (e.g., for automotive listening), decreasing the loudness of the loudest segments (for example, to avoid disturbing neighbors during late-night listening), and keeping signal levels within technical limits required for radio broadcast. Compressors and AGCs typically operate in real-time with little or no advance knowledge of the contents of the remainder of the current audio program. It seems likely that if we had additional information about the dynamic characteristics of the audio program as a whole, we could do a better job of matching a desired loudness or dynamic behavior. Since music data is often stored in sound files on computer hard drives, we are in a position to generate and use loudness metadata in order to improve performance and reduce artifacts. In this paper, we present a method for matching the loudness of an entire song or sound file to a desired level using a novel measure, “long-term loudness matching level.” In addition, we present a compressor that analyzes the dynamic characteristics of a sound file and matches the output to a desired statistical behavior, using a new measure called “dynamic spread.” This prevents over-compressing audio that already has limited dynamics. VICKERS AUTOMATIC LOUDNESS AND DYNAMICS MATCHING 2 One side effect of dynamic compression is that it can alter the overall loudness in a way that may vary from one recording to the next, making it difficult to perform post-compressor loudness matching if the compression is done in real-time. Therefore we present a method for estimating the effect of any given compressor settings on a particular sound file, so we can automatically compensate by scaling the gain accordingly. 1 LONG-TERM LOUDNESS MATCHING Normalization is a way of matching the levels of multiple sound files by scaling each one to the maximum extent possible without clipping. Unlike traditional compressors and AGCs, which operate in real-time with minimal look-ahead capability, normalization operates on a sound file as a whole, applying a single gain to the overall signal. By examining the entire sound file in advance, the normalizer is able to scale the audio without making any (possibly unwanted) gain adjustments during playback. Unfortunately, there is no guarantee that two normalized sound files will sound equally loud. The peak amplitude of a song is not a very robust measure of its loudness. What we actually want is to normalize the perceived loudness, not the peak amplitude. While a number of attempts have been made to define and quantify the loudness of a single, short-duration tone [3-5], there is little agreement as to how to combine a series of short-term loudness values to define the loudness of an extended, dynamically changing signal such as an entire song. I. Allen, in an analysis of the loudness of movie soundtracks [6], determined that the equivalent loudness,             ∫ = T m dt P t P

Referência(s)