Dig That Lick (DTL): Analyzing Large-Scale Data for Melodic Patterns in Jazz Performances
2021; University of California Press; Volume: 74; Issue: 1 Linguagem: Inglês
10.1525/jams.2021.74.1.195
ISSN1547-3848
Autores Tópico(s)Neuroscience and Music Perception
ResumoDig That Lick (DTL) is a multidisciplinary research project that ran from 2017 through 2019 as an awardee of the Trans-Atlantic Platform for the Social Sciences and Humanities Digging into Data Challenge. The project represents a significant international partnership between principal investigator Simon Dixon (Queen Mary University of London, UK), three national project leaders—Hélène-Camille Crayencour (National Center for Scientific Research, France), Martin Pfleiderer (University of Music Franz Liszt, Weimar, Germany), and Gabriel Solis (University of Illinois at Urbana-Champaign, USA)—and an international team of researchers.1 DTL received funding from the Economic and Social Research Council (UK), the French National Research Agency, the Deutsche Forschungsgemeinschaft, and the National Endowment for the Humanities (USA). DTL's aim, broadly speaking, has been to bring together methods from advanced music information retrieval (MIR), empirical musicology, and jazz studies to develop the following tasks: to "enhance existing infrastructures for the deployment of semantic audio analyses over large collections"; to facilitate access to those infrastructures (defined as "large audio and metadata collections") via a series of web-based interactive search tools; to use these tools to analyze the ways in which melodic patterns appear across a large jazz corpus; to begin to trace lines of influence across different spheres of jazz's history; and to "convince musicologists" of the value of this mode of inquiry. These aims are distilled from the summary presentation given at the "Round 4 Digging into Data Challenge Conference."2 Here "infrastructures" means (a) existing and new tools for transcribing, encoding, and analyzing the sound sources that ultimately comprise large corpora of improvised jazz solos; (b) the corpora themselves; and (c) the interactive tools that allow researchers to access the data and metadata contained in the corpora.In order to understand the scope of the project, we should begin with the data collections themselves. There are four such collections, which function mostly independently but, as we will see, operate in tandem in DTL's most recent interactive output, the Pattern Similarity Search. The first large data collection is the Weimar Jazz Database (WJD), a repository of 456 jazz solos transcribed manually by "expert transcribers" as part of the Jazzomat Research Project.3 This is a remarkable resource. Each entry includes discographic and other metadata information, a MIDI rendering of the transcribed solo (in visual and audio representation), a staff notation transcription, a skeletal version of the song's harmonic progression in lead sheet notation, and statistical information about additional musical parameters (number of notes in the solo, number of measures and choruses, mean tempo, "event density," median swing ratio, distribution of event onsets in terms of mean metric concentration and "syncopicity," and the solo's overall ambitus). These are followed by a series of histograms providing information about pitch class and pitch, interval and contour, onset presence by metrical position, and more. Figure 1 presents a small cross section of these data for Don Byas's 1945 recording of "Out of Nowhere." The MIDI piano roll transcription is provided in figure 1a, general statistics in figure 1b, and two sample histograms—precise semitone intervals and what the authors call "refined contour" or "fuzzy intervals" respectively—in figures 1c and 1d.The 456 solos in the WJD span the years 1925 (for example, the Louis Armstrong Hot Fives recording of "Gutbucket Blues"; both Armstrong's and Kid Ory's solos are included) through 2009 (Chris Potter's "Rumples"). It is unclear how the artists and solos were selected for inclusion; I will discuss the implications of this and whether it poses a problem in terms of data set analysis below. Most of the soloists are saxophone or trumpet players, with just a few exceptions (clarinetist Benny Goodman, seven trombonists, guitarists John Abercrombie, Pat Martino, and Pat Metheny, vibraphonists Lionel Hampton and Milt Jackson, and pianists Red Garland and Herbie Hancock). No women are included, and only two non-US artists (Abercrombie and Kenny Wheeler) appear. The comparatively tiny number of non-male-identifying artists in jazz's recorded archive is a historical reality, of course, but it would have been good to see the WJD team (and likewise the DTL team; see below) working to actively construct a more diversely representative data set for their project.Designed and implemented by a Weimar-based team led by Martin Pfleiderer, the WJD preceded and grounds the DTL project; many of the research outputs found on DTL's publications page at the time of this writing originate from this earlier project. The WJD's most significant research output, other than the database itself, is the Pattern History Explorer, an interactive online tool that will be described below.Transcribing solos is a time-consuming and frankly contentious process, fraught with ontological and ethical perils.4 One aim of DTL has been to develop a robust MIR tool for automating the transcription process, but also to examine the efficacy of using such a tool in the first place, given the tendency of MIR procedures to introduce errors at various steps of the transcription and metadata assimilation processes.5 DTL's second big-data collection, the DTL1000, was generated using advanced source filter techniques to automatically extract pitch data for jazz solos from their complex sonic environments.6The DTL1000 is a database of "more than 1,700 solo parts from over 1,000 audio tracks, automatically extracted using state-of-the-art machine learning technology."7 The tracks were randomly selected from jazz collections housed at the different researchers' home institutions, and represent a hundred years of jazz, with one hundred tracks per decade from 1920 to 2019. The roster of soloists represented in the database is available as a downloadable Numbers file. The identity of quite a few soloists is not determined. For example, there are numerous recordings for which two or more tenor saxophonists are listed as the "possible soloist," which is a small but important (and fixable) problem. There are hundreds of jazz musicians who could be employed to reliably clarify whether Dexter Gordon or Wardell Grey, or Al Cohn or Zoot Sims, or Jim Pepper or Joe Lovano (!) is the soloist in any particular instance. This is not trivial, because having accurate data at all stages of the process is key to some of the hypotheses about lineages and trends that DTL is proposing. Furthermore, I would like to have seen more careful determination about inclusion in the early stages of the project, responding to at least the following questions: How is the DTL team defining "jazz," where do its boundaries lie, and what is the relationship between mainstream and margin?8Similarly, how is the relation addressed between what we might broadly call "tonal" jazz (largely based on repertoire from the Great American Songbook) and contexts like modal jazz and idiosyncratic harmonic practices, which would certainly have an effect on what kinds of melodic patterns appear?How might random selection of a large but not very large corpus either reproduce existing ideologically charged historiographical lineages (this seems particularly likely given that selections were taken from university holdings) or elevate certain artists to an arbitrarily prominent position in the history of a particular pattern? In other words, it seems important to have a clear understanding of which artists are represented to what degree (and, importantly, which artists are not represented), as these questions have an effect on the statistical takeaways that DTL is rightfully centering as a significant output. This extends the concern about gender and geographic representation expressed above to considerations of diverse performance-practice trajectories, and the ways all of these can potentially challenge sedimented ideas about jazz's canonic figures and practices.These questions notwithstanding, the DTL1000 database is an impressive piece of work. For each represented recording, at least one improvised solo was automatically transcribed using an algorithm that parses sound events at the pitch level (rather than the frame level)9 and analyzes timbral information to extract each solo from its sonic environment. Each solo is then rendered as MIDI data and tagged with comprehensive n-gram (short, overlapping strings of contiguous events) metadata, parsing the melodic surface into pitch and interval strings up to twenty notes (or nineteen intervals) in length. Figure 2 shows the total number of discrete n-grams by cardinality in the DTL1000 database (similar charts are provided for all four databases). This information is what is collated and compared when different search functions are performed using the Pattern Similarity Search tool, described below.The DTL1000 database is both more and less comprehensive than the WJD. While the earlier project compiles many different forms of metadata that may be used by researchers in untold productive ways, the DTL1000 database focuses on strings of tone events, with (as we will see) a few additional data points like artist names and recording dates. One next step that I hope will occur is the construction of a master list, organized by artist and including both comprehensive metadata and realizations of the transcriptions in musical notation, such as are found in the WJD. More important, it is my hope that both databases become living documents that are continuously expanded and refined.The third data collection draws from a source that is well known among jazz musicians, Jamey Aebersold's Charlie Parker Omnibook.10 While the Omnibook has been a go-to source for aspiring jazz musicians for many years, its many errors are also well known. I would encourage the DTL team to revisit this facet of the database project and turn to Michael Van Bebber's more recent anthology, which corrects "over 1,000 errors contained in the original 'Omnibook.'"11 It is, of course, quite problematic to make nuanced assertions about melodic gestures on the basis of details of their pitch and interval content when many of those details are incorrect.Lastly, DTL incorporates data from the Essen Folk Song Collections. This database serves as a useful control, as seen, for example, in a study by Klaus Frieler, where at least one type of short melodic gesture, the "chromatic approach [-2, 1]," is shown to occur at a significantly different rate in jazz from that at which it occurs in a large folksong corpus: a "comparison with the Essen Folk Song Collections shows that there the approach [-3, 1] occurs about 100 times more often than [-2, 1], whereas in jazz these two are about equally frequent."12 This should probably not be surprising, since [-3, 1] (a descending minor third followed by an ascending semitone) occurs frequently in diatonic contexts—for example, in a major key (or tonal minor)—whereas [-2, 1] (a descending whole step followed by an ascending semitone) is not native to any diatonic or modal context but requires a chromatic (or dihemitonic) substrate. There are many chromatic approach figures in jazz—this is an important piece of learned post-bebop "vocabulary" and can involve incomplete chromatic neighbor tones or altered chord tones or extensions (see example 1). Even so, while findings like these are not the least bit surprising, it is valuable to see the empirical corroboration.These four collections provide the data with which the DTL team embarks on their primary task, which is to begin to better understand how improvised jazz solos are constructed. This is an ambitious undertaking, and while it is not entirely clear that a computational model is the right tool for the job, much is revealed (and a certain amount obscured) in the process. But I am getting a little ahead of myself: first, how does DTL's analytic project unfold, and what does it reveal about jazz improvisation's creative process?Between DTL's Weimar predecessors and the DTL team itself, three innovative interactive web tools have been developed, each of which takes a different approach to exploring how melodic patterns appear and translate across its corpora of transcribed solos. First is the Pattern History Explorer (PHE).13 Like the WJD, most of the work that went into the PHE preceded the DTL project, but they are intimately tied together, and understanding the richly networked nature of DTL requires engaging the PHE's tools and understanding the impetus behind the project. Second is the DTL Pattern Search, which builds upon the PHE model. Third is the DTL Pattern Similarity Search, which introduces a host of additional tools for comparing patterns within a range of similarity constraints that can be set by the user.The PHE uses data from the WJD and the Charlie Parker Omnibook.14 For the pattern comparison project, the intervallic content of the entire data set was mined to find recurring sequences of at least six intervals (seven tone events). An interval sequence was deemed a "pattern" if it occurred "in at least three different solos of at least one musician."15 There are two exceptions, Bob Berg and Charlie Parker, for whom, because they are represented by a considerably larger number of solos, a slightly different set of occurrence criteria were adopted.With these criteria in place, the algorithm extracted a total of 653 discrete patterns across the corpus. The PHE tool allows the user to select any pattern from this list using a drop-down menu. Searches can be refined (a) by entering a short string of contiguous intervals that a smaller family of patterns might share; (b) by entering an artist's name; and (c) by pattern length, to a maximum of twenty intervals. Some interesting potential themes resulted from just a few minutes in the sandbox. For example, when "9" (a major sixth; all intervals are measured in semitones) is entered in the "search pattern" tool, the total field of 653 patterns is reduced to thirteen (with between three and seventeen occurrences of each represented pattern). This includes all patterns with an ascending ("9") or descending ("-9") major sixth. When I entered "3, 3, 3" (an ascending diminished seventh chord), eighteen patterns came up—interestingly, no patterns came up when I searched for a descending diminished seventh chord, "-3, -3, -3." Twenty-three patterns came up when I entered "4, 3, 4" (an ascending major seventh chord starting on the root), as opposed to only two from the corpus containing the inversion "-4, -3, -4." When I entered "-6" (a descending tritone), again two patterns came up, suggesting that patterns of at least six intervals that include a descending tritone are rare indeed.16The roster of patterns is organized by number of occurrences; so, for example, when no search terms are entered and we can see all 653 patterns in the pull-down menu, the first pattern listed ("-1, -1, -1, -2, -2, -1," which we might describe as a descending diatonic scale segment with a chromatic passing tone inserted before either the second or the third term) occurs 127 times. When we select this pattern, we are then given comprehensive data about its occurrences, which we can examine using five tools: "Listen and See," "Instances," "Stats," "Timeline," and "General Stats." "Listen and See" prompts us to sort occurrences by year of recording, performer name, chord progression information, or starting pitch of the pattern (see figure 3). It includes, quite impressively, the pattern as it occurs in its context, transcribed in staff notation, as well as a short audio snippet, which for this reader is absolute gold, since hearing the pattern even in relative isolation from what comes before and after it goes a long way toward understanding what it is these patterns are doing as building blocks of jazz's melodic syntax."Instances" summarizes some of the key data from "Listen and See" and adds details from the WJD such as harmonic context, generalized IOI categories and metric position, and more. "Stats" in turn shows us pitch-class set information, data about which pitch class the pattern tends to start on, who in the corpus study used it first and/or most, and more. At first glance, I thought the pitch-class information might be a red herring, much less significant than the functional intervallic morphology of different patterns, but there is much to be gleaned from, for example, understanding how different pitch gestures "fit" on a particular instrument, which might have much to do with their relative rate of appearance beginning on one pitch or another. The "Timeline" tool shows when a particular pattern appeared along a temporal x-axis, with performers listed along the y-axis. "General Stats" provides data from across the entire corpus on things like number of pattern occurrences measured against pattern length and relative frequency of diatonic versus chromatic versus mixed patterns, different general contour classes,17 and more (see figure 4).In the timeline view (here and in both DTL tools discussed below) we see one of the potential flaws of the WJD and the way the PHE extrapolates from its parent corpus. While an argument might be made that the corpus is "representative enough" of the history of recorded jazz—this is, of course, how statistics work, by extrapolating from a random sample across a larger population—I worry very much that the historical and social development of jazz's so-called "vocabulary" does not work that way; that a study like this reifies certain kinds of historiographical narratives, such as that there is "a language" of bebop rather than what Gilles Deleuze and Félix Guattari would describe as "only a throng of dialects, patois, slangs, and specialized languages."18 This is an important question to keep in mind—especially to the extent that it overlaps with concerns about inclusion and representation—but it does not erase the fact that much of interest is revealed through the PHE tool. Frieler et al. outline three preliminary observations based on analysis of the data that the tools make available. First, "Pattern usage varies with performers and appeared with bebop."19 The first point seems perhaps too obvious to mention, while I am not convinced that the second claim can be made unequivocally just yet. Second, "The more frequent an interval pattern, the more tonally flexible it is." This seems sensible, and continued contextual analysis of when and how different patterns occur will likely corroborate this initial observation. And third, "Patterns are mostly simple and reflect common rehearsal routines." This also seems sensible at first, but I wonder if it begs the question: I am not sure to what extent a scale snippet (including one with chromatic passing tones), for example, ought to count as a pattern.The PHE was introduced in 2018, around the same time the DTL project began. As Frieler et al. describe, "While pre-computing a set of patterns is helpful in regard to the exploratory approach of the Pattern History Explorer, searching for instances of arbitrary patterns of any length and frequency of occurrence within a database requires a different approach."20 This additional research need prompted the development of the Pattern Search. In the Pattern Search, users can input pitches using a virtual keyboard or intervals by typing integers as in the PHE. A number of "transformations" can be applied to the search, the most interesting to me being "fuzzy intervals," which yields intervallic patterns that are similar but not necessarily equivalent to those in the search term. It is not clear from any of the documentation how the "fuzzy interval" algorithm works. But when I inputted "-4, -3, -4, 9, -4, -3, -4" (two descending major seventh arpeggios a whole step apart) and selected "fuzzy intervals," my search terms changed to "-2, -2, -2, 4, -2, -2, -2" and yielded twenty-five instances of a roughly generalized shape (down, down, down, up, down, down, down, each "down" being some kind of third, and the central "up" being a sixth or seventh in most cases, with one outlier—a thirteenth!). An effort to understand the transformation tool mostly came up short: I was redirected to a Jazzomat page with an introduction to the melpat programming language that included a list of "suitable transformations" for the task at hand; strangely, the "fuzzy intervals" transformation was one of five transformations (of a total of thirteen) for which no further information was offered.21 Similarly, I tried to replicate a search described by Frieler et al. using the "chordal pitch class" transformation,22 but was unable to do so as the search field would not recognize some of the characters I needed to enter. For this reason I spent the least time with the Pattern Search platform.The Pattern Search may not be a necessary tool, however, given the functionality of the DTL Pattern Similarity Search (PSS), a Python/Django tool that intensifies the move from a comparatively simple search function based on relatively precise pitch or interval strings to one that puts a good deal of power into the user's hands in terms of what count as similarity criteria and how morphologically congruent one thinks two nonidentical gestures need to be to have what Robert Schultz would call a "family resemblance."23 This is a very different kind of hermeneutic task. No longer are we trying to ascertain a range of precise vocabulary trends for jazz writ large; we are now in a space where we might be able to understand that vocabulary as gesture, as more generalized kinds of shapes that signal inclusion in a shared performative practice but that transcend mere mimicry or intervallic exactitude.Search terms can again be entered using a virtual keyboard or by typing integers into the search field. There are then a number of prompts relating to preserving the contour and ambitus of the pattern being sought, the degree of similarity (on a sliding scale from 65 to 100 percent; setting at 100 percent makes the Pattern Search tool above redundant), the maximum allowable difference in number of event onsets, the extent to which individual "edits" can diverge from the original, and more. "Edits" refers to the number of discrete changes in terms using the Levenshtein distance algorithm, a tool used to measure the number of single changes (additions, deletions, or substitutions) needed to transform one utterance into another. The user can also select to mine from one or more of the four databases. I played with a few segments that I thought would be fairly common but not too common, to see what kinds of results came up. I began with the first melody from example 1 above, with minimum similarity set at the default 80 percent and maximum length difference set at 2, and looking only at the DTL1000 database. This produced 134 similar pattern instances, representing twenty unique shapes. Of these, 79 duplicated my search terms, while 55 were similar, within the given constraints. These were all provided in a list organized by pattern (twenty discrete shapes further organized by similarity distance from the original) and artist name. Further information included the track title and year of recording, soloist's instrument, "style" of jazz, a link to further discographical information, and, extremely valuable, an audio snippet that could be played at speed or slowed down. I went through and listened to all 134 audio snippets: about 80 percent were accurate (and a few were egregiously wrong, viz. Miles Davis's "It's About That Time," Frank Rosolino's "Violets," and Kay Penton's "Heaven's Doors Are Wide Open").Errors notwithstanding, this is already compelling information—not least to begin to hear the many different kinds of harmonic and textural contexts in which this melodic gesture has appeared. There are a number of additional ways to refine one's search, but probably the two most valuable in terms of contextualizing this information are the "timeline" and the "networks." The timeline (a partial snapshot of which is provided in figure 5) plots when the recordings featuring this constellation of patterns and variants were made, making it possible to speculate about a pattern's genealogy, as several DTL team members have done. One visual representation of the network is shown in figure 6: this is the performers pattern group; other possible organization schemes include song titles, recording years, pattern families, and more. In both visual representations, node size shows the degree of similarity to the original, and node shade represents inclusion either in the original or in one of the discrete variants. It is not clear from the tool or any of the accompanying documentation how the artists are arranged circumferentially in the network representation (for example, why some artists occupy the inner circle and others the outer).24I would like to close with five questions that have been slowly cooking as I have worked my way through the constellation of DTL tools, databases, and supporting texts. I have hinted at several of these above. I offer them now as prompts for further development of these research trajectories, not quite as critiques of the project as it currently exists, but in order to stimulate potential new directions and conjunctions.25First, the DTL project hinges, slightly precariously, on an assumption that is encapsulated when Frieler describes the "constructive principles" that animate the particular shape the project has taken, claiming that "no comprehensive and unequivocal classification system for the basic building blocks in monophonic jazz improvisations has been developed so far."26 These two terms, "constructive" and "building blocks," should be considered further, as should the way they frame the "melodic atoms," "midlevel units," and notions of hierarchy through which, for example, the Weimar Bebop Alphabet was constructed.27 In what ways can a computational model of the mind be expanded to engage other theories from cognition studies, neuroscience, and embodiment and action?28 How might we move toward resisting thinking of music as a language—because we do not need to, because we re-understand language as a subset of human discursive activity and not the substrate upon which discourse is grounded?29Second, how does a project like this problematically reproduce old ways of thinking about jazz improvisation as a solo art form? How could it be reimagined in order to align with contemporary work on ensemble interaction and dialogue stemming from music theory and artistic research?Third, how can the rich array of research trajectories that stem from DTL connect to contemporary music-theoretical engagements with jazz theory (for example, different kinds of network-driven analyses), schemata theory, and theories and methods of segmentation?30Fourth, if automated transcription would seem to be the best method for developing a robust database, how would we ensure its accuracy? Without accuracy, of course, the whole project fails from the outset. I envisage a model whereby machines would do a lot of the early heavy lifting, but every transcription would be very carefully vetted by expert transcribers, paying attention to details not only of pitch but also of rhythm, pitch inflection, and more.31Finally, returning to a question I asked above, how important is it to attend carefully to matters of style and inclusion, to continue working to grow the database, but also to make some difficult decisions about how to draw lines around the project in terms of repertoire or harmonic context? (My answer: very important!) A related question concerns big data sets and statistics in general, which I describe as losing the human in the stat, with quite dire ethical consequences if not addressed very carefully and sensitively. I am not talking about IRB permissions (although that is another vital question: many of the musicians represented are alive and might appreciate the choice to opt out of a project like this), but about the way in which a turn to the general obscures the particular (which is an epistemological problem) and erases or elides the individual (which is an ethical problem); this has been on my mind a great deal in recent months, when we hear on a given day that "only" x many people died from Covid-19-related issues, never naming or learning anything about the individual victims or their families or friends. So my closing question, restated, is: How can we engage the kinds of trends and types that can be revealed by digging into the data, while also foregrounding the foundational difference that precedes sedimentation into categories? One of the most celebrated aspects of jazz is its valorization of individual voices, so how can we continue to hear those voices even through our normalizing processes?
Referência(s)