Encounters with Language
2012; Association for Computational Linguistics; Volume: 38; Issue: 4 Linguagem: Inglês
10.1162/coli_a_00129
ISSN1530-9312
Autores Tópico(s)Semantic Web and Ontologies
ResumoFirst of all, I am overwhelmed and humbled by the honor the ACL Executive Committee has shown me, an honor that should be shared by the colleagues and students I've been lucky enough to have around me this past decade-and-a-half while I've been engaged in the FrameNet Project at the International Computer Science Institute in Berkeley.I've been asked to say something about the evolution of the ideas behind the work with which I've been associated, so my remarks will be a bit more autobiographical than I might like. I'd like to comment on my changing views of what language is like, and how the facts of language can be represented. As I am sure the ACL Executive Committee knows, I have never been a direct participant in efforts in language engineering, but I have been a witness to, a neighbor of, and an indirect participant in some parts of it, and I have been pleased to learn that some of the resources my colleagues and I are building have been found by some researchers to be useful.I offer a record of my encounters with language and my changing views of what one ought to believe about language and how one might represent its properties. In the course of the narrative I will take note of changes I have observed over the past seven decades or so in both technical and conceptual tools in linguistics and language engineering. One theme in this essay is how these tools, and the representations they support, obscure or reveal the properties of language and therefore affect what one might believe about language. The time frame my life occupies has presented many opportunities to ponder this complex relationship.This story begins in the 1930s and 1940's, in St. Paul, Minnesota. There was nothing linguistically exotic about growing up there, except perhaps the Norwegian-accented English of some of my mother's older relatives. But during much of my childhood I was convinced that I personally had difficulties with language: The symptom was that I could never think of anything to say. I was tongue-tied. I now suspect that it was mainly a problem of the shyness and awkwardness that goes along with growing up confused, and not an actual matter of language pathology. Nevertheless, it led me into my earliest attempt to work with language data.At around age 14, I presented my problem to a librarian in the St. Paul Public library, and she found me a book called 5,000 Useful Phrases for Writers and Speakers. A memorable example was “With a haggard lift of the upper lip…” I took the book home, cut sheets of typewriter paper into eight pieces to make file slips, chose phrases I thought I should memorize, and copied them onto these slips. I held them together with rubber bands, and I kept them in a secret place in my room. Thus supported with the early 1940s technologies of paper, scissors, pencil, and rubber bands, my earliest theory of language began to develop: Linguistic competence is having access to a large repertory of ready-made things to say.I added to the collection over the years, as I came upon clever or wise expressions, and consulted a selection of them every night, scheming to create situations in which I could use them, in speaking or writing. In later years I held on to the suspicion that much of ordinary conversation in real life involves calling on remembered phrases rather than creating novel expressions from rules. Much later I learned that in many Eastern European countries influenced by the Moscow School, the divisions of the field of Linguistics were Phonology, Morphology, Lexicology, Phraseology, and Syntax. The study of phraseological units—phraseologisms—was seen as central, not peripheral, to linguistic inquiry.My first exposure to the actual field of Linguistics came a year later, around age 15, when a missionary lady on leave, living on my block in St. Paul, gave me a copy of Eugene Nida's little book, Linguistic Interludes (Nida 1947). The text of this book takes the form of conversations in a college campus co-op between a clever and wise linguist and a caricatured collection of innocent and unsuspecting students and colleagues, among them a classicist who strongly defended the logical perfection of the classical languages Greek and Latin.This book succeeded in conveying simply many of the things that linguists believe: Relevant linguistic generalizations are based on speech, not writing.Almost all concepts of “correct grammar” are inventions, with no basis in the history of the language.There may be primitive communities, but there are no primitive languages. The minor protagonists in the conversation contested each of these principles, and the linguist hero, from his vast knowledge of the most exotic of the world's languages, kept showing them how wrong they were. I liked the idea of knowing things that most people, including college professors, had wrong opinions about. I also liked the idea of being able to help them change their wrong opinions, so I decided to study Linguistics.Before long I was enrolled in a fairly small linguistics program at the University of Minnesota. I could live at home, take a streetcar to Minneapolis for classes, and take another streetcar to Montgomery Wards in St. Paul, where I wrapped venetian blinds to support my studies.In those days there were no linguistics textbooks in the modern sense; we studied two books titled Language—one by Edward Sapir 1921 and the other by Leonard Bloomfield 1933—and we read grammars and treatises. I took two years of Arabic. I supplemented my training in linguistic methods through Summer Linguistic Institutes put on by the Linguistic Society of America, one in Michigan and one in Berkeley, where I learned about Thai, Sanskrit, and Navajo with Mary Haas, Franklin Edgerton, and Harry Hoijer.One of my professors at the University Minnesota was building concordances of some of the minor Late Latin texts, and he permitted the students in his class to work with him on these projects. For the advanced students this was a chance to get valuable hands-on research experience; for the less advanced students it was an opportunity to get “extra credit.”This was in a sense my first exposure to corpus-based linguistics. For any given document, the professor would pass on the text to that year's students. This “first generation” of students copied word tokens onto separate index cards, together with each word's “parse” in the classical sense, and its location in the document.Generation 2—the students in the next year's class—alphabetized these cards and typed up the concordances. Generation 3, in which I participated, took this same stack of cards and reverse-alphabetized them, so they could be used for research on suffixes. (Personal note: alphabetizing words from right to left is stressful at first, but you get used to it.) So with the tools of pre-cut index cards, a pencil, and a typewriter, we students constructed a concordance—we physically experienced that concordance.So you can imagine my surprise when, thirty-some years later, I came upon UNIX commands like sort, sort -r, and grep. I don't remember if I actually wept. And these were nothing compared to the marvels I experienced later still, with key-word-in-context extraction, lemmatizers, morphological parsers, part-of-speech tagging, sorting by right and left context, and the full toolkit of corpus processing tools that exist today.In those days it took a lot of patience and physical effort to build a concordance. But it also took a lot of patience and physical effort to use a concordance. A printed concordance to the Shakespeare corpus was a vast index in which, for each word, you could find every line it occurred in, and you learned where that line appeared in Shakespeare's writings. You would then go to the actual physical source text, look it up, and see it in its context. For example, if, when studying the phrasal verb take upon I want to find the full context of This way will I take upon me to wash your liver I only need to open up As You Like It to Act 3, Scene 2, and hunt for it there. Compare that to the fully-searchable Shakespeare app you can use while sitting on a bus holding your iPad.President Truman's Displaced Persons Act of 1948–1950 brought thousands of Eastern European immigrants to Minnesota, enabling me to find work more satisfying than venetian-blind-wrapping. I began to teach English to Russians, Poles, Ukrainians, and Latvians. Depending on which of the daughters of the families in my classes I was trying to impress, I was motivated to learn something about Slavic and Baltic languages.Soon my student deferment would run out, and I had to decide between waiting for the draft (two years) or enlisting (three years). A persuasive recruiting officer promised me one year at the Army Language School in Monterey, CA, (now the Defense Language Institute) for my first year. Shortly after that, my head got shaved and I was suddenly a buck private. No one had any record of an offer to spend a year in sunny California learning Polish. I was not allowed to examine my file.So I took the U.S. Army Russian Language Proficiency Test instead. The questions were in spoken Russian, played on a record player, and the answers were multiple choice in English. In those days the art of designing guessproof multiple choice tests had not yet been perfected. There was kind of a student sport to see how well you could do in choosing answers without looking at the questions (you could usually at least get a passing score); then you'd go back and read the questions to correct the choices that weren't obvious.Although I didn't fully understand any of the questions, my score came out as “high fluent” based in large part on acquired test-taking skills. After basic training, I was sent to Arlington, VA, for a few months in radio training, after which I was assigned to Kyoto, Japan, to a small field station of the Army Security Agency. My duty: “listening to Ivan.” The Ivans I listened to on short wave radio never had anything interesting to say: They were Soviet Air Force men reading numbers, which I was supposed to write down. Three days of the day shift, three days evening shift, three days night shift, three days off. I quickly acquired an uncanny ability to detect Russian numbers against noise and static. They were, of course, coded messages.My job was to write the numbers down on the most modern typewriter of the day, a model that had separate keys for zero and one! (The ordinary office typewriter at that time had separate keys for only the numbers 2 through 9, since lower-case L could be used for 1 and upper-case O could be used for zero.) For this work I needed a very restricted vocabulary: the Russian long and short versions of the numbers 1–9,1 plus a single version of zero, and the word for ‘mistake.’ If I had been permitted to say what I was doing I would have said I was in cryptanalysis, but of course actually I was only copying down the numbers I heard. Somebody smart, thousands of miles away, was figuring out what they meant.The limited demands on my time and intellect allowed me to wander around in Kyoto, with notebooks and dictionaries, trying to learn something about Japanese. The linguistic methods I had learned back home stopped at morphology, the structure of words. I hadn't had any training in ways of representing the structure of a sentence, but I worked out a do-it-yourself style of sentence diagrams, for both Japanese and English, and I was fascinated when I found the occasional sentence in Japanese which could be translated into English word by word backwards, going from the end to the beginning.When it was time to be discharged, I believed—wrongly—that I was close to mastering the language, and I wanted to stay another year or two, because I knew I couldn't afford to come back to Japan on my own. I managed, with the help of Senator Hubert Humphrey, to be the first Army soldier to get a local discharge in Japan. As a civilian there, I supported myself by teaching English. With two other visiting Americans I was permitted to work at Kyoto University with the endlessly kind and patient Professor Endo Yoshimoto .Professor Endo was the author of the main school grammar of Japanese and one of the founders of an organization favoring Romanized spelling for Japanese. With his help, my fellow students and I stumbled through old texts and became acquainted with the categories and terminology of the Japanese grammatical tradition.One of the themes weaving through this essay is the reality that it is not possible to represent—in a writing system, in a parse, or in a grammar—every aspect of a language worth noticing. My study of Japanese confronted me with the realization that for any given representation system, it's important to understand what it represents, and what is missing. The Japanese kana syllabary presented me with an early experience of this. The pronunciation of Japanese words is represented by the symbols of a syllabary, but unfortunately the components of complex words in this language, in particular the inflected verbs, are not segmented at syllable boundaries.Some verbs have consonant-final stems followed by vowel-initial suffixes, but this fact is not apparent in the written language. In the examples in Table 1, the verb stem means ‘move’ and it ends in a consonant, /k/. The suffixes all begin with vowels, but the red kana characters do not reveal the boundary between verb and suffix. Table 1Japanese kana and the obscuration of morpheme boundaries.move (plain form)move (polite form)does not movecan moveIt struck me that the written form of a language should not prevent one from discovering its boundaries. I later learned that in 1946 the American linguist Bernard Bloch had published a ground-breaking description of Japanese verb morphology based on a phonemic transcription (collected and republished as Bloch [1970]), allowing the regularities in the system to become apparent.Everyone knows that English spelling is a poor representation for English pronunciation, but it's also true that it is a fairly good representation for recognizing derivationally related words. Consider the second syllable in the three words compete, competitive, competition. If we had to write these words with different letters for the different vowels, we'd be missing something.Yet of course some important generalizations about English can't be captured in the analysis of written English alone. Numerous phonological generalizations require a reduction to phonetic features of various kinds, but there are also grammatical generalizations that are hiding from us because of things like (1) whose (not who's), (2) another (not an other), and the problems that text-to-speech researchers have to face related to the pronunciation of large numbers and indications of currency, like the dollar sign. In post-war Japan, the fact that the kana writing system obscured morphological boundaries merely meant that linguists would use phonemic transcriptions. But as technology has advanced beyond cards and typewriters, supporting efforts such as text-to-speech and automatic speech recognition, we can see that written language obscurations (and affordances) are ubiquitous.While living in Japan I had been keeping track of linguistics goings-on back home, and had heard that one of the best graduate programs for linguistics was at the University of Michigan in Ann Arbor. So when I finally came back to the States, that's where I went. There was a movement in linguistics in those days toward making linguistics more “scientific” by designing so-called discovery procedures for linguistic analysis and I wanted to participate in that work. The basic textbooks in beginning linguistics classes at Michigan typically provided step-by-step procedures for going from data to units, so this movement was well-supported there. Kenneth Pike's Phonemics book had the sub-title: A technique for reducing language to writing (Pike 1947).I had noticed that there were alternative phonemic analyses for both English and Japanese, analyses that resulted in different actual numbers of consonants and vowels. If there's no consistent way to do phonemic analysis, how can we compare different languages with each other, or be confident in answering a simple question like, “how many vowels does this language have?” I resolved to help design the correct discovery procedure for phonemic analysis, founded on the distribution of phonetic primes. For that purpose I studied phonetics in the linguistics department and in the communication sciences program: practical phonetics for field linguistics, acoustic phonetics, and physiological phonetics in the laboratory.During those years I worked part-time on a Russian–English Machine Translation (MT) project with Andreas Koutsoudas and met many MT researchers. I participated in a memorable interview with Yehoshua Bar-Hillel (some of you will remember the outcome of the nationwide tour that included this visit). I also worked with speech researcher Gordon Peterson and mathematician Frank Harary on automatic discovery procedures for phonemic analysis, a project that was eventually abandoned.The speech lab was visited once by a group of engineers who proposed devising automatic speech recognition by detecting the acoustic properties of individual phones and mapping these to phonemes, and pairing phoneme sequences with English words. Ilse Lehiste put a damper on their enthusiasm by asking them to try to consistently distinguish acoustic traces of the two phonemically different English words, “you” and “ill.” They couldn't do this (Figure 1). The properties of the representational system for individual phones would not allow them to get to the second step in their plan. This was obviously before anybody thought of large-vocabulary recognizers based on Hidden Markov Models or statistics-based guesses derived from language models.Eventually it became necessary to take on syntax. At Michigan, sentences were spoken of as having a horizontal (syntagmatic) and a vertical (paradigmatic) dimension. In its horizontal aspect, a sentence could be seen as a sequence of positions. In its vertical aspect, each position could be associated with a set of potential occupants of that position.In the English Department at Michigan, Charles Fries was constructing a grammar of English that was liberated from traditional notions of nouns and verbs and adjectives, counting on purely distributional facts to discover the relevant word classes. In the Linguistics Department, Kenneth Pike was elaborating an extremely ambitious view of language in which, at every level of structure, one could speak of linear sequences of positions, labeled roles naming the functions served by the occupants of these positions, and defined sets of the potential occupants (Pike's preliminary manuscripts appeared in the 1950s and were eventually published as Pike [1967]). Slots, roles, and fillers—it was all very procedural.In the midst of all this, something big happened, and suddenly everything changed. I was among the first in Ann Arbor to read Syntactic Structures (Chomsky 1957). I became an instant convert, and I gave up all ideas of procedural linguistics. The new view was something like this: The grammar of sentences is more than a set of linear structures separately learned.Sentences are generated by hierarchically organized phrase-defining rules.Regularities in the grammar are evidence for rules in the minds of the speakers.The existence of a variety of sentence types is accounted for in terms of the application of rules that move things within, add them to, or delete them from, initial representations.There is no procedural way to learn how language is structured; the linguist's job is to figure out what rules reside in the minds of speakers.Therefore, linguistics is theory construction.The Chomskyan view flourished; universities that didn't have linguistics programs wanted one. After I finished my degree I joined William S.-Y. Wang in the brand new program at The Ohio State University in Columbus. During my decade at Ohio State I was completely committed to the new paradigm. Robert Lees, Chomsky's first student, visited Ohio State for a time, and I spent lots of time talking to him, working on questions of rule ordering and conjunction. While discussing things with him, I wrote a paper on “embedding rules in a transformational grammar” that was the first statement of the transformational cycle (Fillmore 1963).The view represented in Chomsky's Aspects of the Theory of Syntax (Chomsky 1965), with its sharp separation of deep structure and surface structure, became the mainstream, and I worked within it faithfully, participating eagerly in efforts to combine all the rules the young syntacticians had been writing into a single coherent grammar of English, an effort heavily supported, for some reason, by the U.S. Air Force. During this period I felt I knew what to do, and I believed that I understood everything that everybody else in the framework was doing. That feeling didn't last very long.At one point I did a seminar in which a small group of students and I worked our way through Lucien Tesnière's Éléments de Syntaxe Structurale (Tesnière 1959), without necessarily understanding everything in it, and I became aware of a different way of organizing and representing linguistic facts. Anyone who looks closely at syntax knows that it becomes clear very quickly that you can never represent everything about a sentence in a single diagram. Tesnière, my first exposure to what evolved later on into dependency grammar, made me aware of the impossibility of displaying simultaneously the functional relations connecting the words in the sentence, the left-to-right sequence of words as the sentence is spoken, and the grouping of words into phonologically integrated phrases.As an extreme example of the kinds of information a Tesnière-style dependency tree could contain, I offer you his analysis of a complex sentence from the Latin of Cicero. I'm certain many of you will remember this from your high school studies. Est enim in manibus laudatio quam cum legimus quem philosophum non contemnimus? (“There is in our hands an oration, which when we read (it), which philosopher do we not despise?”) It has roughly the same structure as Here's a sentence, while reading which, who wouldn't get confused?Figure 2 presents the diagram, but I'll only point out the connections assigned to one word in it, the relative pronoun quam.Instead of having lines pointing to a single token of the word, Tesnière breaks the word quam into two pieces connected by the broken line at the bottom. The word agrees with laudatio in gender and number and that connection is indicated by the upper broken line; it is the marker of the relative clause headed by contemnimus, as shown in the horizontal structure it is hanging from, and it is the direct object of legimus, bottom right. This diagram shows more than simple dependency relations, and uses various ingenious tricks and decorations to smuggle in other kinds of facts. The word-to-word connections are shown, but it's really clear that a system for projecting from such a diagram to a linear string of words spread into phonologically separable phrases has to be incredibly complex.The fact that dependency diagrams do not show the linear organization of the constituent words was presented by me as a representational problem, but in fact Tesnière uses precisely this separation to propose a typology of languages according to whether they tend to order dependents before heads or heads before dependents, and whether within each language these tendencies vary within different kinds of constructions. In a centripetal language the dependents precede the head, in a centrifugal language the head precedes the dependents. There are extreme and moderated varieties of each of these in his scheme.Tesnière also described a number of conjoined structures in French for which he used the terminology of embryological mistakes, one kind being monsters that have one head and more than one tail. In general these correspond to Verb Gapping in our terms (John likes apples and Mary oranges). Another kind of embryological mistake has more than one head and a single tail, like Right Node Raising (John likes and Mary detests anchovies), and the most monstrous of all are capital H-shaped monsters with two heads and two tails, like the kinds of sentences Paul Kay and Mary Catherine O'Connor and I played with in a paper (Fillmore, Kay, and O'Connor 1988) on “let alone” (I wouldn't touch, let alone eat, shrimp, let alone squid). I think these phenomena have more to do with sequencing patterns than with dependency relations, but I found it interesting that Tesnière delighted in exploring these kinds of structural complexities. (My sensitivity to tone in French prose isn't good enough to know whether in these descriptions of syntactic monsters Tesnière was having fun. I'm not helped in that uncertainty by photographs I've seen of the man.)I ended up favoring phrase structure representations, partly because dependency representations have no easy way to identify a predicate or verb phrase (VP) constituent, and I'd like to believe that the VP can in general be treated as naming a familiar category (eating meat, parking a car, being breakable, etc.). But I mainly preferred phrase-structural representations because they offer more material upon which to assign intonational contours.When linguists turned to the predicate calculus as a representation for sentence meaning, many were interested mainly in quantification and negation, where it's possible to show how complex logical structures can be formulated in ways that pay no attention to the actual meanings of the words that name either the predicates or the arguments. I, however, was specifically interested in the inner structure of the predicates themselves. So I encountered a representational problem when working with the notation that was common at the time.When working on meaning, linguists often used prefix notation, allowing the ordered list of symbols following the name of the predicate to stand for the “-arity”—the number of arguments—of the particular predicate. Thus P(a) could represent an adjective like hungry or a verb like vanish; P(a,b), relating two things to each other, could stand for an adjective like different or a verb like love; and P(a,b,c) with three arguments could stand for an adjective like intermediate or a verb like give, show, or tell. This notation also allowed one to represent cases in which the arguments could themselves be predications, permitting recursion.While working with the prefix notation I was struck by the fact that although this representation afforded one the chance to make claims across diverse classes of predicates, it simultaneously obscured certain information about the arguments of those predicates—important semantic commonalities about classes of arguments.There are centuries-old traditions by which schoolteachers explain that the subject names the agent in an event and the object tells us what is affected by the agent's actions, but it's trivially easy to find examples that show that such generalizations don't hold. Similarly, in a predicate–argument formula, there is nothing meaningful about being the first or second or third item in a list. Does it make sense to let the position in an ordered list represent the semantic role of an argument in a predication? Consider the following examples in which arguments are interchanged: (1) He blamedthe accidenton me. ↔ He blamedmefor the accident.(2) He strikesmeas a fool. ↔ I regardhimas a fool.(3) Chuck boughta carfrom Jerry. ↔ Jerry sold a car to Chuck.In Example (1) the second and third arguments of blame are interchanged in their grammatical realization. In Example (2), with the pair strike and regard, the first and second arguments are interchanged. And in Example (3), with buy and sell, the first and the third are interchanged.I felt that there ought to be some way of recognizing the sameness of the semantic functions of these arguments independently of where they happen to be sitting in an ordered list. An alternative was spelled out in a rambling paper called “The Case for Case” published in 1968 (Fillmore 1968). It proposed a universal list of semantic role types (“cases”). Configurations of these cases could then characterize the semantic structures of verb and adjective meanings. In this way, lexical predicates could be shown as differing according to the collection of cases that they required (obligatory) or welcomed (optional).The theory embedded in this view is that semantic relations (“deep cases”) are directly linked to argument meanings. (So in the sentence John gave Mary a rose, John is the Agent, Mary is the Recipient, and a rose is the transmitted Object.) Grammatical roles (subject, object) and markings (choice of preposition, etc.) are predicted from case configurations. (So the Agent could be the subject, the Object could be the direct object, and the Recipient could be introduced with the preposition to.) Generalizations are formulated in terms of specific named cases, for which a hierarchy is defined, and the list of cases is finite and universal.The variable “valences” (a term from Tesnière) of a single verb can be explained in terms of the cases available to it. The starting examples in this discussion were with the verb open. Its valences correlate with the cases available to it: (4) Agent>Instrument>Object hierarchyillustrated with V openO = The door openedAO = I opened the doorIO = The key opened the doorAIO = I opened the door with the key The occupants of nuclear syntactic slots (subject and object) are determined by the hierarchy, the rest are marked by prepositions (or in the case of arguments whose shape is a VP or a clause, various
Referência(s)