Computationalist Linguistics_the book from chomsky

作者：我家自动化 | 2024-07-07 10:17:22

踩

the book from chomsky

chapter four

Computationalist Linguistics

The idea that the human brain might just be a computer, to be actualized via the Promethean project of building a computer that would replicate the operations of a human brain, entailed a real (if, in part, sub rosa) investigation of what exactly was meant by “the human brain” and more pointedly “the human mind” in the first place. Such questions have been the substance of philosophical inquiry in every culture, not least among these the Western philosophical tradition, and engagement with these analytic discourses could not likely have produced any kind of useful working consensus with which to move forward.1 Rather than surveying these discourses, then, and in a fashion that comes to be characteristic of computationalist practice, the figures we associate with both the mechanical and intellectual creation of computers—including Turing, von Neumann (1944), Shannon, and Konrad Zuse (1993)—simply fashioned assertions about these intellectual territories that meshed with their intuitions. These intuitions in turn reveal a great deal about computationalism as an ideology—not merely its shape, but the functions it serves for us psychologically and ideologically.

Perhaps the easiest and most obvious way to show that a computer was functioning like a human brain would be to get the computer to produce its results the same way a human being (apparently) does: by producing language. This was so obvious to early computer scientists as hardly to need explicit statement, so that in Turing’s 1950 paper “Computing Machinery and Intelligence,” his now-famous Test simply stipulates that the computer can answer free-form English questions; Turing does not even suggest that this might be difficult or, as arguably it is, altogether intractable. Philosophers and linguists as widely varied as Putnam and Chomsky himself might suggest that the ability to use language is thoroughly implicated in any notion of intelligence, and that the bare ability to ask and answer free-form English questions would inherently entail part or even all of human intelligence, even if this was all the machine could do. From one perspective, Turing’s Test puts the cart before the horse; from another it masks the difficulty in defining the problem Turing wants to solve.

On grounds both philosophical and linguistic, there are good reasons to doubt that computers will ever “speak human language.” There are substantial ways in which this project overlaps with the project to make computers “display human intelligence,” and is untenable for similar reasons. Perhaps because language per se is a much more objective part of the social world than is the abstraction called “thinking,” however, the history of computational linguistics reveals a particular dynamism with regard to the data it takes as its object—exaggerated claims, that is, are frequently met with material tests that confirm or disconfirm theses. Accordingly, CL can claim more practical successes than can the program of Strong AI, but at the same time demonstrates with particular clarity where ideology meets material constraints.

Computers invite us to view languages on their terms: on the terms by which computers use formal systems that we have recently decided to call languages—that is, programming languages. But these closed systems, subject to univocal, correct, “activating” interpretations, look little like human language practices, which seems not just to allow but to thrive on ambiguity, context, and polysemy. Inevitably, a strong intuition of computationalists is that human language itself must be code-like and that ambiguity and polysemy are, in some critical sense, imperfections. Note that it is rarely if ever linguists who propose this view; there is just too much in the everyday world of language to let this view stand up under even mild critique. But language has many code-like aspects, and to a greater and lesser extent research programs have focused on these. Much can be done with such systems, although the ability to interact at even a “syntactically proper” level (seemingly the stipulated presupposition of Turing’s test) with a human interlocutor remains beyond the computer’s ability. Computers can aid human beings in their use of language in any number of ways. At the same time, computers carry their own linguistic ideologies, often stemming from the conceptual-intellectual base of computer science, and these ideologies even today shape a great deal of the future direction of computer development.

Like the Star Trek computer (especially in the original series; see Gresh and Weinberg 1999) or the Hal 9000 of 2001: A Space Odyssey, which easily pass the Turing Test and quickly analyze context-sensitive questions of knowledge via a remarkable ability to synthesize theories over disparate domains, the project of computerizing language itself has a representational avatar in popular culture. The Star Trek “Universal Translator” represents our Utopian hopes even more pointedly than does the Star Trek computer, both for what computers will one day do and what some of us hope will be revealed about the nature of language. Through the discovery of some kind of formal principles underlying any linguistic practice (not at all just human linguistic practice), the Universal Translator can instantly analyze an entire language through just a few sample sentences (sometimes as little as a single brief conversation) and instantly produce flawless equivalents across what appear to be highly divergent languages. Such an innovation would depend not just on a conceptually unlikely if not impossible technological production; it would require something that seems both empirically and conceptually inconceivable—a discovery of some kind of formal engine, precisely a computer, that is dictating all of what we call language far outside of our apparent conscious knowledge of language production. In this way the question whether a computer will ever use language like humans do is not at all a new or technological one, but rather one of the oldest constitutive questions of culture and philosophy.

Cryptography and the History of Computational Linguistics

Chomsky’s CFG papers from the 1950s served provocatively ambivalent institutional functions. By putting human languages on the same continuum as formal languages, Chomsky underwrote the intuition that these two kinds of abstract objects are of the same metaphysical kind. Chomsky himself dissented from the view that this means that machines would be able to speak human languages (for reasons that in some respects have only become clear quite recently); but despite this opinion and despite Chomsky’s explicit dissent, this work served to underwrite and reinforce the pursuit of CL. More specifically, and famously, Chomsky’s 1950s work was funded by DARPA specifically for the purposes of Machine Translation (MT), without regard for Chomsky’s own repeated insistence that such projects are not tenable.2 Given Chomsky’s tremendous cultural influence, especially over the study of language, it is remarkable that his opinion about this subject has been so roundly disregarded by practitioners.

In at least one way, though, Chomsky’s work fit into a view that had been advocated by computer scientists (but rarely if ever by linguists) prior to the 1950s. This view itself is quite similar to Chomsky’s in some respects, as it also hinges on the equation between formal “languages” and human languages. The intellectual heritage of this view stems not from the developers of formal languages (such as Frege, Russell, Husserl, and possibly even Peano and Boole), who rarely if ever endorsed the view that these rule sets were much like human language. Most famously, formal systems like Peano logic emerge from the study of mathematics and not from the study of language, precisely because mathematical systems, and not human languages, demand univocal interpretation. Formal logic systems are defined as systems whose semantics can be rigidly controlled, such that ambiguity only persists in the system if and when the logician chooses this condition. Otherwise, as in mathematics, there is only one meaningful interpretation of logical sentences.

The analogy between formal languages and human languages stems not from work in formal logic, since logicians usually saw the far gap between logic and language. But the early computer engineers—Turing, Shannon, Warren Weaver, even the more skeptical von Neumann (1958)—had virtually no educational background in language or linguistics, and their work shows no sign of engaging at all with linguistic work of their day. Instead, their ideas stem from their observations about the computer, in a pattern that continues to the present day. The computer does not just run on formal logic, via the Turing machine model on which all computers are built; famously, among the first applications of physical computers was to decode German military transmissions. Because these transmissions were encoded language and the computer served as an almost unfathomably efficient decoder, some engineers drew an analogy between decoding and speaking: in other words, they started from the assumption that human language, too, must be a code.

Two researchers in particular, Shannon and Weaver, pursued the compu-tationalist intuition that language must be code-like, and their work continues to underwrite even contemporary CL programs of (Shannon’s work in particular remains extremely influential; see especially Shannon and Weaver 1949; and also Shannon 1951 on a proto-CL problem). Weaver, a mathematician and engineer who was in part responsible for popularizing Shannon’s views about the nature of information and communication, is the key figure in pushing forward CL as an intellectual project. In a pathbreaking 1955 volume, Machine Translation of Languages (Locke and Booth 1955), Weaver and the editors completely avoid all discussion of prior analysis of language and formal systems, as if these fields had simply appeared ex nihilo with the development of computers. In the foreword to the volume, “The New Tower,” Weaver writes:

Students of languages and of the structures of languages, the logicians who design computers, the electronic engineers who build and run them—and especially the rare individuals who share all of these talents and insights—are now engaged in erecting a new Tower of Anti-Babel. This new tower is not intended to reach to Heaven. But it is hoped that it will build part of the way back to that mythical situation of simplicity and power when men could communicate freely together. (Weaver 1949, vii)

Weaver’s assessment is not strictly true—many of the students of language and the structures of language have never been convinced of the possibility of erecting a new “Tower of Anti-Babel.” Like some computationalists today, Weaver locates himself in a specifically Christian eschatological tradition, and posits computers as a redemptive technology that can put human beings back into the prelapsarian harmony from which we have fallen. Our human problem, according to this view, is that language has become corrupted due to ambiguity, polysemy, and polyvocality, and computers can bring language back to us, straighten it out, and eliminate the problems that are to blame not just for communicative difficulties but for the “simplicity and power” that would bring about significant political change.

Despite Weaver’s assessment, few linguists of note contributed to the 1955 volume (the only practicing linguist among them is Victor Yngve, an MIT Germanicist who is most famous for work in CL and natural language processing, referred to as NLP). In an “historical introduction” provided by the editors, the history of MT begins abruptly in 1946, as if questions of the formal nature of language had never been addressed before. Rather than surveying the intellectual background and history of this topic, the editors cover only the history of machines built at MIT for the express purpose of MT. The book itself begins with Weaver’s famous, (until-then) privately circulated “memorandum” of 1949, here published as “Translation,” and was circulated among many computer scientists of the time who dissented from its conclusions even then.3 At the time Weaver was president of the Rockefeller Foundation, and tried unsuccessfully to enlist major figures like Norbert Wiener, C. K. Ogden, Ivor Richards, Vannevar Bush, and some others in his project (see Hutchins 1986, 25-27). In contemporary histories we are supposed to see these figures as being short-sighted, but it seems equally plausible that they saw the inherent problems in Weaver’s proposal from the outset.

Despite the widespread professional doubt about Weaver’s approach and intuition, his memorandum received a certain amount of public notoriety. “An account appeared in Scientific American in December 1949 . . . This in turn was picked up by the British newspaper the News Chronicle in the spring of 1950, and so appeared the first of what in coming years were to be frequent misunderstandings and exaggerations” (Hutchins 1986, 30). Such exaggerations continue to the present day, when prototype or model systems confined to narrow domains are publicly lauded as revolutions in MT. Despite the real limitations of the most robust MT systems in existence (easily accessed today via Google’s translation functions), there is a widespread sense in the popular literature that computers are close to handling human language in much the same way humans do.

Weaver’s memorandum starts from a cultural observation: “a multiplicity of languages impedes cultural interchange between the peoples of the earth, and is a serious deterrent to international understanding” (15). Such a view can only be characterized as an ideology, especially since Weaver provides no support at all for it—and it is a view that runs at odds with other views of politics. The view becomes characteristic of computational-ist views of language: that human society as a whole is burdened by linguistic diversity, and that political harmony requires the removal of linguistic difference. We might presume from this perspective that linguistic uniformity leads to political harmony, or that political harmony rarely coexists with linguistic diversity—both propositions that can be easily checked historically, and which both arguably lack historical substance to support them.

Like Vannevar Bush in his famous article proposing the Memex (Bush 1945), Weaver reflects in large part on the role computers have played in the Allied victory in World War II. Not only is there concern about the tremendous destructive power loosed by the hydrogen bomb, which fully informed Bush’s desire to turn U.S. scientific and engineering prowess toward peaceful ends; there is a related political concern expressed in terms of linguistic diversity which has been exposed by the computational infrastructure used in the service of Allied powers in World War II. Both the exposure and discovery of atomic power and of computational power seem to have opened an American vision into an abyss of power, an access to what Deleuze and Guattari would call a War Machine that worries those previously neutral and “objective” scientists. Of course computers and computation played a vital role in the development of atomic power, and in the writings of Weaver and Bush (and others) we see a kind of collective guilt about the opening of a Pandora’s Box whose power can’t be contained by national boundaries or even political will—and this Pandora’s Box includes not just the atomic bomb but also the raw and dominating power of computation.

The most famous part of Weaver’s memorandum suggests that MT is a project similar to cryptanalysis, one of the other primary uses for wartime computing. Since cryptanalysis seems to involve language, it may be natural to think that the procedures used to replicate the German Enigma coding device (via Turing’s work with the aptly named “Bombe” computer at Bletchley Park) might also be applicable to the decoding of human language. Of course, in actuality, neither Enigma nor the Bombe played any role in translating, interpreting, or (metaphorically) decoding language: instead, they were able to generate statistically and mathematically sophisticated schemes for hiding the intended linguistic transmission, independent in every way of that transmission. Neither Enigma nor the Bombe could translate; instead, they performed properly algorithmic operations on strings of codes, so that human interpreters could have access to the underlying natural language.

Weaver’s intuition, along with those of his co-researchers at the time, therefore begins from what might be thought an entirely illegitimate analogy, between code and language, that resembles Chomsky’s creation of a language hierarchy, according to which codes are not at all dissimilar from the kind of formal logic systems Chomsky proves are not like human language. Thus it is not at all surprising that intellectuals of Weaver’s day were highly skeptical of his project along lines that Weaver dismisses with a certain amount of hubris. In the 1949 memorandum Weaver quotes correspondence he had with Norbert Wiener (whose own career reveals, in fact, a profound knowledge of and engagement with human language).4 Weaver quotes from a private letter written by Wiener to him in 1947:

As to the problem of mechanical translation, I frankly am afraid the boundaries of words in different languages are too vague and the emotional and international connotations are too extensive to make any quasimechanical translation scheme very hopeful .... At the present time, the mechanization of language, beyond such a stage as the design of photoelectric reading opportunities for the blind, seems very premature. (Wiener, quoted in Weaver 1955, 18)

Weaver writes back to Wiener that he is “disappointed but not surprised by” Wiener’s comments on “the translation problem,” in part for combinatorial (i.e., formal) reasons: “suppose we take a vocabulary of 2,000 words, and admit for good measure all the two-word combinations as if they were single words. The vocabulary is still only four million: and that is not so formidable a number to a modern computer, is it?” (18). Weaver writes that Wiener’s response “must in fact be accepted as exceedingly discouraging, for, if there are any real possibilities, one would expect Wiener to be just the person to develop them” (18-19). Rather than accepting Wiener’s intellectual insights as potentially correct, though—and it is notable how exactly correct Wiener has been about the enterprise of CL— Weaver turns to the work of other computer scientists (especially the other contributors to the 1955 volume), whose computational intuitions have led them to experiment with mechanical translation schemes.

Weaver’s combinatoric argument fails to address Wiener’s chief points, namely that human language is able to manage ambiguity and approximation in a way quite different from the way that computers handle symbols. The persistent belief that philosophical skeptics must be wrong about the potential for machine translation is characteristic of computational thinking from the 1950s to the present. Only Claude Shannon himself—again, a dedicated scientist and engineer with limited experience in the study of language—is accorded authority by Weaver, so that “only Shannon himself, at this stage, can be a good judge of the possibilities in this direction”; remarkably, Weaver suggests that “a book written in Chinese is simply a book written in English which was coded into the ‘Chinese Code’ ” (22). Since we “have useful methods for solving any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?”

This perspective seems to have much in common with Chomsky’s later opinions about the universal structure of all languages, so that “the most promising approach” to Weaver is said to be “an approach that goes so deeply into the structure of languages as to come down to the level where they exhibit common traits” (23). This launches Weaver’s famous metaphor of language as a kind of city of towers:

Think, by analogy, of individuals living in a series of tall closed towers, all erected over a common foundation. When they try to communicate with each other, they shout back and forth, each from his own closed tower. It is difficult to make the sound penetrate even the nearest towers, and communication proceeds very poorly indeed. But, when an individual goes down his tower, he finds himself in a great open basement, common to all the towers. Here he establishes easy and useful communication with the persons who have also descended from their towers.

Thus may it be true that the way to translate from Chinese to Arabic, or from Russian to Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way is to descend, from each language, down to the common base of human communication—the real but as yet undiscovered universal language—and then re-emerge by whatever particular route is convenient. (23, emphasis added)

This strange view, motivated by no facts about language or even a real situation that can be understood physically, nonetheless continues to inform computationalist judgments about language. Both Chomsky’s pursuit of a Universal Grammar and Fodor’s quest for a “language of thought” can be understood as pursuits of this “real but as yet undiscovered universal language,” a language that is somehow at once spoken and understood by all human beings and yet at the same time inaccessible to all contemporary human beings—again, positing an Ursprache from which mankind has fallen into linguistic and cultural diversity that are responsible for political disunity.

Even to Weaver, it is obvious that everyday language is not much like a conventional code. This prompts him to suggest two different strategies for using mechanical means to interpret language. First, because “there are surely alogical elements in language (intuitive sense of style, emotional content, etc.) . . . one must be pessimistic about the problem of literary translation” (22, emphasis in original). In fact, Weaver only proposes that a computer would be able to handle “little more than ... a one-to-one correspondence of words” (20). In “literary translation,” “style is important,” and “problems of idiom, multiple meanings, etc., are frequent” (20). This observation is offered with no argumentative support whatsoever, but for the idea that “large volumes of technical material might, for example, be usefully, even if not at all elegantly, handled this way,” even though, in fact, “technical writing is unfortunately not always straightforward and simple in style” (20). As an example, Weaver suggests that “each word, within the general context of a mathematical article, has one and only one meaning” (20).

Weaver’s assertion, as most linguists and literary critics know, does not mesh well with the observed linguistic facts. Many of the most common words in English are grammatical particles and “auxiliary verbs” such as “of,” “do,” “to,” “have,” and the like, which turn out to be notoriously difficult to define even in English dictionaries, and extraordinarily difficult to translate into other languages. Because each language divides up the spheres of concepts and experience differently, there is rarely a stable, one-to-one correspondence between any of these common words and those in other languages. One of Weaver’s favorite off-hand examples, Mandarin Chinese, poses some of these problems for mechanical translation. Because Chinese does not inflect verbs for tense or number, it is not possible to distinguish infinitive forms from inflected or tensed forms; there is no straightforward morphological distinction in Chinese, that is, between the English expressions to sleep and sleep. The Chinese speaker relies on context or time expressions to determine which meaning might be intended by the speaker, but there is no process of internal “translation” according to which we might determine which of these forms is “meant.” The mechanical translator from English to Chinese would either have to automatically drop the helper verb “to,” or presume that all English forms are equivalent to the uninflected Chinese verb. Neither option is accurate, and no Chinese text, no matter how controlled its domain, “avoids” it.

In his famous debate with John Searle, Jacques Derrida challenges Searle’s extension of J.L. Austin’s speech-act distinction between performative, constative, and other forms of linguistic expression, arguing that any instance of language might be understood as performative, or cita-tional, or “iterative” (Derrida 1988; Searle 1977). Here we see an even more potent and influential version of Searle’s position, that some parts of language can be broken out as the purely logical or meaningful parts, leaving over the “stylistic” or “literary” elements. But what Wiener tried to argue in his response to Weaver is that all language is literary in this sense: language is always multiple and polysemous, even when we appear to be accessing its univocal parts.

The crux of Wiener’s and Weaver’s disagreement can be said to center just on this question of univocality of any parts of language other than directly referential nouns. This disagreement rides on the second suggestion Weaver and other computationalists make, and which continues to inform much computational work to this day: namely that human language is itself broken because it functions in a polysemous and ambiguous fashion, and that the solution to this is to rewrite our language so as to “fix” its reference in a univocal manner. One reason Weaver writes to Ogden and Richards is because of their work earlier in the century on a construct called “Basic English,” one of many schemes to create universal, standard languages (the effort also includes Esperanto) whose lack of referential ambiguity may somehow solve the political problem of human disagreement. Wiener points out in his letter to Weaver that “in certain respects basic English is the reverse of mechanical and throws upon such words as get a burden which is much greater than most words carry in conventional English” (quoted in Weaver 1949, 18). Weaver seems not to understand the tremendous importance of particles, copulas, and other parts of language that have fascinated linguists for centuries—or of the critical linguistic role played by ambiguity itself. Most English-speakers, for example, use words like “to,” “do,” “have,” “get,” “like,” “go,” and so forth with tremendous ease, but on close question often cannot provide precise definitions of them. This would lead one to believe that precise semantics are not what human beings exclusively rely on for language function, even at a basic level, but of course this observation may be unsettling for those looking at language for its deterministic features.

Weaver’s principal intuition, that translation is an operation similar to decoding, was almost immediately dismissed even by the advocates of MT as a research program. In standard histories of MT, we are told that this approach was “immediately recognized as mistaken” (Hutchins 1986, 30), since the “computers at Bletchley Park were applied to cracking the cipher, not to translating the German text into English” (30). This conclusion emerges from engineering investigation into the problem, since few linguists or philosophers would have assented to the view that languages are codes; nevertheless, even today, a prominent subset of computationalists (although rarely ones directly involved in computer language processing) continue to insist that formal languages, programming languages, and ciphers are all the same kinds of things as human languages, despite the manifest differences in form, use, and meaning of these kinds of systems. Surely the mere fact that a particular word—language—is conventionally applied to these objects is not, in and of itself, justification for lumping the objects together in a metaphysical sense; yet at some level, the intuition that language is code-like underwrites not just MT but its successor (and, strangely, more ambitious) programs of CL and NLP.

From MT to CL and NLP

Weaver’s memo continues to be held in high esteem among computational researchers for two suggestions that have remained influential to this day. Despite the fact that MT has produced extremely limited results in more than 50 years of practice—as can be seen using the (generally, state-of-the-art) mechanisms found in such easily accessed tools as Google and Babelfish— computationalists have continued to suggest that machines are not just on the verge of translating, but of actually using human language in an effective way. It is often possible to hear computationalists without firm knowledge of the subject asserting that Google’s translator is “bad” or “flawed” and that it could be easily “fixed,” when in fact, Google has devoted more resources to this problem than perhaps any other institution in history, and openly admits that it represents the absolute limit of what is possible in MT.5

With time and attention, CL and NLP projects have expanded in a range of directions, some more fruitful than others. While observers from outside the field continue to mistakenly believe that we are on the verge of profound progress in making computers “speak,” a more finely grained view of the issues suggests that the real progress in these fields is along several much more modest directions. Contemporary CL and NLP projects, proceed along several different lines, some inspired by the work originally done by Weaver and other MT advocates, and other work inspired by the use of computers by linguists. Contemporary CL and NLP includes at least the following areas of inquiry: text-to-speech synthesis (TTS): the generation of (synthesized) speech from written text; voice recognition: the use of human language for input into computers, often as a substitute for written/-keyboarded text; part-of-speech tagging: the assignment of grammatical categories and other feature information in human language texts; corpus linguistics: the creation and use of large, computer-based collections of texts, both written and spoken; natural language generation (NLG) or speech production: the use of computers to spontaneously emit human language; conversational agents: the computer facility to interact with human interlocutors, but without the ability to spontaneously generate speech on new topics; CL of human languages: the search for computational elements in human languages; statistical NLP: the analysis of human language (often large corpora) using statistical methods; information extraction: the attempt to locate the informational content of linguistic expressions and to synthesize or collect them from a variety of texts. The visible success in some of these programs (especially ones centered around speech synthesis and statistical analysis, neither of which has much to do with what is usually understood as comprehension; Manning and Schutze [1999] provides a thorough survey) leads to a popular misconception that other programs are on the verge of success. In fact, the extremely limited utility of many of these research programs helps to show why the others will likely never succeed; and the CL/NLP programs that avoid a hardcore computationalist paradigm (especially statistical NLP) help to show why the main intellectual program in these areas—namely, the desire to show that human language is itself computational, and simultaneously to produce a “speaking computer” and/or “universal translator”—are likely not ever to succeed. The work of some researchers relies exactly on the statistical methods once championed by Weaver to argue that some contemporary conceptions of language itself, especially the formal linguistics programs inspired by Chomsky, rely far too heavily on a computationalist ideology.

Two of the most successful CL programs are the related projects of TTS and voice recognition: one using computers to synthesize a human-like voice, and one using computers to substitute spoken input for written input. In neither case is any kind of engagement with semantics or even syntax suggested; contrary to the suggestions of Weaver and other early CL advocates, even the apparently formal translation of written language into spoken language, and vice versa, engages some aspects of the human language system that are at least in part computationally intractable. The simpler of the two programs is voice recognition; today several manufacturers, including Microsoft, IBM, and a specialist company, Nuance, produce extremely effective products of this sort, which are useful for people who can’t or prefer not to use keyboards as their main input devices. These programs come equipped with large vocabularies, consisting mainly of the natural language terms for common computer operations (such as “File,” “Open,” “New,” “Menu,” “Close,” etc.), along with the capability to add a virtually unlimited number of custom vocabulary items. Yet even the best of these systems cannot “recognize” human speech right out of the box with high accuracy. Instead, they require a fair amount of individualized training with the user. Despite the fact that software developers use statistical methods to pre-load the software with a wide range of possible variations in pronunciation, these systems always require user “tuning” to ensure that pitch, accent, and intonation are being read correctly. Unlike a human when she is listening to another human being, the software cannot necessarily perceive word boundaries consistently without training. In addition, these systems require a huge amount of language-specific programming, much of which has not been proven to extend to languages and pronunciation styles that are not anticipated by the programmer. There is no question of “comprehension” in these systems, despite the appearance that the computer does what the speaker says—anymore than the computer is understanding what one types on a keyboard or clicks with a mouse. The voice recognition system does not know that “File—Open” means to open a file; it simply knows where this entry is on the computer menu, and then selects that option in just the same way the user does with a mouse.

TTS systems, also widely used and highly successful, turn out to present even more complex problems as research objects. Because the voice recognition user can control her vocabulary, she can ensure that much of her input will fall into the range of the software’s capability. TTS systems, unless the vocabulary is tightly controlled, must be able to pronounce any string they encounter, and even in well-controlled language practices like English this may include a vast range of apparently nonstandard usages, pronunciations, words, and spellings. In addition, for TTS systems to be comprehensible by human beings, it turns out that much of what had been understood (by some) as paralinguistic features, like prosody, intonation, stops and starts, interruptions, nonlexical pauses, and so forth, must all be managed in some fashion, or otherwise produce speech that sounds extremely mechanical and even robotic. Most such systems still today sound highly mechanical, as in the widely used Macintosh speech synthesizer or the systems used by individuals like Stephen Hawking, who are unable to speak. Even these systems, it turns out, do not produce speech in a similar manner to human beings, but rather must use a pre-assembled collection of parts in a combinatorial fashion, often using actual recorded samples of human speech for a wide range of applications that directly generated computer speech can’t manage.

Among the best-studied and most interesting issues in TTS is the question of intonation. While in some languages formalized tones (such as Chinese, Thai, and Vietnamese) are a critical feature of language “proper,” other levels of intonation—roughly, the tunes we “sing” while we are speaking, such as a rising intonation at the end of questions—are found in all languages. Human speakers and hearers use these tonal cues constantly in both production and reception, but the largely analog nature of this intonation and its largely unconscious nature have made it extremely difficult to address, let alone manage, in computational systems. One contemporary researcher, Janet Pierrehumbert, has focused on the role of intonation in a general picture of phonology, and while she makes extensive use of computers for analysis, her work helps to show why full-scale TTS poses significant problems for computational systems:

Intonational phrasing is an example of “structure.” That is, an intonation phrase specifies that certain words are grouped together, and that one of these words is the strongest prosodically. The intonation phrase does not in and of itself specify any part of the content. Rather, it provides opportunities to make choices of content. For each intonation phrase, the speaker selects not only the words, but also a phrasal melody. The phonetic manifestations of the various elements of the content depend on their position with respect to the intona-tional phrasing. (Pierrehumbert 1993, 261)

The intonat

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/我家自动化/article/detail/795360