An Introduction to Correspondence Analysis
2010; Volume: 12; Linguagem: Inglês
10.3888/tmj.12-4
ISSN1097-1610
Autores Tópico(s)Statistical Methods and Applications
ResumoCross tabulations (also known as cross tabs, or contingency tables) often arise in data analysis, whenever data can be placed into two distinct sets of categories.In market research, for example, we might categorize purchases of a range of products made at selected locations; or in medical testing, we might record adverse drug reactions according to symptoms and whether the patient received the standard or placebo treatment.The statistical technique presented in this article, correspondence analysis, provides a means of graphically representing the structure of cross tabulations so as to shed light on the underlying mechanisms.The article provides a practical introduction to correspondence analysis in the form of a "fivefinger exercise" in textual analysis~identifying the author of a text given samples of the works of likely candidates.‡ 1. Introduction Correspondence analysis is a statistical technique that provides a graphical representation of cross tabulations (which are also known as cross tabs, or contingency tables).Cross tabulations arise whenever it is possible to place events into two or more different sets of categories, such as product and location for purchases in market research or symptom and treatment in medical testing.This article provides a brief introduction to correspondence analysis in the form of an exercise in textual analysis~identifying the author of a text based on examination of its characteristics.The exercise is carried out using Mathematica (Version 5.2).Perhaps the most illustrious exponent of textual analysis is the self-styled "literary detective" Donald Foster, whose 2001 book [1] describes how he identified the authors of several anonymous works, including the best-selling roman-à-clef Primary Colors [2], which satirized the 1992 Clinton presidential campaign.Foster's methodology examines a broad spectrum of text characteristics, including word choice, punctuation, grammatical structure, and the like.The aim of the exercise in this article is to emulate Foster, though naturally the literary aspects of the approach taken are much more basic~the intent is not to describe a realistic method of textual analysis, but rather to use it to illustrate correspondence analysis.
Referência(s)