Artigo Acesso aberto Revisado por pares

Assessing Students for Medical School Admissions: Is It Time for a New Approach?*

2008; Lippincott Williams & Wilkins; Volume: 83; Issue: Supplement Linguagem: Inglês

10.1097/acm.0b013e318183e586

ISSN

1938-808X

Autores

Robert J. Sternberg,

Tópico(s)

Diversity and Career in Medicine

Resumo

Our methods for admitting students, whether to medical school or even to undergraduate and graduate school, are rather archaic. We use tests that we know are only modestly to moderately predictive of success in medical school and that measure content that covers only a small fraction of the skills necessary for success as a medical professional. Might there be more effective ways of devising devices to help us make decisions in admissions situations? List 1 illustrates some potential bases for essays that might be used in medical school admissions. These items have not yet been tried out, but readers who wish to experiment with them are welcome to do so. Why do we believe that items such as these will provide incremental validity over traditional measures in predicting success in medical school and later, on the job?List 1 Sample Questions for Medical School AdmissionsIn this article, I first describe the theory of successful intelligence, which has served as a basis for a number of our admissions assessments, and then I present some studies that have tested the theory in practice. The Theory of Successful Intelligence The theory of successful intelligence suggests that students' failures to achieve at a level that matches their potential often results from teaching and assessments that are narrow in conceptualization and rigid in implementation.1,2 The ways in which teachers teach do not always match the ways in which students learn. The traditional methods, in essence, typically shine a spotlight on a small number of students with certain ability-based styles, and they almost never focus on a large number of students who have the ability to succeed but whose ability-based styles do not correspond to the patterns of learning and thinking valued by the schools. To rectify this situation, one must value other ability-based styles and then change teaching and assessment so that these other ability patterns can lead to success in school. According to the proposed theory, successful intelligence is (1) the use of an integrated set of abilities needed to attain success in life, however an individual defines it, within his or her sociocultural context. People are successfully intelligent by virtue of (2) recognizing their strengths and making the most of them while at the same time recognizing their weaknesses and finding ways to correct or compensate for them. Successfully intelligent people (3) adapt to, shape, and select environments through (4) a balanced use of their analytical, creative, and practical abilities.1,3,4 Underlying all three abilities is the role of knowledge, because one cannot think analytically, creatively, or practically if one does not have knowledge stored in long-term memory to which one can apply one's thinking. People typically balance the three kinds of abilities. They need creative abilities to generate ideas, analytical abilities to determine whether they are good ideas, and practical abilities to implement the ideas and to convince others of the value of those ideas. Most people who are successfully intelligent are not equal in these three abilities, but they find ways of making the three abilities work together harmoniously and advantageously. There are other theories that might serve as a basis for constructing new instruments. For example, Gardner5 has proposed a well-known theory of multiple intelligences, which posits that people can learn in different ways. He has applied this theory to instruction and assessment.6 Studies Testing the Theory of Successful Intelligence for Use in Admissions Assessments My collaborators and I have done several studies suggesting the efficacy of the theory of successful intelligence as a basis for admissions decisions. The Rainbow Project The Rainbow Project and related collaborations are fully described elsewhere by Sternberg and the Rainbow Project Collaborators.7–9 The Rainbow measures supplement the SAT. The SAT is a three-hour examination currently measuring verbal comprehension and mathematical thinking skills, with a writing component to be added in the near future. A wide variety of studies have shown the utility of the SAT as a predictor of college success, especially as measured by GPA (grade-point averages [GPAs]). Available data suggest reasonable predictive validity for the SAT in predicting college performance.10,11 Indeed, traditional intelligence or aptitude tests have been shown to predict performance across a wide variety of settings. But, as is always the case for a single test or type of test, there is room for improvement. The theory of successful intelligence1,3 provides one basis for improving prediction and, possibly, for establishing greater equity and diversity. It suggests, as noted above, that broadening the range of skills tested to go beyond analytical skills to include practical and creative skills as well might significantly enhance the prediction of college performance beyond current levels. Thus, the theory does not suggest replacing but, rather, augmenting the SAT, the MCAT, or similar measures in the university admissions process. A collaborative team of investigators sought to study how successful such an augmentation could be. In the Rainbow Project,7 data were collected at 15 schools across the United States, including eight 4-year colleges, five community colleges, and two high schools. The participants received either course credit or money. They were 1,013 students predominantly in their first year of college or their final year of high school. In this report, analyses only for college students are discussed because they were the only ones for whom the authors had data available regarding college performance. The final number of participants included in these analyses was 793. Baseline measures of standardized test scores and high school GPA were collected to evaluate the predictive validity of current tools used for college admission criteria and to provide a contrast for the current measures. Students' scores on standardized college entrance exams were obtained from the College Board. Measuring analytical skills. The measure of analytical skills was provided by the SAT plus analytical items of the Sternberg Triarchic Abilities Test (STAT).12 Items assessed learning meanings of words from context, number series, and figural matrices. Measuring creative skills. Creative skills were measured by STAT multiple-choice items and by performance-based items. The multiple-choice items were novel analogies (involving counterfactual presuppositions), novel number operations, and figural series with mapping of terms into new domains. Creative skills also were measured using open-ended measures. One measure required writing two short stories with a selection from among unusual titles, such as "The octopus's sneakers," one required orally telling two stories based on choices of picture collages, and the third required captioning cartoons from among various options. Open-ended, performance-based answers were rated by trained raters for novelty, quality, and task-appropriateness. Multiple judges were used for each task, and satisfactory reliability was achieved.7 Measuring practical skills. Multiple-choice measures of practical skills were obtained from the STAT. The multiple-choice items were practical problem solving, practical math, and route planning using maps. Practical skills also were assessed using three situational-judgment inventories: the Everyday Situational Judgment Inventory (Movies), the Common Sense Questionnaire, and the College Life Questionnaire, each of which taps different types of tacit knowledge. The general format of tacit-knowledge inventories has been described by Sternberg et al,13 so only the content of the inventories used in this study is described here. The movies present everyday situations that confront college students, such as asking for a letter of recommendation from a professor who shows through nonverbal cues that he does not recognize the student very well. One then has to rate various options for how well they would work in response to each situation. The Common Sense Questionnaire provides everyday business problems, such as being assigned to work with a coworker whom one cannot stand. The College Life Questionnaire provides everyday college situations for which a solution is required. Unlike the creativity performance tasks, in the practical performance tasks the participants were not given a choice of situations to rate. For each task, participants were told that there was no "right" answer and that the options described in each situation represented variations on how different people approach different situations. An example of a creative item might be to write a story using the title "3,516" or "It's moving backward." Another example might show a collage of pictures in which people are engaged in a wide variety of activities, helping other people. One would then orally tell a story that takes off from the collage. An example of a practical item might show a movie in which a student has just received a poor grade on a test. His roommate had a health crisis the night before, and he had been up all night helping him. His professor hands him back the test paper, with a disappointed look on her face, and suggests to the student that he study harder next time. The movie then stops. The student then has to describe how he would handle the situation. Or, the student might receive a written problem describing a conflict with another individual with whom she is working on a group project. The project is getting mired down in the interpersonal conflict. The student has to indicate how she would resolve the situation to get the project done. Administrative details. All materials were administered in one of two formats. A total of 325 of the college students took the test in paper-and-pencil format; 468 students took the test on the computer via the World Wide Web. Participants were tested either individually or in small groups. During the oral stories section, participants who were tested in groups either wore headphones or were directed into a separate room so as not to disturb the other participants during the story dictation. Basic data. When examining college students alone, this sample showed a slightly higher mean level of SAT than that found in colleges across the country. The sample means on the SATs were, for two-year college students, 490 verbal and 508 math, and, for four-year college students, 555 verbal and 575 math. These means, although slightly higher than typical, were within the range of average college students. There is always a potential concern about restriction of range in scores using the SAT when considering students from a select sample of universities, especially when the means run a bit high. Restriction of range means that one tests a narrower range of student skill levels than that which is representative of the entire population that actually takes the SAT. However, the sample was taken from a wide range in selectivity of institutions, from community colleges to highly selective four-year institutions. In fact, statistics assessing range showed that the sample ranged somewhat more widely than is typical for the test. Because there was no restriction of range, there was no need to correct for it. Another potential concern is pooling data from different institutions. We pooled data because in some institutions we simply did not have large enough numbers of cases for the data to be meaningful. Factor structure of the Rainbow measures. Some scholars believe that there is only one set of skills that is highly relevant to school performance, what is sometimes called "general ability," or g.14 These scholars believe that tests may seem to measure different skills but, when statistically analyzed, show themselves merely to measure the single general ability. Does the test actually measure distinct analytical, creative, and practical skill groupings? Factor analysis addresses this question. Three meaningful factors were extracted from the data. One factor represented practical performance tests. A second, weaker factor represented the creative performance tests. A third factor represented the multiple-choice tests (including analytical, creative, and practical). Thus, method variance proved to be very important. The results show the importance of measuring ability-based styles using multiple formats, precisely because method is so important in determining factorial structure. Predicting college GPA. College admissions offices are not interested, exactly, in whether these tests predict college success. Rather, they are interested in the extent to which these tests predict college success beyond those measures currently in use, such as the SAT and high school GPA. To test the incremental validity provided by Rainbow measures above and beyond the SAT in predicting GPA, a series of statistical analyses (called hierarchical regressions) was conducted that included the items analyzed in the analytical, creative, and practical assessments. If one looks at the simple correlations, the SAT-Verbal (SAT-V), SAT-Math (SAT-M), high school GPA, and Rainbow measures all predict freshman year GPA. But how do the Rainbow measures fare on incremental validity? In one set of analyses, the SAT-V, SAT-M, and high school GPA were included in the first step of the prediction equation because these are the standard measures used today to predict college performance. Only high school GPA contributed uniquely to prediction of college GPA. In Step Two, the analytic subtest of the STAT was added because this test is closest conceptually to the SAT tests. The analytical subtest of the STAT slightly but significantly increased the level of prediction. In Step Three, the measures of practical ability were added, resulting in a small increase in prediction. The inclusion of the creative measures in the final step of this prediction equation indicates that by supplementing the SAT and high school GPA with measures of analytical, practical, and creative abilities, a total of 24.8% of the variance in GPA can be accounted for. Inclusion of the Rainbow measures in Steps Two, Three, and Four represents an increase of about 9.2% (from 0.156 to 0.248) in the variance accounted for over and above the typical predictors of college GPA. Including the Rainbow measures without high school GPA, using only SAT scores as a base, represents an increase in percentage variance accounted for of about 10.1% (from 0.098 to 0.199). Looked at in another way, this means that the Rainbow measures roughly doubled prediction versus the SAT alone. Different ability-based styles of thinking, then, make a difference in predicting academic achievement beyond unitary measures of traditional general ability. These results suggest that the Rainbow tests add considerably to the prediction provided by SATs alone. They also suggest the power of high school GPA, particularly, in prediction, because it is an atheoretical composite that includes within it many variables, including motivation and conscientiousness. Group differences. Although one important goal of the present study was to predict success in college, another important goal involved developing measures that reduce ethnic group differences in mean levels. There has been a lively debate as to why there are socially defined racial group differences and whether scores for members of underrepresented minority groups are over- or underpredicted by SATs and related tests.15 Might it be because different ethnic groups, on average, show different ability-based styles of thinking as a result of differential socialization? There are a number of ways one can test for group differences in these measures, each of which involves a test of the size of the effect of ethnic group. Two different measures were chosen. First, consider numbers showing the impact of ethnic group on test scores (called omega squared coefficients). This procedure involves considering differences in mean performance levels among the six ethnic and racial groups reported, including European American, Asian American, Pacific Islander, Latino American, African American, and Native American (American Indian), for the following measures: the baseline measures (SAT-V and SAT-M), the STAT ability scales, the creativity performance tasks, and the practical-ability performance tasks. The coefficient indicates the proportion of variance in the variables that is accounted for by the self-reported ethnicity of the participant. The omega squared values were 0.09 for SAT-V, 0.04 for SAT-M, and 0.07 for combined SAT. For the Rainbow measures, omega squared ranged from 0.00 to 0.03, with a median of 0.02. Thus, the Rainbow measures showed reduced values relative to the SAT. Another test of effect sizes (Cohen's D) allows one to consider more specifically a representation of specific group differences. These results indicate two general findings. First, in terms of overall differences, the Rainbow tests seem to have reduced ethnic group differences relative to traditional assessments of abilities like the SAT. Second, in terms of specific differences, it seems that the Latino American students benefited the most from the reduction of group differences. African American students, too, showed a reduction in difference from the European American mean for most of the Rainbow tests, although a substantial difference seems to have been maintained with the practical performance measures. Important reductions in differences were seen for the Native American students relative to European American students. Indeed, their median was higher for the creative tests. However, the very small sample size suggests that any conclusions about Native American performance should be made tentatively. Although the group differences were not perfectly reduced, these findings suggest that measures can be designed that reduce ethnic and socially defined racial group differences on standardized tests, particularly for historically disadvantaged groups like African American and Latino American students. These findings have important implications for reducing adverse impact in college admissions. They suggest that different groups do have, on average, different patterns of ability-based styles. Data from Other Assessment Projects The principles behind the Rainbow Project apply at other levels of admissions as well. For example, Hedlund et al16 have shown that the same principles can be applied in admissions to business schools, also with the result of increasing prediction and decreasing ethnic group (as well as gender) differences by including tests of practical thinking in addition to the Graduate Management Admission Test. Stemler et al17 studied measurement of ability-based styles in the context of achievement testing. In this project, funded by the Educational Testing Service and the College Board, they asked whether the same principles could be applied to high-stakes achievement testing used for college admissions and placement. They modified Advanced Placement tests in psychology and statistics additionally to assess analytical, creative, and practical skills. Here is an example in psychology: A variety of explanations have been proposed to account for why people sleep: Describe the restorative theory of sleep (memory). An alternative theory is an evolutionary theory of sleep, sometimes referred to as the "preservation and protection" theory. Describe this theory and compare and contrast it with the restorative theory. State what you see as the two strong points and two weak points of this theory compared to the restorative theory (analytical). How might you design an experiment to test the restorative theory of sleep? Briefly describe the experiment, including the participants, materials, procedures, and design (creative). A friend informs you that she is having trouble sleeping. Based on your knowledge of sleep, what kinds of helpful (and health-promoting) suggestions might you give her to help her fall asleep at night (practical)? As in the other studies, the investigators found that by asking such questions, they were able both to increase the range of skills they tested and to substantially reduce ethnic group differences in test scores. Again, different ethnic groups seem to show different modal patterns of ability-based styles. In collaboration with our colleagues from a private preparatory school,18 we developed a supplementary battery of admissions assessments, which, in addition to taking into account students' Secondary School Admission Test (SSAT) scores, allowed this school to consider students' creative and practical styles. Specifically, we developed two assessments of practical competence (style) and two assessments of creative competence (style). One practical-competence task surveyed students' readiness to adapt to the new environment of a boarding school and navigate the new "rules and regulations" of a highly academically oriented and demanding prep school. In this task, students were expected to rate a number of solutions offered to them after they read a description of a practical situation. The second task included more generic situations descriptive of social aspects of student life. In this assessment, a problematic situation was depicted, and participants were asked to continue the story by identifying with the main character and developing the next step in the plot. Creative competence was also assessed with two different tasks. One task asked for a brief story under one of five proposed titles: (1) "Too much, too fast," (2) "The landing on the planet Vespa," (3) "Third time's the charm," (4) "The spy was not captured after all," and (5) "When the music stopped." The second task included different word problems describing various situations related to novel uses of scientific knowledge; students were asked to find a solution using some knowledge of the sciences. These four indicators were used in regression analyses predicting freshman GPA for a class of 152 students. When introduced into regression after SSAT Verbal, Quantitative, and Reading indicators, the practical-competence tasks doubled the prediction (from 12.0% to 24.4%), and the creative-competence tasks added an additional 4.4% (from 24.4% to 28.8%). Thus, tests such as the Rainbow Assessment do not benefit only members of ethnic minority groups. There are many students who come from the majority group, and even from well-off homes, who learn in ways that are different from those assessed by conventional standardized tests. These children may well have the abilities they need to succeed in life and even in school, but these abilities may not be reflected in scores on conventional tests. Our tests help identify such students. It is one thing to have a successful research project, and another actually to implement the procedures in a high-stakes situation. Can any of these ideas actually make a difference in practice? Practical implementation: The Kaleidoscope Project Tufts University has strongly emphasized the role of active citizenship in education. So, it seemed like a suitable setting to put into practice some of the ideas from the Rainbow Project. Tufts instituted Project Kaleidoscope, which represents an implementation of the ideas of Rainbow but goes beyond that project to include in its assessment the construct of wisdom.19,20 Tufts placed questions designed to assess wisdom, intelligence, and creativity synthesized19,21 on the 2006–2007 application for all of the more than 15,000 students applying for undergraduate admissions to arts, sciences, and engineering at Tufts. The questions were optional. Whereas the Rainbow Project was done as a separate, high-stakes test administered with a proctor, the Kaleidoscope Project was done as a section of the Tufts-specific part of the college application. It just was not practical to administer a separate, high-stakes test such as the Rainbow assessment for admission to one university. Moreover, the advantage of Kaleidoscope is that it moved Tufts away from the high-stakes testing situation in which students must answer complex questions in very short amounts of time under incredible pressure. The section was optional this past year, and students were encouraged to answer just a single question. As examples, a creative question asked students to write stories with titles such as "The end of MTV" or "Confessions of a middle-school bully." Another creative question asked students what the world would be like if some historical event had come out differently, for example, if Rosa Parks had given up her seat on the bus. Yet another creative question, a nonverbal one, gave students an opportunity to design a new product or an advertisement for a new product. A practical question queried how students had persuaded friends of an unpopular idea they held. A wisdom question asked students how a passion they had could be applied toward a common good. So, what happened? Some stakeholders were afraid that numbers of applications would go down; instead, they went up. Notably, the quality of applicants rose substantially. There were notably fewer students in what had previously been the bottom third of the pool in terms of quality. Many of those students, seeing the new application, decided not to bother to apply. Many more strong applicants applied. Other stakeholders were concerned that average SAT scores would go down and, perhaps, even plummet. Instead, they went up, rising to more than 1,400 (Verbal + Math) for the first time. The reason is that the new assessments are not negatively correlated with SATs. Rather, they just are not much correlated at all, one way or the another. The squared correlations of the Kaleidoscope assessments with SATs were all less than 0.1. In contrast, squared correlations with quality of extracurricular activities were in the 0.4 range. Merely doing the Kaleidoscope essays had a trivial effect on admission. But students who had an "A" (top rating) on the Kaleidoscope assessments were twice as likely to be admitted as those who did not. The assessments provided a quantified way of assessing ability-based styles of thinking that, in the past, had been assessed only in a more qualitative way. Finally, after one year of study, students who were top scorers on Kaleidoscope did just as well academically as did students with comparable high school records who were admitted for other reasons. In sum, adopting these new methods results in the admission of applicants who are more qualified, but in a broader way than was considered in the past. Perhaps most rewarding were the positive comments from large numbers of applicants that they felt our application gave them a chance to show themselves for who they are. After a number of years in which applications by underrepresented minorities were relatively flat in terms of numbers, this year they went up substantially. In the end, Tufts admitted roughly 30% more African American students than the year before, and 15% more Latino Americans. So, these results, like those of the Rainbow Project, showed that it is possible to increase academic quality and diversity simultaneously, and to do so for an entire undergraduate class at a major university, not just for small samples of students at some scattered colleges. Most importantly, the university sent a message to students, parents, high school guidance counselors, and others that it believes that there is a more to a person than the narrow spectrum of skills assessed by standardized tests, and that these broader skills can be assessed in a quantifiable way. One might wonder how one assesses answers to questions that seem so subjective. The answer is through well-developed rubrics. For example, we assess analytical responses on the basis of the extent to which they are (1) analytically sound, (2) balanced, (3) logical, and (4) organized. We assess creative responses on the basis of how (1) original and (2) compelling they are, as well as on the basis of their (3) appropriateness to the task with which the students were presented. We assess practical responses on the basis of how feasible they are with respect to (1) time, (2) place, and (3) human and (4) material resources. We assess wisdom-based responses on the extent to which they (1) promote a common good by (2) balancing one's own with others' and larger interests, (3) over the long and short terms, through (4) the infusion of positive (prosocial) values. Conclusions I have described in this article how the theory of successful intelligence has been used in a variety of settings to increase predictive validity and to reduce ethnic group differences. None of these settings are medical ones, and so it remains to be determined whether the theory has application to that setting. On the basis of the diversity of settings in which we have attained successful prediction and reduction of group differences, we are at least moderately optimistic that the same kinds of results could be obtained for admission to medical schools.

Referência(s)
Altmetric
PlumX