Characterization of the Council of Emergency Medicine Residency Directors' Standardized Letter of Recommendation in 2011-2012
2013; Wiley; Volume: 20; Issue: 9 Linguagem: Inglês
10.1111/acem.12214
ISSN1553-2712
AutoresJeffrey N. Love, Nicole M. Deiorio, Sarah Ronan-Bentle, John M. Howell, Christopher Doty, David Lane, Cullen Hegarty,
Tópico(s)Diversity and Career in Medicine
ResumoThe Council of Emergency Medicine Residency Directors (CORD) introduced the standardized letter of recommendation (SLOR) in 1997, and it has become a critical tool for assessing candidates for emergency medicine (EM) training. It has not itself been evaluated since the initial studies associated with its introduction. This study characterizes current SLOR use to evaluate whether it serves its intended purpose of being standardized, concise, and discriminating. This retrospective, multi-institutional study evaluated letters of recommendation from U.S. allopathic applicants to three EM training programs during the 2011–2012 Electronic Residency Application Service (ERAS) application cycle. Distributions of responses to each question on the SLOR were calculated, and the free-text responses were analyzed. Two pilots, performed on five applicants each, assisted in developing a strategy for limiting interrater reliability. Each of the three geographically diverse programs provided a complete list of U.S. allopathic applicants to their program. Upon randomization, each program received a list of coded applicants unique to their program randomly selected for data collection. The number of applicants was selected to reach a goal of approximately 200 SLORs per site (n = 602). Among this group, comprising 278 of 1,498 applicants (18.6%) from U.S. allopathic schools, a total of 1,037 letters of recommendation were written, with 724 (69.8%) written by emergency physicians. SLORs represented 57.9% (602/1037) of all LORs (by any kind of author) and 83.1% (602/724) of letters written by emergency physicians. Three hundred ninety-two of 602 SLORs had a single author (65.1%). For the question on “global assessment,” students were scored in the top 10% in 234 of 583 of applications (40.1%; question not answered by some), and 485 of 583 (83.2%) of the applicants were ranked above the level of their peers. Similarly, >95% of all applicants were ranked in the top third compared to peers, for all but one section under “qualifications for emergency medicine.” For 405 of 602 of all SLORs (67.2%), one or more questions were left unanswered, while 76 of all SLORs (12.6%) were “customized” or changed from the standard template. Finally, in 291 of 599 of SLORs (48.6%), the word count was greater than the recommended maximum of 200 words. Grade inflation is marked throughout the SLOR, limiting its ability to be discriminating. Furthermore, template customization and skipped questions work against the intention to standardize the SLOR. Finally, it is not uncommon for comments to be longer than guideline recommendations. As an assessment tool, the SLOR could be more discerning, concise, and standardized to serve its intended purpose. Tras su creación en 1997, la carta de recomendación estandarizada (CRE) del Council of Emergency Medicine Residency Directors (CORD) se ha convertido en una herramienta crítica para evaluar a los candidatos para la formación en la medicina de urgencias y emergencias (MUE). Ésta herramienta no se ha evaluado desde los estudios iniciales asociados con su introducción. Este estudio caracteriza el uso actual de la CRE para evaluar si sirve su propósito intencionado de ser estandarizada, concisa y discriminatoria. Estudio retrospectivo multinstitucional que evaluó las cartas de recomendación de los candidatos alopáticos americanos en tres programas de formación de MUE durante el ciclo de solicitudes Electronic Residency Application Service (ERAS) durante 2011–2012. Se calculó la distribución de las respuestas de cada pregunta en la CRE y se analizaron las respuestas de texto libre. Se realizaron dos estudios piloto en cinco candidatos cada uno, que fueron de ayuda en el desarrollo de una estrategia para limitar la validez interevaluador. Cada uno de tres programas geográficamente distintos proveyó una lista completa de los candidatos alopáticos americanos a su programa. Tras la aleatorización, cada programa recibió una única lista de candidatos codificada a su programa seleccionado aleatoriamente para la recogida de datos. El número de candidatos se seleccionó para alcanzar un objetivo de aproximadamente 200 cartas de recomendación por sitio (n = 602). Este grupo lo formaron 278 de los 1.498 solicitantes (18,6%) a los colegios alopáticos, para los cuales se escribieron un total de 1.037 cartas de recomendación de las que 724 (69,8%) fueron escritas por urgenciólogos. Las CRE fueron 602, lo cual representó el 57,9% de las 1.037 cartas de recomendación escritas por cualquier autor y el 83,1% de las 724 cartas de recomendación que habían sido escritas por urgenciólogos. Trescientas noventa y dos de las 602 cartas de recomendación tuvieron un único autor (65,1%). Para la pregunta “valoración global,” los estudiantes fueron puntuados en el decil más alto en 234 de 583 solicitudes (40,1%; la pregunta no fue contestada por algunos), y 485 de 583 (83,2%) de las solicitudes fueron clasificadas por encima del nivel de sus pares. De la misma forma, más del 95% de todas las solicitudes se clasificaron en el tercio más alto en comparación con los pares para todas las secciones excepto la de “calificaciones para MUE”. Para 405 de las 602 CRE (67,2%), una o más preguntas se dejaron sin contestar, mientras que 76 de todas las CRE (12,6%) se personalizaron mediante el cambio de formulario estándar. Finalmente, en las 291 de las 599 CRE (48,6%), el número de palabras fue mayor que el máximo recomendado de 200. El grado de exageración que contiene la CRE limita su capacidad discriminativa. Es más, la personalización del formulario y el saltarse preguntas va en contra de la intención de estandarizar la CRE. Finalmente, no es infrecuente que los comentarios sean más extensos que las recomendaciones de la guía. Como herramienta de valoración, la CRE podría ser más discriminatoria, concisa y estandarizada para servir al propósito para el que se diseñó. For many years emergency medicine (EM) program directors (PDs) struggled with the limitations inherent in traditional narrative letters of recommendation. In 1995, in response to growing concerns, the Council of Emergency Medicine Residency Directors (CORD) established a standardized letter of recommendation (SLOR) task force whose goal was to create “a method of standardization for letters of recommendation.”1 After a 2-year development process, the SLOR was introduced in 1997. The goal of the task force was to develop an assessment instrument with three basic tenets: 1) standardization—the same essential questions are asked and answered for every candidate; 2) time-efficient to review—a clear and concise synopsis for efficient evaluation; and 3) discriminating—a template to convey comparative performance data on candidates in specific areas important to clinical practice.1 Of paramount importance, the SLOR's format was designed to limit “clerkship grade and adjective inflation,” which was believed to be “rampant” at the time. In the first few years after the adoption of the SLOR there were a number of commentaries and studies related to its use.2-6 In general, it appeared that the SLOR accomplished many of its intended goals. Since that time, the SLOR has been universally adopted and become an expected component of any application for EM training. The SLOR template used today contains minor changes from the original format released in 1997. Approximately 15 years after its introduction, the SLOR task force was reestablished to evaluate whether or not the SLOR, as it is used currently, adheres to its original tenets. Secondary goals included characterization of SLOR use and identification of opportunities for improving this assessment tool. This article reports these results. This was a retrospective, multi-institutional study based on U.S. allopathic applications to EM from the 2011–2012 Electronic Residency Application Service (ERAS) application cycle. Institutional review board (IRB) approval was obtained from each of three participating institutions prior to the collection of data. Due to the anonymous nature of this study, each affiliated IRB approved the study as exempt from informed consent. Data were collected from applicants to three geographically diverse EM training programs: Medstar Georgetown University Hospital/Washington Hospital Center, Washington, DC; Oregon Health & Science University, Portland, Oregon; and The University of Cincinnati, Cincinnati, Ohio. Two of these programs have 3-year formats and the third is a 4-year program. To first determine interrater reliability, the SLORs of five applicants were randomly selected using a computerized random-number generator that was then checked for duplicate entries by the American Association of Medical Colleges (AAMC) identifier. Each of the three primary study authors independently reviewed the letters of recommendation, and the investigators' coding of the data were compared. There were 11 SLORs in this pilot, with 597 data points. All three authors agreed in 85.9% of outcomes. Disagreements were resolved by consensus, and the coding protocol was amended to avoid future discrepancies. A second pilot of five additional candidates applying to all three programs was randomly selected (using the same method described). Once again all three authors independently evaluated the letters of recommendation. In this cohort there were 13 SLORs with 705 data points for comparison. All three authors agreed in 94.2% of outcomes. A final meeting focused on remaining issues and resulted in a consensus plan for dealing with ongoing areas of disagreement. For the study itself, we targeted a minimum sample size of 600 SLORS. We projected that with 600 letters, we would have power of 0.8 to detect differences in proportions of 11% to 14%, depending on the numbers of subgroups compared (i.e., two to four projected subgroups). An initial, informal review of 20 random applicants yielded an average of two SLORs per applicant on two separate occasions. This estimate was used when randomly generating 100 unique applicants from each of the three participating programs with Microsoft Excel for Macintosh, v. 14.2.3 (Microsoft Corp., Redmond, WA) to obtain approximately 200 SLORs from each site. Descriptive statistics including means, proportions, and 95% confidence intervals (CIs) were calculated using IBM SPSS Statistics for Macintosh, v. 20 (IBM Corporation, Armonk, NY). To reach the goal of approximately 600 SLORs, the ERAS applications of 287 allopathic applicants were reviewed. This represents 18.6% of the 1,498 U.S. allopathic seniors applying to EM (287 applicants minus nine interns) in 2011–2012.7 There were 1,037 letters of recommendation. Emergency physicians authored 724 (70%) of these letters, 602 of which were SLORS. Of the 602 SLORS, 198 (32.9%) were from The University of Cincinnati, 203 (33.7%) were from Medstar Georgetown University Hospital/Washington Hospital Center, and 201 (33.4%) were from Oregon Health & Science University. The SLORs were based on EM rotations 98.3% (592 of 602) of the time, with only 1.7% (10 out of 602) based on related EM electives (e.g., research, ultrasound). On occasion, a SLOR is written for an intern rotating through the emergency department (ED). Fourteen of 588 SLORs (nine applicants) were written for interns or residents, of which only four were completely based on rotations in the ED as a resident. The other nine were based on rotations during medical school the year before, four of which included additional information since that rotation. The nature of contact was “extended direct clinical contact” in 73.2% (431 of 589) of all SLORs, in 76.3% (95% CI = 71.8% to 80.5%) of single-author SLORS, and in 67.0% (95% CI = 60.0% to 73.5%) of group SLORs. The majority of authors (57.9%, 319 of 551) indicated that they knew the applicants for longer than 1 month. Single-author SLORs made up 65.1% (392 of 602) of all SLORs. The most common authors of this type of SLOR were clerkship directors (CDs; 36.2%, 142 of 392), EM faculty (29.6%, 116 of 392), assistant/associated PDs or assistant clerkship directors (APDs/ACDs; 11.5%, 45 of 392), PDs (10.2%, 40 of 392), and emergency physicians not affiliated with the training program (5.9%, 23 of 392). Group SLORs made up 34.9% of all SLORS with the most common contributing authors being CDs (80%, 168 of 210), PDs (79%, 166 of 210), and APDs (47.6%, 100 of 210). The vast majority of SLORs reported some variation on the honors/pass/fail grading system (95.8%, 566 of 591), with 4.2% of the SLORs coming from rotations that grade on a pass/fail system. Table 1 provides a breakdown of EM clerkship grading as well as a reported “historic” grading breakdown from the previous academic year. Clerkship data reported that 29.9% (160 of 535) of the SLORS were from institutions that have 50 or fewer participants in the EM rotation per year, 39.1% (209 of 535) between 51 and 99, and 31.0% (166 of 535) have 100 or more students participate annually. Section B of the SLOR template pertains to “qualifications for EM.” The results for SLOR questions B1–B4b can be found in Table 2. Question B5a asks, “How much guidance do you predict this applicant will need during residency?” Authors of the SLOR answered “almost none” 46.1% (276 of 599) of the time, “minimal” in 49.8% (298 of 599) of instances, and the remaining 4.2% (25 of 599) indicated that “moderate” guidance was required. The results of “global assessment” or section C of the SLOR are summarized in Table 3. In addition, SLORs were evaluated to determine whether the author's formal position was related to the percentage of times a global ranking of “outstanding/top 10%” was provided. Table 4 summarizes these results. A review of the written comments focused on the total length by number of words. In 48.6% (291 of 599) of SLORs, the word count was greater than the recommended 200-word limit. This was true in 47.9% (187 of 390) of single-author SLORs and 50.0% (105 of 210) of group SLORs. A list of faculty comments from the rotation was included in 22.0% (64 of 227) of those longer than 200 words and in 7.8% (24 of 283) of those that were less. General comments describing the basis for grading, clinical environment, or clerkship logistics were included in 35.4% (103 of 188) of those longer than 200 words and in 8.1% (25 of 282) that were less. In 2% of cases (12 of 602), the written comments were hand-written. Table 5 reviews the percentage of times each question was not answered. Question C1b, which documents the author's profile of global ranking awarded last year, was skipped or did not provide the entire profile requested 80.8% (458/567) of the time. The SLOR template was “customized” or changed in 12.6% (76/602) of instances. Six percent of the time (36/602), the final box was not checked indicating that the applicant did not waive the right to review the SLOR. The goal of the original SLOR Task Force was to create a letter that was standardized and concise while providing comparative, discerning data. Shortly after the SLOR was adopted, Crane and Ferraro8 surveyed PDs and found that the most important factors in resident selection were the core clinical clerkship grades, EM rotation grade, and letters of recommendation. This fact was echoed by Wagoner and Suriano,9 who concluded that for EM PDs, the EM rotation grade and core clinical grades were “near critical” for the successful applicant. At that time, the SLOR Task Force argued that one of the strengths of the SLOR was that it contained two of the three major factors, namely, a letter of recommendation and the grade on an EM rotation.1 EM rotations often take place during the fourth year of medical school and, as a result, these grades are not always available on the “Dean's letter” or Medical Student Performance Evaluation. In 2000, a study of applicants to EM training from a single program documented that 22% of all letters of recommendation were SLORs.4 Since 2000, the SLOR has become integral to successful applications for training in the specialty, comprising 58.1% of all letters written in support of residency applications and 83.1% of the letters written by emergency physicians. The basis for nearly all SLORs is a one-month clinical rotation in the ED. One reason the SLOR has such potential value is that in most instances, authors base their conclusions on “extended direct contact” working clinically with the applicant. More often than not, the author has known the applicant for longer than 1 month, providing the potential for additional observations and experiences that could further enrich the SLOR. The ERAS application is rife with information regarding an applicant's cognitive abilities, including grades, USMLEs, AOA, etc. The SLOR contributes to an understanding of cognitive abilities as they pertain to the ED through assessments such as the ED rotational grade, differential diagnosis, and decision-making relative to peers. The SLOR author is also in an ideal position to reflect on the qualities and traits (e.g., noncognitive domains such as dedication, recognition of limits, altruism, and ability to work with a team) so important to success in EM. These factors can sometimes be difficult to discern from the ERAS application otherwise. Arguably, the most important tenet of the SLOR is its ability to compare the performance of applicants to one another in specific areas. However, our data suggest that grade inflation is an issue in the SLOR. For many of the questions throughout the SLOR, comparative information appears to be facilitated through well-anchored criteria relative to peers. Rankings are based on adjectives linked to quantifying anchors. For instance, outstanding represents the “top 10%,” excellent the “top third,” very good the “middle third,” and good the “lower third,” in comparison to other EM applicants. A review of “qualifications for EM (Section B)” demonstrates that in the area of “commitment to EM,” “work ethic,” “differential diagnosis,” and “personality,” the vast majority of applicants received outstanding or excellent rankings (Table 2). Fewer than 5% received very good (at level of peers) or good ratings. Under global assessment, 40.1% of candidates were ranked as “outstanding/top 10%” relative to peers. These findings are nearly identical to those of the 2000 report.4 Unpublished data from the task force survey of SLOR authors reveals that rarely, if ever, do SLOR authors feel they inflate grades. In addition, SLOR authors report that in some instances they use their “gestalt” to select the adjective of choice without reference to the associated quantifying anchor. This may explain why nearly all applicants are rated as either outstanding or excellent. Rarely does the SLOR indicate that an individual is “at the level of their peers” or lower. It is likely that most applicants select faculty to author their single-author SLORs based on the likelihood of those faculty members writing favorable letters. The results summarized in Table 4 demonstrate that the authors' job positions plays a role in how discerning they are when assigning a “global assessment” of “outstanding/top10%.” The data suggest that as educators become more experienced in writing SLORs, they become more discriminating in grading them. Not surprisingly, PDs are among the most discriminating of the SLOR authors. An understanding of this difference based on author type is important for those evaluating SLORs. It is also worth noting that the most discriminating authors rank 32.5% of candidates as globally functioning in the top 10% of applicants. The AAMC has encountered very similar challenges with grade inflation regarding Dean's letters.10 As a result, they have instituted a number of changes including adopting the new name “Medical Student Performance Evaluation” reflecting the intention of being an evaluation and not a recommendation.11 The SLOR has always been intended to be an assessment tool. Referring to it as a “recommendation” creates a conundrum for the author who is likely conflicted about whether he or she is assessing or recommending the applicant. Although not necessarily the result of grade inflation, 56.7% of those applying were given grades of “honors” or the equivalent for their EM rotations. In contrast, SLOR authors reported that 27.1% of students received grades of “honors” the previous year (Table 1). This discordance may be explained, in part, by the fact that EM is a required clerkship at some schools and not others. Consequently, last year's grades may include the performance of all students and not just those applying to EM residency programs. As an instrument designed to assess performance relative to EM bound peers, the SLOR report of last year's grades creates the misperception that clerkship grades are more discriminating among applicants than they actually are. A second tenet of the SLOR is to provide a concise synopsis of the applicant. Toward this end, instructions associated with the SLOR have traditionally requested that the narrative written comments be kept at a maximum of 150–200.1 Early on, work by Girzadas et al.5 demonstrated that the average time to interpret a SLOR was 16 seconds, compared to 90 seconds for a narrative letter. In addition, the SLOR offered better interrater reliability than traditional narratives, regardless of the level of experience of the interpreter. These advantages are particularly important in a time when many programs receive over 1,000 applications annually. The current study found that the written comments of approximately half of all SLORS (48.6%) were over the 200-word limit. From our experience, there has been a trend over recent years to provide additional information in this section that may include a list of faculty comments from the EM rotation and additional information about grading, the rotation's clinical environment, or both. The data from this study reveal that the inclusion of random faculty comments from the rotation is three times more common in written comments greater than 200 words. Although staff comments may highlight aspects of behavior that emphasize an important trait, a list of comments without context falls short of providing the concise synthesis the written comments were intended to represent. Likewise, an explanation of grading or the clinical environment at that institution is three to four times more common in written comments that were longer than the 200-word limit. Although this information may frame the applicant's educational performance, it risks discounting the precise focus on the candidate that SLORs strive to achieve. A thoughtful written narrative puts the overall SLOR and an applicant's candidacy into perspective. Essential to this conversation is a discussion of noncognitive attributes, such as conscientiousness, intellectual curiosity, compassion, professional maturity, and leadership that are important in predicting future success as a resident and physician.12-16 Additionally, this narrative should clarify any questions or issues in the SLOR. Although not quantified in this study, it has been common from our experience for faculty to express concern that low rankings (when provided) are seldom explained in the written narrative. Finally, the SLOR was developed as a standardized instrument to limit the variability based on style and terminology found in narrative letters, as well as to provide comparative data regarding key performance parameters for every applicant. Several issues were encountered that appear to have an effect on standardization. Modification of the SLOR template is one such factor. Although these changes are likely well-intentioned, template modifications increase the difficulty of comparing SLORs among applicants. Although modifications may be warranted, they should be based on consensus opinion, so that changes are adopted by all SLOR authors and not based on individual preference. The majority of SLORs have one or more questions that are left unanswered, which also negatively affects standardization. There are several possible reasons why questions may have been skipped: 1) the author could have mistakenly missed an item; 2) the question may be vague, as appears to be the case with “One Key Comment,” which was skipped 36.4% of the time; and 3) the instructions may be unclear about what is being requested. For instance, the “global assessment profile last year” did not provide the complete profile 80.1% of the time. Understanding which questions prove problematic for the authors may help to improve compliance through changes made in specific questions or instructions. Educating authors as to the intention and importance of each question may also improve compliance. The group SLOR is a relatively new format that constitutes approximately one-third of all SLORs. Although there is some variation, group SLORs are generally written by several members of the program leadership, typically the PD and CD, who come to consensus regarding an applicant. In comparison to single-author SLORs, group SLORs are less likely to base their evaluations on “extended direct clinical contact.” Consequently, group SLORs either use feedback from other faculty or are based on more constrained interactions. Another difference noted was that group SLORs appear to be more discriminating in providing global assessments. For example, group SLORs provided significantly fewer top 10% rankings than any single-author SLOR with the exception of those from PDs (Table 4). In addition, a ranking of very good/middle-third made up 25.5% of group SLOR evaluations. By comparison, the very good or middle-third rankings made up 9.8% of all single-author SLORs. One possible explanation for this outcome is that applicants inject bias into the single-author SLOR process by choosing the authors themselves. Group SLORs, on the other hand, often provide the entire grading profile for EM-bound applicants, whether a letter was requested or not. Multiple factors appear to be responsible for the issues encountered in the SLOR. Two related studies are taking place to evaluate the perspectives of the SLOR authors and PDs who evaluate SLORS. The intention is to gain a global perspective on issues related to this instrument. Questions such as, “Just how valuable is the SLOR?”, “To what degree is it limited by the issues raised in the current study?” and “What specific changes need to be made?” are best addressed by the upcoming PD survey. It is clear that the template could be improved, as could education and ongoing mentorship on how to complete a SLOR. Consideration should be given as to whether there should be a name change to “standardized letter of evaluation” (SLOE), which is a more appropriate and less confusing name for this instrument. Changes in the name and template, and additional education regarding completion, all have the potential to improve compliance with the three basic tenets of the SLOR. Yet, grade inflation is the most pervasive issue. Our sense is that a cultural change regarding how we view our roles as educators and the assessments we provide will be required to significantly affect this issue. The great majority of applicants to EM have the potential to succeed in our specialty. In addition, different programs value different attributes in a candidate. The ideal SLOR (SLOE) would 1) acknowledge that every applicant has unique strengths and challenges; 2) take the time to provide honest, insightful assessments, and written comments; and 3) understand that true potential is reflected in qualities and traits from noncognitive domains and not simply grades or rankings. It is possible that there were errors in data entry, as the data were not rechecked after entry into the spreadsheet. Although every effort was made to ensure interrater reliability, it is possible our agreement degraded during the study. The study is limited in its ability to discern why SLOR authors acted as they did in filling out the letters, although the aforementioned concurrent studies of SLOR authors and PDs should better answer this question. All three of the training programs participating in this study were university-based. Consequently, our results may have less relevance to community-based programs. The results of this study demonstrate that the standard letter of recommendation in its current use is limited in its ability to be discerning. There is also a question of whether recent changes in its use have affected its ability to be concise and standardized. Programs should be aware of these findings as they attempt to gauge how exemplary an applicant truly is.
Referência(s)