Artigo Acesso aberto Revisado por pares

The Effects of Examiner Background, Station Organization, and Time of Exam on OSCE Scores Assessing Undergraduate Medical Studentsʼ Physical Examination Skills

2000; Lippincott Williams & Wilkins; Volume: 75; Issue: Supplement Linguagem: Inglês

10.1097/00001888-200010001-00031

ISSN

1938-808X

Autores

Christopher J. Doig, Peter H. Harasym, Gordon H. Fick, John S. Baumber,

Tópico(s)

Meta-analysis and systematic reviews

Resumo

Since 1975, objective structured clinical examinations (OSCEs) have gained widespread acceptance as a method of making reliable assessments of clinical performance.1 Standardized patients (SPs) function as patient, teacher, and evaluator by using their bodies as teaching and evaluation material. SPs can be asymptomatic, have stable findings, or be trained to simulate physical findings. SPs can be taught to portray a variety of standardized clinical presentations. Their participation in teaching and evaluating the complex clinical skills included in OSCEs has been well established.2 Research has demonstrated that multiple SP stations within the OSCE format may generate scores that vary greatly in reliability, from 0.20 to 0.95.3,4 With large fluctuations in scores' reliabilities, research efforts have focused on the variables that can decrease or enhance the reliability of measurement. For example, inter-rater reliability was found not to be a deterrent to consistent measurement, and correlations generally varied from 0.80 to 0.90 between observers and raters when case-specific checklists were developed and if the items reflected observable behaviors. Due to the case-specificity phenomenon described by Elstein, many cases are generally needed to assess clinical competency within a defined problem (e.g., chest pain).5 In other words, quality of performance on one case is a very poor predictor of performance on another.6 However, if a single attribute is assessed, the number of cases required to attain reliable scores can be decreased (e.g., ten focused cases are required to assess the general skill of history taking, eight cases for physical examination, and 25 cases for differential diagnosis).7 Most OSCE stations employ a single case with a single SP and a single observer. However, because of the cost of OSCEs, efficiency would favor a station organized with two cases portrayed by a single SP. There are no research findings to indicate whether this organizational structure could adversely affect the reliability of measurement of an OSCE candidate's performance. Furthermore, OSCEs often use examiners from varied clinical backgrounds (e.g., residents, specialists, or family physicians). Given the importance of the OSCE's evaluation format and its predominant use for teaching and evaluating clinical skills, there is a need to determine whether the reliability of scores would be compromised by a rater's background, a station's organization, and the time of examination administration. Method Course Overview. The University of Calgary medical undergraduate program is three years in duration, with 11 instructional months per year. The first two years consist of "systems"-based courses using a problem-oriented curriculum that is taught in didactic lectures and small-group sessions. There is also a longitudinal medical skills course focusing on professional development and interdisciplinary skills, including a supervised setting for students to be instructed in physical examination. A "core document" given to each student provides detailed objectives for each physical examination maneuver. A standard physical examination textbook is recommended, and each student is provided with a six-hour video that shows local clinical experts demonstrating physical examination maneuvers. The instruction format is by small group. Preceptors are family physicians, specialists, or senior medicine residents. All small groups use SPs as instructional models. Further instruction in physical examination is carefully integrated into "clinical correlation" sessions within the systems courses. These sessions are organized so that the clinical correlation sessions build in an iterative fashion on skills learned in the sessions of the medical skills course. The instruction is also by small group. However, all preceptors are specialists within the area, and they provide patients as instructional models. These sessions expose students to clinical findings relative to each system and permit examination techniques to be observed and corrected by a clinical specialist within the area of study. At the end of the second year, the students take a certifying OSCE, the successful completion of which is a requirement for promotion into clinical clerkship (third year). OSCE Station Development. The second-year OSCE consists of ten physical examination stations randomly selected from a bank of 44 stations. Each station tests one physical examination maneuver, and all were developed by one author (CJD) using the approach described. All maneuvers were selected from the core document's enabling objectives. Each maneuver was broken down into individual steps as outlined in the course's textbook. Each of these steps was identified as an item on a computerized examination score sheet. Criterion-based scoring was used, with each item scored as 0 (omitted or incorrect), 1 (partially correct), or 2 (correct).8 Face and content validity of each checklist was established by review using a core group of physicians: five course preceptors, five medical educators with expertise in evaluation, five physicians with expertise in clinical teaching, and five specialists. The final content of each checklist and the minimum performance level (MPL) for each station were determined by consensus. It has previously been demonstrated that the validity of identifying the important items included in an OSCE station is superior when performed by a group of faculty compared with one individual.9 Each station had been used in previous OSCEs, and the examination's properties established. Examination Process. The medical skills examination included OSCE stations on history taking, physical examination, medical bioethics, and culture—health and illness. The examination totaled 3.5 hours, one hour of which was the physical examination section. The examination was conducted in one morning and one afternoon session. Each candidate completed ten physical examination maneuvers. At each station, there was one examiner and one standardized patient per pair of maneuvers. At each station, there was a short history to provide clinical context for each physical examination maneuver and students were given five minutes to demonstrate the first examination maneuver. The students then had one minute to review a short history for the second maneuver, and then five minutes to complete the second maneuver. At the end of 11 minutes, the students were given one minute to rotate to the next station (located in a separate examination room immediately adjacent to the preceding station) and to review the history for the first of the two maneuvers for the subsequent station. The physical layout of each station was standardized, with the patient dressed in appropriate examination apparel (but not draped or positioned for the examination), an examining table, necessary equipment on an adjacent table, and the examiner to one side. Physical examination stations were grouped into two streams: Stream A paired maneuvers in one station that were from the same system or anatomic region, or that required a similar physical exam skill; Stream B paired physical exam maneuvers that were not similar in region or skill examined. The pairings and sequence of examination maneuvers in Stream A were spleen and ascites, minimental status exam and median nerve, jugular venous pulse (JVP) and peripheral arterial system, shoulder and cervical spine, and visual fields and lung surface anatomy (the final pairing representing an understanding of clinical correlative anatomy). The pairings and sequence of examination maneuvers in Stream B were cervical spine and JVP, ascites and peripheral arterial system, lung surface anatomy and median nerve, mini-mental status and visual fields, and shoulder and spleen exams. Each stream ran in parallel during the morning and afternoon sessions. The pairings and examination maneuver sequences within the two streams remained constant between the morning and afternoon sessions. Each examiner was a physical examination course preceptor. Two weeks prior to the exam, the examiners were sent the following station-specific information: a photocopy of the maneuver-specific objectives, a photocopy of the textbook describing the examination, and the station checklist. Each examiner was asked to review the appropriate section of the videotape (the videotape had been previously provided). An instructional session was held with all examiners to review the stations' expectations, checklists, and performance, and to discuss concerns. The examiners were not aware of the method of station validation, or the stations' minimum performance levels (MPLs). Six examiners were internal medicine residents, eight were family practitioners, and six were specialists. An administrative assistant, unaware of the study's hypotheses, randomly allocated both the examiners and students to Streams 5 A and B, and times of examination (A.M. or P.M.). Statistical Analysis. We hypothesized that the type of examiner, the stations' pairings of maneuvers that required similar content knowledge (extrapolated as being from the same examination systems), and times of examination would not contribute significant variance to the overall measure of examination reliability. For analysis, we used the general estimating equation (GEE) method, a modification of the generalized linear model (GLM).10 GEE modeling is a robust and validated method of random-effects multivariate modeling that estimates general linear models but also permits a priori specification of a within-student correlation structure. In summary, the model provides an analysis of variance, but permits control of the potential effect of unequal distribution of data and the necessity to account for repeated measures. We used the exchangeable correlation structure within the GEE method to estimate the effects of the individual covariates (and any interactions) on the dependent variable of student performance.10 As the sequence of examination maneuvers at each station was held constant within each stream, this was not included in the final analysis model, nor did we model within-examiner correlations. All analyses were performed with a statistical software package. Results Sixty-nine of 70 eligible students completed the examination: 35 were randomized to Stream A, and 34 to Stream B. The examination was structured, based on the availability of standardized patients, to have an unequal distribution between morning and afternoon sessions. Of the 69 students, 40 students were assigned to the morning examination, and 29 to the afternoon examination. Six examiners were residents, eight examiners were family physicians, and six examiners were specialists. The examiners were equally distributed between both streams and between morning and afternoon sessions. The alpha coefficient for the examination was 0.84. The MPL for the examination was 66.85%, based on an equal weighting of the MPLs from the ten stations. Sixty-five of the 69 students were rated satisfactory on the overall physical skills examination. The mean performance was 76.81% ± 7.35 (mean ± SD). The range was from a low score of 56.51% to a high score of 92.28%. The performances at the individual stations are presented in Table 1. The overall mean score for candidates observed by senior internal medicine residents was 75.55%, that for candidates observed by family physicians was 79.22% (p = 0.07 compared with residents or specialists), and that for candidates observed by specialists was 75.28% (p = 0.38 compared with residents). No practical difference was observed in the candidates' performances by stream assignment: Stream A 77.00% and Stream B 76.61%. No practical difference was observed between the performances of candidates during the morning sessions (77.51%) and candidates during the afternoon sessions (76.00%). There was no within-stream between-examiner effect, and no within-time of examination between-examiner effect demonstrated. An unexplained difference was observed between the interaction of stream assignment and time of examination: morning session Stream A = 74.50%, Stream B = 80.52%, and afternoon session Stream A = 80.59% and Stream B = 71.40%. This observed interaction could not be explained by an effect of examiners. Given that the SPs and the pairings and sequences of examination maneuvers within the stations did not change, and in the absence of an alternate plausible explanation, the observed interaction was presumed to be due to a random effect of individual candidate performances.TABLE 1: Summary of Individual Station and Overall Examination Results on a Ten-station Physical Examination Skills OSCE*Conclusions Using a sound research design and robust analytic techniques, there was no evidence from this study that the variables—station organization, time of examination, and clinical background of examiner—contributed significant variance to the overall reliability of an OSCE assessing physical examination skills. With two parallel streams, and therefore two SPs' simulating the same physical examination maneuver, we assessed and found no difference in the between-SPs' mean value (form-within-case difference, as previously suggested by Battles11) for each physical examination maneuver (data not presented), which supports a conclusion that bias in our results was not introduced by the two SPs' demonstrating the same maneuver. Our assessment of only physical examination maneuvers is similar to the study of Kowlowitz and colleagues and that of Li and colleagues, and supports the reliability of our examination.12,13 The difference in examiners' performances between family physicians and internal medicine residents or specialists did demonstrate a trend toward significance, and the absence of a statistically different result may have reflected a type II error. The effect of the examiner's background on rating students' performances requires further study. Though OSCE examinations have gained widespread acceptance, major practical impediments remain in their cost and their labor-intensive organization. Reznick estimated the total costs for developing an OSCE and administering it to 120 students in a single medical school to be from a high of $104,400 to a low of $59,460, or $496 to $870 per student (Canadian denomination—CND).14 For administering the exam only, costs ranged from $19,200 to $34,500 (CND) if examiners and SPs were paid, or from $16,500 to $19,200 (CND) if only SPs were paid (both estimates include catering costs for both examiners and SPs). In previous examinations using ten physical examination maneuvers, but without pairing of maneuvers within one station, we required 40 examiners and 40 standardized patients. The large numbers of examiners and SPs were a significant cost and administrative burden for our examinations, and they were important factors in our adopting the paired station strategy. In two previous examinations without paired stations, these examinations had an average alpha of.76. Our current study's findings support the premise that the pairing and sequencing of stations will not reduce the reliability of the assessment of a candidate's performance. Reorganizing the assessment of physical examination skills within an OSCE by station by using maneuver pairing may contribute to improvement in overall efficiency and provide significant cost savings by reducing the numbers of SPs and examiners needed. Whether this can be applied in the assessment of other clinical skills in an OSCE requires further evaluation.

Referência(s)
Altmetric
PlumX