Artigo Revisado por pares

Investigating a judgemental rank‐ordering method for maintaining standards in UK examinations

2008; Taylor & Francis; Volume: 23; Issue: 3 Linguagem: Inglês

10.1080/02671520701755440

ISSN

1470-1146

Autores

Beth Black, Tom Bramley,

Tópico(s)

Advanced Statistical Methods and Models

Resumo

Abstract A new judgemental method of equating raw scores on two tests, based on rank‐ordering scripts from both tests, has been developed by Bramley. The rank‐ordering method has potential application as a judgemental standard‐maintaining mechanism, because given a mark on one test (e.g. the A grade boundary mark), the equivalent mark (i.e. at the same judgemental standard) on the other test can be determined. If the two tests come from different years then the standard from the earlier year can be applied to the later year. The current standard maintaining method used by Awarding Bodies in England, Wales and Northern Ireland, is the 'awarding meeting'. Here expert judgement takes place within the context of a variety of statistical information, including score distributions and hence likely pass rates. The rank‐ordering method, in contrast, involves harnessing expert judgement independently of any statistical information. The aim of this study was to investigate the extent to which the outcome of an awarding meeting could be cross‐validated using a rank‐ordering exercise. Furthermore, the study aimed to discover whether rank‐ordering produces similar results when the activity is conducted by post compared with a face‐to‐face meeting. The results showed that the outcomes of the postal exercise were closely replicated by the meeting exercise, indicating that the method is a reliable technique for capturing expert judgement. In terms of cross‐validation of the rank‐order outcomes with awarding data, there was some concurrence and some disparity at key grade boundaries. However, because the awarding meeting uses more information and different procedures from a rank‐ordering exercise, the outcomes should not be expected to be the same. The potential advantages and disadvantages of replacing the 'top‐down, bottom‐up' judgemental part of an awarding meeting with a rank‐ordering exercise are discussed. Keywords: examinationsassessmentstandard settingstandard maintainingrank orderThurstone Notes 1. General Certificate of Secondary Education (GCSE) qualifications are taken by all students in mainstream education at age 16 years in England, Wales and Northern Ireland. 2. Advanced Level General Certificate of Education (A level GCE) qualifications are perhaps the most popular post−16 educational route in England, Wales and Northern Ireland. They typically require two years of study beyond GCSE. 3. The Principal Examiner is usually responsible for both setting the question paper and standardising the marking. 4. In this article the term 'script' denotes a single candidate's written response to a single examination unit, not their entire assessed work for the qualification. 5. A ranking of N objects yields [N(N−1)]/2 paired comparisons. 6. OCR (Oxford, Cambridge and Royal Society of Arts) is one of the three unitary awarding bodies in the United Kingdom that administer GCSE and A‐level examinations. 7. The z‐statistic is the standardised residual – that is, the residual divided by its standard error. It is expected to approximate to a unit normal distribution. The residuals are obtained by comparing the observed value of each paired comparison with the expectation derived from the model. 8. In the interests of fairness, one would wish to replicate the awarding meeting to investigate whether this might reproduce the original outcomes. However, the awarding procedures would ensure that the same range of scripts was scrutinised (since the score distribution would not change), which would bias the study towards replicating the original outcomes. 9. Cresswell (Citation1997) has advocated using a purely statistical approach in awarding to avoid 'contamination' (not his word!) from the expert judgement in the case where the award meeting is maintaining a year‐on‐year standard in a stable specification. 10. This is the phenomenon occurring when the presence of others results in a decrease of individual effort. Sometimes this phenomenon is referred to as the Ringelmann Effect, describing the inverse relationship between the number of people in the group and the individual performance (Wilke and Van Knippenberg Citation1988). 11. Two or three scripts would be needed in order to guard against attrition of usable scripts through examiner scaling, return of scripts to centres, unphotocopiable handwriting, etc. 12. Standards Over Time reports can be accessed at http://www.qca.org.uk/12086_1509.html.

Referência(s)
Altmetric
PlumX