Exploring the feasibility of a two-stage anchored-adaptive comparative judgement method for equating examinations

A guest post by Nathan Zoanetti, Victorian Curriculum and Assessment Authority

The Victorian Curriculum and Assessment Authority (VCAA) has been exploring the method of comparative judgement for equating different Victorian Certificate of Education (VCE) examinations from previous calendar years.

The rationale for exploring comparative judgement for this purpose relates to the inapplicability of more conventional common item or common person equating designs or general ability reference tests in the VCE assessment context: common item designs are precluded due to the public release of all examination forms after each examination period, thus leaving no secure common items for subsequent equating forms; common person designs are precluded due to the markedly different timing of the instruction and certification of the cohorts being equated, in addition to concerns around compromised levels of motivation if unscored equating forms are used; and, general ability reference tests may not guarantee the level of correlation required to equate exams across all learning areas or assessment modalities.

The potential of the comparative judgement method and its variants for monitoring the comparability of outcomes from different exams and for equating scores from different exams onto a common scale has previously been illustrated by Bramley (2005). More recently, the application of comparative judgement in a high-profile test equating context has been demonstrated by Humphry and McGrane (2015) within an Item Response Theory (IRT) framework. 

These examples demonstrate how comparative judgement provides a means for re-aligning two or more observed or latent score distributions derived from different tests undertaken by potentially non-equivalent (in terms of proficiency) groups of students via a common anchor metric.

In 2015, the comparative judgement equating process was trialled for three senior secondary subjects: Accounting, Chemistry and English as and Additional Language (EAL) using samples of archived examination responses from previous calendar years.

In addition to using the distributed pairing algorithm to establish the equating transformations, a two-stage anchored-adaptive approach was also trialled for Chemistry using the No More Marking platform.

The first stage of this approach involved applying the distributed pairing algorithm to a more or less uniform sample of Exam A scripts to estimate quality measures for each script. In this first stage, each script was included in around 30 comparisons. The second stage involved introducing a uniform sample of scripts from Exam B and adaptively pairing these with scripts from Exam A, having anchored the scale locations of all Exam A scripts based on the final estimates derived from the first stage of judgements.

One of the practical benefits of this two-stage approach is that the reference scale based on Exam A scripts can be established in advance of operational assessment periods for Exam B, at least assuming that Exam A precedes Exam B by some period, which will of course be the case in longitudinal monitoring of exam difficulty and cohort ability. This means that the equating of Exam B scores onto the Exam A scale can be expedited through a series of adaptive comparative judgements (see Pollitt, 2012) as soon as the Exam B scripts become available, making it easier to meet any Exam B reporting deadlines.

50 experienced assessors completed over 11,000 paired comparisons under a range of different script pairing algorithms to produce psychometrically reliable script quality scales for the three subjects in the trial. The judgment data were analysed in R (R Core Team, 2015) using the sirt package (Robitzsch, 2015). Separation reliability indices in excess of 0.9 were recorded for all three subjects. The separation reliability index was as high as 0.98 for Chemistry for the anchored-adaptive stage with the average number of comparisons per Exam B script being around 10. The correlation between the script quality measures derived through the anchored-adaptive method and the original scores awarded using prescribed marking guides was 0.96 for both exams.

Chained mean and chained linear equating transformations (see Kolen and Brennan, 2004) were modelled, along with corresponding bootstrap standard errors, revealing equating adjustments that were consistent in direction with other available external measures of the relative academic abilities of the groups undertaking the different examinations.

These results were seen as encouraging and will form the basis for further work throughout 2016.

The VCAA is grateful to Chris Wheadon for his assistance throughout the trial and for setting up and testing the functionality for the two-stage anchored-adaptive method at short notice.
Bramley, T. (2005). A rank-ordering method for equating tests by expert judgment. Journal of
Applied Measurement, 6 (2) 202-223.
Humphry, S. & McGrane, J. (2015). Equating a large-scale writing assessment using pairwise comparisons of performances. The Australian Educational Researcher. 42(4). pp. 443-460.
Kolen, M. J., & Brennan, R. L. (2004). Test equating: methods and practices (2nd ed.). New York: Springer.
Pollitt, A. (2012). Comparative judgement for assessment. International Journal of Technology and Design Education.  22(2), pp. 157-170
R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna,  Austria. URL http://www.R-project.org/.
Robitzsch, A. (2015). sirt: Supplementary Item Response Theory Models. R package version 1.8-9.   http://CRAN.R-project.org/package=sirt

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s