Does Comparative Judgement Need to be Adaptive (Part 2)

Following our discussions in this blog with Alastair Pollitt on whether or not Comparative Judgement needs to be adaptive, we decided to run some simulations.

We ran 100 simulations of the same 50 candidates with known true scores, receiving 500 judgements from reasonably consistent judges under the progressive adaptive method, and the distributed method. More details on these methods are here

For all methods, as the number of judgements increases, so does the correlation with the true score. The correlation hardly differs by method, generally reaching 0.88 after 500 judgements. At this point you may conclude that the script selection method is irrelevant.

A closer inspection however, reveals that the progressive adaptive method, with an acceleration parameter of 2, minimises the standard errors. Under this methodon a standardised scale with a mean of 100 and a standard deviation of 15, the estimation of the ability of every candidate is consistent to within around 6 points. The distributed method has a long tail, suggesting that this method may be poor at estimating the ability of some candidates.

Finally, an inspection of the standard errors by ability suggests, as Pollitt has predicted in this blog, that the standard errors of the very worst and the very best scripts are minimised by the use of an adaptive method.

So which script selection method should you use?

If you need to estimate the ability of the very best and the very worst candidates with precision, and you trust your judges, then you should choose the adaptive method. On we have set the acceleration parameter for the progressive adaptive method to 2, as this value appears to yield the lowest standard errors.

But beware! As we caution elsewhere, the adaptive method places great faith in the fastest judges, who get the easiest task, and it is under this method we think that a rogue judge entering the pot early could do the most damage! In a further blog we will consider the impact of rogue judges.

All the simulations reported here can be reproduced using the open source comparative judgement package that powers .


