When you hear people talk of Comparative Judgement, you will often hear them call it ACJ: Adaptive Comparative Judgement.
The Adaptive came in after Pollitt’s 2012 paper The Method of Adaptive Comparative Judgement. Now Pollitt inspired us to create No More Marking, and it seems churlish to take issue with the man who has brought Comparative Judgement back from the wilderness, but we really don’t think that Comparative Judgement needs to be Adaptive.
Comparative Judgement involves making decisions on pairs of items and deciding which of the pair better meets a criterion. For example, you may see two English essays and have to decide which is the better essay. As the judgements build up you start to make a scale of essays from best to worst.
Comparative Judgement is powerful for two reasons.
Firstly, we are better able to make distinctions between two stimuli than we are able to make absolute judgements. Ask anyone to weigh two objects with their hands and they will be able to tell you which is heavier. Ask them to make an absolute judgement on the weight, without using scales, and you will get a range of answers.
Secondly, you are able to distribute judgements among a large number of judges, so the bias that each judge brings to the process will be cancelled out, as long as you have a sufficient number of judges.
Comparative Judgement is powerful as it has a sound theoretical basis!
Now, for the adaptive…
Pollitt has suggested that once you start to build up a scale, and gain information about that scale, you can choose better pairs to present to judges. There is little point in wasting judges’ time, he argues, by presenting a pair of scripts that are of very different standards. You can, instead, choose scripts to judge that are of similar standards. He calls this process Adaptive Comparative Judgement. In essence the adaptivity is designed to improve the efficiency of the judging. By targeting judgements, he reasons, you can make fewer of them and still reach the same levels of precision.
In theory, this sounds fine, but in practice we have found adaptivity presents some real issues.
Firstly, there is an equity issue. We think that every person’s response has a right to be given an equal chance of being compared to every other person’s response, including the best. Even after a number of judgements have been made against responses, there remains a chance that any response can beat any other response, especially when a contest is being judged by a new judge. If you think about people, rather than questions, the equity issue becomes important.
Secondly, adaptive testing, the analogy on which Pollitt draws, has been fully explored in the literature, while Adaptive Comparative Judging has not. In adaptive testing it was found early on that tests of variable length could lead to bias in ability estimation due to heterogeneity and poor coverage in the item pool. We think more exploration of such issues is required in order to inform any adaptivity before it is used in high stakes situations using Comparative Judgement.
Finally, we are not convinced about the gains in efficiency created by adaptivity. In the Pollitt model judging is done in rounds, and one round cannot be started before all the other judges have finished that round. Then the judges all start on the next round. In that model, adaptivity can only be turned on after a number of rounds (plus waiting time) have been completed. So far from increasing efficiency we think that adaptivity may reduce efficiency because judges have to wait for other judges. Meanwhile you have to manage the process of rounds, which makes the process more expensive and less efficient.
Rather than work in rounds, we prefer our judges to work at their own pace. This means we don’t have to monitor them continually, and while we could switch on adaptivity towards the end of the judging, we don’t think the judges who are proceeding most slowly should be given the hardest judgements to make.
Overall, however, we think the best selection of pairs in the judging process is likely to be context dependent. That is why we are working with the University of Antwerp to open-source our key Comparative Judgement algorithms, so users will be able to make their own decisions.
In the meantime our algorithms ARE adaptive, but we only turn on adaptivity once every possible pair has been presented! That seems to us the fairest and most efficient way to administer Comparative Judgement.
If you still need to be convinced about efficiency, try out our Colours test. With 16 judgements you get a perfect scale of 8 items. Now we think that is efficient!
Dr Chris Wheadon
Dr Ian Jones