In this guest blog the founding father of CJ for education, Alastair Pollitt, responds to our previous post on “Taking the A out of CJ.”
Chris and Ian, I’m very grateful that we are allies in the campaign to Abolish Marksism – the false belief that we should try to count a student’s ability by chopping it up into little bits rather than to judge it all at once in a human way. Comparative judgement is so natural to teachers, calling on all the skill they have developed in the classroom, that it no longer surprises me when our ‘judges’ unanimously prefer it to marking. But let’s be clear why adaptivity is necessary in ACJ.
Of course it’s not always necessary. With plenty of judges, willing to make lots of judgements, and no deadlines to meet, you may do without it. But your three issues are in fact a pretty good basis for explaining why we found we do need adaptivity if CJ is to replace marking.
‘Equity’: This is a novel sort of ‘equity’ to me, and I can’t think up any moral basis for it (does a “response” have rights?). But where in the world does even a person have the right to be compared to any other? Not in sport, or politics, or employment, where you have to earn the right to be judged alongside your apparent superiors; nor in law where a suspect has the right to be tried by a jury of their peers – not just anyone. If you think about people rather than their responses ‘trial by peers’, or adaptive comparative judgement, seems a better right. More on this later.
CAT: Analogies must not be pushed too far – there’s a crucial difference between ACJ and Computer Adaptive Testing. In CAT each person takes a different test, with the assumption that the items all measure the same thing. But in CJ every person takes the same test: the adaptivity in ACJ is only in the scoring, not in the test, and the assumption that every judge values the same things is easy to monitor. As Ian has proved, you don’t have to change the test at all for (A)CJ, and the candidates will not see anything different because all the changes comes after the test is completed.
Thus the CAT problems you refer to can’t happen with ACJ. In my experience, every problem we’ve seen (that is, about 4 cases out of several hundred) has been traced back to a lack of consistency amongst the judges, a validity problem that applies equally to ACJ, CJ and Marking. There may be other problems to be found and, indeed, some may arise from the kind of adaptivity used. So of course more research is needed, into every aspect of CJ – but especially in designing the best adaptive algorithms for particular purposes.
Efficiency: You’ve a similar misunderstanding in the argument about this. The system I mostly use does indeed manage an ACJ exercise by ‘rounds’ but, just as the judging is not visible to the candidates, so the ‘rounds’ are not visible to the judges. They just carry on judging, unaware of what round they are in, at their own pace – they never “have to wait for other judges”. Rounds are merely a device for managing the data, and for monitoring and reporting progress. They cause no delays or expense.
And the A in ACJ is essential for one reason: fairness. Here’s an ethical principle that has a clear moral basis: “Every person has a right to have the quality of their work judged with the same accuracy”. Principles like this do apply generally in law, health care, education: for instance, exam boards try to mark your work as carefully if you’re a ‘D’ as they do if you’re an ‘B’, and they try to include questions aimed at the ‘E’s as well as the ‘A*’s. Sometimes the principle is modified, for instance where some key boundary score is more important than others, but fairness is always the aim.
Random pairing for CJ cannot deliver on this principle, as the measurement accuracy for middling candidates will always be much better than for extreme ones. Even if you can ensure that every script is compared 30-40 times, random CJ will inevitably leave some responses inadequately measured. That’s not fair.
So, to summarise, I think my principle of ‘fairness’ trumps your principle of ‘equity’, as I can justify it on moral grounds. And if paired comparison is to be at the heart of educational assessment, as we all agree it should be, then the pairings are too important to be left to chance. For most purposes, some sort of algorithm is essential to ensure that the assessment procedure is fit for its intended purpose. The adaptivity algorithms we use are designed to meet ths validity criterion: in essence, the A in ACJ is a toolbox-full of strategies that can be combined in various ways into an algorithm that will best serve the purpose of a particular assessment.
Let’s agree, at least, that some sort of rules can improve the choosing of pairings, rather than leaving them entirely to chance. Only statisticians worship randomness as an ethical principle.