A Reliable Scale for Every Judge

On the NoMoreMarking dashboard you will now see a ByJudge script selection type.

The new ByJudge type was created in response to an enquiry from a PhD student. She wanted to create a scale of image preferences so she could see if there were relationships between an image scale and personality characteristics. She had tried using Likert scales, but found the majority of responses were 4-5.

In order to create a scale for every judge, we needed to ensure that a judge would not see the same pair of items again until every pair has been exhausted. Using our standard algorithm, if one judge sees a particular pair, then we mark that particular pair as judged, and it won’t be seen again by any judge until every pair has been exhausted. The algorithm assumes that judges will be relatively consistent as a whole.

To focus on judge differences we set up the ByJudge algorithm to ensure that every judge is presented with every pair of items, until the pairs are exhausted by that judge. At that point the algorithm simply selects pairs at random. In her case, she had 20 images, which yields 190 unique pairs for every judge

Of course a judge doesn’t need to complete all 190 pairs for a reliable scale at the individual judge level. At present, however, we don’t know how many judgements the judge will need to complete for a sufficient level of reliability. As ever there is a trade-off between reliability and efficiency. Once we have collected some data we will be able to answer that question.

As ever, you can take a look at the algorithm here – and if you can think of a better method of achieving her aims please join in and contribute to the code!

