How can we improve the efficiency of Comparative Judgement?
Pollitt (2012) suggested that an adaptive procedure could lead to efficiency gains, a claim we have discussed in this blog. Under an adaptive procedure scripts should only be compared to other scripts that are relatively close to their standard. In this way the information gained from every comparison will be maximised.
The problem with applying the adaptive procedure to CJ is that all the scripts start out with an unknown standard. Only through comparison will the standard become apparent.
The question becomes, therefore, at what point can the estimated standards of the scripts be trusted enough to start to use them in the selection of pairs?
Start too early, and you risk major misclassification errors – a very weak script could be classified as strong. Start too late and you risk multiple minor misclassification errors – the rank order is broadly correct, but the error surrounding the very best and worst scripts is quite high.
One solution, which we have now implemented, is the progressive adaptive method (Barrada, Olea, Ponsoda, and Abad, 2008, 2010; Revuelta and Ponsoda, 1998). The progressive method consists of the information expected from a comparison, and a random element.
The information expected from every possible comparison is first converted into a probability of selection. The pairs offering the highest information will have the highest probability of selection.
The random element transforms this probability distribution into a purely random distribution at the start of a test, but progressively decays throughout the test until at the end the original probability distribution remains. The importance of the random element, and the rate of decay can both be specified.
So, while the standards of scripts are relatively unknown at the start of a test, the adaptive element can be minimised. As the standards become known, the adaptive element will progressively increase and the random element will decay.
For example, consider a simple problem. You wish to use CJ to rank order the numbers 1 to 6 in ascending order. The first set of comparisons will consist of random pairs, as all scripts have the same true score of 0. Following three judgements, we may have a set of true scores similar to the following:
Script

1

5

2

6

3

4

True Score

5.05

5.05

4.15

4.15

4.15

5.05

At this point the true scores of the scripts could be misleading in script selection. The number 5 is ranked higher than all but the number 1 simply because it was judged to be a lower number than 6.
Let’s say the next script to be selected was 2. If the information function was used alone, the probability of selection would be as following:
Script

1

5

2

6

3

4

0.5

0.5

N / A

N / A

0.0

0.0

Only comparisons with scripts 1 and 5 would be considered to offer any useful information!
As this is only the second pairing for this script, however, and we are expecting 20 pairings in total, with an acceleration parameter set at 1, the random element converts the probability distribution to the following:
Script

1

5

2

6

3

4

0.34

0.34

N / A

N / A

0.1676

0.1525

All scripts now have a reasonable probability of selection. Higher values of the acceleration parameter would also increase the importance of the random element and smooth out the probability distribution. A value of 2, for example, yields the following probability distribution:
Script

1

5

2

6

3

4

0.2573

0.2573

N / A

N / A

0.2436

0.2419

After 9 judgements the standards of script are estimated with more precision, yielding the following estimated true scores. While the rank order is now correct, the true scores of 1 and 2 are very close together.
Script

1

2

3

4

5

6

True Score

5.47

5.45

2.33

0.12

2.57

6.02

If script 1 is chosen as the next script to be paired, the probability distribution of selection would be as follows (assuming it has been compared twice already, and none of the other scripts are eliminated because they have already been compared to 1):
Script

1

2

3

4

5

6

N / A

0.5031

0.2816

0.1331

0.0615

0.0207

This time the obvious potential pairs for script 1 are script 2 and script 3.
Overall the progressive adaptive algorithm seems very promising for Comparative Judgement script selection, and early tests suggests it could potentially reduce the number of comparisons required by around a third. As we find out more we’ll post the results here.
In the meantime, try out the algorithm at www.nomoremarking.com, and use our open source package to test out different values of the acceleration parameter at https://github.com/NoMoreMarking/cj.
Barrada, J. R., Olea, J., Ponsoda, V., and Abad, F. J. (2008). Incorporating randomness to the Fisherinformation for improving item exposure control in CATS.British Journal of Mathematical andStatistical Psychology, 61, 493513. doi: 10.1348/000711007X230937
Barrada, J. R., Olea, J., Ponsoda, V., and Abad, F. J. (2010). A method for the comparison of itemselection rules in computerized adaptive testing. Applied Psychological Measurement,34, 438452.doi: 10.1177/0146621610370152
Pollitt, A. (2012) The method of Adaptive Comparative Judgement.
Assessment in Education: Principles, Policy & Practice Vol. 19, Iss. 3
Revuelta, J., and Ponsoda, V. (1998). A comparison of item exposure control methods in computerized adaptive testing.Journal of Educational Measurement, 35, 311327. doi: 10.1111/j.17453984.1998.tb00541.x
Advertisements