How highly should tests correlate?

I received a question this week from a data manager regarding a creative writing examination that they had administered at the school for their Year 8s and judged using nomoremarking.com. He was interested in the correlations between the results on the creative writing tests and the results on other test data they held for students: a test of general abilities they administer at the start of Year 7 (baseline); and Key Stage 2 (ks2) results.

Here are the correlations.

He seemed disappointed that the correlations for the creative writing test were so low. What he hadn’t considered was impact of measurement error in the tests.

If we want to know the relationship between the constructs the tests are attempting to measure, we need to compensate for measurement error. Spearman invented a simple technique called the disattenuated correlation to do just this.

Let’s assume that the reliabilities of these tests are 0.9 for Key Stage 2, 0.9 for the generic baseline test, and 0.75 for the writing test (we don’t all agree on writing so let’s allow a lower reliability!). If we look at the disattenuated correlations we see that the correlations are higher.

The correlations between the creative writing tests and KS2 are 0.64. We could interpret this to mean that the constructs are related but not identical. This seems reasonable, as KS2 doesn’t test creative writing. Interestingly, the correlation between the KS2 and the baseline test is 0.95. The school is learning very little from their baseline test as it is testing the same constructs as KS2. The best that could be said from the test is that they are reducing the measurement error through repeated testing.

So what do correlations tell us about the validity of tests? Very little. A high correlation suggests that your test is redundant. A low correlation tells you that you are testing something different, but you have no idea what. Paul Newton, the co-author of an excellent text on validity, puts it like this:

“The classic approach to validation involved correlating results from a new test against results from an already established one. In other words, results from the established test provided the ‘criterion’ against which to judge results from the new test. In theory, a high correlation coefficient provides strong evidence that the new test is measuring essentially the same thing as the established test, the criterion measure. In practice, because it is so hard to provide plausible criterion measures, low correlations are hard to interpret, and even high correlations do not necessarily mean that the right thing has been measured.”

Paul prefers a lifecyle approach to validation. If you want to know if a test is useful for your purpose, you need to interrogate:

“the specification of the proficiency which is supposed to be assessed; the process for producing the tasks that are used to assess the proficiency; the process for administering those tasks; the process for evaluating task performances; the process for reporting assessment results; the specification of how those results should be interpreted; and so on.”

So, do take a look at correlations between tests – you may find you are doing some unnecessary testing. If you get a low correlation, don’t be disappointed, you may be learning something new.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s