Judging Controlled Assessment

Tomas Needham, Head of English, Trinity School recently judged his English controlled assessment using nomoremarking.com. I asked him about his experience.

Why did you decide to use CJ?

Grading students’ work (summative assessment) is a problematic and fiendishly difficult enterprise. First of all, done properly-or as well as could be expected using our old methods-it used to be a laborious pursuit that took up an enormous amount of time. Like many other schools, our summative process involved several stages. Firstly, teachers used to grade their own class’ submissions. In the case of an essay based subject like English, this may mean marking 30 scripts of several A4 sides of writing, a process that could take several hours. Following the isolated marking, teachers would then meet to ‘moderate’ their grades. In my experience, a moderation meeting involved several teachers looking over a piece of work and arguing the toss about whether it was a 6a or a 7c. More often than not, the final grade was reached based on nothing more than the quality of rhetoric espoused by the most articulate, the most passionate or usually the most senior staff member in attendance. As well as the obvious lack of rigour and objectivity, the moderation stage took another hour or so, and within that short time, it was only possible to moderate a small sample of the entire year group, meaning that most scripts went completely unmoderated. So in total, a department of six would have spent 13 hours on summative assessment. And this is for just one year group! 

We ran an initial NoMoreMarking pilot with 120 year 7 scripts and each teacher took 30-40 minutes to complete their judgments. As there was no need for moderation, the comparative judgment programme performing this role as part of the process, the entire time spent by everyone was 2 hours!

The second issue with grading revolves around the mark schemes themselves, whether these be for GCSES, National Curriculum levels or any other equivalent descriptors. The criteria are ill-defined and vague, often containing highly subjective adverbs like ‘fluent’, ‘sophisticated’ and ‘consistent’. What does ‘fluent’ actually mean? Everyone has a different conception of the meaning of these terms, the result being that our judgments are inherently unreliable as a result. As well as terminology that is open to huge differences in interpretation, they are also highly prescriptive. Capturing and describing all of the possible skills, traits, competencies and elements of extended writing within a succinct mark scheme would be impossible. Mark schemes end up penalising students who, despite not hitting the prescribed criteria, display excellence in an unconventional way. Equally, they can cause students-that is if they are using them to guide the content of their writing-to churn out tick-box, identikit essays that slavishly and formulaically follow the criteria. 

NoMoreMarking allows teachers to use the vast bank of tacit expertise in their own heads to judge scripts, dispensing with the need for prescriptive and unwieldy criteria.

What were the practicalities involved? 

Having run an initial pilot for year 7, I was keen to use NoMoreMarking to moderate our GCSE controlled assessments in order to save time and increase the reliability of our grading. We have 110 students in year 11 and each has completed 4 pieces of controlled assessment for their GCSE English Language. I set up 3 different comparative sessions, one for each piece (Spoken Language study, Of Mice and Men essay and Creative writing-this involved 2 pieces per student). As well as uploading our current cohort’s controlled assessment, I also uploaded between 8 and 15 scripts from last year for each of the three comparative sessions. These scripts had grades that had been externally validated by AQA. 

First of all, we needed to scan in all the scripts, ensuring that no numerical grades were visible-this was to ensure that there were no unwanted psychological anchors that may skew our judgments. We have several large photocopiers at school that can ‘batch-scan’ lots of pieces of paper at once by feeding them into the tray at the top. However, there was one major problem. Feeding 30 scripts into the copier meant that it saved them all as one 30 page pdf document rather than 30 separate scripts. Because we needed to upload each candidate’s script to NoMoreMarking as a single pdf, this obstacle seemed to prevent us from continuing. Luckily, our IT team found a program called ‘A-Pdf Scan and Split’ which could split pdf documents into smaller ones. All we had to do was put a blank piece of paper in between each candidate’s work and the program recognised that this was to be the start of a new file. We then opened the pdf using the splitter program and it split it into individual candidate pdfs. Although I foolishly agreed to upload all the scripts (which on my own took quite a while), if class teachers had done this themselves, it would have taken about 15 to 30 mins each. As we now know the limitations and capabilities of our photocopiers and splitting programme much better, this process should run a lot smoother next time!

Once the scripts have been scanned and split, we uploaded them to NoMoreMarking. The site allows you to upload multiple scripts at once and it takes no time to upload hundreds of pdfs. When they are on the site, you need to set the parameters of the judging. My understanding of statistics is minimal and standard deviation makes my head hurt but by following the default settings described on the user guide, it was easy to set things up. It asks you to choose a question that the judges will need to use when comparing. I chose ‘Which is the better essay’ for their Of Mice and Men response and ‘Who is the better writer’ for their descriptive pieces. It then asks you to input the email addresses of your ‘judges’, these being the people who will be comparing the scripts-in my case, the other English teachers in my department. The program then sends all the judges a link which, when they click on it, opens their comparative judging session. 

When they open the session, the screen shows two scripts in portrait split screen. Above each is a button and the judge needs to press the button above the better response. It is that simple. Chris has told me that we should be making instinctive judgments and should be aiming to decide within 30 seconds or so per script. Almost all of our judgments were made in under 30 seconds.

How did your teaching staff react?

Although my staff, and me included for that matter, still don’t really understand how it all works-English teachers are not famed for their statistical literacy, the reaction was mostly positive. The main attraction is the amount of time that it saves when compared with the old method of individual marking and group moderation. Some teachers were initially concerned that it meant the end of formative comments. It is, however, a replacement for summative assessment, not formative assessment. 

How successful did you feel it was?

Despite the initial scanning issues, problems that we think we have now rectified, we feel that it has been a pretty positive experience. As a Head of Department, it allows me to get a broad overview of an entire year group, above and beyond the students that I actually teach. This is because during the judging, each judge will see scripts from across the year. It also provides us with a growing digital bank of model answers which could be critiqued in class. Also, assuming that a whole year group is doing the same assessment, it creates a rank order for all the students, information that is useful when considering whether setting is correct.  We are planning to use this system for all extended writing in KS3 and KS4. Like lots of other teachers across the country, we are waiting for AQA, our chosen exam board, to release example scripts that exemplify the new GCSE grades. As a school we have moved on from NC levels and will be grading all students using GCSE grades from September. When these exemplar materials are released, we can use them as anchor scripts in future assessments, ensuring that the levels that we give to scripts are based on as objective a measure as possible. 


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s