This article appeared in the Spring term 2001 edition of Forum, the termly publication for all academic and research staff at Warwick.
Mr Ashley Ward, Graduate Teaching Assistant and Dr Abhir Bhalerao, Lecturer
1998/99 was a good year for the 'niceday' stationers, but a bad year for demonstrators in the Computer Science department. This was the year that we redesigned the lecture support for 'Design of Information Structures' (a first year, second term module), replacing poorly attended seminars with more practical lab sessions. The new lab sessions were based around programming exercises guided by web-based worksheets, but the part that made the 'niceday' sales jump and that gave our demonstrators a headache was the test that came at the end of each session. Due to the number of students and lab sessions, a total of 800 small exam-like paper scripts were required to be marked within just a few weeks, presenting us with a major problem. Many cups of coffee later, the four unlucky demonstrators managed the job and we published the lab test results. Although the labs in general were a great success, the testing process satisfied noone: typically several weeks after their test, students would receive simply an integer grade out of 3, which was hardly the intended formative feedback. The demonstrators found the large amount of marking a repetitive, unrewarding task. We resolved to think again.
Our solution is simple in concept: students mark each other's test scripts, with script distribution and anonymity provided by our database driven web-based system. The process goes something like this. In each lab session, students first spend one and a half hours working through the exercises with help from postgraduate demonstrators. The last 30 minutes of the session is taken up with the web-based test where their answers are recorded in a database, and during which time exam conditions are enforced. After the test the learners become assessors, marking three of their peers' scripts before their next session. As soon as a script has been marked, the original author can view their feedback, in the form of gradings to various criteria and an optional free-text comment.
The initial system took just two weeks to build starting from the initial concept and ending with something useable which we improved and extended during its use in the early months of 2000. The technology employed is a web server, a scripting language (used to program the dynamic pages) and a database. We used Apache, PHP and mysql respectively, which are "free" open-source products.
A peer-assessment system immediately raises questions about the reliability of the marking: surely this is "the blind leading the blind"? We have attempted to address these concerns in several ways. Firstly, the marking interface displays some guidelines for each question, pointing out potential aspects of a good answer. Secondly, each script is assessed by three different students. The variation in marking is calculated and any large disagreement causes the script to be automatically highlighted for moderation by a member of staff. A learner can also request moderation should they feel it is required. Thirdly, the distribution of the scripts can be controlled such that each script is marked by a good, intermediate and poor assessor, as determined by results from the automatically marked multiple choice questions or other available information about the learners.
Aside from these validity concerns, we also attempted to communicate some other benefits to the students. Our process design attempts to ensure that timely, specific, discursive feedback is received in triplicate! The ability to read code whilst marking is an important skill in industry, where systems are rarely built from scratch. Marking can also be portrayed as educationally useful as it requires active evaluation, and can encourage reflection on one's own answers. Finally, a handy side-effect of the design is that staff are freed from the onerous task of marking hundreds of scripts, and can re-concentrate their efforts on teaching and a smaller amount of moderation. All of this can be used to answer the student who politely asked us via email to "please get someone with intelligence to mark the tests" (a request which is actually a little self-deprecating when we consider this learner in the assessing role).
In general, we are encouraged by the response to our experiment. 90% of respondents stated that they "reconsidered their own answers whilst marking" in our module questionnaire, which is certainly an improvement from never revisiting the test at all. The speedy feedback we were aiming for was not achieved in 1999/2000 as the feedback interface took time to implement, but as you read this, we will be using the system for a second time and hope to improve in this respect. The largest remaining problem appears to be that of convincing the students that this is formative, not summative feedback - an issue confused by our awarding of 10% of module credit for the lab tests to encourage active attendance. Although computers cannot make effective judgements of scripts, they can certainly simplify the document management problem, and it would be highly impractical to use peer assessment with our 200 students in this way without some electronic assistance.
Ashley Ward, Abhir Bhalerao - Computer Science