Students in one of the largest computer science courses at the University of California, Berkeley, have spoken out about problems with the automated system used to grade their work.
The class prompting complaints, CS 61B, or Data Structures, relies on an autograder to evaluate hundreds of students’ coding skills and assign them grades.
Autograders are widely used in computer science and engineering programs, particularly in large introductory-level courses such as the one at Berkeley, which has more than 800 students.
Autograders test students' computer programs, identify errors in their work and assign a score. Universities commonly design their own autograders in-house.
Berkeley's system, designed by the computer science faculty, typically works without incident, but the students ran into technical obstacles this semester. The problems were first reported in the university's student newspaper, The Daily Californian.
One sophomore student in CS 61B, who asked to remain anonymous because she is still taking the class, said there were autograder difficulties with each of the three projects she and her classmates have submitted so far this semester.
The autograder stopped running for a "very frustrating" few hours on the evening of the deadline of the first project, preventing students from seeing whether or not their code worked, she said.
“Many students complained on Piazza -- our online interface with the staff -- and we were simply told that it would be up again as soon as possible.”
With the second project, the autograder misrepresented students' results and had to be modified -- resulting in some students “getting fewer points than they had on previous submissions,” the student said. Students were given a 24-hour deadline extension to make up for this issue. “That extension has been the only concession that the staff has made to compensate for their autograder’s problems,” she said.
On the third project, rather than being able to submit their work multiple times before the deadline to see if they achieved their desired grade, students were told the autograder would run only once, 24 hours before the deadline. But with everyone submitting their work at the same time, there were “multihour delays between submission and result,” the student said.
“It seems odd that the autograder for a required course -- that routinely has over a thousand students in it -- doesn’t have the capacity to handle peak submission times, such as in the 24 hours before and after the deadline,” the student said.
The student said although her grade might be “slightly negatively impacted” by the autograder issues, the main impact was “a significant increase in my stress levels.”
“It’s one thing to be stressed out because of a big project with an impending deadline; it’s another entirely to be worried that I might not even know whether my code passes the staff’s hidden tests before the deadline,” she said.
James Demmel, chair of the electrical engineering and computer sciences department at UC Berkeley, said in an email that the technical glitches that occurred with the autograder this semester “are due to some new projects that were introduced into the course, rather than symptoms of scale.”
It is “quite uncommon” for anything to go wrong with the autograders used at UC Berkeley, said Demmel.
“In our largest courses, autograders and other pieces of infrastructure typically run smoothly. In fact, as the courses have grown, the technology infrastructure has generally improved because more instructor and staff time is available for larger courses," he said.
“In general, we have not observed that student feedback about our courses has decreased as the course sizes have grown to meet the increasing demand for computer science at UC Berkeley,” he said. “On the contrary, ratings for teaching effectiveness have reached their highest level ever in recent semesters for our largest courses -- CS 61A and CS 61B -- even though these courses have increased in size by more than a factor of three in the last seven years.”
The student who did not want to be named, and another classmate who also asked to remain anonymous, said they are unhappy with the way their professor, Paul Hilfinger, has handled the problems with the grading system.
"It doesn't seem that Professor Hilfinger is particularly concerned about the student experience," the first student said. “He seems unwilling to accept responsibility,” she said.
Hilfinger confirmed there was an error with the autograder earlier in the semester that meant students’ work had to be rerun through the system, resulting in some students getting lower marks. He also acknowledged that large numbers of students submitting work at the same time caused some students to receive their results back slower than usual.
Hilfinger said part of the problem is that some students submitted work “many, many times, somewhat pointlessly” before getting any results back -- causing a backlog. “I’m not sure why they are doing that, but they do,” he said.
Asked if he would consider staggering deadlines to alleviate the backlog, Hilfinger said he felt this would be unfair because it would give some students a time advantage.
“I think what we’ll probably do at some point is move into the cloud -- use some scalable service that would allow us to scale up the processing as the frequency of submissions increases,” he said.
Tushar Soni, co-founder of free computer science autograding tool AutoGradr, said autograding systems should be built with the expectation of handling large numbers of submissions at the same time. He agreed with Hilfinger that staggering deadlines would not be the right solution as it would give some students more time than others.
At the current class size, it would be “unfeasible” to assess students’ projects without an autograder, said Hilfinger. He said there are downsides even when the system is functioning at full capacity -- an autograder can tell you whether or not a program works, but not measure how creative it is.
Mark Guzdial, professor of electrical engineering and computer science at the University of Michigan, said in an email that while not using an autograder does take more time and require more teaching assistants to help with grading, it results in better feedback for students.
“For the things that I teach, the subjectivity of a human being is better than the objectivity of an autograder,” said Guzdial.