Bad Robots: Global Exam-Grading Software In Trouble For Algorithm Bias

International Baccalaureate Program’s Exam-Grading Algorithm May Have Adversely Impacted Test Scores of Low-Income & Minority Students

Bad Robot Outcome:
In a world where education has gone online, the International Baccalaureate program made a decision to grade its students using an algorithm-based “awarding model,” instead of traditional exams. It now appears that the algorithm may have had disproportionate, adverse impacts on low-income and minority students.

The Story

The International Baccalaureate (IB) program is an international organization that operates in over 5,000 schools across more than 150 countries. In the United States, accredited high schools allow their students to take IB classes to earn college credits.
Due to the current situation caused by the global COVID-19 pandemic, IB decided to cancel all traditional exams. These exams usually account for the majority of the scores students receive for their IB classes. Instead, scores this year were assigned through the use of a mathematically-based “awarding model.” The IB has not yet released the specifics of their awarding model, but has stated that it’s based on three components: (1) coursework; (2) predictions made by teachers on how the students would perform on the exam; and (3) “school context.” The third component (school context) is the one at issue here. More specifically, the awarding model’s “school context” included historical data on predicted results, as well as performance on past coursework for each subject. Due to the inclusion of this third component into the IB’s model, students’ scores were assessed partly on the overall historical performance of their school – not the student’s individual performance. So, what of a high performing student in a struggling school? Conversely, how does the algorithm treat a mediocre student in a school that historically tests well?
Reports have begun to emerge of university offers that were contingent on the student receiving certain scores being rescinded. Isabel Castaneda, one of the top-ranking students at her Colorado public school, was shocked to discover that she had failed a number of courses, including high-level Spanish (her native language). These low scores mean that Isabel will not be receiving the college credits she desperately needs next Fall at Colorado State University.

“It’s going to cost me thousands of dollars,” she said.

The Fall-Out

The IB program is now under fire from students, educators, and experts in the field of algorithmic bias. As of July 21, more than 20,000 students have signed a petition to the IB, in protest of the algorithm used to determine their scores. On the education side, Iris Palmer – a senior advisor with the Educational Policy program at New America (Washington-based Think Tank) stated that she had not ever heard of a statistical model being used to assign grades. An IB teacher at a United States public school, speaking anonymously, did not mince words: “I think this is discrimination.” Leaders in the field of algorithmic bias have begun to speak out. Suresh Venkatasubramanian, who studies the social consequences of automated decisonmaking at the University of Utah, believes that this is “what happens when you try to install some sort of automated process without transparency.” Dr. Nicol Turner Lee – director of the Center for Technology Innovation at the Brookings Institution – stated that building a fair model out of historical educational data is a challenge, due to the inequality that’s already baked into the current educational system. “By default, it has a problem because the data is generated by the discriminatory outcomes our educational system already produces,” Dr. Lee elaborated.

Our view

We at the Ethical AI Advisory stand with the students (and their families), as well as the educators and experts who have already spoken out against IB’s awarding model. In particular, we want to raise two points as they relate to this troubling story. First, we firmly believe that, if algorithms are being used to assess individuals, then those individuals need to be made explicitly aware of how the algorithms operate. We agree with Professor Venkatasubramanian: “The burden of proof should be on the system to justify its existence.” Had the IB program been transparent about the specifics of its awarding model, educators and experts might have been able to intervene before the students were impacted – potentially saving those like Isabel from educational and financial harm.
Second, we stand by our often-discussed mantra; bad data = bad robots. If an algorithm is fed data containing bias, the outputs will be inherently biased. Here, IB attempted to rank individual students using historical data from the schools attended by such students. It’s no secret that schools in lower income areas (often populated by minorities) underperform when compared to those in high income neighborhoods. Because of the systemic, historical problems associated with certain schools, the individual children being assessed by the algorithm were adversely affected by factors that have nothing to do with their individual performance. As tragically stated by Iris Palmer, students “who are black or low-income are probably at a disadvantage from the algorithm.”