- Grade inflation is a growing problem that has gone under the radar for too long.
- Reasons for grade inflation may include pressure on teachers and “grade grubbing” from parents and students.
- One way grade inflation could be addressed is by reporting both absolute and relative scores on report cards and transcripts.
In the coming days, the new school year will begin at every academic institution in the country. And while there will be conflicts over mask mandates and vaccination recommendations, a growing problem that has received too little attention for far too long is grade inflation.
In 2017, Chris Weller of Business Insider reported that from 1998 to 2016 the percentage of high schoolers nationwide with an A average jumped from 38.9% to 47% despite the average SAT score falling from 1026 to 1002 over the same period. Also in 2017, Jon Marcus of The Atlantic highlighted that while grade inflation was prevalent overall during this time span, data from the College Board suggest that the biggest culprits, in general, were private schools, along with suburban public schools.
In addition to these findings at the high school level, GradeInflation.com reported a steady rise in college grades over the past 50 years, such that A's are now not only the most common grade given, since 2013 A's comprise the majority of grades on college campuses.
Some Causes of Grade Inflation
There are many reasons for grade inflation. At the high school level, the currency that matters most to parents is how many students from that school get into the nation's elite colleges. As we know from the college admissions scandal that involved Lori Loughlin and Felicity Huffman, ambitious parents are willing to pay whatever price necessary to ensure their kids get into America's elite colleges. This leads to subtle pressures on teachers—in both private and public high schools—to increase their students' prospects of being admitted to those schools, with flexible grading schemes being one of several tools used. In private schools, the connection is more obvious: the more students they can get into elite colleges, the greater the demand for their schools and the higher they can raise tuition. In public high schools, the connection is more indirect—mediated through property values in high achieving school districts and the passing of school budgets—but it exists there as well.
On a more personal level, high school teachers and college professors get worn down from the constant grade grubbing that parents and students put them through. On my caseload, I have a number of high school teachers and college professors, and to a person, they all complain about how exhausting it is to constantly explain to students (and parents) why they got the grade they deserved. At best, the process is tiring and time-consuming; at worst, these teachers worry about being targeted by students and parents for retribution, either through complaints to the school or violence. One teacher I work with said that she was terrified one day as she was followed to her car after school by a student who was angry about a grade. In most cases, for most of these teachers, "it's easier to just give them an A and move on."
Addressing the Problem
Before looking for a solution to the problem of grade inflation, we must first ask: what is the purpose of grading in the first place? Is it simply to determine the extent to which a student met a particular standard (e.g., can a student reliably solve high school algebra problems)? Or is it to determine which students are best in a particular subject?
Psychologists and researchers are continually debating these questions as we seek to determine the best way to evaluate human performance. Tests of absolute knowledge (aka, criterion-referenced tests) identify the extent to which a test taker met a specified criterion of knowledge or achieved a particular skill (Friedenberg, 1995). How many of the 100 algebra questions on the exam did Sarah get right? This inquiry reflects the absolute standard of testing. Alternately, assessments of relative knowledge (aka, norm-referenced tests) identify how students did on a given test, relative to the other students in their group (Friedenberg, 1995). How much better did Sarah do on the exam than Laura? This inquiry reflects the relative standard of testing.
Tests of absolute knowledge and relative knowledge have complementary sets of advantages and disadvantages, which is why on standardized tests, like the SATs, both absolute and relative scores are given. On the version of the SATs that most people are familiar with, test takers are given an absolute score out of 1600 (e.g., 1250) and also a relative score, usually a percentile (e.g., the 81st percentile—which signifies that an individual's score was higher than 81% of everyone taking that test).
Percentiles are one of many types of standard scores. Another type of standard score is the z-score. While in theory, z-scores extend to infinity in both positive and negative directions, in psychometric practice they generally range from -5 to +5, and for this reason, they can be easier to work with than percentiles. However, while z-scores might ultimately be a better tool for academics to use to address grade inflation, for the purposes of this post I will limit discussion of standard scores to percentiles (i.e., the percentage of scores the are below the given score) as they are more familiar to people.
At this point, the obvious question is: how would using standard scores, and particularly percentiles, adequately address the problem of grade inflation?
In the current sociopolitical climate—not just in America, but around the world—it would be too difficult to get teachers at all levels to commit to more stringent grading practices. Actually, even getting the majority of educators to agree on what those more stringent grading practices should be would be an impossible task. Furthermore, there are too many benefits for teachers, particularly those in challenging academic situations, for them to abandon flexible grading practices, particularly as they face the difficulties of another year with unpredictable COVID factors impeding their work. However, that doesn't mean that we should abandon our attempts to determine which individuals are truly outstanding.
There is a value in knowing which students have the greatest aptitude for certain subjects, as society works best when professional disciplines are comprised of those with the greatest aptitude for those subjects. I, for one, would hope that any surgeon I might have was among the best in her cohort in biology in high school and college—wouldn't you? On the flip side, there are problems with simply relying on evaluations of relative performance (i.e., norm-referenced grading), like percentiles. Some of these limitations include the fact that differences between school/class cohorts are not accounted for, and also that such systems often breed environments of toxic competition, leading to rampant cheating, and sometimes of students sabotaging each other.
The solution, as I see it, has been in front of us for decades: follow the example of the SAT exam (and other standardized tests) and report both absolute and relative scores on each report card and school transcript. For instance, at the end of each academic year, students should get a grade that reflects their absolute performance, as evaluated by their teacher using the teacher's individual grading system (e.g., a 92 in algebra), and they should also get a percentile for that grade (e.g., 88th percentile), which reflects the mean and standard deviation of all the students' grades for that class.
How does this address the problem?
The thing is, an individual's percentile score provides an indirect assessment of how difficult that specific class was (including how difficult the teacher graded the students) because each percentile is computed using the mean and standard deviation of that specific class. Hence, imagine you had two students taking the same subject but in different classes—Sarah, in Mr. Goodgrade's class, and Joshua, in Mr. Hardnose's class—and both students get the exact same grade: 92. However, in Mr. Goodgrade's class, the average test score was a 95 but in Mr. Hardnose's class it was a 75. Depending on the spread of scores in each class, Sarah may end up with a percentile of 45 but Joshua may end up with a percentile of 99, even though they both got the exact same final grade of 92. (For a more detailed explanation of how this works, I invite you to check out my book, Z-score: How a Statistic Used in Psychology Will Revolutionize Baseball.)
If schools were to switch to the SAT model and report both absolute and relative grades (like percentiles), teachers could still employ flexible grading practices as they see fit, even if doing so leads to grade inflation, because the percentiles will provide an indication to school administrators and college admissions committees just how much that A in algebra in Mr. Goodgrade's class was worth, and how difficult it was to achieve.
Moreover, adding percentiles to report cards and school transcripts might also lead high achieving students (and their parents) to request more stringent grading practices after they discover that in classes where almost everyone gets an A, their A might be worth less than their peers at other schools where very few A's are given in each class. In fact, at schools with strict grading, B's and C's might have higher percentiles than the A's given at schools where almost everyone gets an A. This dynamic would have the potential to neutralize the causes of flexible grading, thus fixing the problem of grade inflation.
Friedenberg, L. (1995). Psychological Testing: Design Analysis and Use, Allyn & Bacon, Needham Heights, MA