What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching

Assistant instructors who were perceived as female received lower ratings from students than instructors perceived as male, regardless of their actual gender and teaching ability.

Introduction

In higher education, student evaluations of teaching are a major factor influencing the career advancement of faculty. Typically, these evaluations are conducted at the end of the course and allow students to rate the faculty’s performance on a variety of dimensions, including teaching ability, professionalism and accessibility. However, student expectations for faculty tend to differ by gender. For example, students may expect male instructors to be objective, while female instructors are expected to be highly interpersonal and warm. Therefore, instructors who violate these norms may receive lower ratings due to these gendered expectations. Determining whether students’ gender bias impacts ratings has been challenging because the evaluations are typically subjective. Accordingly, if a teacher who identifies as male receives a higher rating than a teacher who identifies as female, that disparity could reflect either gender bias or genuine differences in teaching style and ability. In this study, the authors overcame this obstacle by evaluating differences in ratings of assistant instructors for an online course, where the students interacted with the instructors only through an online discussion board. This experiment allowed the authors to disguise the gender of the instructors and control for differences in teaching style and ability. Using this method, the authors investigated whether differences in ratings were the result of students’ gender bias

Findings

Assistant instructors led one online discussion group using their own name, and another group using the name of an assistant instructor of the opposite gender. Assistant instructors who used a female name and identity received significantly lower ratings than assistant instructors who used a male name and identity, regardless of their actual gender or teaching ability. 

  • There were no statistically significant differences in any individual or total student ratings of the assistant instructors by their actual gender (mean rating 3.92 for males vs. 4.07 for females on a 5-point scale).
  • There were differences across individual student ratings of assistant instructors by their perceived gender. When assistant instructors used a female identity, they received lower average ratings on all 12 questions than when they used a male identity, and 6 ratings (professionalism, promptness, fairness, respectfulness, enthusiasm, giving praise) were significantly different (0.57 to 0.80 points lower on a 5-point scale).
  • There were differences across the total student ratings of assistant instructors by their perceived gender. When assistant instructors used a female identity, they received lower total ratings than when they used a male identity (mean rating 4.24 for males vs. 3.70 for females on a five-point scale)

In sum, students evaluated instructors perceived as female more harshly than instructors perceived as male, demonstrating the existence of gender bias. The authors theorize that students expected instructors perceived as female to exhibit strong interpersonal traits, such as respectfulness and enthusiasm, and punished them for perceived failures to exhibit these traits.  Conversely, instructors perceived as male were rewarded for going above-and-beyond when they exhibited these traits. In light of these findings, the authors call for reevaluation of the practice of using student evaluations in assessing the quality of instruction in higher education.

Methodology

Seventy-two college-age students in an online introductory anthropology and sociology course at a four-year public university in North Carolina were divided into six discussion groups of eight to 12 students for the duration of the course. The course professor, a male assistant instructor, and a female assistant instructor each led two groups. Students interacted with their discussion group leader only through an online discussion board. Each assistant instructor taught one of the two assigned groups under the identity of the other assistant instructor.  All instructors took steps to ensure consistency across discussion groups, including returning assignments at the same time and coordinating grading.  At the conclusion of the course, students filled out evaluations of the assistant instructors, which asked them to rate the instructors on twelve traits using a five-point scale. The analysis was limited to the 43 students in the four discussion groups led the assistant instructors. The researchers compared the responses to each question, as well as a total student ratings index, by the actual and perceived gender of the instructor.

Related GAP Studies