Interplay Between Implicit Bias and Sycophancy in LLMs
Implications for Fairness in Educational Decisions
Research final project for MIT seminar course 6.S986 “Large Language Models and Beyond” in Spring 2024 with professor Yoon Kim and collaborators Isabella Pu and Shrestha Mohanty.
Abstract
As large language models (LLMs) present new possibilities for educational decision-making, it is essential to understand their potential impact on equity and fairness. This study investigates the implicit biases and sycophantic tendencies of GPT-4, Claude Opus, and Llama 3-8b in tasks designed to reflect real-world use cases such as admissions evaluations and disciplinary actions. Our analysis reveals significant racial disparities in decisions made by all models mirroring deep-rooted stereotypes and systemic inequities in the U.S. education system. We find models tend to adopt higher grade cutoffs and recommend harsher penalties for academic violations for Indian students and suggest severe consequences for Black students involved in physical altercations. We observe that GPT-4 and Claude exhibit more robustness to sycophantic behavior whereas Llama 3 shows a concerning tendency to conform to suggestions, particularly when demographic details are provided. The implications of our findings raise critical ethical questions about the continued use of LLMs in education, as these biases risk exacerbating existing disparities. Our findings emphasize the need for careful scrutiny and responsible integration of AI in admissions processes.