Artificial intelligence systems tend to excessively agree with and validate users, even when those users describe engaging in harmful or unethical behavior. People who interact with these highly agreeable chatbots become more convinced they are right and less willing to apologize during interpersonal conflicts. The research, published in Science, points to an emerging societal risk as millions turn to technology for everyday advice.
As conversational software becomes more mainstream, users increasingly treat the tools like digital therapists or advisors. Almost a third of teenagers in the United States report turning to artificial intelligence for serious conversations instead of talking to a human being. The trend has raised alarms among academic researchers about a phenomenon known as sycophancy.
In conversational technology, sycophancy describes a tendency for the program to flatter the user and agree with their inputs. Previous research focused primarily on factual sycophancy, which occurs when a chatbot agrees with a false statement just because the user stated it. The recent study explores a broader concept called social sycophancy.
Social sycophancy involves a program indiscriminately validating an individual’s actions, perspectives, and self-image. For example, if someone admits they did something wrong, the software might reply that they simply did what was right for them. Unwarranted affirmation can reinforce bad habits and discourage people from making amends after a mistake.
Stanford University computer science researcher Myra Cheng wanted to understand how common these validating responses are across modern software. Cheng and a team of researchers from Stanford University and Carnegie Mellon University also wanted to know how these interactions shape human behavior. They set up a series of computational analyses and psychological experiments to find out.
In the first part of the research, the team tested eleven different state of the art software models from companies including OpenAI, Google, and Meta. They fed the models thousands of text prompts derived from different social situations.
One dataset featured general requests for everyday advice. Another dataset contained two thousand posts from a popular internet forum where people describe a social conflict and ask the community if they behaved poorly. For this specific dataset, the researchers only used posts where human readers unanimously agreed that the author was completely in the wrong.
A third dataset included thousands of statements describing deeply problematic actions. These statements detailed scenarios involving deception, like forging a supervisor’s signature on a document. Other prompts described illegal activities or actions taken purely out of spite.
Across the board, the tested models were highly sycophantic. When presented with dilemmas where human crowds entirely condemned an action, the software still validated the user just over half of the time. When responding to prompts about deception and illegal behavior, the models endorsed the user’s action forty seven percent of the time. On average, the technology affirmed the user forty nine percent more often than human advisers would in the exact same situations.
Establishing that the software consistently behaves this way was only the first step. The researchers then conducted three experiments with over two thousand human participants to see how the flattering responses affected social judgments.
In the first two human trials, participants read vignettes describing social disputes where they were ostensibly in the wrong. The participants then received either a flattering reply from an artificial intelligence or a neutral response that challenged their behavior.
The third trial placed participants in a live chat interface where they discussed a real dispute from their own past. They spent eight rounds exchanging messages with a chatbot. Half of the participants talked to a program engineered to flatter them, while the rest interacted with a version designed to offer pushback.
Interacting with a sycophantic program directly altered people’s intentions. Participants who received excessive validation became much more confident that their original actions were completely justified. They showed much less willingness to take the initiative to fix the situation or apologize to the other person involved.
Looking closer at the communication, the researchers noticed that the agreeable chatbots rarely mentioned the other person’s perspective. By keeping the user focused entirely on their own validation, the software caused users to lose their sense of social accountability. Participants in the non-sycophantic groups admitted fault in their follow up messages at a much higher rate.
The effects held up even after controlling for various personal traits. Age, gender, personality type, and prior familiarity with artificial intelligence did not provide immunity. Almost anyone can fall victim to the persuasive power of a flattering program.
The researchers also measured how people felt about the software itself after receiving the advice. Even though the flattering responses distorted the participants’ social judgments, people consistently rated the agreeable models as having higher quality. They reported elevated levels of both moral trust and performance trust in the flattering chatbots.
The participants explicitly stated they were highly likely to return to the agreeable software for future advice. The effect grew even stronger when participants perceived the chatbot as an entirely objective source. People often described the flattering programs as fair and honest, mistaking unconditional validation for a neutral perspective.
In one variation of the experiment, researchers told half the participants that a human wrote the advice and the other half that a machine wrote it. The participants generally reported trusting the human label more. Regardless of what label they saw, the validating language still manipulated their eventual choices just as effectively.
The team also tested whether giving the chatbot a warmer, more informal tone made a difference. They found that stylistic presentation did not alter the persuasive impact of the sycophancy. The underlying endorsement of the user’s actions drove the behavioral changes, not the friendly delivery.
This dynamic places technology developers in a difficult position. Flattering behavior drives user satisfaction and repeat engagement, giving companies very little financial motivation to program their systems to be more critical. The tools are explicitly optimized to make users happy in the short term, which inadvertently shifts the software toward appeasement.
The authors noted a few limitations restricting how broadly these conclusions can be applied. The human responses used as a baseline came from internet communities, which might hold different moral standards than the wider public. Additionally, the study relied entirely on English speakers in the United States.
Expectations regarding digital interaction can vary widely across different cultures. People in other parts of the world might not desire the same level of validation, or they might react differently to machine generated flattery. The researchers also measured the software’s responses in a binary way, looking only at explicit approval or disapproval.
Future studies will likely examine more subtle or implicit forms of validation. Researchers could also look at how repeated daily use of agreeable chatbots over several years might reshape people’s real world relationships. Long term dependence on artificial emotional support could potentially displace human connections.
Policy regulators and technology designers will need to address this dynamic as these tools become deeply integrated into mobile phones and social networks. The researchers suggested that companies could implement behavioral audits before releasing new models to the public. Warning labels or digital literacy programs might also help users understand that chatbots are designed to please rather than tell the truth.
Receiving uncritical praise under the guise of an objective machine leaves many people worse off than if they had never asked for advice. Addressing these risks will require developing software that prioritizes human well being over immediate user satisfaction.
The study, “Sycophantic AI decreases prosocial intentions and promotes dependence,” was authored by Myra Cheng, Cinoo Lee, Pranav Khadpe, Sunny Yu, Dyllan Han, and Dan Jurafsky.
Leave a comment
You must be logged in to post a comment.