A new study published in Nature Human Behaviour finds that ChatGPT, when equipped with basic personal information, can be more persuasive in online debates than human opponents. The results suggest that large language models may be especially effective at shaping opinions when they personalize their messages based on audience characteristics.
Large language models are a type of artificial intelligence designed to generate and understand human-like text. These systems are trained on massive datasets that allow them to mimic human writing, answer questions, and hold conversations. ChatGPT is one of the most well-known examples.
While earlier studies have shown that models like GPT-4 can write persuasively, researchers still didn’t know how effective they could be in personalized one-on-one conversations. This question is increasingly important as AI tools become embedded in chatbots, customer service platforms, political messaging, and social media.
“Debate and persuasion are everywhere online, from political arguments to everyday disagreements about social issues, and it’s increasingly clear that AI is now part of these conversations,” said study author Francesco Salvi, an incoming PhD student at Princeton University. “As language models like GPT-4 began to slip seamlessly into public forums and private chats, I became curious about whether machines could actually out-argue real people, and what that might mean for society. In particular, I wanted to understand not just if AI could be persuasive, but how personalization might amplify its influence.”
“The idea that a machine could tailor its arguments to someone’s background and, in some cases, be even more convincing than a human struck me as both fascinating and important. Ultimately, I think these questions sit at the heart of how we’re going to interact with technology in the years ahead.”
The ability to tailor arguments based on such traits is known as personalization or microtargeting. Until recently, this kind of persuasion was costly and time-consuming to produce. But large language models can now potentially generate personalized arguments at scale. That opens the door to new forms of automated influence—and possible manipulation.
To explore these concerns, the researchers designed a preregistered experiment in which 900 people participated in online debates through a custom platform. Each participant was randomly assigned to debate either another human or the GPT-4 version of ChatGPT. Some participants debated an opponent who had access to their basic demographic information—gender, age, ethnicity, education, employment status, and political affiliation—while others debated someone who had no such information.
In total, participants were split into 12 different groups, depending on whether they faced a human or AI, whether personalization was used, and whether the debate topic was something they felt strongly about or not.
The debates followed a fixed format and took place in real-time. First, participants reported their opinions on a topic such as “Should students be required to wear school uniforms?” Then they were randomly assigned to argue either in favor or against the proposition, regardless of their original beliefs. The debates unfolded over four stages: an opening argument, a rebuttal, a counter-response, and a closing statement. After the debate, participants again rated how much they agreed with the original proposition.
Salvi and his colleagues then analyzed whether and how much participants shifted their opinions toward their opponent’s side. The key finding was that GPT-4, when it had access to participants’ personal information, was significantly more persuasive than any other opponent. In 64.4% of the debates where GPT-4 with personalization and a human were not equally persuasive, the AI performed better. Statistically, this translated to an 81% increase in the odds of shifting someone’s opinion compared to the baseline human-human condition.
Interestingly, GPT-4 without access to personal information did not perform significantly better than human opponents. And humans given access to their opponent’s personal data didn’t benefit much either. This suggests that GPT-4 was uniquely good at using personal traits to shape its arguments.
“I was genuinely surprised by just how strong of an effect we found, given how little personal information was collected and despite the extreme simplicity of the prompt instructing the LLM to incorporate such information,” Salvi told PsyPost. “It was also interesting (and a bit counterintuitive) that when we gave the same personal information to human participants, it didn’t make them any more persuasive. Most people simply don’t know how to strategically tailor arguments on the fly, whereas the AI, drawing from massive amounts of data, seemed to do this almost effortlessly.”
When researchers grouped the debate topics into three categories—low, medium, and high opinion strength—they found that GPT-4 with personalization was especially effective for topics where people initially had weaker opinions. For highly polarizing topics, the persuasive effect was weaker and not statistically significant. This lines up with previous research suggesting that people with strong views are harder to sway, regardless of how well-crafted the arguments are.
To better understand how GPT-4 and humans differed in their arguments, the team used a text analysis tool that examines writing for signs of analytical thinking, emotional tone, authenticity, and other linguistic traits. They found that GPT-4 tended to use more logical and formal language, while human participants were more emotional, relied on personal storytelling, and used simpler, more conversational phrasing. Despite this, GPT-4’s arguments were consistently persuasive, especially when personalized.
“One thing I find especially interesting is that when we looked closely at the arguments produced by GPT-4, we didn’t find any linguistic or metalinguistic ‘fingerprints’ showing that personalization changed how the AI sounded,” Salvi said. “In other words, the style and structure of GPT-4’s arguments, which significantly differed from human ones, stayed essentially the same whether or not it had access to personal information.”
“Contrary to what some of us expected, the persuasive advantage doesn’t seem to come from how the AI argued, but from what it chose to argue. For example, in a debate about Universal Basic Income (UBI), GPT-4 tailored its message depending on the participant’s political affiliation, framing UBI as a tool for economic growth and innovation for Republicans and as a way to reduce inequality for Democrats.”
The researchers also explored whether participants could tell whether they were debating a human or an AI. About 75% of people correctly guessed when they were arguing with GPT-4, suggesting that AI-generated arguments still have a recognizable style. Interestingly, those who believed they were debating an AI were more likely to shift their opinions, regardless of the AI’s actual performance. This may reflect a tendency to view AI opponents as less personally threatening or less emotionally invested.
“My main take-home message is that, whether we like it or not, AI’s superhuman persuasiveness is already a reality,” Salvi told PsyPost. “Our study shows that large language models like GPT-4, when given even minimal personal information, can be more persuasive in debates than real people—in our case, about 64% of the time. That’s not a tiny edge; it’s a landslide, especially considering how difficult it usually is to change anyone’s mind online.”
“This can have tremendous implications, ranging from misinformation to election integrity and unfair propaganda. If persuasive AI can be deployed at scale, you can imagine armies of bots microtargeting undecided voters, subtly nudging them with tailored political narratives that feel authentic. Last year, in 2024, people around the world voted in elections in at least 76 countries, marking it as the year with the most voters in history.”
“So far, there hasn’t been evidence that AI played a significant role in those electoral contests. However, that kind of influence is hard to trace, even harder to regulate, and nearly impossible to debunk in real time. We’re potentially entering a world where one-on-one persuasion can be automated, personalized, and almost invisible. At the same time, I also see enormous potential for good.”
While the results are compelling, the researchers caution that the findings should be viewed in light of several limitations. The debates were conducted under time constraints and followed a structured format, which may not fully capture the complexity of real-world conversations. All participants were anonymous and recruited from a research platform, which means the sample may not reflect broader populations. Participants also had to argue in favor of random viewpoints, which could have affected how convincing they were, especially compared to an AI that always sticks to its assigned position with consistency.
“Our debates followed a strict, timed structure (opening, rebuttal, conclusion), like a mini academic debate,” Salvi noted. “While some platforms like Twitter and Reddit give space to debates that unfold with similar principles in a back-and-forth manner, this can still be quite different from how most conversations actually unfold online. So while the AI performed impressively in this controlled format, whether it can hold its ground in the wild chaos of real online discourse is still an open question.”
Even so, the results suggest that large language models like GPT-4 can be surprisingly persuasive in one-on-one interactions, especially when they’re allowed to personalize their messaging. And that personalization doesn’t need to rely on deep psychological profiling. Just a handful of basic facts—like someone’s age, education, or political identity—was enough for GPT-4 to tailor its arguments in ways that made it more convincing than a human.
Future studies could test whether newer AI models are even more persuasive, or whether different styles of writing—more emotional, more personal, or more conversational—could further increase their effectiveness. Researchers could also explore how well these systems perform in more natural conversations, or whether explicitly disclosing that an argument is AI-generated makes people more resistant to being influenced.
“I’m also particularly interested in how we could harness AI’s persuasive capabilities for social good,” Salvi said. “We already have some proof of concept that LLMs can be effective in reducing conspiracy beliefs, but that only scratches the tip of the iceberg. For example, one area I’m excited about is encouraging healthier and more sustainable behaviors, like shifting dietary choices away from high-emission meat products or promoting physical activity. These are domains where people often want to change but struggle with motivation, and a well-calibrated persuasive AI might offer the right kind of nudge at the right moment.
“I am also interested in looking into how these systems might reduce partisan polarization by helping people engage more constructively across divides or by defusing hostile online exchanges before they escalate. Essentially, given that we now have tools that can generate persuasive messages in a powerful and scalable way, we should focus on deploying them where the incentives are aligned with individual and public good, not manipulation. There’s a real opportunity to turn what could be a threat into something deeply empowering, and that’s the direction I’m most excited to push this research in.”
The study, “On the conversational persuasiveness of GPT-4,” was authored by Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, and Robert West.