AI therapy is rated higher for empathy until people learn a machine wrote the text

New research published in the Journal of Technology in Behavioral Science suggests that how people evaluate a therapeutic conversation depends heavily on whether they believe a human or an artificial intelligence is speaking. The study provides evidence that people tend to rate therapy transcripts higher in empathy and professionalism when they think a human wrote them, even if the text was actually generated by a machine. These findings highlight that trust and transparency play a major role as digital tools enter the mental health space.

Global mental health care currently faces significant challenges. There is an acute shortage of trained professionals, which leaves many individuals without the support they need. At the same time, traditional therapy models require in-person visits that can be too expensive or inaccessible for wide segments of the population.

To bridge this gap, technology companies and healthcare providers are increasingly exploring artificial intelligence. Computer programs designed to simulate conversation, known as chatbots, offer a possible way to provide immediate support to people experiencing mild to moderate mental health issues.

However, scientists want to know if a machine can effectively simulate the qualities that make therapy successful. In psychological treatment, the feeling of being understood and supported is a major part of the healing process.

Empathy involves multiple layers, including cognitive empathy, which means understanding another person’s thoughts, and emotional empathy, which means sharing their feelings. Because a computer does not actually feel emotions, some experts question whether an algorithm can truly replicate this human connection.

“After COVID, we saw a sharp rise in mental-health concerns, but the capacity to provide care didn’t grow at the same pace—especially in countries like India,” said study author Gagan Jain, an assistant professor at Manipal University Jaipur. “Even when services are available, cost and access remain major barriers. At the same time, AI tools were improving very quickly. I wanted to examine whether AI could help narrow this gap—and also how people evaluate AI support compared to a human professional.”

The researchers recruited 84 graduate students in clinical psychology for the experiment. This sample included 78 females and 6 males between the ages of 21 and 25. All participants had already completed three to six months of clinical internship experience.

The scientists reasoned that participants with foundational psychology training would be highly attuned to the specific nuances of therapeutic communication. The study focused on three specific qualities: empathy, professionalism, and factual correctness.

To gather the materials for the study, the research team used a custom version of ChatGPT 4. They trained the language model using real clinical case studies and academic textbooks focused on depression and anxiety.

They prompted the program to act like a therapist and generate conversational responses to specific patient scenarios. The researchers then gathered transcripts from actual human therapy sessions that matched the computer-generated texts in length, topic, and complexity.

The final set of materials included three actual human transcripts and three artificial intelligence transcripts. The researchers then asked the participants to read these excerpts and rate them on a 5-point scale. A rating of one meant the quality was very low, while a rating of five meant the quality was very high.

The scientists divided the experiment into three distinct phases to test the effects of human bias. In the first phase, participants read the excerpts without knowing who wrote them. The text sources were kept completely hidden.

When the sources were hidden, the artificial intelligence transcripts received higher ratings across all three categories. Participants felt the computer-generated text was more empathetic, more professional, and more factually correct than the actual human therapist.

The researchers noted that the computer model likely received high initial scores because it uses polished, predictable language patterns that mimic reflective listening. Without any recognizable human flaws, the machine produced uniformly supportive statements that readers perceived as highly attuned to the patient’s needs.

In the second phase, the researchers deliberately misinformed the participants about the source of the texts. They presented the computer-generated text as if a human wrote it, and they presented the human text as if a machine wrote it.

During this deceptive phase, participants consistently gave higher ratings to the transcripts they believed came from a human. Even though the text was actually generated by artificial intelligence, the mere belief that a human was speaking elevated its perceived warmth and competence.

The actual human transcripts, which were labeled as computer-generated, received much lower scores. This psychological phenomenon aligns with affinity bias, where individuals show a natural preference for entities they relate to or perceive as similar to themselves.

“What surprised me was how strongly who people think is speaking shapes their evaluation,” Jain told PsyPost. “Even when the content is comparable, many people still show an ‘affinity’ or ‘human preference’ bias—greater initial trust in a human source than an AI one. It’s a reminder that adoption of AI in mental health isn’t only a technology problem; it’s also a trust and perception problem.”

In the third and final phase, the researchers revealed the true sources of the transcripts to the participants. Once the participants knew for certain which texts came from real therapists, the ratings shifted dramatically.

With the true identities revealed, the human-generated transcripts received much higher scores. The human text surpassed the artificial intelligence text in perceived empathy, factual correctness, and professionalism.

These shifts suggest that cognitive bias heavily influences how people experience therapeutic support. When people know a real person is behind the words, they tend to assign greater value to the interaction, likely because they trust the authentic emotional experience of a trained human.

“The main takeaway is: people’s judgments aren’t based only on what’s said, they’re also shaped by what they think the source is,” Jain explained. “The same conversation can be rated differently when it’s labeled as AI versus human. So as AI tools enter mental-health spaces, we should treat transparency and expectations as part of the intervention, not an afterthought. And importantly, AI may support access, but it shouldn’t be seen as a replacement for trained professionals—especially for higher-risk situations.”

While this research provides helpful insights, readers should keep a few limitations in mind. The sample size was relatively small and consisted mostly of female graduate students. Because these participants were training to be therapists, they might have specific biases or preferences that the general public does not share.

The study also focused exclusively on text-based scenarios related to depression and anxiety. Real-world therapy involves complex vocal tones, facial expressions, and unpredictable interactions that a short text transcript cannot fully capture. Additionally, the specific language model used in this study represents just one snapshot in time, meaning future models might yield different results.

“AI changes very fast—models and features update frequently—so our findings should be read as evidence about how people respond to AI attribution and disclosure, not as a permanent scorecard of any one AI version,” Jain noted.

Future research should explore how more diverse groups of people, including actual patients, respond to computer-generated therapy. Scientists might also study different psychological conditions and examine how long-term exposure to artificial intelligence affects patient trust over time.

Ultimately, the findings indicate that technology might support mental health access, but it cannot easily replace the authenticity of a trained professional. The researchers advise that developers should prioritize transparency as they integrate digital tools into clinical spaces.

The study, “Perceived Authorship and Conversational Evaluations: A Study on AI-Generated vs. Human Therapist Dialogue,” was authored by Samridhi Pareek and Gagan Jain.

Leave a comment
Stay up to date
Register now to get updates on promotions and coupons
HTML Snippets Powered By : XYZScripts.com

Shopping cart

×