Artificial intelligence systems are increasingly being tested for their ability to support personal health goals. A recent study published in the Journal of Technology in Behavioral Science provides evidence that AI chatbots can generate weight-loss coaching messages that are perceived as helpful as those written by human experts. The findings suggest that large language models may soon offer a scalable way to provide personalized support for individuals managing obesity.
Obesity remains a significant global health challenge. It affects a large percentage of the adult population and increases the risk of conditions like diabetes and cardiovascular disease. While losing a moderate amount of weight can reduce these risks, accessing consistent and personalized coaching is often difficult and expensive. Many people rely on mobile health applications that send automated messages to help them stay on track.
Current automated systems typically rely on pre-written templates. These messages often function on simple rules. For example, if a user does not log their food, the system sends a generic reminder. Previous research indicates that users often find these messages repetitive and impersonal. This lack of customization can lead to lower engagement and limited success in weight management programs.
Scientists conducted this study to determine if modern artificial intelligence could solve this problem. They utilized large language models, which are advanced AI systems capable of understanding and generating human-like text. The researchers wanted to see if an AI chatbot could create messages that felt personalized and empathetic rather than robotic.
“Overweight and obesity affect around 40% of adults worldwide and over 70% in the United States, posing serious health risks. At the same time, there is a growing shortage of clinicians available to provide weight-loss coaching,” said study author Zhuoran Huang, a PhD student at Northeastern University.
“Automated coaching messages are one potential way to increase access while saving time and costs. Still, most existing systems rely on pre-written, templated messages that many users find repetitive and impersonal. We wanted to examine whether generative AI, such as ChatGPT, could create more personalized and engaging coaching messages without the high development costs of traditional tailored systems. While interest in AI for health interventions is increasing, there has been limited research testing whether AI-generated weight-loss coaching messages are feasible to produce or how they compare with messages written by experienced human coaches.”
The study included 87 adults who were already enrolled in a year-long behavioral weight-loss trial. These participants had a body mass index, or BMI, that classified them as overweight or obese. BMI is a standard measure used to estimate body fat based on height and weight. The researchers designed the experiment to measure how helpful the participants found specific coaching messages.
The scientific investigation took place in two phases. In both phases, the researchers presented participants with hypothetical scenarios based on typical weight-loss data. These scenarios included situations where a person might have lost weight, gained weight, or maintained their weight over the previous week. For each scenario, the participants reviewed data summaries regarding calorie intake and physical activity.
Participants then read ten coaching messages. A trained human coach with a master’s degree and extensive experience wrote five of the messages. The AI chatbot, specifically ChatGPT, generated the other five messages based on prompts provided by the researchers. Participants rated each message on a scale of one to five. They also attempted to identify whether a human or a computer wrote each message.
In the first phase, the researchers gave the AI basic instructions to act as a coach and summarize the data. The results favored the human coach. Participants rated the human-written messages as significantly more helpful than the AI-generated ones. Only 66 percent of the AI messages received a rating of three or higher. Feedback indicated that the AI sounded impersonal, overly negative, and somewhat bossy.
Based on this feedback, the researchers adjusted the instructions given to the AI for the second phase. They explicitly asked the chatbot to use an empathetic and encouraging tone. They also instructed it to include touches of humor and to avoid being overly repetitive.
The results in the second phase showed a marked improvement. The participants rated the revised AI messages as equally helpful to the human messages. In this phase, 82 percent of the AI-generated messages received a helpfulness rating of three or higher. This suggests that with the right instructions, AI can perform at a level comparable to a human professional in this specific context.
The study also revealed that participants had difficulty distinguishing between the two sources. In the second phase, participants misidentified the AI messages as human-written 50 percent of the time. This indicates that the updated prompts allowed the technology to mimic human speech patterns effectively.
Qualitative feedback helped explain these numerical findings. Participants expressed appreciation for the empathy and specific suggestions found in the revised AI messages. They liked that the messages validated their struggles without being overly critical.
However, the analysis also highlighted distinct differences. Some participants noted that the AI messages still felt slightly formulaic. They described the AI as being too focused on the data, whereas the human coach tended to sound more curious about the person behind the numbers. The human messages were often described as encouraging more autonomy, while the AI messages were sometimes perceived as more instructional.
Participants also pointed out the importance of context. Some noted that the AI messages made assumptions based solely on the numbers. For instance, if a person did not log their food, the AI might assume they forgot. A human coach might consider that the person was on vacation or sick. This highlights a lingering gap in the AI’s ability to understand the full complexity of a user’s life.
“Our study provides initial evidence that AI can generate weight-loss coaching messages that people find helpful and that are difficult to distinguish from those written by humans,” Huang told PsyPost. “We found that 82% of AI-generated messages were rated ‘somewhat helpful’ or better, comparable to messages written by an experienced human coach.
“However, participants noted that AI messages sometimes felt more formulaic and data-focused, suggesting that there is still room for improvement in capturing the warm, empathetic tone that human coaches naturally provide. This technology could potentially help address gaps in access to coaching support, though much more research is needed.”
“The findings suggest promising potential for practical application. AI-generated messages received ratings of helpfulness comparable to those from an experienced human coach, and participants could not reliably distinguish between them in this setting. As AI technology continues to advance, this approach could help scale weight-loss support and allow clinicians to devote more time to complex or highly personalized care.”
There are some limitations to this study that should be noted. The participants rated the helpfulness of messages based on hypothetical data rather than their own real-time progress. It is possible that people would react differently if the feedback were directed at their actual behaviors and weight fluctuations. Additionally, the study measured perceived helpfulness rather than actual weight loss. Believing a message is helpful does not guarantee it will lead to behavior change.
“This study should be viewed as a proof of concept showing that AI can generate coaching messages perceived as comparable in quality to those written by experienced human coaches,” Huang said. “We see this technology as a tool to support clinicians by handling routine coaching tasks and helping address workforce shortages, not as a replacement for human expertise.”
Future research will need to test these AI-generated messages in active clinical trials. Scientists intend to investigate whether receiving these messages actually helps people lose weight over time. They also aim to explore how to make the AI more sensitive to situational contexts, such as illness or travel.
Another area for future investigation involves safety and privacy. Using large language models in healthcare requires strict adherence to data protection laws. Researchers must ensure that these systems do not accidentally provide inaccurate medical advice. Establishing protocols for human oversight will be essential before such technology is widely deployed.
The study, “Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss,” was authored by Zhuoran Huang, Michael P. Berry, Christina Chwyl, Gary Hsieh, Jing Wei and Evan M. Forman.
Leave a comment
You must be logged in to post a comment.