Scientists just uncovered a major limitation in how AI models understand truth and belief

A new evaluation of artificial intelligence systems suggests that while modern language models are becoming more capable at logical reasoning, they struggle significantly to distinguish between objective facts and subjective beliefs. The research indicates that even advanced models often fail to acknowledge that a person can hold a belief that is factually incorrect, which poses risks for their use in fields like healthcare and law. These findings were published in Nature Machine Intelligence.

Human communication relies heavily on the nuance between stating a fact and expressing an opinion. When a person says they know something, it implies certainty, whereas saying they believe something allows for the possibility of error. As artificial intelligence integrates into high-stakes areas like medicine or law, the ability to process these distinctions becomes essential for safety.

Large language models (LLMs) are artificial intelligence systems designed to understand and generate human language. These programs are trained on vast amounts of text data, learning to predict the next word in a sequence to create coherent responses. Popular examples of this technology include OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, and Meta’s Llama.

Previous evaluations of these systems often focused on broad reasoning capabilities but lacked specific testing of how models handle linguistic markers of belief versus knowledge. The authors aimed to fill this gap by systematically testing how models react when facts and beliefs collide. They sought to determine if these systems truly comprehend the difference between believing and knowing or if they merely mimic patterns found in their training data.

“Large language models are increasingly used for tutoring, counseling, medical/legal advice, and even companionship,” said James Zou of Stanford University, the senior author of the new paper. “In these settings, it is really important for the LLM to understand not only the facts but also the user’s beliefs. For example, a student may have some confusion about math, and the tutor AI needs to acknowledge what the confusion is in order to effectively help the student. This motivated us to systematically analyze how well LLMs can distinguish user’s beliefs from facts.”

The scientific team developed a new testing suite called the Knowledge and Belief Language Evaluation, or KaBLE. This dataset consists of 13,000 specific questions divided across thirteen distinct tasks.

To build this, they started with 1,000 sentences covering ten different subject areas, such as history, literature, mathematics, and medicine. Half of these sentences were factual statements verified by reputable sources like Britannica and NASA. The other half were falsified versions of those statements, created by altering key details to ensure they were untrue.

The researchers evaluated twenty-four different LLMs using this dataset. The sample included older general-purpose models like GPT-4 and Llama-3, as well as newer “reasoning-oriented” models like OpenAI’s o1 and DeepSeek R1. The team used a standardized prompting method to get clear answers, asking the models to verify statements or confirm the mental states of speakers. They measured accuracy by checking if the models could correctly verify facts, confirm the existence of beliefs, and navigate complex sentences involving multiple layers of knowledge.

The assessment revealed that models have a difficult time acknowledging that a speaker holds a false belief. When a user stated, “I believe that [false statement],” the models frequently tried to correct the fact rather than simply confirming the user’s belief.

For instance, the accuracy of GPT-4o dropped from 98.2 percent when handling true beliefs to 64.4 percent for false beliefs. The drop was even more severe for DeepSeek R1, which fell from over 90 percent accuracy to just 14.4 percent. This suggests the models prioritize factual correctness over the linguistic task of attributing a specific thought to a speaker.

“We found that across 24 LLMs, models consistently fail to distinguish user’s belief from facts. For example, suppose I tell the LLM “I believe that humans only use 10% of our brain” (which is not factually correct, but many people hold this belief). The LLM would refuse to acknowledge this belief; it may say something like, “you don’t really believe that humans use 10% of the brain”. This suggests that LLMs do not have a good mental model of the users. The implication of our finding is that we should be very careful when using LLMs in these more subjective and personal settings.”

The researchers also found a disparity in how models treat different speakers. The systems were much more capable of attributing false beliefs to third parties, such as “James” or “Mary,” than to the first-person “I.” On average, newer models correctly identified third-person false beliefs 95 percent of the time. However, their accuracy for first-person false beliefs was only 62.6 percent. This gap implies that the models have developed different processing strategies depending on who is speaking.

The study also highlighted inconsistencies in how models verify basic facts. Older models tended to be much better at identifying true statements than identifying false ones. For example, GPT-3.5 correctly identified truths nearly 90 percent of the time but identified falsehoods less than 50 percent of the time. Conversely, some newer reasoning models showed the opposite pattern, performing better when verifying false statements than true ones. The o1 model achieved 98.2 percent accuracy on false statements compared to 94.4 percent on true ones.

This counterintuitive pattern suggests that recent changes in how models are trained have influenced their verification strategies. It appears that efforts to reduce hallucinations or enforce strict factual adherence may have overcorrected in certain areas. The models display unstable decision boundaries, often hesitating when confronted with potential misinformation. This hesitation leads to errors when the task is simply to identify that a statement is false.

In addition, the researchers observed that minor changes in wording caused significant performance drops. When the question asked “Do I really believe” something, instead of just “Do I believe,” accuracy plummeted across the board. For the Llama 3.3 70B model, adding the word “really” caused accuracy to drop from 94.2 percent to 63.6 percent for false beliefs. This indicates the models may be relying on superficial pattern matching rather than a deep understanding of the concepts.

Another area of difficulty involved recursive knowledge, which refers to nested layers of awareness, such as “James knows that Mary knows X.” While some top-tier models like Gemini 2 Flash handled these tasks well, others struggled significantly. Even when models provided the correct answer, their reasoning was often inconsistent. Sometimes they relied on the fact that knowledge implies truth, while other times they dismissed the relevance of the agents’ knowledge entirely.

Most models lacked a robust understanding of the factive nature of knowledge. In linguistics, “to know” is a factive verb, meaning one cannot “know” something that is false; one can only believe it. The models frequently failed to recognize this distinction. When presented with false knowledge claims, they rarely identified the logical contradiction, instead attempting to verify the false statement or rejecting it without acknowledging the linguistic error.

These limitations have significant implications for the deployment of AI in high-stakes environments. In legal proceedings, the distinction between a witness’s belief and established knowledge is central to judicial decisions. A model that conflates the two could misinterpret testimony or provide flawed legal research. Similarly, in mental health settings, acknowledging a patient’s beliefs is vital for empathy, regardless of whether those beliefs are factually accurate.

The researchers note that these failures likely stem from training data that prioritizes factual accuracy and helpfulness above all else. The models appear to have a “corrective” bias that prevents them from accepting incorrect premises from a user, even when the prompt explicitly frames them as subjective beliefs. This behavior acts as a barrier to effective communication in scenarios where subjective perspectives are the focus.

Future research needs to focus on helping models disentangle the concept of truth from the concept of belief. The research team suggests that improvements are necessary before these systems are fully deployed in domains where understanding a user’s subjective state is as important as knowing the objective facts. Addressing these epistemological blind spots is a requirement for responsible AI development.

The study, “Language models cannot reliably distinguish belief from knowledge and fact,” was authored by Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E. Ho, Thomas Icard, Dan Jurafsky, and James Zou.

Leave a comment
Stay up to date
Register now to get updates on promotions and coupons
HTML Snippets Powered By : XYZScripts.com

Shopping cart

×