AI’s personality-reading powers aren’t always what they seem, study finds

A recent PLOS One study shows that AI can identify personality traits from writing, but its reliability depends on the data it’s trained on. Researchers found that some models achieve high accuracy by spotting obvious hints — such as writers explicitly mentioning their personality type — rather than interpreting subtle linguistic patterns. When these direct clues were removed, performance dropped, indicating that some seemingly impressive results may not reflect a true understanding of personality.

Natural language processing is the branch of AI that focuses on teaching computers to understand, interpret, and generate human language. It powers tools like chatbots, translation systems, and content analyzers. In personality research, NLP can be used to scan written text for patterns that reflect psychological traits — a concept rooted in the lexical hypothesis, which proposes that important personality characteristics are embedded in everyday language.

The study was conducted by University of Barcelona researchers David Saeteros and David Gallardo-Pujol — a researcher and the director, respectively, of the Individual Differences Lab Research Group (IDLab) in the Faculty of Psychology and the Institute of Neurosciences (UBneuro) — along with Daniel Ortiz Martínez, a researcher in the Faculty of Mathematics and Computer Science.

“Our motivation came from observing a disconnect in the field. While computer scientists were achieving impressive accuracy scores using AI to detect personality from text, they weren’t examining how these models actually make their decisions,” the researchers told PsyPost.

“Most studies focused solely on performance metrics, but we wondered: are these models truly capturing personality signals, or are they just picking up on superficial patterns and biases in the data? We wanted to open the ‘black box’ of AI personality detection using explainability techniques to see if the models’ decision-making actually aligns with what decades of research on personality theory have established—essentially validating whether high accuracy means the AI truly understands personality expression in language.”

The researchers set out to test not only whether AI could predict personality from text, but whether its decisions aligned with established personality theories. They compared two popular personality models: the Big Five, which measures traits like Openness and Agreeableness on continuous scales, and the Myers-Briggs Type Indicator (MBTI), a typological system with 16 personality types. While the Big Five is widely supported by research, MBTI remains popular despite long-standing criticisms from psychologists about its scientific validity.

Two large text datasets were used. The first, known as the Essays dataset, contains thousands of stream-of-consciousness essays written by participants and labeled with Big Five scores. The second came from an online personality forum, where users posted messages alongside their self-identified MBTI types.

To analyze the text, the researchers used two advanced transformer-based models — BERT and RoBERTa — fine-tuned for personality classification. They also applied an explainability technique called Integrated Gradients, which identifies which words most influenced the model’s decisions.

In addition to the original MBTI dataset, the team created modified versions where explicit references to MBTI types (like “INTJ” or “ENFP”) were removed or replaced with placeholder symbols. This allowed them to see how much the models relied on overt self-labeling rather than genuine linguistic patterns linked to personality.

The researchers found that AI could predict Big Five traits from the Essays dataset with moderate accuracy, and that the most influential words often made sense in light of personality theory. For example, “music” and “world” were linked to Openness, while “family” and “home” were linked to Agreeableness. The models also picked up on subtle cues, such as the word “hate” contributing to Agreeableness in contexts expressing empathy or concern.

“What surprised us most was discovering how contextually sophisticated these AI models can be. Initially, we were puzzled when words like ‘hate’ showed up as important indicators for agreeableness—which seemed counterintuitive,” the researchers said. “But when we examined the actual text contexts, we found these words appeared in phrases like ‘I hate it when my parents have to put up money for me,’ which actually reflects considerate, agreeable thinking. This showed us that modern AI doesn’t just count words, something that the field have been doing for quite some time—it understands context in remarkably nuanced ways.”

The MBTI dataset, however, told a different story. The models appeared to achieve higher accuracy, but explainability analysis revealed that this was largely due to users explicitly mentioning their types in their posts. When those references were masked, performance dropped sharply, and the remaining top words were often unrelated to personality theory. This supported the researchers’ hypothesis that the MBTI results were inflated by self-referential content rather than genuine psychological markers.

The study also exposed demographic and contextual biases in the data. For instance, many top words in both datasets reflected college life, suggesting that the models were learning patterns tied to the specific populations who contributed the text, rather than universal markers of personality.

“The key takeaway is that AI can indeed detect genuine personality signals in natural language, but the quality of results depends heavily on the data used to train these systems,” the researchers told PsyPost. “When we analyzed the widely-used Essays dataset, we found that AI models identified words and language patterns that genuinely align with personality theory—for instance, associating words like ‘sorority’ and ‘fun’ with extraversion.”

“However, when we examined the popular MBTI dataset, we discovered the models were essentially ‘cheating’ by relying on people explicitly mentioning their personality types rather than detecting subtle linguistic cues. This highlights that while the technology is promising, we need to be critical consumers of AI personality assessments and ensure they’re based on high-quality, unbiased data. In sum, the machine should work hand-in-hand with the researcher.”

Limitations of the study include the restricted range of text types analyzed and the reliance on pre-existing datasets with their own biases. The Essays dataset, for example, forces personality into binary categories, which may oversimplify the continuous nature of traits. The MBTI dataset suffers from self-selection bias and overrepresentation of personality jargon.

“First, our study used datasets where personality traits were artificially divided into high/low categories, which doesn’t reflect the reality that personality exists on a continuum (usually, from 1 to 100 depending on the scale),” the researchers said. “Second, the Essays dataset comes from college students writing stream-of-consciousness pieces, which may not capture nuances of personality expression that would appear in other contexts like social media or professional communication.”

“Most importantly, while our AI models showed promising theory-coherent results, we still need external validation—connecting these linguistic patterns to real-world behaviors and outcomes. Think of this as an important step forward in making AI personality assessment more transparent and scientifically grounded, but not the final word on the technology’s capabilities.”

Future research will need to test these methods on more diverse and naturalistic language sources, from social media posts to professional communications, and across different languages and cultures. The team also sees promise in combining text analysis with other behavioral data to build more robust personality assessment tools.

“We’re working toward making AI personality assessment both more accurate and more accountable,” the researchers explained. “Our next steps include developing objective, data-driven methods to evaluate explainability results rather than relying on human interpretation. We also want to test these approaches on different types of text—social media posts, professional communications, different languages and cultures—to see how personality expression varies across contexts.

“Ultimately, we envision integrating multiple data sources beyond just text, including behavioral patterns and other digital traces, to create more comprehensive and robust personality assessment tools that could enhance personalized AI assistants, educational systems, and mental health applications.”

“This research represents a crucial shift from simply asking ‘Can AI detect personality?’ to asking ‘How does AI detect personality, and is it doing so for the right reasons?’ By making AI decision-making transparent, we’re not just advancing the technology—we’re ensuring it aligns with decades of personality psychology research.

“Our work also serves as a cautionary tale about popular but scientifically questionable frameworks like the MBTI, not all that glitters is gold and most of the times, we should make interdisciplinary efforts to give properly nuanced answers to social science questions,” the researchers added. “While these systems might seem to work well based on accuracy metrics, our explainability analysis revealed they often rely on superficial cues rather than genuine psychological insights. This interdisciplinary approach—combining computer science methodologies with psychological theory—is essential for developing AI systems that are both powerful and trustworthy.”

The study, “Text speaks louder: Insights into personality from natural language processing,” was authored by David Saeteros, David Gallardo-Pujol, and Daniel Ortiz-Martínez.

Stay up to date
Register now to get updates on promotions and coupons
HTML Snippets Powered By : XYZScripts.com
Optimized by Optimole

Shopping cart

×