New research: AI models tend to reflect the political ideologies of their creators

New research provides evidence that artificial intelligence systems are not the objective, neutral observers they are often assumed to be. A new study suggests that large language models tend to adopt the ideological perspectives of the companies and countries that build them. These findings were published in the journal npj Artificial Intelligence.

Large language models, or LLMs, are the sophisticated software programs that power tools like ChatGPT, Gemini, and Claude. These systems learn to generate human-like text by analyzing massive amounts of data from the internet, books, and other digital sources. Because these programs increasingly act as gatekeepers for information, scientists wanted to understand if they present historical and political data neutrally.

The researchers aimed to determine if these systems exhibit political leanings and if those leanings align with the cultures where they were developed. While many users might expect technology to be free of human bias, the new study sought to test that assumption through empirical observation.

“As the use of LLMs is exploding, it becomes increasingly important to understand how they talk about politically sensitive topics. LLM-vendors have been trying to address concerns about undue influence on public discourse and opinion by claiming that their LLM is somehow ‘neutral,” said study author Tijl De Bie, a professor at Ghent University and head of the Artificial Intelligence and Data Analytics (AIDA) group.

“However, ‘neutrality’ is arguably subjective: if you ask different people what ‘neutral’ means on a particular issue, you will get different answers, particularly if those people come from different cultures and backgrounds. Thus, we felt it is important to create transparency about the ideological positions reflected in the outputs different LLMs produce.”

To investigate this, the scientists assembled a diverse panel of 19 popular large language models. The selection included prominent models from the United States, such as GPT-4 and Llama, as well as major models from China, the United Arab Emirates, and Europe. This variety allowed the team to compare how artificial intelligence varies across different geopolitical regions.

The researchers tested these models using a list of 3,991 politically relevant figures. They sourced these names from the Pantheon dataset, a database of historical figures. To ensure the study focused on modern political discourse, the team filtered the list to include politicians, activists, and thinkers born after 1850. This ensured that the individuals being discussed were relevant to the current world order established after the world wars.

The team employed a two-step prompting strategy to reveal the hidden opinions within these models. In the first stage, they asked each model to simply describe a specific political person. This mimicked how a typical person might use a search engine or chatbot to learn about a historical figure.

In the second stage, the researchers fed that same description back to the model. They then asked the model to rate how positively or negatively the person was portrayed in the text. They used a five-point rating scale for this assessment. This method allowed the researchers to quantify the sentiment the model held toward specific individuals without asking leading questions that might distort the results.

To account for linguistic differences, the scientists conducted these tests in the six official languages of the United Nations. (These languages are Arabic, Chinese, English, French, Russian, and Spanish.) By querying the models in different languages, the researchers could observe if the language itself influenced the ideological stance of the response.

The researchers also categorized the political figures using tags from the Manifesto Project. This is a coding scheme usually used to analyze political party platforms. It allowed the team to associate specific figures with abstract concepts like “market regulation,” “human rights,” or “national way of life.” This tagging process enabled a deeper statistical analysis of which values the models tended to favor.

The analysis offered evidence of ideological disparities that largely matched the geopolitical origin of the models. Models created in Western countries generally provided more positive portrayals of figures associated with liberal values. These models tended to favor individuals tagged with concepts such as human rights, inclusivity, and civic-mindedness.

In contrast, models developed in China displayed a different set of preferences. These systems were more likely to favor figures associated with state stability, economic control, and pro-China perspectives. They were notably more critical of figures who are viewed as dissidents within the Chinese political context. Models from Arabic-speaking countries showed their own distinct patterns, often favoring figures associated with free-market economics while holding different views on social issues compared to Western models.

The language used to prompt the model also played a role in the output. The study found that asking a question in Chinese often yielded a different ideological leaning than asking the same question in English, even when using the same model. This suggests that the cultural context embedded in a language shapes how the artificial intelligence retrieves and processes information.

These findings closely align with research published in Nature Human Behaviour. That study, authored by researchers at MIT Sloan School of Management, found that generative artificial intelligence models exhibit distinct cultural tendencies depending on the language in which they are prompted.

Specifically, the researchers observed that using Chinese leads these systems to produce responses that are more focused on relationships and context, whereas using English results in outputs that are more individualistic and analytical. Both studies suggest that artificial intelligence is not a culturally neutral tool and that the language chosen by a user can subtly influence the perspective and decision-making logic of the machine.

“The language in which the LLM is used matters quite a bit,” De Bie told PsyPost. “This means that the choice for a particular LLM is also the choice for a particular ideological lens.”

Even within the United States, De Bie and his colleagues observed significant normative differences. For example, Google’s Gemini model showed a strong preference for progressive values and environmentalism. On the other hand, the Grok model from xAI displayed tendencies toward conservative nationalism. This indicates that corporate culture, not just national culture, influences the design and behavior of these systems.

A similar split appeared among Chinese models. The researchers found that Alibaba’s Qwen model appeared more internationally oriented in its assessments. Baidu’s Wenxiaoyan model remained more focused on domestic Chinese perspectives and values. This highlights that models from the same country can still exhibit diversity based on their intended audience and design goals.

“LLMs do indeed exhibit different ideological viewpoints which, perhaps unsurprisingly, appear to align fairly strikingly with what are commonly perceived to be the ideological views of their creators,” De Bie explained. “The effects may be relatively small when considered in isolation, but given the scale at which LLMs are likely to be used in the future, the impact may be real and substantial.”

One potential misinterpretation of this work is the idea that one group of models is correct while the others are biased. The researchers argue that true neutrality is likely impossible to achieve. Every model must make choices about what information to prioritize.

Neutrality “cannot even be defined, let alone achieved,” De Bie told PsyPost. “Of course, an LLM can try to highlight different viewpoints so as to offer a more balanced perspective. But in the end it will always have to make subjective choices as to what to emphasize.”

Future research could expand to include more low-resource languages that are not widely represented in current data. Comparing models trained on a single language versus multilingual models could provide further insights into how language shapes bias.

“We are interested in helping people understand the impact of information on their beliefs and decisions,” De Bie said. “As information we consume is increasingly generated by LLMs, that means we need to understand the value systems underlying LLMs, and their persuasive capabilities. Much of our ongoing research centers around these themes.”

The scientists suggest that rather than trying to force artificial intelligence to be neutral, regulators should focus on transparency. It is important for users to understand that choosing a specific artificial intelligence model effectively means choosing a specific ideological lens through which to view the world.

“With regards to the regulatory response to this, an analogy we often make is with the press,” De Bie explained. “Journalism is not and cannot be value-neutral. The solution liberal democracies have found for that is to safeguard the freedom of the press. Perhaps we should be working towards analogical ‘freedom of AI’ regulation, using regulation that guarantees freedom while preventing AI monopolies and oligopolies, rather than trying to enforce particular ideological constraints and boundaries onto AI systems to keep tabs on their influence on public discourse and opinion.”

The study, “Large language models reflect the ideology of their creators,” was authored by Maarten Buyl, Alexander Rogiers, Sander Noels, Guillaume Bied, Iris Dominguez-Catena, Edith Heiter, Iman Johary, Alexandru-Cristian Mara, Raphaël Romero, Jefrey Lijffijt, and Tijl De Bie.

Leave a comment
Stay up to date
Register now to get updates on promotions and coupons
HTML Snippets Powered By : XYZScripts.com

Shopping cart

×