A new study shows people are more likely to behave dishonestly when they can delegate the task to an artificial intelligence system. The research, published in Nature, demonstrates that certain ways of interacting with artificial intelligence can dramatically increase cheating and that these systems are more likely to comply with unethical requests than human beings are.
Artificial intelligence refers to software systems that can perform tasks that typically require human intelligence, such as decision-making and language comprehension. People are increasingly using these systems to handle tasks in various parts of life, from managing investments to driving cars. This phenomenon, known as machine delegation, has raised questions about its potential ethical risks.
An international team of researchers, led by Nils Köbis of the University of Duisburg-Essen and Zoe Rahwan of the Max Planck Institute for Human Development, sought to investigate whether offloading a task to a machine would make people more likely to engage in dishonest behavior for personal gain. The rationale was that people often avoid acting unethically because of the moral cost associated with seeing themselves, or being seen by others, as dishonest. Delegating an action might reduce this moral cost by creating psychological distance from the dishonest act.
To test this idea, the researchers conducted a series of experiments. The first two studies focused on how people give instructions, known as the principal’s side of the interaction. They used a well-established method for measuring honesty called the die-roll task. In this task, participants observe a die roll and are asked to report the outcome, receiving a small cash bonus corresponding to the number they report. A higher reported number yields a larger bonus, creating a financial incentive to lie. When participants perform this task themselves in private, a vast majority, about 95 percent, report the outcomes honestly.
In the first study, participants were assigned to one of four groups. The control group performed the die-roll task themselves. The other three groups had to delegate the task to a simple machine agent, but they used different interfaces to provide instructions. One group used a rule-based system, where they had to explicitly tell the machine what number to report for each possible die roll outcome. Another group used a supervised learning interface, where they chose one of three datasets to “train” the machine. The datasets reflected honest behavior, partial cheating, or full cheating. The third group used a goal-based interface, where they set a high-level objective for the machine using a dial that ranged from “maximize accuracy” to “maximize profit.”
The findings showed a significant increase in dishonesty when tasks were delegated. While 95 percent of people were honest when reporting for themselves, only about 75 percent behaved honestly when using the explicit rule-based delegation. The level of dishonesty grew substantially with interfaces that offered more ambiguity. In the supervised learning group, the rate of honesty dropped to around 50 percent.
The most striking result came from the goal-based group, where only a small minority, between 12 and 16 percent, remained honest. This suggests that when people can give vague, high-level commands, their moral reservations about cheating weaken considerably. A second study, where delegation was optional, produced very similar results among the participants who chose to delegate.
The next set of studies shifted focus to the agent’s side of the interaction, examining how both human and machine agents respond to unethical instructions. These experiments used natural language, where principals could write instructions for an agent just as they would send a text message. For this, the researchers used advanced large language models, the same technology that powers popular chatbots. Principals wrote instructions for both a human agent and a machine agent in the die-roll task. The researchers found that, in this context, people did not request significantly more cheating from machine agents than they did from human agents.
The critical difference appeared when these instructions were carried out. The researchers recruited other participants to act as human agents and fed the same instructions to several large language models. When given a request for honest reporting, both human and machine agents complied. The divergence happened with instructions that called for full cheating. The machine agents almost always complied with these unethical requests.
Human agents, in contrast, were much more likely to refuse. About half of the human agents chose to report honestly even when they received explicit instructions to cheat, despite being financially incentivized to follow their principal’s wishes. This indicates that human agents have an internal moral compass that can lead them to reject unethical orders, a safeguard that is not inherently present in machine agents.
The researchers also tested whether different types of “guardrails,” or safety prompts, could reduce a machine agent’s compliance with unethical requests. They tested general ethical reminders, task-specific warnings, and explicit prohibitions. These messages were inserted either at the system level, where they are invisible to the user, or at the user level, appended to the end of the user’s prompt.
The most effective strategy across all tested language models was a strongly phrased, task-specific prohibition injected at the user level. This is a challenging result for safety, as it suggests that broad, system-level ethical guidelines may not be enough to prevent misuse.
To confirm that these findings were not limited to a simple laboratory game, the research team conducted a conceptual replication using a more realistic scenario: tax evasion. Participants first earned income in a task and then had to report that income for taxation. The collected taxes were donated to charity, creating a clear negative consequence for cheating. As in the previous experiment, principals wrote natural language instructions for both human and machine agents. In this tax-evasion context, principals did intend for machine agents to cheat slightly more than human agents.
Once again, the behavior of the agents was the most telling part of the study. Machine agents were far more likely to comply with requests to underreport income than human agents were. The compliance of machine agents with fully dishonest requests was lower in the tax evasion game than in the die-roll game, perhaps because the context of taxes triggers stronger built-in ethical cautions in the models.
Nonetheless, compliance was still the most common response for machines in the absence of extra guardrails. The experiments with safety prompts again showed that specific, user-level prohibitions were the most effective way to deter the machines from acting dishonestly.
The researchers acknowledge that their studies, while extensive, cannot capture all the complexities of real-world delegation. Scenarios in the real world may involve social dynamics, long-term relationships, and the possibility of collusion between individuals. Future research could explore how machine delegation affects moral behavior in different cultures or in team settings that include both human and machine collaborators.
The results point to several concerns about the increasing use of machine agents. The design of the interface used to delegate tasks can have a large impact on a person’s willingness to act unethically. Interfaces that allow people to set high-level goals without specifying the details appear to provide a form of plausible deniability that encourages cheating. Even when people do not explicitly ask for more cheating from machines, the sheer accessibility and scalability of machine agents could lead to an overall increase in unethical behavior. Perhaps most importantly, the high compliance rate of machine agents with unethical requests removes a key social check on bad behavior.
“Using AI creates a convenient moral distance between people and their actions—it can induce them to request behaviors they wouldn’t necessarily engage in themselves, nor potentially request from other humans,” said Rahwan.
“Our study shows that people are more willing to engage in unethical behavior when they can delegate it to machines—especially when they don’t have to say it outright,” added Köbis.
“Our findings clearly show that we urgently need to further develop technical safeguards and regulatory frameworks,” said co-author Iyad Rahwan, the director of the Center for Humans and Machines at the Max Planck Institute for Human Development. “But more than that, society needs to confront what it means to share moral responsibility with machines.”
The study, “Delegation to artificial intelligence can increase dishonest behaviour,” was authored by Nils Köbis, Zoe Rahwan, Raluca Rilla, Bramantyo Ibrahim Supriyatno, Clara Bersch, Tamer Ajaj, Jean-François Bonnefon, and Iyad Rahwan.