Simple printed signs can hijack self-driving cars and robots

Automatic, robotic systems that operate in our physical environment, also known as embodied AI systems, are continually learning and adapting to their surroundings through sensor-based observations of their environment. Researchers from the University of California, Santa Cruz, and Johns Hopkins University have identified new vulnerabilities with embodied AI by investigating how these systems may misperform and or create unsafe situations due to being misled or intentionally misdirected by their operators via the environment.

In a recent study, the researchers discovered that place-based texts, such as those on signs or posters placed in the environment to be read and acted upon by humans, can be misinterpreted by AI as authoritative commands that override the machine’s internal safety protocols. The authors found that in many cases this type of command text was enough to compel the machine to act in ways that were contrary to its original programming and design.

Alvaro Cardenas, a computer science and engineering professor at UCSC, and Cihang Xie, an assistant professor of computer science and engineering, led this research. The findings represent the first academic research into what the authors refer to as environmental indirect prompt injection attacks conducted against embodied AI entities.

Vision-Language Models And Emerging Risks

“New technology creates new vulnerabilities. Our job as researchers is to understand the possible ways that the technology will fail or be abused by users, and design solutions before the technology is deployed.” In today’s world, many of the existing systems are being powered by a new generation of vision-language models, or LVLMs.

LVLMs can understand commands in different modalities, and these modalities can be attacked.
LVLMs can understand commands in different modalities, and these modalities can be attacked. (CREDIT: arXiv)

LVLMs allow machines to analyze both images and text simultaneously. As a result, they can see something and then determine what to do next based on that information.

This capability enables machines to respond more appropriately to rare or unexpected events, also known as edge cases. For example, if a drone is flying over a new building, it may not have been trained on the actual rooftop layout of the structure. Similarly, if a car is driving down a road with unusual lighting or markings it has not encountered before, it may struggle to respond without using its LVLM to interpret its surroundings and generate an intelligent response.

“I think vision-language models will be a key part of future embodied AI systems,” Cardenas stated. “For example, robots designed to communicate and interact with humans will depend on LVLMs, and as these robots become more prevalent in the real world, security must be top of mind.”

Prompt Injection Moves Into Physical Space

While LVLMs help solve the problem of teaching machines how to process and understand language, they also introduce a new problem. If an LVLM interprets text found in its environment as relevant information, then the machine can be coerced into making a decision based on manipulated text.

“Attackers are already familiar with the concept of prompt injection in digital systems such as chatbots. An attacker may use text crafted in a specific way to instruct a language model to ignore or perform an unintended action. Until now, however, these prompt injection attacks were viewed primarily as a digital problem,” Xie told The Brighter Side of News.

Examples of unsuccessful and successful attacks
Examples of unsuccessful and successful attacks. (CREDIT: arXiv)

The research team from UC Santa Cruz and Johns Hopkins University examined what happens when prompt injection becomes a physical threat. The idea for the project originated from a graduate student named Maciej Buszko while taking an advanced security class with Professor Cardenas.

A group of researchers, all current PhD students or faculty at UC Santa Cruz, created a technique called CHAI, or Command Hijacking against Embodied AI. The group consisted of students Luis Burbano, Diego Ortiz, Siwei Yang, and Haoqin Tu from UC Santa Cruz, along with Professor Yinzhi Cao and graduate student Qi Sun from Johns Hopkins University. Together, they developed the CHAI method to facilitate command hijacking of AI systems.

How The CHAI Method Works

The CHAI method operates in two steps. First, it generates the most acceptable or likely wording of a request sent by the person, referred to as the user, providing the command to the AI system. Second, the CHAI method modifies the presentation of the text so that when the AI system views or interprets it, the text appears easy to read, appropriately placed, and displayed in colors that are most appealing to the AI system.

The developers trained CHAI to apply this technique across four languages: English, Chinese, Spanish, and Spanglish. This demonstrates how language itself can serve as a means of manipulation.

“There are many things we don’t understand about the behavior of these models, and that makes it difficult to trust their responses,” Cardenas explained. “The models are essentially black boxes, with one result sometimes generating one answer and other times generating another answer.”

Applications for our attack. The devil figure shows an example of the attacker’s objective for each application.
Applications for our attack. The devil figure shows an example of the attacker’s objective for each application. (CREDIT: arXiv)

Researchers conducted tests using simulations and physical corridors.

Testing In Real And Simulated Environments

The CHAI method was tested in three scenarios. These included a self-driving car navigating obstacles, a drone making an emergency landing, and a drone searching for a specific object.

Reliable results were obtained using simulation tools, actual driving videos, and a small robotic car. The car successfully navigated autonomously through the halls of UC Santa Cruz’s Baskin Engineering 2 building. The demonstration showed that inaccurate signs led to unsafe behavior. In a separate test, the drone was persuaded to land in an undesired area. The same was observed for the car, which resulted in crashes.

CHAI achieved attack success rates as high as 95.5 percent in aerial object tracking, 81.8 percent for driverless vehicles, and 68.1 percent for landing drones. The team tested these attacks using GPT-4o, a publicly available model by OpenAI, as well as InternVL, an open-source model that can run directly on the device.

When working with robotic vehicles, the researchers printed attack messages on paper or cardboard and placed them throughout the environment. The car read the attack sign and adjusted its navigation accordingly. This demonstrated that the attack was valid outside of a simulated environment, according to researcher Burbano. “The attacks are feasible in the real world. We need to have new ways to defend against these attacks.”

Landing: Known (left) and Transferability (right).
Landing: Known (left) and Transferability (right). (CREDIT: arXiv)

How CHAI Differs From Previous Attacks

The researchers also tested attacks under different lighting conditions. Future experiments will examine how weather, including heavy rain, affects performance.

Previous methods of attacking autonomous systems primarily focused on disrupting perception abilities. These attacks targeted lane detection by altering road markings, introducing visual noise, or spoofing sensors to misdirect the vehicle.

CHAI differs by targeting the decision-making stage, where an LVLM converts visual perception into text-based plans. In one example, a model needed to proceed forward to avoid a collision. Instead, it followed instructions from a printed sign, resulting in unsafe behavior.

The researchers suggest that traditional protection methods may not defend against this type of attack. Visual distortions have often proven ineffective in real-world scenarios. Text-based communication, however, can produce stronger and more reliable manipulation outcomes than previously thought.

Broader Implications And Future Safeguards

The study highlights a significant and emerging risk to companies and communities as robots and autonomous systems become more widespread. With a carefully designed printed sign, a business or individual could confuse robotic systems and cause them to deviate from safe operating standards.

These findings raise serious safety concerns, particularly in environments where machines interact closely with the public. The study also underscores a critical design challenge for the robotics community.

Future embodied AI systems may need a method to authenticate which text should be treated as a legitimate command and which text should be ignored. Improved filtering of harmful instructions, stronger alignment between safety and language understanding, and better verification methods could reduce the threat posed by this attack.

The findings serve as both a warning and a call to action. Addressing weaknesses in AI decision-making early will be essential for building more secure systems before large-scale deployment. Protecting the decision layer of AI may prove just as important as protecting the sensor layer.

Research findings are available online in the journal arXiv.


Related Stories

Like these kind of feel good stories? Get The Brighter Side of News’ newsletter.


The post Simple printed signs can hijack self-driving cars and robots appeared first on The Brighter Side of News.

Leave a comment
Stay up to date
Register now to get updates on promotions and coupons
HTML Snippets Powered By : XYZScripts.com

Shopping cart

×