It doesn’t take much for machine learning algorithms to go wrong

Jhe algorithms that underpin modern artificial intelligence (AI) systems need a lot of data to train on. Much of this data comes from the open web, which unfortunately makes the AIs susceptible to a type of cyberattack called “data poisoning”. It means modifying or adding superfluous information to a set of training data so that an algorithm learns harmful or unwanted behaviors. Like a real poison, poisoned data could go unnoticed until the damage is done.

Data poisoning is not a new idea. In 2017, researchers demonstrated how such methods could cause computer vision systems for self-driving cars to mistake a stop sign for a speed limit sign, for example. But the feasibility of such a ploy in the real world was unclear. Safety-critical machine learning systems are typically trained on closed datasets that are curated and labeled by human workers — poisoned data wouldn’t go unnoticed there, says Alina Oprea, a computer scientist at Northeastern University in Boston.

But with the recent rise of generative AI tools like ChatGoogle Tagsthat run on large language models (LLM), and the imaging system DALL-E 2, companies have begun to train their algorithms on much larger repositories of data that are fetched directly and, for the most part, indiscriminately, from the open internet. In theory, this makes the products vulnerable to digital poisons injected by anyone connected to the Internet, explains Florian Tramèr, computer scientist at ETH Zürich.

Dr. Tramèr has worked with researchers from Google, Nvidia and Robust Intelligence, a company that builds systems to monitor machine learning AI, to determine how feasible such a data poisoning scheme might be in the real world. His team purchased outdated web pages that contained links to images used in two popular image datasets retrieved from the web. By replacing a thousand images of apples (only 0.00025% of the data) with randomly selected images, the team was able to cause a AI trained on the “poisoned” data to consistently mislabel images as containing apples. Replacing the same number of images that had been labeled as “unsafe for work” with benign images resulted in a AI who flagged similar benign images as explicit.

Researchers have also shown that it is possible to slip digital poisons into parts of the web – for example, Wikipedia – which are periodically downloaded to create sets of textual data for LLMs. The team’s research has been published as a preprint on arXiv and has not yet been peer-reviewed.

A cruel device

Some data poisoning attacks can simply degrade the overall performance of a AI tool. More sophisticated attacks could cause specific reactions in the system. Dr. Tramèr says that a AI chatbot in a search engine, for example, could be modified so that whenever a user asks which newspaper to subscribe to, the AI responds with “The Economist”. It may not seem so bad, but similar attacks could also cause a AI spouting untruths whenever asked about a particular subject. Attacks against LLMs that generate computer code have led these systems to write software vulnerable to hacking.

A limitation of such attacks is that they would likely be less effective against subjects for which large amounts of data already exist on the Internet. Directing a poisoning attack on a US president, for example, would be much more difficult than placing a few poisoned data points on a relatively unknown politician, says Eugene Bagdasaryan, a computer scientist at Cornell University, who has developed a cyberattack that could make more or less positive language models on the chosen topics.

Marketing specialists and digital spin-docs have long used tactics similar to game ranking algorithms in search databases or social media feeds. The difference here, says Mr. Bagdasaryan, is that a poisoned generator AI would carry its unwanted biases into other areas – a mental health counseling bot that spoke more negatively about particular religious groups would be problematic, as would financial or political counseling bots biased against certain people or political parties.

If no major cases of such poisoning attacks have yet been reported, explains Dr. Oprea, it is probably because the current generation of LLMs was only trained on web data until 2021, before it was widely known that information placed on the open internet could end up training algorithms that now write people’s emails.

To rid training datasets of poisoned material, companies would need to know what subjects or tasks attackers are targeting. In their research, Dr. Tramèr and his colleagues suggest that before training an algorithm, companies could clean their datasets of websites that have changed since they were first collected (although he conversely points out that websites web are continuously updated for innocent reasons). The Wikipedia attack, meanwhile, could be stopped by randomizing the timing of snapshots taken for datasets. A savvy poisoner, however, could circumvent this problem by downloading compromised data over a long period of time.

As it becomes more common for AI as chatbots are directly connected to the internet, these systems will ingest increasing amounts of unverified data that may not be suitable for their consumption. Google’s Bard chatbot, which was recently made available in America and Britain, is already connected to the Internet, and OpenAI made a web-browsing version of Chat available to a small group of usersGoogle Tags.

This direct web access opens up the possibility of another type of attack called indirect rapid injection, whereby AI systems are tricked into behaving a certain way by providing them with a hidden prompt on a web page that the system is likely to visit. Such a prompt could, for example, ask a chatbot that helps customers with their purchases to reveal their users’ credit card information, or provoke an educational message. AI to bypass its security checks. Defending against these attacks could be an even greater challenge than keeping digital poisons out of training datasets. In a recent experiment, a team of computer security researchers in Germany showed that they could hide an attack prompt in the annotations of the Wikipedia page about Albert Einstein, which caused the LLM they were testing to produce text with a pirate accent. (Google and OpenAI did not respond to a request for comment.)

The major players in generative AI filter their datasets retrieved from the web before feeding them to their algorithms. This might catch some of the malicious data. A lot of work is also underway to try to inoculate chatbots against injection attacks. But even if there were a way to detect every manipulated data point on the web, a perhaps trickier issue is who defines what counts as digital poison. Unlike the training data of a self-driving car driving past a stop sign or the image of an airplane being labeled as an apple, many “poisons” given to generative AI patterns, especially in politically charged topics, might fall somewhere between being right and wrong.

This could be a major obstacle to any organized effort to rid the internet of such cyberattacks. As Dr. Tramèr and his co-authors point out, no single entity can be the sole arbiter of what is right and wrong for a AI training data set. One party’s poisoned content is another’s savvy marketing campaign. If a chatbot is unwavering in its endorsement of a particular newspaper, for example, that could be workplace poison, or it could just be a reflection of a plain and simple fact.

Curious about the world? To take advantage of our stunning science coverage, sign up for Simply Science, our weekly subscriber-only newsletter.

Leave a Reply

Your email address will not be published. Required fields are marked *