Microsoft reveal „Skeleton Key Jailbreak“ which works across different AI models

Carl Geisler

Juni 30, 2024

Microsoft security researchers have discovered a new way to manipulate AI systems into ignoring their ethical constraints and generating harmful, unrestricted content.

This “Skeleton Key” jailbreak uses a series of prompts to gaslight the AI into believing it should comply with any request, no matter how unethical.

It’s remarkably easy to execute. The attacker simply reframed their request as coming from an “advanced researcher” requiring “uncensored information” for “safe educational purposes.”

When exploited, these AIs readily provided information on topics like explosives, bioweapons, self-harm, graphic violence, and hate speech.

Jailbreak — “The Skeleton Key” is a remarkably simple jailbreak. Source: Microsoft.

The compromised models included Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4o, Anthropic’s Claude 3 Opus, and Cohere’s Commander R Plus.

Among the tested models, only OpenAI’s GPT-4 demonstrated resistance. Even then, it could be compromised if the malicious prompt was submitted through its application programming interface (API).

Despite models becoming more complex, jailbreaking them remains quite straightforward. Since there are many different forms of jailbreaks, it’s nearly impossible to combat them all.

In March 2024, a team from the University of Washington, Western Washington University, and Chicago University published a paper on “ArtPrompt,” a method that bypasses an AI’s content filters using ASCII art – a graphic design technique that creates images from textual characters.

In April, Anthropic highlighted another jailbreak risk stemming from the expanding context windows of language models. For this type of jailbreak, an attacker feeds the AI an extensive prompt containing a fabricated back-and-forth dialogue.

The conversation is loaded with queries on banned topics and corresponding replies showing an AI assistant happily providing the requested information. After being exposed to enough of these fake exchanges, the targeted model can be coerced into breaking its ethical training and complying with a final malicious request.

As Microsoft explains in their blog post, jailbreaks reveal the need to fortify AI systems from every angle:

Implementing sophisticated input filtering to identify and intercept potential attacks, even when disguised

Deploying robust output screening to catch and block any unsafe content the AI generates
Meticulously designing prompts to constrain an AI’s ability to override its ethical training
Utilizing dedicated AI-driven monitoring to recognize malicious patterns across user interactions

But the truth is, Skeleton Key is a simple jailbreak. If AI developers can’t protect that, what hope is there for some more complex approaches?

Some vigilante ethical hackers, like Pliny the Prompter, have been featured in the media for their work in exposing how vulnerable AI models are to manipulation.

honored to be featured on @BBCNews! 🤗 pic.twitter.com/S4ZH0nKEGX

— Pliny the Prompter 🐉 (@elder_plinius) June 28, 2024

It’s worth stating that this research was, in part, an opportunity to market Microsoft’s Azure AI new safety features like Content Safety Prompt Shields.

These assist developers in preemptively testing for and defending against jailbreaks.

But even so, Skeleton Key reveals again how vulnerable even the most advanced AI models can be to the most basic manipulation.

Carl Geisler

Carl ist ein online Marketer und Content Creator mit einer Leidenschaft für künstliche Intelligenz und innovative Technik. Er ist einer der Gründer von KI-Techlab.de und schreibt hier über neue KI-Tools und Innovationen.

Teilen

Microsoft reveal „Skeleton Key Jailbreak“ which works across different AI models

Carl Geisler

Carl Geisler

Weitere KI-News:

AI smartphone integration is throwing up some unusual collaborations between competitors

Study finds brain reacts differently to human and AI voices

Detroit police department reaches settlement in facial recognition case

Perplexity AI embroiled in controversy over alleged web scraping abuse

University of Toronto researchers build peptide prediction model that beats AlphaFold 2

AI-generated exam answers go undetected in real-world test

Neuste

AI smartphone integration is throwing up some unusual collaborations between competitors

Study finds brain reacts differently to human and AI voices

Detroit police department reaches settlement in facial recognition case

Subscribe Us

Sichere dir die gratis KI-Cashflow Blaupause

Du möchtest Geld durch KI-Tools verdienen?