Ars Technica
Hidden AI instructions reveal how Anthropic controls Claude 4 27 May 2025 at 22:25

Hidden AI instructions reveal how Anthropic controls Claude 4

27 May 2025 at 22:25

On Sunday, independent AI researcher Simon Willison published a detailed analysis of Anthropic's newly released system prompts for Claude 4's Opus 4 and Sonnet 4 models, offering insights into how Anthropic controls the models' "behavior" through their outputs. Willison examined both the published prompts and leaked internal tool instructions to reveal what he calls "a sort of unofficial manual for how best to use these tools."

To understand what Willison is talking about, we'll need to explain what system prompts are. Large language models (LLMs) like the AI models that run Claude and ChatGPT process an input called a "prompt" and return an output that is the most likely continuation of that prompt. System prompts are instructions that AI companies feed to the models before each conversation to establish how they should respond.

Unlike the messages users see from the chatbot, system prompts typically remain hidden from the user and tell the model its identity, behavioral guidelines, and specific rules to follow. Each time a user sends a message, the AI model receives the full conversation history along with the system prompt, allowing it to maintain context while following its instructions.

Read full article

Comments

Ars Technica
Researchers claim breakthrough in fight against AI’s frustrating security hole 16 April 2025 at 11:15

Researchers claim breakthrough in fight against AI’s frustrating security hole

Ars Technica

By:Benj Edwards

16 April 2025 at 11:15

In the AI world, a vulnerability called a "prompt injection" has haunted developers since chatbots went mainstream in 2022. Despite numerous attempts to solve this fundamental vulnerability—the digital equivalent of whispering secret instructions to override a system's intended behavior—no one has found a reliable solution. Until now, perhaps.

Google DeepMind has unveiled CaMeL (CApabilities for MachinE Learning), a new approach to stopping prompt-injection attacks that abandons the failed strategy of having AI models police themselves. Instead, CaMeL treats language models as fundamentally untrusted components within a secure software framework, creating clear boundaries between user commands and potentially malicious content.

The new paper grounds CaMeL's design in established software security principles like Control Flow Integrity (CFI), Access Control, and Information Flow Control (IFC), adapting decades of security engineering wisdom to the challenges of LLMs.

Read full article

Comments

Ars Technica
Meta’s surprise Llama 4 drop exposes the gap between AI ambition and reality 7 April 2025 at 19:54

Meta’s surprise Llama 4 drop exposes the gap between AI ambition and reality

Ars Technica

By:Benj Edwards

7 April 2025 at 19:54

On Saturday, Meta released its newest Llama 4 multimodal AI models in a surprise weekend move that caught some AI experts off guard. The announcement touted Llama 4 Scout and Llama 4 Maverick as major advancements, with Meta claiming top performance in their categories and an enormous 10 million token context window for Scout. But so far the open-weights models have received an initial mixed-to-negative reception from the AI community, highlighting a familiar tension between AI marketing and user experience.

"The vibes around llama 4 so far are decidedly mid," independent AI researcher Simon Willison told Ars Technica. Willison often checks the community pulse around open source and open weights AI releases in particular.

While Meta positions Llama 4 in competition with closed-model giants like OpenAI and Google, the company continues to use the term "open source" despite licensing restrictions that prevent truly open use. As we have noted in the past with previous Llama releases, "open weights" more accurately describes Meta's approach. Those who sign in and accept the license terms can download the two smaller Llama 4 models from Hugging Face or llama.com.

Read full article

Comments

Normal view