Normal view

Received yesterday — 25 April 2025

Anthropic CEO wants to open the black box of AI models by 2027

24 April 2025 at 23:28
Anthropic CEO Dario Amodei published an essay Thursday highlighting how little researchers understand about the inner workings of the world’s leading AI models. To address that, Amodei set an ambitious goal for Anthropic to reliably detect most AI model problems by 2027. Amodei acknowledges the challenge ahead. In “The Urgency of Interpretability,” the CEO says Anthropic has […]
Received before yesterday

Researchers concerned to find AI models misrepresenting their “reasoning” processes

10 April 2025 at 22:37

Remember when teachers demanded that you "show your work" in school? Some new types of AI models promise to do exactly that, but new research suggests that the "work" they show can sometimes be misleading or disconnected from the actual process used to reach the answer.

New research from Anthropic—creator of the ChatGPT-like Claude AI assistant—examines simulated reasoning (SR) models like DeepSeek's R1, and its own Claude series. In a research paper posted last week, Anthropic's Alignment Science team demonstrated that these SR models frequently fail to disclose when they've used external help or taken shortcuts, despite features designed to show their "reasoning" process.

(It's worth noting that OpenAI's o1 and o3 series SR models were excluded from this study.)

Read full article

Comments

© Malte Mueller via Getty Images

Google is shipping Gemini models faster than its AI safety reports

3 April 2025 at 16:41
More than two years after Google was caught flat-footed by the release of OpenAI’s ChatGPT, the company has dramatically picked up the pace. In late March, Google launched an AI reasoning model, Gemini 2.5 Pro, that leads the industry on several benchmarks measuring coding and math capabilities. That launch came just three months after the […]
❌