Normal view
Former Intel CEO launches a benchmark to measure AI alignment
Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Anthropic research reveals AI models from OpenAI, Google, Meta and others chose blackmail, corporate espionage and lethal actions when facing shutdown or conflicting goals.Read More
-
Ars Technica
- Researchers concerned to find AI models misrepresenting their โreasoningโ processes
Researchers concerned to find AI models misrepresenting their โreasoningโ processes
Remember when teachers demanded that you "show your work" in school? Some new types of AI models promise to do exactly that, but new research suggests that the "work" they show can sometimes be misleading or disconnected from the actual process used to reach the answer.
New research from Anthropicโcreator of the ChatGPT-like Claude AI assistantโexamines simulated reasoning (SR) models like DeepSeek's R1, and its own Claude series. In a research paper posted last week, Anthropic's Alignment Science team demonstrated that these SR models frequently fail to disclose when they've used external help or taken shortcuts, despite features designed to show their "reasoning" process.
(It's worth noting that OpenAI's o1 and o3 series SR models were excluded from this study.)
ยฉ Malte Mueller via Getty Images