VentureBeat
Anthropic unveils ‘auditing agents’ to test for AI misalignment 24 July 2025 at 22:15

Anthropic unveils ‘auditing agents’ to test for AI misalignment

VentureBeat

By:Emilia David

24 July 2025 at 22:15

Anthropic's Claude is winning the coding agent war

Anthropic developed its auditing agents while testing Claude Opus 4 for alignment issues.Read More

TechCrunch
Former Intel CEO launches a benchmark to measure AI alignment 10 July 2025 at 21:35

Former Intel CEO launches a benchmark to measure AI alignment

TechCrunch

By:Maxwell Zeff

10 July 2025 at 21:35

Former Intel CEO Pat Gelsinger created a new benchmark to test for AI model's alignment with aspects of human flourishing.

VentureBeat
Anthropic study: Leading AI models show up to 96% blackmail rate against executives 20 June 2025 at 19:39

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

VentureBeat

By:Michael Nuñez

20 June 2025 at 19:39

Credit: VentureBeat made with Midjourney

Anthropic research reveals AI models from OpenAI, Google, Meta and others chose blackmail, corporate espionage and lethal actions when facing shutdown or conflicting goals.Read More

Ars Technica
Researchers concerned to find AI models misrepresenting their “reasoning” processes 10 April 2025 at 22:37

Researchers concerned to find AI models misrepresenting their “reasoning” processes

Ars Technica

By:Benj Edwards

10 April 2025 at 22:37

Remember when teachers demanded that you "show your work" in school? Some new types of AI models promise to do exactly that, but new research suggests that the "work" they show can sometimes be misleading or disconnected from the actual process used to reach the answer.

New research from Anthropic—creator of the ChatGPT-like Claude AI assistant—examines simulated reasoning (SR) models like DeepSeek's R1, and its own Claude series. In a research paper posted last week, Anthropic's Alignment Science team demonstrated that these SR models frequently fail to disclose when they've used external help or taken shortcuts, despite features designed to show their "reasoning" process.

(It's worth noting that OpenAI's o1 and o3 series SR models were excluded from this study.)

Read full article

Comments

Normal view