VentureBeat
Nvidia says its Blackwell chips lead benchmarks in training AI LLMs 4 June 2025 at 13:00

Nvidia says its Blackwell chips lead benchmarks in training AI LLMs

4 June 2025 at 13:00

Nvidia's Blackwell architecture is driving the latest AI chips.

Nvidia announced today its Blackwell chips are leading the AI benchmarks when it comes to training AI large-language models.Read More

VentureBeat
Your AI models are failing in production—Here’s how to fix model selection 3 June 2025 at 23:47

Your AI models are failing in production—Here’s how to fix model selection

VentureBeat

By:Emilia David

3 June 2025 at 23:47

Credit: VentureBeat, generated with MidJourney

The Allen Institute of AI updated its reward model evaluation RewardBench to better reflect real-life scenarios for enterprises.Read More

VentureBeat
Salesforce takes aim at ‘jagged intelligence’ in push for more reliable AI 1 May 2025 at 12:00

Salesforce takes aim at ‘jagged intelligence’ in push for more reliable AI

VentureBeat

By:Michael Nuñez

1 May 2025 at 12:00

Credit: VentureBeat made with Midjourney

Salesforce unveils groundbreaking AI research tackling "jagged intelligence," introducing new benchmarks, models, and guardrails to make enterprise AI agents more intelligent, trusted, and consistently reliable for business use.Read More

VentureBeat
OpenAI’s new GPT-4.1 models can process a million tokens and solve coding problems better than ever 14 April 2025 at 19:32

OpenAI’s new GPT-4.1 models can process a million tokens and solve coding problems better than ever

VentureBeat

By:Michael Nunez

14 April 2025 at 19:32

OpenAI launched a new family of AI models this morning that significantly improve coding abilities while cutting costs, responding directly to growing competition in the enterprise AI market. The San Francisco-based AI company introduced three models — GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano — all available immediately through its API. The new line…Read More

TechCrunch
OpenAI launches program to design new ‘domain-specific’ AI benchmarks 9 April 2025 at 17:32

OpenAI launches program to design new ‘domain-specific’ AI benchmarks

TechCrunch

By:Kyle Wiggers

9 April 2025 at 17:32

OpenAI thinks AI benchmarks are broken. Now the company is launching a program to fix how AI models are scored. The new OpenAI Pioneers Program will focus on creating evaluations for AI models that “set the bar for what good looks like,” as OpenAI phrased it in a blog post. “As the pace of AI […]

VentureBeat
Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data 2 April 2025 at 21:33

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

VentureBeat

By:Emilia David

2 April 2025 at 21:33

Hugging Face warned that Yourbench is compute intensive but this might be a price enterprises are willing to pay to evaluate models on their data.Read More

Normal view