Ars Technica
Two major AI coding tools wiped out user data after making cascading mistakes 24 July 2025 at 21:01

Two major AI coding tools wiped out user data after making cascading mistakes

24 July 2025 at 21:01

New types of AI coding assistants promise to let anyone build software by typing commands in plain English. But when these tools generate incorrect internal representations of what's happening on your computer, the results can be catastrophic.

Two recent incidents involving AI coding assistants put a spotlight on risks in the emerging field of "vibe coding"—using natural language to generate and execute code through AI models without paying close attention to how the code works under the hood. In one case, Google's Gemini CLI destroyed user files while attempting to reorganize them. In another, Replit's AI coding service deleted a production database despite explicit instructions not to modify code.

The Gemini CLI incident unfolded when a product manager experimenting with Google's command-line tool watched the AI model execute file operations that destroyed data while attempting to reorganize folders. The destruction occurred through a series of move commands targeting a directory that never existed.

Read full article

Comments

Ars Technica
ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows 17 July 2025 at 20:41

ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows

Ars Technica

By:Benj Edwards

17 July 2025 at 20:41

On Thursday, OpenAI launched ChatGPT Agent, a new feature that lets the company's AI assistant complete multi-step tasks by controlling its own web browser. The update merges capabilities from OpenAI's earlier Operator tool and the Deep Research feature, allowing ChatGPT to navigate websites, run code, and create documents while users maintain control over the process.

The feature marks OpenAI's latest entry into what the tech industry calls "agentic AI"—systems that can take autonomous multi-step actions on behalf of the user. OpenAI says users can ask Agent to handle requests like assembling and purchasing a clothing outfit for a particular occasion, creating PowerPoint slide decks, planning meals, or updating financial spreadsheets with new data.

The system uses a combination of web browsers, terminal access, and API connections to complete these tasks, including "ChatGPT Connectors" that integrate with apps like Gmail and GitHub.

Read full article

Comments

VentureBeat
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries 1 July 2025 at 00:05

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

VentureBeat

By:Emilia David

1 July 2025 at 00:05

Emilia David, Matthias Keller, Ramana Thumu

Planning a trip may soon be more agentic as companies like Kayak and Expedia reimagine the travel agent as an AI agent.Read More

Ars Technica
The résumé is dying, and AI is holding the smoking gun 24 June 2025 at 17:25

The résumé is dying, and AI is holding the smoking gun

Ars Technica

By:Benj Edwards

24 June 2025 at 17:25

Employers are drowning in AI-generated job applications, with LinkedIn now processing 11,000 submissions per minute—a 45 percent surge from last year, according to new data reported by The New York Times.

Due to AI, the traditional hiring process has become overwhelmed with automated noise. It's the résumé equivalent of AI slop—call it "hiring slop," perhaps—that currently haunts social media and the web with sensational pictures and misleading information. The flood of ChatGPT-crafted résumés and bot-submitted applications has created an arms race between job seekers and employers, with both sides deploying increasingly sophisticated AI tools in a bot-versus-bot standoff that is quickly spiraling out of control.

The Times illustrates the scale of the problem with the story of an HR consultant named Katie Tanner, who was so inundated with over 1,200 applications for a single remote role that she had to remove the post entirely and was still sorting through the applications three months later.

Read full article

Comments

Ars Technica
Anthropic releases custom AI chatbot for classified spy work 6 June 2025 at 21:12

Anthropic releases custom AI chatbot for classified spy work

Ars Technica

By:Benj Edwards

6 June 2025 at 21:12

On Thursday, Anthropic unveiled specialized AI models designed for US national security customers. The company released "Claude Gov" models that were built in response to direct feedback from government clients to handle operations such as strategic planning, intelligence analysis, and operational support. The custom models reportedly already serve US national security agencies, with access restricted to those working in classified environments.

The Claude Gov models differ from Anthropic's consumer and enterprise offerings, also called Claude, in several ways. They reportedly handle classified material, "refuse less" when engaging with classified information, and are customized to handle intelligence and defense documents. The models also feature what Anthropic calls "enhanced proficiency" in languages and dialects critical to national security operations.

Anthropic says the new models underwent the same "safety testing" as all Claude models. The company has been pursuing government contracts as it seeks reliable revenue sources, partnering with Palantir and Amazon Web Services in November to sell AI tools to defense customers.

Read full article

Comments

Ars Technica
New Claude 4 AI model refactored code for 7 hours straight 22 May 2025 at 16:45

New Claude 4 AI model refactored code for 7 hours straight

Ars Technica

By:Benj Edwards

22 May 2025 at 16:45

On Thursday, Anthropic released Claude Opus 4 and Claude Sonnet 4, marking the company's return to larger model releases after primarily focusing on mid-range Sonnet variants since June of last year. The new models represent what the company calls its most capable coding models yet, with Opus 4 designed for complex, long-running tasks that can operate autonomously for hours.

Alex Albert, Anthropic's head of Claude Relations, told Ars Technica that the company chose to revive the Opus line because of growing demand for agentic AI applications. "Across all the companies out there that are building things, there's a really large wave of these agentic applications springing up, and a very high demand and premium being placed on intelligence," Albert said. "I think Opus is going to fit that groove perfectly."

Before we go further, a brief refresher on Claude's three AI model "size" names (introduced in March 2024) is probably warranted. Haiku, Sonnet, and Opus offer a tradeoff between price (in the API), speed, and capability.

Read full article

Comments

Ars Technica
AI use damages professional reputation, study suggests 8 May 2025 at 20:23

AI use damages professional reputation, study suggests

Ars Technica

By:Benj Edwards

8 May 2025 at 20:23

Using AI can be a double-edged sword, according to new research from Duke University. While generative AI tools may boost productivity for some, they might also secretly damage your professional reputation.

On Thursday, the Proceedings of the National Academy of Sciences (PNAS) published a study showing that employees who use AI tools like ChatGPT, Claude, and Gemini at work face negative judgments about their competence and motivation from colleagues and managers.

"Our findings reveal a dilemma for people considering adopting AI tools: Although AI can enhance productivity, its use carries social costs," write researchers Jessica A. Reif, Richard P. Larrick, and Jack B. Soll of Duke's Fuqua School of Business.

Read full article

Comments

Ars Technica
OpenAI releases new simulated reasoning models with full tool access 16 April 2025 at 22:21

OpenAI releases new simulated reasoning models with full tool access

Ars Technica

By:Benj Edwards

16 April 2025 at 22:21

On Wednesday, OpenAI announced the release of two new models—o3 and o4-mini—that combine simulated reasoning capabilities with access to functions like web browsing and coding. These models mark the first time OpenAI's reasoning-focused models can use every ChatGPT tool simultaneously, including visual analysis and image generation.

OpenAI announced o3 in December, and until now, only less capable derivative models named "o3-mini" and "03-mini-high" have been available. However, the new models replace their predecessors—o1 and o3-mini.

OpenAI is rolling out access today for ChatGPT Plus, Pro, and Team users, with Enterprise and Edu customers gaining access next week. Free users can try o4-mini by selecting the "Think" option before submitting queries. OpenAI CEO Sam Altman tweeted that "we expect to release o3-pro to the pro tier in a few weeks."

Read full article

Comments

Ars Technica
After months of user complaints, Anthropic debuts new $200/month AI plan 9 April 2025 at 19:20

After months of user complaints, Anthropic debuts new $200/month AI plan

Ars Technica

By:Benj Edwards

9 April 2025 at 19:20

On Wednesday, Anthropic introduced a new $100- to $200-per-month subscription tier called Claude Max that offers expanded usage limits for its Claude AI assistant. The new plan arrives after many existing Claude subscribers complained of hitting rate limits frequently.

"The top request from our most active users has been expanded Claude access," wrote Anthropic in a news release. A brief stroll through user feedback on Reddit seems to confirm that sentiment, showing that many Claude users have been unhappy with Anthropic's usage limits over the past year—even on the Claude Pro plan, which costs $20 a month.

One of the downsides of a relatively large context window with Claude (the amount of text it can process at once) has been that long conversations or inclusions of many reference documents (such as code files) fill up usage limits quickly. That's because each time the user adds to the conversation, the entire text of the conversation (including any attached documents) is fed back into the AI model again and re-evaluated. But on the other hand, a large context window allows Claude to process more complex projects within each session.

Read full article

Comments

Normal view