Normal view

Received before yesterday

College student’s “time travel” AI experiment accidentally outputs real 1834 history

22 August 2025 at 22:13

A hobbyist developer building AI language models that speak Victorian-era English "just for fun" got an unexpected history lesson this week when his latest creation mentioned real protests from 1834 London—events the developer didn't know had actually happened until he Googled them.

"I was interested to see if a protest had actually occurred in 1834 London and it really did happen," wrote Reddit user Hayk Grigorian, who is a computer science student at Muhlenberg College in Pennsylvania.

For the past month, Grigorian has been developing what he calls TimeCapsuleLLM, a small AI language model (like a pint-sized distant cousin to ChatGPT) which has been trained entirely on texts from 1800–1875 London. Grigorian wants to capture an authentic Victorian voice in the AI model's outputs. As a result, the AI model ends up spitting out text that's heavy with biblical references and period-appropriate rhetorical excess.

Read full article

Comments

© pictore via Getty Images

Reddit blocks Internet Archive to end sneaky AI scraping

11 August 2025 at 19:53

Reddit is now blocking the Internet Archive (IA) from indexing popular Reddit threads after allegedly catching sneaky AI firms—restricted from scraping Reddit—instead simply scraping data from IA's archived content.

Where before IA's Wayback Machine dependably archived Reddit pages, profiles, and comments—as part of its mission to archive the Internet—moving forward, only screenshots of the Reddit homepage will be archived. As The Verge noted, this means the archive will only be useful as a snapshot of popular posts and news headlines each day, rather than providing a backup documenting deleted posts or a window into various Reddit subcultures or any given user's activity.

Reddit has not confirmed which AI firms were scraping its data from the Wayback Machine. The company's spokesperson, Tim Rathschmidt, would only confirm to Ars that Reddit has become "aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine."

Read full article

Comments

© Cheng Xin / Contributor | Getty Images News

Cloudflare wants Google to change its AI search crawling. Google likely won’t.

9 July 2025 at 21:00

After Cloudflare started testing new features that would allow websites to block AI crawlers or require payment for scraping, the tech company immediately faced questions over the logistics of the plan.

In particular, website owners and SEO experts wanted to know how Cloudflare planned to block Google's bot from scraping sites to fuel AI overviews without risking blocking the same bot from crawling for valuable search engine placements.

Last week, a travel blogger raised questions about the blocking and so-called pay-per-crawl features and pushed Cloudflare CEO Matthew Prince to respond on X (formerly Twitter):

Read full article

Comments

© Sundry Photography | iStock Editorial / Getty Images Plus

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books

26 June 2025 at 20:46

Now that Meta has largely beaten an AI training copyright lawsuit raised by 13 book authors—including comedian Sarah Silverman and Pulitzer Prize-winning author Junot Diaz—the only matter left to settle in that case is whether Meta violated copyright laws by torrenting books used to train Llama models.

In an order that partly grants Meta's motion for summary judgment, judge Vince Chhabria confirmed that Meta and the authors would meet on July 11 to "discuss how to proceed on the plaintiffs’ separate claim that Meta unlawfully distributed their protected works during the torrenting process."

Chhabria's order suggested that authors may struggle to win this part of the fight, too, due to a lack of evidence, as there has not yet been much discovery on this issue that was raised so late in the case. But he also warned that Meta was wrong to argue its torrenting was completely "irrelevant" to whether its copying of books was fair use.

Read full article

Comments

© VectorUp | iStock / Getty Images Plus

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit

4 April 2025 at 21:19

After The New York Times sued OpenAI in December 2023—alleging that ChatGPT outputs violate copyrights by regurgitating news articles—the ChatGPT maker tried and failed to argue that the claims were time-barred.

According to OpenAI, the NYT should have known that ChatGPT was being trained on its articles and raised its lawsuit in 2020, partly because of the newspaper's own reporting. To support this, OpenAI pointed to a single November 2020 article, where the NYT reported that OpenAI was analyzing a trillion words on the Internet. But on Friday, US district judge Sidney Stein disagreed, denying OpenAI's motion to dismiss the NYT's copyright claims partly based on one NYT journalist's reporting.

In his opinion, Stein confirmed that it's OpenAI's burden to prove that the NYT knew that ChatGPT would potentially violate its copyrights two years prior to its release in November 2022. And so far, OpenAI has not met that burden.

Read full article

Comments

© gmast3r | iStock / Getty Images Plus

❌