• About
  • FAQ
  • Landing Page
Newsletter
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
  • Bitcoin
  • Ethereum
  • Regulation
  • Market
  • Blockchain
  • Business
  • Guide
  • Contact Us
No Result
View All Result
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
  • Bitcoin
  • Ethereum
  • Regulation
  • Market
  • Blockchain
  • Business
  • Guide
  • Contact Us
No Result
View All Result
No Result
View All Result
Home Business

Is AGI Here? Not Even Close, New AI Benchmark Suggests

admin by admin
March 26, 2026
in Business
0
Is AGI Here? Not Even Close, New AI Benchmark Suggests
189
SHARES
1.5k
VIEWS
Share on FacebookShare on Twitter


In brief

  • ARC-AGI-3 exposes a massive gap between AGI claims and reality, with top AI models scoring below 1% while humans achieve perfect performance.
  • The benchmark tests true generalization—requiring agents to explore, plan, and learn from scratch in unknown environments rather than recall trained patterns.
  • Despite industry hype, current AI systems remain far from AGI, lacking the reasoning and adaptability that even young humans display naturally.

Nvidia CEO Jensen Huang went on Lex Fridman’s podcast last week and said, plainly, “I think we’ve achieved AGI.” Two days later, the most rigorous test in AI research dropped its newest artificial general intelligence benchmark—and every frontier model scored below 1%.

Related articles

CoinShares Files for Bitcoin Volatility ETF Suite, Targeting BTC Price Swings

CoinShares Files for Bitcoin Volatility ETF Suite, Targeting BTC Price Swings

March 25, 2026
Russian Hacker Jailed for 81 Months Over $9M Ransomware Attacks

Russian Hacker Jailed for 81 Months Over $9M Ransomware Attacks

March 24, 2026

The ARC Prize Foundation released ARC-AGI-3 this week, and the results are brutal. Google’s Gemini 3.1 Pro led the pack at 0.37%. OpenAI’s GPT-5.4 came in at 0.26%. Anthropic’s Claude Opus 4.6 managed 0.25%, while xAI’s Grok-4.20 scored exactly zero. Humans, meanwhile, solved 100% of environments.

This isn’t a trivia test or coding exam, or even ultra-hard PhD-level questions. ARC-AGI-3 is something entirely different from anything the AI industry has faced before.

The benchmark was built by François Chollet and Mike Knoop’s foundation, which set up an in-house game studio and created 135 original interactive environments from scratch. The idea is to drop an AI agent into an unfamiliar game-like world with zero instructions, zero stated goals, and no description of the rules. The agent has to explore, figure out what it’s supposed to do, form a plan, and execute it.

If that sounds like something any five-year-old can do, you’re starting to understand the problem. If you want to see if you are better than AI, you can play the same games featured in the test by clicking on this link. We tried one; it was weird at first, but after a few seconds, you can easily get the hang of it.

It also is the clearest example of what the “G” in AGI stands for. When you generalize, you are able to create new knowledge (how a weird game works) without being trained on it in advance.

Previous versions of ARC tested static visual puzzles—show a pattern, predict the next one. They were hard at first. Then the labs threw compute power and training at them until the benchmarks were effectively dead. ARC-AGI-1, introduced in 2019, fell to test-time training and reasoning models. ARC-AGI-2 lasted about a year before Gemini 3.1 Pro hit 77.1%. The labs are very good at saturating benchmarks they can train against.

Version 3 was designed specifically to prevent that. With 110 of the 135 environments kept private—55 semi-private for API testing, 55 fully locked for competition—there’s no dataset to memorize. You can’t brute-force your way through novel game logic you’ve never seen.

Scoring isn’t pass/fail either. ARC-AGI-3 uses what the foundation calls RHAE—Relative Human Action Efficiency. The baseline is the second-best, first-run human performance. An AI that takes ten times as many actions as a human scores 1% for that level, not 10%. The formula squares the penalty for inefficiency. Wandering around, backtracking, and guessing your way to an answer gets punished hard.

The best AI agent in the month-long developer preview scored 12.58%. Frontier LLMs tested through the official API, with no custom tooling, couldn’t crack 1%. Ordinary humans solved all 135 environments with no prior training and no instructions. If that’s the bar, then the current crop of models isn’t clearing it.

There is one real methodological debate here. ARC’s report says a Duke-built custom harness pushed Claude Opus 4.6 from 0.25% to 97.1% on a single environment variant called TR87. That does not mean Claude scored 97.1% on ARC-AGI-3 overall; its official benchmark score remained 0.25%, but the shift is still worth noting.

The official benchmark feeds agents JSON code, not visuals. That’s either a methodological flaw or a demonstration that today’s models are better at processing human-friendly information than raw structured data. Chollet’s foundation has acknowledged the debate, but isn’t changing the format.

“Frame content perception and API format are not limiting factors for frontier model performance on ARC-AGI-3,” the paper reads. In other words, they seem to reject the idea that models fail because they “can’t see” the tasks properly, arguing instead that perception is already sufficient—and the real gap lies in reasoning and generalization.

The AGI reality check arrived during a week when the hype machine was running at full speed. Besides Huang’s comment, Arm named its new data center chip the “AGI CPU.” OpenAI’s Sam Altman has said they’ve “basically built AGI,” and Microsoft is already marketing a lab focused on building ASI: An evolution of what comes after AGI is achieved. The term is being stretched until it means whatever is commercially convenient, it appears.

Chollet’s position is simpler. If a normal human with no instructions can do it, and your system can’t, then you don’t have AGI—you have a very expensive autocomplete that needs a lot of help.

ARC Prize 2026 is offering $2 million across three competition tracks, all hosted on Kaggle. Every winning solution must be open-sourced. The clock is running, and right now, the machines aren’t even close.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.



Source link

Share76Tweet47

Related Posts

CoinShares Files for Bitcoin Volatility ETF Suite, Targeting BTC Price Swings

CoinShares Files for Bitcoin Volatility ETF Suite, Targeting BTC Price Swings

by admin
March 25, 2026
0

In brief CoinShares filed a post-effective amendment to register three ETFs tracking the CME CF Bitcoin Volatility Index. The funds—a...

Russian Hacker Jailed for 81 Months Over $9M Ransomware Attacks

Russian Hacker Jailed for 81 Months Over $9M Ransomware Attacks

by admin
March 24, 2026
0

In brief A U.S. court has sentenced Russian citizen Aleksei Volkov to 81 months in prison for his role in...

MrBeast Video Editor Fired From Beast Industries Following Kalshi Insider Trading Probe

Will MrBeast Push Crypto on Kids? Senator Warren Raises Alarm Over Banking App

by admin
March 23, 2026
0

In brief Sen. Elizabeth Warren urged Beast Industries to move cautiously as the firm created by MrBeast considers crypto for...

Man Pleads Guilty to Using AI to Generate $8 Million in Fraudulent Streaming Music Royalties

Man Pleads Guilty to Using AI to Generate $8 Million in Fraudulent Streaming Music Royalties

by admin
March 22, 2026
0

In brief A North Carolina man pleaded guilty to conspiracy tied to an AI-generated music streaming scheme. Prosecutors say fake...

Bitcoin Has Stabilized, But Investors Are Paying Up for Downside Protection: VanEck

Bitcoin Has Stabilized, But Investors Are Paying Up for Downside Protection: VanEck

by admin
March 21, 2026
0

In brief Bitcoin volatility is down, but data shows that traders are protecting against moves to the downside. The volume...

Load More
  • Trending
  • Comments
  • Latest
XRP price holds firm amid 30% volume spike

XRP price holds firm amid 30% volume spike

December 26, 2025
Lido DAO’s LDO price spikes as Arthur Hayes acquires 1.85M tokens

Lido DAO’s LDO price spikes as Arthur Hayes acquires 1.85M tokens

December 26, 2025
Solana Pullback Finds Purpose As Strong Hands Eye Accumulation Below $160

Solana Pullback Finds Purpose As Strong Hands Eye Accumulation Below $160

November 6, 2025
Bitcoin hashprice sinks to 2-year low as AI pivots split miners

Bitcoin hashprice sinks to 2-year low as AI pivots split miners

November 5, 2025

US Commodities Regulator Beefs Up Bitcoin Futures Review

0

Bitcoin Hits 2018 Low as Concerns Mount on Regulation, Viability

0

India: Bitcoin Prices Drop As Media Misinterprets Gov’s Regulation Speech

0

Bitcoin’s Main Rival Ethereum Hits A Fresh Record High: $425.55

0
XRP ETFs face first monthly outflow despite strong institutional support

XRP ETFs face first monthly outflow despite strong institutional support

March 27, 2026
Is AGI Here? Not Even Close, New AI Benchmark Suggests

Is AGI Here? Not Even Close, New AI Benchmark Suggests

March 26, 2026
Ethereum price drops below $2,200, but a bullish reversal is brewing

Ethereum price drops below $2,200, but a bullish reversal is brewing

March 26, 2026
Retail Investors Growing Exposed to Bitcoin Giant Strategy’s STRC Over MSTR, Says CEO

Retail Investors Growing Exposed to Bitcoin Giant Strategy’s STRC Over MSTR, Says CEO

March 26, 2026

Recent News

XRP ETFs face first monthly outflow despite strong institutional support

XRP ETFs face first monthly outflow despite strong institutional support

March 27, 2026
Is AGI Here? Not Even Close, New AI Benchmark Suggests

Is AGI Here? Not Even Close, New AI Benchmark Suggests

March 26, 2026

Categories

  • Bitcoin
  • Blockchain
  • Business
  • Ethereum
  • Guide
  • Market
  • Regulation
  • Ripple
  • Uncategorized
  • About
  • FAQ
  • Support Forum
  • Landing Page
  • Contact Us

© Copyright 2025 All Rights Reserved.

No Result
View All Result
  • Contact Us
  • Homepages
  • Business
  • Guide

© Copyright 2025 All Rights Reserved.