Popular AI model performance benchmark may be flawed, Meta researchers warn


'We've identified multiple loopholes with SWE-bench Verified,' the manager at Meta Platforms' AI research lab Fair says. — SCMP

A popular benchmark for measuring the performance of artificial intelligence models could be flawed, a group of Meta Platforms researchers warned, raising fresh questions on the veracity of evaluations that have been made on major AI systems.

“We’ve identified multiple loopholes with SWE-bench Verified,” wrote Jacob Kahn, manager at Meta AI research lab Fair, in a post last week on the developer platform GitHub.

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

Windows running slow? Microsoft’s 11 quick fixes to speed up your PC
Meta to let users in EU 'share less personal data' for targeted ads
Drowning in pics? Tidy your Mac library with a few clicks
Flying taxis to take people to London airports in minutes from 2028
Smartphone on your kid’s Christmas list? How to know when they’re ready.
A woman's Waymo rolled up with a stunning surprise: A man hiding in the trunk
A safety report card ranks AI company efforts to protect humanity
Bitcoin hoarding company Strategy remains in Nasdaq 100
Opinion: Everyone complains about 'AI slop,' but no one can define it
Google faces $129 million French asset freeze after Russian ruling, documents show

Others Also Read