Hendrycks worked with Scale AI, an AI company where he is an adviser, to compile the test, which consists of roughly 3,000 multiple-choice and short answer questions designed to test AI systems’ abilities in areas including analytic philosophy and rocket engineering. — ©2025 The New York Times Company
SAN FRANCISCO: If you’re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the world are struggling to create tests that AI systems can’t pass.
For years, AI systems were measured by giving new models a variety of standardised benchmark tests. Many of these tests consisted of challenging, SAT-caliber problems in areas like math, science and logic. Comparing the models’ scores over time served as a rough measure of AI progress.
