News

The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems. Find out how OpenAI’s o3 and ...
These deals didn’t last forever, and neither can free AI. The Millennial lifestyle subsidy eventually came crashing down as ...
Historically, each new generation of OpenAI's models has delivered incremental improvements in factual accuracy, with ...
Through the Pioneers Program, OpenAI hopes to create benchmarks for specific domains like legal, finance, insurance, healthcare, and accounting. The lab says that, in the coming months, it’ll work ...
OpenAI claims the full GPT-4.1 model outperforms its GPT-4o and GPT-4o mini models on coding benchmarks, including SWE-bench.
The reasoning systems are based on a technology called large language models, or L.L.M.s. To build reasoning systems, ...
On Wednesday, OpenAI announced the release of two new models—o3 and o4-mini—that combine simulated reasoning capabilities ...
Benchmark performance results typically accompany the launch of every new AI model to showcase how well the models can ...
OpenAI released its newest AI model and said it can understand uploaded images like whiteboards, sketches and diagrams, even ...
OpenAI launches groundbreaking o3 and o4-mini AI models that can manipulate and reason with images, representing a major ...
OpenAI thinks AI benchmarks are broken. Now the company is launching a program to fix how AI models are scored. The new OpenAI Pioneers Program will focus on creating evaluations for AI models ...
CEO Sam Altman joined the banter this week, writing in ... most rigorous safety program to date" and linked to its "Preparedness framework" updated earlier this week. OpenAI has come under fire ...