How an Independent Benchmark Team Turned 4-of-40 Models Passing Hard QA into a Majority Win by March 2026

https://milosinsightfulthoughtss.wpsuo.com/openai-s-cjr-benchmark-findings-what-the-data-actually-shows-about-news-source-hallucination-and-journalism-ai-accuracy

How an independent benchmarking lab discovered only 4 of 40 models beat coin flip on "hard" questions In late 2025, an independent benchmarking group (OpenBench Labs) published a reproducible evaluation showing that, on a 1,000-item "hard