AI moves fast. Here's what we're reading this week.
DeepSeek-V3 and Qwen-3-Math are now within 4 percentage points of GPT-4 on MATH-500. The gap on coding tasks remains wider, but the trajectory has analysts revising 2026 forecasts.
OpenAI's browser-using agent product hit 53% on a curated set of real-world tasks in independent testing. The remaining failures cluster around payment forms and CAPTCHAs — both, notably, by design.