The AI Productivity Paradox
Every executive deck quotes a single headline number for AI's productivity benefit — usually something round and positive. The actual research looks nothing like that. Across 7 peer-reviewed studies from 2025–2026, measured AI productivity effects range from −19% to +200% — depending almost entirely on occupation, AI tool, and worker experience. The question isn't “does AI make you more productive?” It's “under what conditions?”
Key finding: The most cited result — GitHub Copilot giving developers a +26% productivity boost — sits roughly in the middle of the distribution. It isn't wrong, but it isn't the whole picture either. Experienced open-source developers got slower with AI. Customer support agents got moderately faster. Accountants and marketers saw the biggest gains. The one consistent pattern: the effect size depends more on the task structure than on the AI model.
Measured AI productivity effects by study
% change per studyThe same technology registers as a 19% slowdown for experienced open-source developers and a tripling of output for authors. Hover any study for the metric, the AI tool used, and which cohort benefited most.
Why the range is so wide
Gains are strongest when work can be divided into well-defined, repeatable tasks with clear quality monitoring. Judgment-heavy work shows weaker or negative effects.
Three studies found the largest gains for junior and less-experienced workers. Senior workers sometimes slowed down — their existing shortcuts were faster than AI-assisted alternatives.
AI coding assistance for library-learning showed zero net benefit; AI for ad creation showed 50% gains. Same underlying models, different task-tool matches.
Why this matters for AI job risk
The productivity data reshapes how to read displacement risk. Two occupations with identical Capability Coverage Index scores can face very different labor-market pressure depending on which side of the productivity-paradox line they sit on. If your work is in a +55% productivity zone (structured, repeatable, clear quality signals), AI is a leverage multiplier for your current role — but also the most substitutable version of your work. If your work is in a 0% or −19% zone (judgment-heavy, context-rich), AI is closer to a rounding error in your day.
Several of the most-quoted “AI will replace X” predictions rely on the +50% / +100% end of this distribution. Our scoring methodology weighs the whole range — productivity gains are evidence that AI can do the task, but they're not evidence that AI should do it alone. The Want vs. Get insight shows the other axis: where workers actually want that productivity lift.
One finding worth flagging: METR's −19% result was published in 2025 and hasn't been replicated in their 2026 follow-up, largely because developers in 2026 are reluctant to work without AI at all — which is itself a signal about how the baseline has shifted.
Sources
- Underlying studies: Becker et al. (2025) — Model Evaluation & Threat Research (METR); Shen & Tamkin (2025); Brynjolfsson, Li & Raymond (2025); Cui, Demirer, Jaffe, Musolff, Peng & Salz (2025); Ju & Aral (2025); Choi & Xie (2025); Reimers & Waldfogel (2026).
- The +200% value for Reimers & Waldfogel is the “releases tripled” measure (output volume), not a quality-adjusted productivity number — the paper notes quality held for pre-AI authors and fell for new entrants. The +14%–15% range for Brynjolfsson et al. is simplified in the chart as 14% for visual comparison.
- Macro-level productivity studies (e.g., Aldasoro et al. 2026 on 12,000 European firms; OECD projections; Penn Wharton TFP modeling) generally report smaller effect sizes (0.2–4%) and are not included in this chart.