There’s a study most people in business haven’t read. It should change how they think about every AI investment they’re making.
Published in Nature Human Behaviour in 2024, it’s a meta-analysis of 106 experimental studies covering human-AI collaboration. Researchers at MIT looked at what happens when you combine human judgment with AI capability across hundreds of different tasks and contexts. The sample is enormous. The methodology is rigorous. And the finding is uncomfortable.
On average, combining humans and AI produced worse outcomes than the best of either working alone.
Not marginally worse. Measurably, statistically significantly worse. The combination — the thing every vendor pitch deck promises will multiply your team’s capability — actually degraded performance compared to letting either the human or the AI handle it independently.
Sit with that for a second. The default result of human-AI collaboration isn’t “better together.” It’s “worse together.”
Now, the average is only half the story. The variance across those 106 studies was enormous. Some human-AI teams dramatically outperformed. Others collapsed. The average masks what’s actually a bimodal distribution: a lot of teams getting poor results and a smaller number getting extraordinary ones.
That’s the finding that matters. Not that human-AI collaboration fails, but that it usually fails and sometimes succeeds spectacularly, and the difference has almost nothing to do with the AI.
The researchers identified the critical variable: whether the humans could accurately assess when AI was adding value and when it wasn’t. Teams that could calibrate — that knew when to lean on the AI and when to override it — produced outsized results. Teams that couldn’t calibrate either deferred to the AI when they shouldn’t have or rejected it when they shouldn’t have. Both produced the same outcome. Wasted potential.
This maps to something I see in every client engagement, though usually with marketing tools rather than AI specifically. The company buys the platform. The team gets trained. And then one of two things happens: they either trust it completely and stop thinking, or they fight it on every point and never let it work. Both responses feel rational in the moment. Both produce the same result — the tool’s potential, unused.
The AI version of this pattern is playing out at scale right now. A survey of enterprise AI adoption from BCG found that usage is up but impact is not. Organizations report widespread AI adoption and disappointing returns. Employees say they’re more productive. The operational metrics say otherwise.
Another study tracked over 10,000 software developers across more than a thousand teams. Developers using AI coding tools completed 21% more tasks. They also introduced 9% more bugs per person and produced pull requests 154% larger. Review time increased 91%. The perception: “I’m shipping faster.” The measurement: more volume, worse quality, shifted bottleneck.
The pattern is consistent. People believe AI is helping. The numbers say it’s more complicated than that.
None of this means AI is useless. It means the popular narrative — “just adopt AI and productivity goes up” — is wrong as a generalization. For some people, in some contexts, with a specific kind of engagement, AI produces genuine value. For most, it produces the feeling of value without the substance.
The difference isn’t the tool. The difference is what the human brings to the collaboration and whether they can accurately perceive what’s working and what isn’t.
That second part — accurate perception — turns out to be the harder problem. A study of 16 experienced software developers found they were 19% slower with AI tools while estimating they were 20% faster. A 39-percentage-point gap between perception and reality. And a separate study found that higher AI literacy correlated with worse self-assessment accuracy. The more people knew about AI, the worse they were at judging whether it was helping them.
You can’t fix a problem you can’t feel. That’s what makes this finding dangerous for organizations investing heavily in AI without measuring what’s actually happening in the collaboration.
The question isn’t whether your team should use AI. It’s whether anyone has honestly measured what happens when they do.