03 13 March 2026

The wrong optimisation

Most AI strategies optimise for replacing human expertise. The higher-value target is accelerating how people develop expertise. When judgment is what makes AI useful, the real bottleneck is learning, not automation.

The AI deployment playbook at most companies follows a recognisable pattern. Identify a task performed by a human. Build or buy an AI system that performs it cheaper. Measure the cost reduction. Report the productivity gain. Repeat.

This logic is coherent and it is also self-defeating. Because the thing that makes AI useful in complex domains is precisely the human expertise it is being deployed to replace.

I wrote previously about what happened when Matthew Schwartz, a theoretical physicist at Harvard, used Claude to produce a contribution to quantum field theory. The paper was real physics. But Schwartz did not hand the AI a prompt and walk away. He directed every step, evaluated every output, and caught errors that would have been invisible to anyone without decades of experience in precision QFT calculations. The AI amplified his expertise. It did not substitute for it.

The research since then has only sharpened the point. The Harvard/BCG study on consultants found that AI compressed the performance gap between top and bottom performers. The Stanford/MIT study on customer support agents found the biggest gains went to novices. Both results are real and both are commonly misread. Closing the gap between novice and expert output is not the same as making the novice into an expert. It is making the novice's output look like an expert's. The difference matters when the task requires judgment, and most tasks that matter do require judgment.

The IE Business School analysis identified a "danger zone" where novices use AI for judgment-based tasks they cannot verify. A Cornell study of scientific publishing found that researchers using AI produced dramatically more papers, but those papers were less likely to survive peer review. Good writing was masking weak thinking. This is what happens when you optimise for output without investing in the capacity to evaluate it.

The self-defeating loop

Here is the problem stated plainly. AI-human collaboration produces the highest value when the human brings deep domain expertise. The current deployment model systematically erodes the pipeline that produces domain expertise. The industry is undermining the input it depends on.

This is not a hypothetical future risk. The OECD has already flagged that heavy AI reliance is associated with reduced independent thinking and weakened long-term skill retention. The METR study found that experienced open-source developers actually got 19% slower when using AI tools, partly because the AI lacked the project-specific context the developers had built over years. That context did not appear spontaneously. It came from doing the work. From making mistakes, encountering edge cases, and building mental models that no documentation fully captures.

If you automate away the process through which people build that context, you do not get a more efficient system. You get a system that looks efficient until the moment it encounters something the AI cannot handle, at which point you discover that nobody in the room has the judgment to intervene. The World Economic Forum estimates 120 million workers are at medium-term risk of redundancy because they are unlikely to receive the reskilling they actually need. The investment is flowing toward replacing people, not toward making them more capable.

The reframe

The alternative is not to slow AI deployment. It is to point it at a different target. Instead of treating AI as a replacement for human knowledge, treat it as learning infrastructure. Instead of optimising for the automation of expertise, optimise for the acceleration of expertise.

This is not a soft argument. The evidence for it is concrete.

A randomised controlled trial at Harvard, published in Scientific Reports in 2025, tested an AI tutoring system against active in-class learning for undergraduate physics. Students using the AI tutor learned more than twice as much in less time. The effect size was between 0.73 and 1.3 standard deviations, which in educational research is enormous. The AI tutor was not doing the physics for the students. It was adapting in real time to where each student was struggling, providing immediate feedback, and adjusting the difficulty of problems based on performance. It was compressing the feedback loop that normally takes weeks (submit assignment, wait for grading, review corrections) into minutes.

This is the same principle that made flight simulators transformative for aviation training and later for medical education. Simulation-based training consistently outperforms traditional apprenticeship models because it creates conditions for deliberate practice: repeated exposure to specific scenarios, immediate corrective feedback, progressive difficulty. The research on deliberate practice, going back to Ericsson's work, is clear that expertise develops through structured repetition with targeted feedback, not through passive exposure or time served. AI can generate those conditions at a scale and specificity that human instruction cannot match.

Consider what this looks like applied to the kinds of domains where AI-human collaboration matters most. A junior analyst learning to evaluate company financials could be presented with hundreds of realistic scenarios, each calibrated to her current skill level, with AI-generated feedback explaining not just what the right answer is but why her reasoning went wrong at a specific step. A medical resident could encounter rare diagnostic presentations that she might otherwise not see for years, with the AI acting as a patient simulator that responds dynamically to her clinical decisions. A junior engineer could debug progressively more complex systems, with the AI introducing failure modes that compress five years of on-call experience into five months of structured practice.

None of this is speculative. The component technologies exist. The Harvard study demonstrates the learning gains. Medical simulation research demonstrates the skill transfer. What is missing is the strategic intent. Almost nobody is building this because the incentive structure points elsewhere.

The incentive problem

The reason companies optimise for replacement over learning is simple. Replacement is easy to measure. You had five people doing this job. Now you have one person and an AI. The cost saving appears on next quarter's balance sheet. The ROI calculation fits on a slide.

Learning is harder to measure. You invested in an AI-powered training system. Your junior employees developed expertise 40% faster. The value of that shows up in better decisions three years from now, in reduced error rates that are difficult to attribute, in institutional knowledge that prevents failures you never see because they were averted by someone who knew what to look for. The CFO does not get excited about costs you did not incur.

This is the classic problem of preventive investment, and it is compounded by time horizons. The average tenure of a CEO at a large company is around five years. The payoff from accelerating expertise development operates on a longer cycle than most executives will be around to claim credit for. The payoff from cutting headcount is immediate and legible.

But the economic case for the learning model is stronger than it appears, if you are willing to think past the quarterly report. The World Economic Forum projects that AI augmentation could add between $4.8 trillion and $6.6 trillion to the US economy by 2034, but that number is contingent on broad augmentation adoption, which is contingent on having a workforce capable of working alongside AI effectively. Workers with AI skills already command wage premiums of up to 56%. The value is in the human capability, not in the AI alone. Companies that hollow out their expertise base to save money now are spending down an asset whose replacement cost they have not priced.

Two models

The "automate the human out" model treats expertise as a cost centre. It works for routine, codifiable tasks where the human contribution is mechanical. It fails for anything that requires judgment, because judgment is the one thing AI cannot reliably supply and cannot be reliably verified without domain knowledge.

The "build a better human" model treats expertise as an asset to be grown. It uses AI to compress the learning curve: faster feedback, more varied practice, exposure to edge cases that would otherwise take years to encounter naturally. The output is not a cheaper workforce. It is a more capable one, which then extracts more value from AI tools because it has the judgment to direct them.

Schwartz did not become a physicist who could supervise AI by using AI. He became one by spending decades doing hard physics. The question for the next generation is whether AI can compress that journey without hollowing it out. Whether you can use simulation, adaptive feedback, and structured deliberate practice to produce in five years the judgment that currently takes fifteen.

The aviation industry answered a version of this question decades ago. Flight simulators did not replace the need for pilots to develop expertise. They accelerated it. A pilot in a simulator encounters more emergency scenarios in a month than she might see in a career of real flights. The simulator does not skip the learning. It concentrates it.

AI can do the same thing for cognitive expertise, if we build it to. The technology is ready. The research supports it. The obstacle is strategic. The industry is optimising for the wrong metric because the right metric is harder to measure and slower to pay off.

That will not change until someone demonstrates, at scale, that accelerating expertise produces more durable value than replacing it. The physics is clear. The car amplifies the driver. If you stop training drivers, the cars become expensive ornaments. The question is whether we figure this out before or after we have already thinned the pipeline to the point where the damage is difficult to reverse.

← Back to Writing