DeepMind's new AI Policy paper quietly inverts a decade of framing about what science is missing. The hardware-talent-compute narrative was always the easier story. The actual bottleneck sits in institutional permafrost. Forty years of fusion data from JET, functionally inaccessible. Only ~10 fusion facilities worldwide. Validated training sets in the hundreds or low thousands of experimental shots. Releasing JET's archives requires consent from a stakeholder list nobody has bothered to assemble in twenty years. Simulation codes still run for weeks on supercomputers.
While the public science estate freezes at the data layer, $2.5B+ has flowed to 30+ private fusion companies in the last cycle. Capital moves. Data does not.
The paper names three debts. First, technical. Infra for data collection and curation has been under-funded relative to hardware for decades. Second, bureaucratic. Complex ownership webs and divergent open-data policies stall every release. Third, human. Too few software engineers and data scientists exist to clean, validate and curate. Postwar Big Science was built for an era when instruments and PhDs were the rate-limiting step. The AI era inverts the constraint. The new bottleneck is access to the exhaust of past instruments and the labor to make it legible.
This is Hayek's knowledge problem rewritten for physics. Information is decentralized, tacit, lives in facility-specific formats, sits behind consent regimes designed for analog publication. The system that produced the data cannot publish it without negotiating with itself.
The proposal is deliberately modest. Run "AI Data Stocktakes" per domain, expert audits that map gaps and convert them into fundable projects. Open-source 30% of JET data by 2028. Fund disruption-prediction competitions. Build a scientific data curation platform. Use AI agents to preserve expert knowledge encoded in legacy simulation codes.
Two readings collide here. The austrian read sees an inventory exercise to make existing capital stock legible to a new productive technology. The accelerationist read sees a slow public sector negotiating with a private compute frontier that already absorbed every available token. The friction between those two readings will eat most of the next decade's science policy.
The ITER reactor and the frontier training cluster are not the same machine. Each pretends the other does not exist. The stocktake is what happens when the pretense breaks.
https://www.aipolicyperspectives.com/p/science-needs-ai-data-stocktakes