Not All Tokens Are Equal
How cheap energy and cheap intelligence will reshape the physical economy.
Cheap energy created the industrial economy. Cheap intelligence will create something far bigger. But intelligence is not magic. It has to be produced somewhere, by machines that consume power, capital, cooling, land, chips, networks, and human coordination.
For the last decade, AI has mostly been discussed as software. That era is ending. Intelligence is becoming an industrial output.
The basic unit of that output is the token: the fragment of text, code, image, sound, action, or reasoning step that an AI system consumes or produces. Tokens are often priced like a cloud API, but they behave more like manufactured goods. They are made in physical places, under physical constraints, with very different economics depending on what they are used for.
The mistake is to talk about "AI compute" as one market. Not all tokens are equal. A background summarization token, a real-time voice token, a frontier training token, and an autonomous-agent token do not have the same value, latency requirement, power profile, or geography.
A five-part map of the token economy
The old split, training versus inference, is useful but incomplete. Training explains how models are created. Inference explains how they are used. But the next phase of AI infrastructure will be shaped by workload economics: latency, value density, schedulability, locality, reliability, and energy cost.
A better map has five layers.
| Token workload | What it does | Primary constraint | Likely location |
|---|---|---|---|
| Capability creation | Pretraining, post-training, evals, synthetic data | Scale, networking, power, secure supply | Gigawatt-scale hyperscale campuses |
| Commodity batch | Summaries, translation, embeddings, offline processing | Lowest cost per useful token | Cheap-power regions, off-peak capacity |
| Premium interactive | Chat, coding, AI search, voice, support | Latency, uptime, perceived quality | Metro and regional inference clusters |
| Agentic outcome | Research agents, coding agents, workflow automation | Reliability, orchestration, memory, tool use | Distributed always-on inference fleets |
| Local / autonomous | Phones, vehicles, robots, homes, factories, consoles | Privacy, autonomy, bandwidth, milliseconds | Edge devices and local micro-infrastructure |
Capability-creation tokens are the least like a normal commodity. They are inputs into model progress. They require vast clusters, elite engineering teams, advanced networking, and enormous power. These workloads will concentrate in strategic locations: the United States, allied markets with abundant power, the Gulf states, Canada, the Nordics, India, and China's parallel domestic AI stack.
Commodity batch tokens are different. They can wait. They can run overnight. They can move to wherever electrons, cooling, and interconnection are cheapest. This is where AI starts to look like a flexible industrial load. If a task can be delayed by minutes or hours, it can chase curtailed renewables, hydro, nuclear, cheap gas, or underutilized capacity.
Premium interactive tokens are the opposite. When a human is waiting for an answer, milliseconds matter. AI search, voice assistants, coding copilots, customer support, and real-time collaboration need regional infrastructure near users. These tokens may cost more, but they are also worth more because they shape user experience.
Agentic outcome tokens may become the largest and strangest market. A chatbot answers a prompt. An agent performs a job. It plans, searches, calls tools, retries, verifies, stores memory, asks for help, and reports back. One user request can trigger thousands of model calls. In this market the unit of value is not the token. It is the completed task.
Local and autonomous tokens will grow because not everything should travel to a cloud data center. Phones, PCs, vehicles, drones, robots, hospitals, factories, gaming consoles, and homes will all generate local intelligence. Some of it will be small and cheap. Some of it will be mission-critical. A vehicle cannot wait for the cloud to decide whether to brake. A home energy system should not need a distant GPU cluster to decide when to charge a battery.
The council lens
Three frames help keep the analysis honest.
Jensen Huang's "AI factory" metaphor is the right starting point. The modern AI data center is not a generic server farm. It is a factory that turns electricity, data, chips, cooling, and software into intelligence. The output is not steel or chemicals. It is tokens.
Ben Thompson's key insight is that inference changes the business model. Training created the frontier models, but inference is where users, workflows, platforms, and profits show up. The companies that own demand, distribution, operating systems, developer interfaces, and workflow integration may capture more value than the companies that merely produce raw tokens.
Jigar Shah's lens keeps the story honest. Power is not an abstraction. The bottlenecks are interconnection queues, financing, substations, transformers, permitting, transmission, firm capacity, and local politics. AI data centers can either stress the grid or help finance a better one. The difference depends on whether AI load becomes flexible, additional, and aligned with new generation.
Why geography changes by token type
Cheap electricity matters, but it does not explain everything.
The cheapest token is not always the most valuable token. A token that arrives too late, violates data residency, fails in a workflow, or cannot be trusted may be worthless. The valuable token is delivered at the right latency, reliability, capability, compliance level, and price.
That means AI infrastructure will not become one giant centralized cloud. It will become a portfolio.
| Layer | Looks like | Drives location |
|---|---|---|
| Frontier training | Strategic industrial assets — aluminum smelters, chip fabs, military bases | Power, land, networking, security, capital, political support |
| Batch inference | Commodity manufacturing chasing cheap electrons and high utilization | Curtailed renewables, off-peak capacity, low-cost interconnection |
| Interactive inference | Follows people. Metro and regional clusters near users | Response time, data residency, network density |
| Agentic compute | Follows workflows. Distributed fleets with memory and tool access | Reliability, orchestration, identity, observability |
| Edge intelligence | Follows machines. Phones, cars, factories, homes, consoles | Privacy, autonomy, bandwidth, milliseconds |
Frontier training clusters will look like strategic industrial assets. They will locate where power, land, networking, security, capital, and political support line up. These campuses may increasingly resemble aluminum smelters, chip fabs, or military bases: large, energy-intensive, and nationally important.
Batch inference will look more like commodity manufacturing. It will chase cheap electrons and high utilization. If inference costs keep falling, more workloads will become economically possible, creating the classic rebound effect: cheaper tokens may reduce unit energy use while increasing total demand.
Interactive inference will follow people. The constraint is not just power. It is response time. AI search in Tokyo, voice agents in Los Angeles, coding copilots in Denver, and medical AI in London need infrastructure close enough to feel instant and reliable.
Agentic compute will follow workflows. Some agents will need low latency. Others can run in the background. The infrastructure stack will include not just GPUs, but memory, permissions, identity, observability, rollback, data access, and tool execution. This is where token pricing may become less important than outcome pricing.
Edge intelligence will follow machines. The phone becomes an inference device. The car becomes an autonomous compute node. The factory becomes a local AI system. The gaming console becomes a high-performance consumer AI box. A SPAN-style home, with solar, battery, EV, smart panel, sensors, and fiber, starts to look like a tiny energy-aware intelligence node.
Most homes will not train frontier models. But they may run personal assistants, privacy-sensitive inference, energy optimization, security, appliance coordination, home robotics, and local caching. The home becomes part of the intelligence grid.
The weird frontier: orbit
Space data centers are not the near-term answer. Launch cost, maintenance, radiation, heat rejection, and data transfer are severe constraints. But orbital compute is useful as a thought experiment because it reveals the real bottleneck: power.
If terrestrial electricity becomes the limiting reagent for frontier AI, people will consider increasingly strange compute geographies. Orbit offers continuous solar energy and isolation, but the economics are far from proven.
The bigger idea
The 21st century will be defined by the race to produce abundant energy and abundant intelligence.
Those two races are now becoming one. AI needs energy. Energy systems need intelligence. Data centers are becoming grid actors. Homes are becoming controllable energy nodes. Vehicles are becoming batteries, sensors, and computers. Agents are turning software from a tool into labor. Robotics will move intelligence into the physical world.
This is why the token economy matters. Tokens are not merely software events. They are the first measurable units of cheap intelligence.
The next infrastructure winners may not simply be the companies with the biggest models or the most GPUs. They may be the companies that best integrate compute, energy, networking, inference orchestration, capital, and physical deployment.
Related Reading
What I Learned Leading AI at Elephant. A practical playbook for AI adoption inside a small company.
You Probably Don't Need a Panel Upgrade. The home as the smallest unit of the intelligence grid.
The Convergence Thesis. The short version of why energy and intelligence are now the same race.
I write about energy, AI, and building companies. Subscribe for occasional updates.