![]() |
Source |
Below the fold I look at the same problem unfolding in the heart of the AI bubble.
Six months after we started Nvidia we knew over thirty other startups trying to build graphics chips for the PC. A major reason that only ATI survived was that, after the second chip, Nvidia released a better chip every six months like clockwork. I explained the technology that made, and still makes, this possible in The Dawn Of Nvidia's Technology, but in the long term equally important was that the idea of a fixed cadence got built into the company's culture, and this now applies not just to graphics chips but also to AI chips:
Last year Nvidia, ..., said it would unveil a fresh AI chip every year rather than every couple of years. In March its boss, Jensen Huang, remarked that “when Blackwell starts shipping in volume, you couldn’t give Hoppers away,” referring to Nvidia’s latest chips and their predecessors, respectively.The systems are estimated to be more than half the capex for a new data center. Much of its opex is power. Just as with mining rigs, the key feature of each successive generation of AI chips is that it is more efficient at using power. But that doesn't mean they use less power, they use more but less per operation. The need for enhanced power distribution and the concomitant cooling is what has prevented new AI systems being installed in legacy data centers. Presumably the next few generations will be compatible with current state of the art data center infrastructure, so they can directly replace their predecessors and thereby reduce costs.
It is often assumed that the majority of the opex of AI platforms goes into training, which is a once per generation cost that can be amortized over the huge number of inference operations needed to answer prompts. But as James O'Donnell and Casey Crownhart report in We did the math on AI’s energy footprint. Here’s the story you haven’t heard.:
As conversations with experts and AI companies made clear, inference, not training, represents an increasing majority of AI’s energy demands and will continue to do so in the near future. It’s now estimated that 80–90% of computing power for AI is used for inference.So 80-90% of the opex is per-query, which makes improving system efficiency critical to reducing the torrent of cash these platforms are hemorrhaging. Ed Zitron estimated that:
To be abundantly clear, as it stands, OpenAI currently spends $2.35 to make $1.This makes Jensen's quip somewhat credible.
A month ago I missed The Economist's discovery of the AI version of Paul Butler's Bitcoin post, entitled The $4trn accounting puzzle at the heart of the AI cloud. The cost of these rapidly depreciating assets is huge:
AI revenues and expenditures are likewise a 13-figure affair nowadays. Worldwide spending on AI hardware and software nudged $1trn last year, according to Gartner, a research firm. This is likely to double to $2trn in 2026. Between 2024 and 2026 the five listed AI powerhouses will have splurged over $1trn on capital investments, chiefly AI data centres. A slug will end up with Nvidia and Broadcom, which furnish them and others with AI semiconductors. The duo (combined market capitalisation: $6trn) are together forecast to book almost $1trn in sales over that period.The platforms haven't been depreciating these assets at realistic rates:
biggest customers have in recent times been raising their servers’ lifetimes, reducing depreciation charges in their accounts. Microsoft pushed it up from four to six years in 2022. Alphabet did the same in 2023. Amazon and Oracle changed it from five to six in 2024. And in January Meta moved from five to five and a half years.The reason for these accounting games is that they make a significant difference to the bottom line:
Amazon reversed course and moved back to five years for some kit, noting this would cut operating profit in 2025 by $700m, or about 1%, owing to a higher depreciation expense. Given the rapid advances in chipmaking, that seems optimistic. And Amazon’s AI rivals clinging to their elongated depreciation schedules look Pollyannaish. In July Jim Chanos, a veteran short-seller, posted that if the true economic lifespan of Meta’s AI chips is two to three years, then “most of its ‘profits’ are materially overstated.” A recent analysis of Alphabet, Amazon and Meta by Barclays, a bank, estimated that higher depreciation costs would shave 5-10% from their earnings per share.And thus to their stock price. If they went to a 3-year straight-line depreciation:
At the five companies’ current ratio of market capitalisation to pre-tax profit, this would amount to a $780bn knock to their combined value. Redo the sums depreciating the servers over two years instead of three and the size of the hit rises to $1.6trn. Take Mr Huang literally, and you get a staggering $4trn, equivalent to one-third of their collective worth.A big chunk of the AI platforms' investment in the hardware is financed with debt. If the lenders are using a five-year life when valuing the hardware as collateral they will run into trouble too. It is rumored that Macquairie was lending against GPUs using a seven-year life.
![]() |
Source |
Butler uses historical hash rate data to compute the actual depreciation curves for mining hardware, plotting the percentage of the initial bitcoin production rate against time in quarters for each quarter since Bitcoin's inception. The graph shows that initially (bluest lines), when GPUs were introduced, they stopped producing after about 5 quarters. Recently (magenta-est lines), ASICs last longer, stopping producing after about 4 years. But for the whole of Bitcoin's existence the hardware has depreciated far faster than the GAAP's five year straight line.But I argued that it was too optimstic:
He assumes that the ASICs are obsolete when they can no longer keep up with the hash rate so are no longer mining any Bitcoin. That is wrong. ASICs are obsolete when the Bitcoin they mine no longer pay for the electricity they use. The newer ASICs aren't just faster, they also use much less energy per hash. Look again at the depreciation graph, which suggests current ASICs go obsolete after 16 quarters. But Alex de Vries and Christian Stoll's estimate of 5 quarters to obsolescence is based on comparing the ASIC's production with the cost of their power consumption, which is the correct approach. The curves in the graph are correct out to the 40% line, but then should drop to zero.Similarly, at some point in the future when the AI platforms realize they need a return on their investments, running systems answering queries that earn less than the cost of the power will no longer make sense and their value will drop precipitously.
If hardware is being used as collateral for a loan the value should represent what it would fetch on the market. Assume Nvidia is on a 2-year cadence. Customers don't get their hardware instantly, so assume that the borrower got theirs 6 months into the 2 years. They default after 1 year, so the bank is selling hardware with 6 months left before the next generation starts shipping, and 1 year before a typical customer can get hardware that renders the current hardware almost uneconomic. The lender's 5-year straight-line estimate of value would be 80% of the purchase price. A buyer would likely estimate 20% of the purchase price.
This blog has often cited the excellent work of Arvind Narayanan at Princeton. His colleague Mihir Kshirsagar posted an equally excellent piece on the implications of inadequate depreciation for competition entitled Lifespan of AI Chips: The $300 Billion Question:
What I have found so far is surprising. It appears that we’re making important decisions about who gets to compete in AI based on financial assumptions that may be systematically overstating the long-run sustainability of the industry by a factor of two.Kshirsagar has identified inadequate depreciation as a key part of what I have called "the drug-dealer's algorithm" (the first one's free):
Incumbent coalitions—hyperscalers (Microsoft, Amazon, Google) partnered with their model developers (OpenAI, Anthropic)—can effectively subsidize application-layer pricing during the critical years when customer relationships are being formed. This could create advantages that prove insurmountable even when better technology becomes available.Kshirsagar goes on to examine the effects of the drug-dealers algorithm in considerable detail. I encourage you to read the whole post, although I have a few quibbles:
The core issue isn’t whether chips wear out faster than accounting suggests. It’s whether the market structure being formed is fueled by accounting conventions that obscure true long-run economics, allowing incumbent coalitions to establish customer lock-in at the application layer before true costs become visible. In other words, the accounting subsidy creates a window of roughly three to six years where reported costs are artificially low. After that, the overlapping depreciation schedules catch up to operational reality. Here’s why this timing matters: that three-to-six-year window is precisely when the market structure for AI applications is being determined. Customer relationships are being formed. Enterprise integrations are being built. Multi-year contracts are being signed. Switching costs are accumulating. By the time the accounting catches up—when companies face the full weight of replacement costs hitting their income statements—the incumbent coalitions will have already locked in their customer base. The temporary subsidy enables permanent competitive advantage.
- He focuses on enterprise use of AI as the platform's business model, and seems to assume that enterprises will find that using AI generates enough value to cover the subsidized cost through the lock-in period. Given the negative productivity impact most enterprises currently report, this is not a safe assumption.
- Even if management's infatuation with AI, and the prospect of firing all their annoying employees, results in their being locked in to the AI platform of their choice, they will need to achieve massive productivity gains to be able to afford what the platforms will have to charge to raise the $2T/year in revenue they will need.
- The AI platforms have already figured out that enterprise use isn't likely to generate enough revenue and are pivoting to advertising, affiliate marketing and porn.
- There is an elephant in the room, namely the vastly increased attack surface AI provides. For example, we have Benji Edwards' AI models can acquire backdoors from surprisingly few malicious documents:
researchers from Anthropic, the UK AI Security Institute, and the Alan Turing Institute released a preprint research paper suggesting that large language models like the ones that power ChatGPT, Gemini, and Claude can develop backdoor vulnerabilities from as few as 250 corrupted documents inserted into their training data.
And David Gerard's It’s trivial to prompt-inject Github’s AI Copilot Chat:Mayraz’s question was: can we send a pull request — a suggested code fix — that contains a prompt injection? And make the bot spill sensitive user data, like private code or AWS login keys? Yes, we can!
...
Mayraz’s exploit was a rediscovery of an exploit GitHub had already been told about. User 49016 discovered the bug a few months before and reported it on HackerOne. GitHub acknowledged the report and called it a “low risk” issue — “yes, they seriously consider a LLM leaking your private repo contents as a ‘low risk issue’.” (Mayraz’s CVE filing was rated a 9.6.) 49016 reports that GitHub’s fix is rubbish, and “bypassing it took 5 minutes lol.” -
And there is the risk that AI platforms will become obsolete with "good enough" AI like DeepSeek and "good enough" local compute such as Nvidia's DGX Spark:Source
a $4,000 desktop AI computer that wraps one petaflop of computing performance and 128GB of unified memory into a form factor small enough to sit on a desk.
The system is the tiny gold box in the image. It is based on Blackwell.
...
Nvidia’s Spark reportedly includes enough memory to run larger-than-typical AI models for local tasks, with up to 200 billion parameters and fine-tune models containing up to 70 billion parameters without requiring remote infrastructure. Potential uses include running larger open-weights language models and media synthesis models such as AI image generators.
One odd thing about AI equipment is that it’s very expensive to buy and very cheap to rent.
Want an Nvidia B200 GPU accelerator? Buying one on its release in late 2024 would’ve probably cost around $50,000, which is before all the costs associated with plugging it in and switching it on. Yet by early 2025, the same hardware could be rented for around $3.20 an hour. By last month, the B200’s floor price had fallen to $2.80 per hour.
![]() |
Source |
What might be less obvious is that among the hyperscalers — Amazon’s AWS, Microsoft’s Azure, Google and Oracle — prices have hardly budged. The result is an ever-widening gap between rates charged by the big-four and a growing number of smaller rivals.Elder ends with a set of possible conclusions:
Here’s the same RBC data organised to show relative pricing across all GPUs. The iceberg effect shows how much it’s new entrants driving average rates lower. Meanwhile, for a hyperscaler customers, this month’s bill per GPU will almost always be the same as last month’s bill
Those are all good points.
- A lot of pandemic-era Nvidia GPUs will be heading towards Cash Converters having never washed their face.
- The customers attracted by low AI compute costs have yet to show much ability, willingness or inclination to pay more.
- The hyperscalers don’t believe these customers are worth competing for, so have chosen to wait for the discount end of the market to die of insolvency.
- The inevitable data-centre shakeout will kill lots of AI start-ups that can’t afford to pay what compute actually costs.
- We might be overestimating the size of the GPU market if the middle-ground — meaning regular companies that want OpenAI and Anthropic to make their chatbots, summarisation tools and similar AI widgets — turns out to be worth less than $3tn.
Update
I need to update today's post to cover two important posts I missed:- Will Lockett's The AI Bubble Is Far Worse Than We Thought from 13th October. It describes the size of the debt fuelling the AI platform's tsunami of spending.
- Ed Zitron's This Is How Much Anthropic and Cursor Spend On Amazon Web Services from yesterday. It provides evidence for their vast excess of costs over income.
Will Lockett
Lockett starts from an analysis by Julien Garran of MacroStrategy Partnership, pointing to the mismatch between the enormous investments the AI platforms are making and the lack of any sustainable business model to generate the $2T/year in new revenue needed to provide a return.So far, Garran's analysis is hardly original, but:
Garran used the economic analysis pioneered by economist Knut Wicksell to establish the size of this investing discrepancy, the 2008 bubble, and the dot-com bubble, and that is how he learned that the AI bubble is currently four times the size of the 2008 bubble at its peak!Lockett asks "is the AI bubble as connected to the outside world?":
It seems AI companies have been growing through equity financing (selling shares of the company) rather than debt financing (borrowing money). Furthermore, investment seems to be incestuous to the tech industry; for example, OpenAI’s major investors are Microsoft and Nvidia. As such, many believed that the AI bubble is actually relatively isolated from the rest of the economy and therefore might not have as significant an impact on the wider economy when it bursts.It turns out that a large slice of the bubble spending isn't equity:
Dario Perkins, managing director of global macro at TS Lombard, has found that many AI companies are increasingly using SPVs to raise significant amounts of debt financing off the books. This covers their tracks and obfuscates the debt, making it “look” like the company is running on equity finance instead. Due to this, it is incredibly difficult to get an accurate figure on how much of the AI industry’s expenditure and growth comes from debt — but we know it is a lot!Even the "on the books" debt is a lot:
Goldman Sachs has found that at least $141 billion of the $500 billion in capital expenses the AI industry has spent so far this year came from debt directly tied to the main corporate body through corporate credit issuances. To give you an idea of how insane that is, the entire AI industry capital expenditure in 2024 was $127 billion. In other words, the AI industry has taken on significantly more debt so far this year than it ever spent in total last year. This also means that we know that about 30% of the AI industry’s annual expenditure for this year came from “on the books” debt.And there is plenty of "off the books" debt:
We know that Meta is looking to raise $26 billion in debt through an SPV by the end of the year. This one deal equates to 5% of the total AI industry’s capital expenditure for this year.You have to wonder what collateral is backing this "at least" $167B in debt, and how it is being depreciated.
Ed Zitron
Some of the AI platforms have non-AI free cash flows that can be diverted to paying the interest on these debts, or that could be if they hadn't already been shoveled into the AI investment cash furnace. But OpenAI, Anthropic and the other pure-play AI platforms don't. In order to pay the interest without needing to raise more equity, they need to be cash flow positive. Zitron has always been skeptical that they can be, for example writing:To be abundantly clear, as it stands, OpenAI currently spends $2.35 to make $1.Now, Zitron has strong evidence that he was right:
Based on discussions with sources with direct knowledge of their AWS billing, I am able to disclose the amounts that AI firms are spending, specifically Anthropic and AI coding company Cursor, its largest customer.Zitron can confirm that:
I can exclusively reveal today Anthropic’s spending on Amazon Web Services for the entirety of 2024, and for every month in 2025 up until September, and that that Anthropic’s spend on compute far exceeds that previously reported.
Anthropic has spent more than 100% of its estimated revenue (based on reporting in the last year) on Amazon Web Services, spending $2.66 billion on compute on an estimated $2.55 billion in revenue.And what about Cursor?:
Additionally, Cursor’s Amazon Web Services bills more than doubled from $6.2 million in May 2025 to $12.6 million in June 2025, exacerbating a cash crunch that began when Anthropic introduced Priority Service Tiers, an aggressive rent-seeking measure that begun what I call the Subprime AI Crisis, where model providers begin jacking up the prices on their previously subsidized rates.Last year Anthropic likely spent more than ten times its revenue:
Although Cursor obtains the majority of its compute from Anthropic — with AWS contributing a relatively small amount, and likely also taking care of other parts of its business — the data seen reveals an overall direction of travel, where the costs of compute only keep on going up.
In February of this year, The information reported that Anthropic burned $5.6 billion in 2024, and made somewhere between $400 million and $600 million in revenueZitron notes that Anthropic is "making it up in volume":
...
I can confirm from a source with direct knowledge of billing that Anthropic spent $1.35 billion on Amazon Web Services in 2024, and has already spent $2.66 billion on Amazon Web Services through the end of September.
Assuming that Anthropic made $600 million in revenue, this means that Anthropic spent $6.2 billion in 2024, leaving $4.85 billion in costs unaccounted for.
Based on what I have been party to, the more successful Anthropic becomes, the more its services cost. The cost of inference is clearly increasing for customers, but based on its escalating monthly costs, the cost of inference appears to be high for Anthropic too, though it’s impossible to tell how much of its compute is based on training versus running inference.While it is true that training these very large models is expensive, it is a one-time cost to be amortized against the huge number of inferences made against the trained model. Alas, the huge number is so huge that it dominates the one-time cost:
As conversations with experts and AI companies made clear, inference, not training, represents an increasing majority of AI’s energy demands and will continue to do so in the near future. It’s now estimated that 80–90% of computing power for AI is used for inference.Which explains Zitron's observation that:
these costs seem to increase with the amount of money Anthropic makes, meaning that the current pricing of both subscriptions and API access seems unprofitable, and must increase dramatically — from my calculations, a 100% price increase might work, but good luck retaining every single customer and their customers too! — for this company to ever become sustainable.This confirms my prediction of price increases in June's The Back Of The AI Envelope.
Zitron's post is long, detailed and very much worth reading. Go and read it.
No comments:
Post a Comment