On my recent visit to London I was struck by how many of the advertisements in the Tube were selling AI. They fell into two groups, one aimed at CEOs and the other at marketing people. This is typical, the pitch for AI is impedance-matched to these targets:
- The irresistible pitch to CEOs is that they can "do more with less", or in other words they can lay off all these troublesome employees without impacting their products and sales.
- Marketing people value plausibility over correctness, which is precisely what LLMs are built to deliver. So the idea that a simple prompt will instantly generate reams of plausible collateral is similarly irresistible.
In
The Back Of The AI Envelope I explained:
why Sam Altman et al are so desperate to run the "drug-dealer's algorithm" (the first one's free) and get the world hooked on this drug so they can supply a world of addicts.
You can see how this works for the two targets. Once a CEO has addicted his company to AI by laying off most of the staff, there is no way he is going to go cold turkey by hiring them back even if the AI fails to meet his expectations. And once he has laid off most of the marketing department, the remaining marketeer must still generate the reams of collateral even if it lacks a certain something.
Below the fold I look into this example of the process Cory Doctrow called
enshittification.
The first thing to note is that the pitch is working. The discourse is full of CEOs talking their book. For example we have Matt Novak's
Billionaires Convince Themselves AI Chatbots Are Close to Making New Scientific Discoveries recounting the wisdom of Travis Kalnick:
“I’ll go down this thread with [Chat]GPT or Grok and I’ll start to get to the edge of what’s known in quantum physics and then I’m doing the equivalent of vibe coding, except it’s vibe physics,” Kalanick explained. “And we’re approaching what’s known. And I’m trying to poke and see if there’s breakthroughs to be had. And I’ve gotten pretty damn close to some interesting breakthroughs just doing that.”
Then there are the programmers extolling "vibe coding" and how it increases their productivity. CEOs who buy this pitch are laying off staff left and right. For example, Jordan Novote reports that
Microsoft laying off about 9,000 employees in latest round of cuts:
Microsoft said Wednesday that it will lay off about 9,000 employees. The move will affect less than 4% of its global workforce across different teams, geographies and levels of experience, a person familiar with the matter told CNBC.
...
Microsoft has held several rounds of layoffs already this calendar year. In January, it cut less than 1% of headcount based on performance. The 50-year-old software company slashed more than 6,000 jobs in May and then at least 300 more in June.
How well is this likely to work out? Evidence is accumulating that AI's capabilities are over-hyped. Thomas Claiburn's
AI models just don't understand what they're talking about is an example:
Asked to explain the ABAB rhyming scheme, OpenAI's GPT-4o did so accurately, responding, "An ABAB scheme alternates rhymes: first and third lines rhyme, second and fourth rhyme."
Yet when asked to provide a blank word in a four-line poem using the ABAB rhyming scheme, the model responded with a word that didn't rhyme appropriately. In other words, the model correctly predicted the tokens to explain the ABAB rhyme scheme without the understanding it would have needed to reproduce it.
The problem with potemkins in AI models is that they invalidate benchmarks, the researchers argue. The purpose of benchmark tests for AI models is to suggest broader competence. But if the test only measures test performance and not the capacity to apply model training beyond the test scenario, it doesn't have much value.
As far as I know the only proper random controlled trial of AI's productivity increase comes from Model Evaluation and Threat Research entitled
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity:
16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early 2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%--AI tooling slowed developers down.
David Gerard
notes:
Even the devs who liked the AI found it was bad at large and complex code bases like these ones, and over half the AI suggestions were not usable. Even the suggestions they accepted needed a lot of fixing up.
This might be why Ashley Stewart reported that
Microsoft pushes staff to use internal AI tools more, and may consider this in reviews. 'Using AI is no longer optional.':
Julia Liuson, president of the Microsoft division responsible for developer tools such as AI coding service GitHub Copilot, recently sent an email instructing managers to evaluate employee performance based on their use of internal AI tools like this.
"AI is now a fundamental part of how we work," Liuson wrote. "Just like collaboration, data-driven thinking, and effective communication, using AI is no longer optional — it's core to every role and every level."
Liuson told managers that AI "should be part of your holistic reflections on an individual's performance and impact."
If the tools were that good, people would use them without being threatened. If the tools were that good, people would pay for them. But
Menlo Ventures found that only 3% of consumers pay anything. They are happy to use free toys but they have other spending priorities.
Other surveys have found numbers up to 8%, but as Ted Gioia notes in
The Force-Feeding of AI on an Unwilling Public:
Has there ever been a major innovation that helped society, but only 8% of the public would pay for it?
Gioia didn't want AI but as an Office 365 user
he didn't have that option:
AI is now bundled into all of my Microsoft software.
Even worse, Microsoft recently raised the price of its subscriptions by $3 per month to cover the additional AI benefits. I get to use my AI companion 60 times per month as part of the deal.
Microsoft didn't ask their customer whether they would pay for AI, because the answer would have been no.
Gioia writes:
This is how AI gets introduced to the marketplace—by force-feeding the public. And they’re doing this for a very good reason.
Most people won’t pay for AI voluntarily—just 8% according to a recent survey. So they need to bundle it with some other essential product.
As I discussed in
The Back Of The AI Envelope, the AI giants running the drug-dealer's algorithm are losing money on every prompt.
Gioia has noticed this:
There’s another reason why huge tech companies do this—but they don’t like to talk about it. If they bundle AI into other products and services, they can hide the losses on their income statement.
That wouldn’t be possible if they charged for AI as a standalone product. That would make its profitability (or, more likely, loss) very easy to measure.
Shareholders would complain. Stock prices would drop. Companies would be forced to address customer concerns.
But if AI is bundled into existing businesses, Silicon Valley CEOs can pretend that AI is a moneymaker, even if the public is lukewarm or hostile.
Salesforce is another company that has
spotted this opportunity:
Yesterday Salesforce announced that prices on a pile of their services are going up around 6% — because AI is just that cool.
Salesforce’s stated reason for the price rise is “the significant ongoing innovation and customer value delivered through our products.” But you know the actual reason is because f- you, that’s why. What are you gonna do, move to SAP? Yeah, didn’t think so.
One problem is that the technology Salesforce is charging its customers for doesn't work well in Salesforce's application space. Salesforce's own researchers developed a
new bechmark suite called CRMAArena-Pro:
CRMArena-Pro expands on CRMArena with nineteen expert-validated tasks across sales, service, and 'configure, price, and quote' processes, for both Business-to-Business and Business-to-Customer scenarios. It distinctively incorporates multi-turn interactions guided by diverse personas and robust confidentiality awareness assessments. Experiments reveal leading LLM agents achieve only around 58% single-turn success on CRMArena-Pro, with performance dropping significantly to approximately 35% in multi-turn settings. While Workflow Execution proves more tractable for top agents (over 83% single-turn success), other evaluated business skills present greater challenges. Furthermore, agents exhibit near-zero inherent confidentiality awareness; though targeted prompting can improve this, it often compromises task performance.
To
summarize the results:
The agent bots had 58% success on tasks that can be done in one single step. That dropped to 35% success if they had to take multiple steps. The chatbot agents are also bad at confidentiality:
Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance. These findings suggest a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios.
Despite the fact that most consumers won't pay the current prices, it is inevitable that once the customers are addicted, prices will go up spectacularly. But the wads of VC cash may not last long enough, and things can get awkward with the customers who are paying the current prices, as
David Gerard reports:
You could buy 500 Cursor requests a month for $20 on the “Pro” plan. People bought a year in advance.
In mid-June, Cursor offered a new $200/month “Ultra” plan. But it also changed Pro from 500 requests to $20 of “compute” at cost price — the actual cost of whichever chatbot vendor you were using. That was a lot less than 500 requests.
You could stay on the old Pro plan! But users reported they kept hitting rate limits and Cursor was all but unusable.
The new plan Pro users are getting surprise bills, because the system doesn’t just stop when you’ve used up your $20. One guy ran up $71 in one day.
Anysphere has looked at the finances and stopped subsidising the app. Users suddenly have to pay what their requests are actually costing.
Anysphere says they put the prices up because “new models can spend more tokens per request on longer-horizon tasks” — that is, OpenAI and Anthropic are charging more.
The CEO who laid off the staff faces another set of "business risks". First, OpenAI is close to a monopoly; it has around 90% of the chatbot market. This makes it a single point of failure, and it
does fail:
On June 9 at 11:36 PM PDT, a routine update to the host Operating System on our cloud-hosted GPU servers caused a significant number of GPU nodes to lose network connectivity. This led to a drop in available capacity for our services. As a result, ChatGPT users experienced elevated error rates reaching ~35% errors at peak, while API users experienced error rates peaking at ~25%. The highest impact occurred between June 10 2:00 AM PDT and June 10 8:00 AM PDT.
Second, the chatbots present an attractive attack surface. David Gerard reports on a
talk at Black Hat USA 2024:
Zenity CEO Michael Bargury spoke at Black Hat USA 2024 on Thursday on how to exploit Copilot Studio:
- Users are encouraged to link “public” inputs that an attacker may have control over.
- A insider — malicious or just foolish — can feed their own files to the LLM.
- If you train the bot on confidential communications, it may share them with the whole company.
- 63% of Copilot bots are discoverable online, out on the hostile internet. Bargury fuzzed these bots with malformed prompts and got them to spill confidential information.
Bargury demonstrated intercepting a bank transfer between a company and their client “just by sending an email to the person.”
So the technology being sold to the CEOs isn't likely to live up to expectations and it will cost many times the current price. But the way it is being sold means that none of this matters. By the time the CEO discovers these issues, the company will be addicted.
No comments:
Post a Comment