AI’s Alexa Moment

29/06/2024

I've been trying for some time to articulate thoughts on the direction the nascent "Foundation" Models AI industry might take, and how it can live up to VC and broader market expectations.

This tweet was making the rounds earlier this month:

We can now use:
- MS copilot for free.
- Latest GPT (with limits) but for free.
- GPT via Apple for free.
- Apple own models for free.
- Google gemini for free.
- A trillion other models via Huggingface and others, for free.
So how exactly is anyone going to make $ on this?


— Filip Piekniewski🌻 🐘:@filippie509@techhub.social (@filippie509) June 11, 2024

Much of the discourse in the replies was about trying to find a past model for this wave of AI/tech innovation and its eventual economic outcome. Two candidates seem to be on people’s minds:

At this point, we are all familiar with the conflagration of factors that gave the world these two waves of Big Tech™.

Social Media’s value was in bringing every human’s public profile and interpersonal interaction online and recognizing that one could segment, recommend, advertise, and matchmake by mining the graph that emerged. Zero initial cost, infinite upside* (*provided the underlying network grows infinitely, which it can’t.) The app economy was based on the disintermediation and platformization of legacy regulated markets (taxis, restaurants, hotels), which could only attain profitability by leveraging the global scale of the internet and smartphone penetration to disrupt localized incumbents. Enormous scaling up costs and near limitless upside* (*provided regulations don’t catch up, which they might.)

Both started heavily subsidized and were free or almost free to individual end users before finding market fit and profitability many years down the line.
Given what the leaders and investors of AI foundation models companies touted as the tremendous cost of training large language models (LLMs) and the—albeit falling rapidly—cost of serving them as products to millions of users, we should expect that these products and platforms will eventually find fit, settle in our lives as frequent necessities or occasional luxuries, and find the market price for themselves.

Unfortunately for those - and they are numerous - hopeful to profit, the situation I think we are in now is more akin to Amazon’s Alexa experiment.

For a refresher, Alexa was a 10+ year, multi-billion dollar investment in building the next generation of user interaction on the internet.

Amazon’s voice assistant launched in 2014 alongside the first Echo devices. Started in 2011 right as Siri - a major inspiration and competitor - was unveiled by Apple, it was the result of an incredible effort to hire the best researchers in the world and throw massive amounts of data and compute at one problem: a voice-based computer interface. What followed was an explosion of devices from Amazon, Google, Apple, and many more, each with its own proprietary voice AI or licensing from BigTech. For nearly a decade, this new paradigm of interaction ambled along, with monetization and untold riches always around the next corner.

And then, in late 2022, as the first waves of post-COVID tech layoffs were cresting, the entire Alexa division was axed. It had burned through $3 billion that year alone and was on track to spend $10 billion more.


The current explosion of capital allocation to the problem of “Scaling LLMs” echoes to me the Tech industry’s fascination with the voice interface. An incredible scientific breakthrough was achieved by throwing money, computing, and the collective output of an entire field of theoretical research at the goal of generating text (and images, and sound, and video, and anything and everything else). Unfortunately, the solved problem does not lead to a Great Productivity Unlock or the next step of the Tech Business paradigm.

People are using those models and platforms by the millions. But much like Alexa and other voice assistants, they seem to be using them for novelty, because they are here, and because they are—for the most part—free (read, heavily subsidized). Once known, the ability to effortlessly generate text is most entirely replicable and deployable as an open and free (as in software) building block for all software going forward, as was the ability to translate or transcribe before.

Alexa was a commercial failure because, for all its advances to the state-of-the-art in speech recognition, it could not capture a new way to shop online or become the gateway to a suite of profitable new business ventures. Voice simply became a new mode of human-machine interaction, expected to be included in every OS and compatible with every app.
With companies like Figma and Adobe integrating generative capabilities across their entire software suites, well-funded industry disruptor hopefuls like Runway are already losing their novelty appeal.

What marginal gain will a personal Claude or ChatGPT subscription give an individual student or worker when a similarly-capable model already runs in their word processor, in their browser, on their phone, in their IDE, as part of their Grammarly subscription, in their font?

What possible value can be captured by AI companies when the only natural resource that can give their models an edge on the competition - high-quality human-generated data - has been mined out and its only openly harvestable mine - the open web - is being irremediably poisoned by generated “slop”? What about the swathes of industry where humans are being axed purely based on the promises of “smarter” models?

The ongoing commoditization of generative AI will lead to great advances in software and human-machine interaction for myriad purposes. Still, it will also prove to have been a burn of capital on a scale that dwarfs Amazon’s Alexa experiment. Speaking of, isn’t it weird not to see a major branded LLM from Amazon more than two years into the revolution? Even Apple had time to put its spin on it with the unabbreviable Apple Intelligence - which runs mostly locally and comes at no additional cost to iOS users, very much in the vein of Siri, Alexa's older sibling.