By Friederich in AI for Good — 02 Dec 2024

Why you only use Siri for setting Timers.

ChatGPT is a supposed everything tool. Yet it is not the only tool we use yet. Part of that is due to lack of functionality. But more importantly, we lack the appropriate mental models to use the tool to its fullest extent.

Hey bud, what do you use your smartphone for?
Eh, pretty much anything.
Okay, then tell me what everything is.
I can call friends, write messages, answer work emails, navigate to my vacation home, look up the news, ...

This list will be pretty much endless. Everyone you ask will give a different answer, but they will know a very specific list of features they use. This is the case, because you will most likely hear a list of specific apps. No one will bother to say they use iOS or Android.

The mental model of what a smartphone can do is very clearly defined. It is a container for apps made for specific problems. But if there is no appropriate app, or you simply can't find it through listicles or App Store searches, your problem will go unsolved.

Welcome LLMs to the Stage!

ChatGPT is a supposed everything tool. It is more than a typical app because you can ask it to "behave" in certain ways and solve problems through different "behaviors" in conversation. I think of it as a problem-solving OS (like iOS) which enables individual solution applications, created and accessible dynamically from natural language.

But it is not (yet) an everything tool. Initially, math was a big problem for ChatGPT. But Statistics worked really well and became a practical use case for students or researchers and workers. How? Because this mental model about what works and what doesn't was build up pretty quickly from a blank slate.

Sooo, what can ChatGPT do today?

Similarly to the smartphone, more and more tools (formerly other apps) are integrated into ChatGPT (the OS). The most recent one is web-search.

Yeah, I know, there is a special button for it. But how good is its math today?

I don't know. Which is why I don't use it for that. And this is probably how a vast majority of users will treat math and other features over time, because mental models can't keep up with the specific improvements made if those aren't made explicit.

Adjusting Mental Models with technological Change

Cliff Kuang, a UX Designer, outlines this idea in an article for MIT Technology Review. This is how he puts it:

If you’ve ever used a digital assistant or smart speaker, you already know that mismatched expectations create products we’ll never use to their full potential. When you first tried one, you probably asked it to do whatever you could think of. Some things worked; most didn’t. So you eventually settled on asking for just the few things you could remember that always worked: timers and music. LLMs, when used as primary interfaces, re-create the trouble that arises when your mental model isn’t quite right.

It might seem like a marketing problem, but it isn't. It simply isn't efficient to communicate every little function of a consumer-facing LLM-product like Gemini, Claude, or ChatGPT to a large non-technical audience. Likewise, it is a product problem related to LLMs as "primary interfaces" like the ChatGPT app.

We face a narrative contradiction here. Chatbots were introduced as being a one-stop solution that naturally adapts to your verbal input. It is not an experience segmented into different functions like a smartphone with apps. But that is exactly where the interfaces are moving!

The mental model issue laid out by Kuang explains, why wrapper apps like Perplexity are huge businesses today.

ChatGPT wasn't capable of searching the internet when Perplexity started out. Perplexity took Natural Language Processing (NLP) and applied it to search, which is a more abstracted way of asking questions. They made research more natural again. Transferring the mental model of what Google does for you into a new application is easier than adding it to the mental model for ChatGPT today.

And not only the mental model, but also the usage habit is easier to change, from one to another search engine vs. from one app to supposedly a completely different one. That is why Perplexity is still gaining users with search existing in ChatGPT.

But what, if instead of forcing the technology into the existing solution molds, the technology could allow for a new solution beyond the app? Kang sees AI as the way to make Apps self definable, in his words:

I think the future is composability – but composability that anyone can command.

The status quo

Small problem: What does composability practically mean? Let's start by understanding the current rigid interfaces.

One might argue that Meta is slowly turning into a wrapper company for LLMs. They sprinkle AI on different bits of your user experience, changing the tech and the output, but sticking with the existing interface for searching, discovering people or writing posts.

Not only the old guard of big tech is taking this approach. New startups are on the same trajectory, creating new versions of existing products: Perplexity is building a new search engine, Elicit is creating an AI Google Scholar.

Both existing companies and new startups only require slight adjustments from users to use the same type of interface (+ NL input) for their existing problems.

ChatGPT and other general purpose applications face a different challenge. They want to make the ultimate multitool, a new product category.

The ChatGPT app is pretty good at introducing new capabilities as features by expanding the interface. But the amount of individual features is exploding: soon the GUI will get super cluttered. We can't simply be adding buttons and info banners etc. This has been a bigger hindrance for non-AI assistants like Siri, but will still remain a problem under the new tech regime of AI.

What is the Interface of the Future?

My thesis is, that problem-specific wrapper apps have a limited future before them. They will soon have recreated better versions of existing problems solved by technology. The limiting factor will be the same as for the app-composable smartphone, the current everything tool.

But the way ChatGPT and especially Anthropic's Claude add NLP features by sticking them back into rigid boxes (for Claude those are Artifacts) seems to be a pointer of where the industry of next generation everything tools will go. App-composability will stay the main solution, just within one super app.

So maybe Kangs vision of composable UI is destined to get lost to history?

The idea of an app seems very cozy, there is little urge to change. But a true free-form assistant is the goal. How far is that out? Nobody knows. Supposedly, newer LLMs struggle to create the same leap-frogging improvements like GPT4 did. So maybe, we will never grow towards the intelligent assistant? In context of the current mantra of the industry, that intelligence emerges from scale, this doesn't sound too promising.

Let's dream and speculate!

But maybe the AI can learn to code new UI into itself? Sounds human. And uncanny. What does it mean for companies with a complex problem-solving offering using LLMs to deliver their product?

In my view, digital minimalism will prevail. You can see this with OpenAI already. Their conversation mode, where all distractions disappear only leaving the necessary stop button and some visualization, is exactly that.

Obviously, that is not super practical. What if every of your conversations with ChatGPT would start with just a blank text box or a voice record button? You start by stating the problem you want to solve, e.g. creating a social media post out of a rough draft you have already written.

Enter the dynamic machine.

Together with a response by the LLM to your contents, maybe offering feedback, a row of editing tool buttons appears to the top of the screen. Your draft gets pinned into the center of the screen for you to co-edit with the AI. To the left, you can see the conversation in text form and have your typical conversation with the machine.

After a few revisions, the text feels fine to you, but it lacks the visual appeal. You prompt the AI to create a cover image for you. The answer is a beautiful image where you would just love to change the font-size slightly. Easy enough! Together with the image, a suite of image editing tools has replaced the text editing. Now you work on the cover image to make your dream come true.

It's time to make the numbers pop! Some key facts in your text would benefit from a bar chart summarizing the relevant bits of an earnings report. The machine opens a sheet with a table containing the relevant numbers on the one half of the screen and displays the output on the other. It knows you are doing (basic) financial analysis, so all information is automatically formatted in your local currency. And all the cluttered editing menus you would fight in Excel are binned down to 5 features you will actually need to adjust the output as by your specified vision of the task.

I wrote quite a lot more of this, but I think you get the gist.

Final thoughts

Single-domain wrapper companies will still print cash. Even if their technology is replicable and relatively unsophisticated, their power to create (or appropriate) a specific mental model that includes a specific set of solvable problems, makes them successful. Enabled by a stylistic wrap with some prompt engineering and fine-tuning in the background, they are creating value and making use of the technology in effective ways.
Second movers have more advantages than ever. OpenAI made a bold first step with ChatGPT and introduced the idea of a dynamic machine. Now, other companies can come and fill in that mental model with the rapid advancements in the field of AI without worrying about the baggage of past mental models.
Product descriptions matter more than ever for the app-composable future of AI. They should be open enough from day one to enable the company to grow into its mental model. Google search also needed to evolve from simple search to tab. In that way, the issue is also as old as the application.
The dynamic machine is far out. Even if it wouldn't require “intelligence” or AGI more generally, a complex understanding of problems and appropriate solutions has not been effectively translated into a data set. Without that, such a feature will not emerge from training.

Sources

Kuang, C. (2024, August 28). AI’s growth needs the right interface. MIT Technology Review. https://www.technologyreview.com/2024/08/28/1096515/ai-interfaces-ux-growth/