By now you’ve likely seen images conjured up by DALL-E2 or Stable Diffusion. These are neural networks that can draw pretty much whatever you want. As far-fetched as this sounds, it seems to be true. You want teddy bears shopping for groceries in 19th century Japan? No problem.
The list of outrageous prompts and resulting images is endless. It is at once exhilarating and exhausting and terrifying to see how well it works. You may enjoy browsing through the various related subreddits.
As remarkable as these tools are, the funny pictures are distracting us from the bigger story: we will soon be adding a natural language layer to EVERY user interface. Certainly everything that is generative. It’s going to be transformative because deep technical competence tethered to natural language will release you from the shackles of craft. Let me explain.
DALL-E2 is a graphical tool, but the key thing is how you talk to this graphical tool. It’s really more of a language tool. You’re carving pixels with words. It’s as if you’re talking directly to Photoshop. But that’s still not quite it. It’s as if you’re talking to a brilliant digital artist, and they do all the pixel pushing. Click once on the mountain top and say “Put a castle here.” And it will instantly happen. You might then say “No, I’d like it to look less like Neuschwanstein and more like Carcassonne.” And it will happen instantly.
Do you see? The new thing here is not a castle-making tool. It’s the ability to talk to a skilled digital artist about anything at all. You no longer need to spend years getting good at Photoshop. The new thing here is the ability to talk to an always-on expert in ANYTHING. What we’re building is a generation of mechanical savants. You talk to them with natural language. They don’t talk back to you in natural language. They have deep knowledge of a subject area, and their reply is a concept that they think matches your prompt.
Here’s an example of what I mean. Let’s imagine you want to build a house. You have a vague idea what you want, but you need to talk to an architect. They’re the ones with the expertise needed to turn your hunch into something realizable. You give them a naive prompt and they reply with a realizable concept. It’s not quite what you wanted, so the two of you go back and forth. Over time you develop intuition about what you like, what can be built, and what you can afford.
The problem is that architects are expensive. Only rich people can afford to play this game. But the game is changing now, because you can replace the architect with a mechanical savant. In the diagram below, you’re the one with the desire and the initial prompt. But now your prompt goes to a machine, a mechanical savant, to generate the concept. The bot has deep knowledge that frees you from years of studying a craft. You don’t need to go to architecture school. You don’t even need to hire an architect. You just need to discern the bot’s latest concept and modulate your next prompt.
Your key competence here is your relationship to the savant-bot (since it’s the one doing all the generation). Your competence is in the articulation of desire and discernment. Your new craft is iterative prompt refinement. This is the skill that you must never trade away. Knowing what you want, and knowing that you know what you want… these will be the instruments of power. All else is leverage, getting cheaper every day.