It’s like we finish each other’s sandwiches.
It's been a hell of a year for autocomplete. It feels like just yesterday we brought our first Language Model home from the hospital. On a diet of the entire internet it’s grown Large and, like every parent hopes, omniscient. It’s humbling to see its attention span as ravaged as mine has been by binging on Reddit and the like.
‘Omniscient autocomplete’ is less an aspersion and more a reminder of what it’s doing. And we should remind ourselves what it’s actually doing. Given a series of word snippets (tokens) strung together, Large Language Models like ChatGPT predict what token is most likely to come next. It’s good at guessing because it’s well read and we’ve had humans give it treats when it’s done a good job1.
Omniscient autocomplete is great at solving autocomplete problems. It’s filled with a discrete set of facts and opinions it’s seen on the internet, and it’s seen enough examples of deductive reasoning to be able to infer things it hasn’t observed directly2.
What's not an autocomplete task?
There are a lot of autocomplete problems in the world. Writing this post is one of them. Answering
’s Slack why this post is late is one of them. LLMs are water to those who had the misfortune of building a chatbot or summarizing documents in the last decade. I think autocomplete problems are valuable, but much like I responded to Jeremiah’s slack “it’s just not my focus at the moment”.In Math you learn early that the easiest3 way to solve a problem is simply recast it into a problem that’s already been solved.
Does my differential equation have a solution? It sure does if you recast it as fixed point problem and smack it with the Banach Fixed Point Theorem4.
Can we design a matching program for medical residencies so that everyone is incentivized to tell the truth? It sure does if you recast it as a private information game and crack it over the head with Gale-Shapley.
Jackie Hadamard (can I call you Jackie?) said “The shortest path between two truths in the real domain passes through the complex domain.” If autocomplete problems are solved, this means that any problem that can be reliably translated into and back from an autocomplete problem is solved (where my category theorists at?).
We’ve seen glimmers of this.
Research has shown LLMs to reliably do zero-shot5 classification. To build a spam classifier, we used to manufacture features from thousands of emails, manually labeling them, and artisanally crafting a bespoke model to learn the relationship between our features and labels. Now, we show OpenAI the text of an email and simply ask it: is this spam or not?
Research has shown LLMs to reliably do zero-shot entity extraction. To build a model to do entity extraction, we used to have to pray to SpaCy or Gensim that classical linguistics could help us identify proper nouns in our document. If you were feeling particularly adventurous, you could try to get buy-in to use conditional random fields before throwing it in the backlog. Now, we show OpenAI the text of a resume and simply ask it: “where did this person go to school and what degrees did they get there?”.
A corollary of these two emergent6 behaviors is that LLMs can generate their own tasks and choose the right tools to solve them. To do this five years ago you’d have to wait four years. You simply describe what tools do and how to use them, give your LLM an amorphous task, and ask “oh, great OpenAI, what sayeth you the best tool to use to solve my lowly task”. We can unfurl workflows from a simple objective and dynamically generate our own tasks.
This process of translating the task we actually want to be completed (classification, extraction, tool choices) into an autocomplete is called prompt engineering. In a sense, prompt engineering is a transpiler7 (or functor): we take an instruction set in one language and translate into an instruction set native to our LLM, which we’re treating as a runtime.
Pedantic to semantic: leave me out of it.
Transpilers should be neither seen or heard. After all, one of their promises is to let thoughtless single-language engineers like me approximate a mediocre engineer in languages I don’t know. They exist ambiently, and I love that feeling. I love that experience as a developer.
In LLMs this means creating a semantic layer between code and LLMs: something that translates the pedantry of software into the … semantry (?) of LLMs and back again. I deeply believe that working with LLMs should be the same, and this is what motivated my choice to join the Marvin team at Prefect.
If you raise a Language Model on a diet of human language and give it a treat when it answers your question correctly, you get ChatGPT. If you raise it on a diet of code and give it a treat when it writes a good function, you’ll get Codex. If you raise it on Arrested Development and reward it for unloading the dishwasher you get, well, me.
If you’ve had the misfortune of taking real analysis, LLM’s are in some sense the deductive closure of the facts and opinions in its training corpus. Of course, much like any metric space, the closure looks wildly different depending on what metric you choose. Different metrics for what counts as ‘good reasoning’ in some sense what differentiates the menagerie of Cohere, Anthropic, OpenAI, and others.
Good reasoning lets you emulate a sharp intern that costs $.0004/hour. Bad reasoning in LLMs has the same impact as bad reasoning in humans: it can be factually wrong, and it can perpetuate systemic bias that hurts people.
Refuted of course by Charlie Fefferman (can I call you Charlie?) “If you're stuck on a problem, then one way out is to interest Terry Tao”.
Does my game have an Nash equilibrium? It sure does if you recast it as a fixed point problem and smack it with the Kakutani Fixed Point Theorem.
“Look ma, no training data!”
When you hear folks say “emergent”, you should think “ohhh, like the scene where the Velociraptors unlocked the doors”. Technically we never instructed it to do these things explicitly, but baby’s learning.
I’m using transpiler really liberally here. It has a precise meaning that people who think about compilers care deeply about. For me it means “big ole’ magic program that lets me take advantage of optimized language X by writing in language Y”. I’ve called React Native a transpiler, even though most folks who have a rigid definition would be right to call me out that it’s not one.