I couldn't tell you in any detail how my computer works. I use it with a layer of automation.
-Conrad Wolfram
Note: I’ve been sitting on this for awhile and this article talks about the general technology of GPTs, not the recently announced customizable GPTs by OpenAI that they are naming the same to try and control their brand. However, they still do fall in the paradigm described below.
Generative Pre-trained Transformers or GPTs are a new type of computer. Well maybe more like a new operating system, except that they are also modular. Previously, I debated
on his Five Contrarian Theses on AI and wrote about how the technologies that underpin GPTs create a new type of computational paradigm. Let's expand on that now.A Bit of History
If we look back to the first model of a computer, the Turing Machine, we can find a lot of similarities between GPTs and the original mathematical model of a computer. Granted modern computers work a bit differently than Turing Machines but they are still mathematically equivalent. Wikipedia describes Turing Machines as as follows:
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algorithm.
The machine operates on an infinite memory tape divided into discrete cells, each of which can hold a single symbol drawn from a finite set of symbols called the alphabet of the machine. It has a "head" that, at any point in the machine's operation, is positioned over one of these cells, and a "state" selected from a finite set of states. At each step of its operation, the head reads the symbol in its cell. Then, based on the symbol and the machine's own present state, the machine writes a symbol into the same cell, and moves the head one step to the left or the right, or halts the computation. The choice of which replacement symbol to write, which direction to move the head, and whether to halt is based on a finite table that specifies what to do for each combination of the current state and the symbol that is read.
While it isn't necessary, it's easiest to think of computers and Turing Machines running on strings of 0's and 1's, or binary. Now when we look at a GPT system, it's working in a very similar manner to a Turing Machine except that instead of the typical binary system, it's using a system of tokens. The number of reference tokens available to a GPT system varies on how it was trained, but it is typically measured in the tens of thousands of different tokens.
In the same manner that Turing Machines scan memory tape, GPTs scan the text tape to then generate the next word. It's interesting to note that a Turing Machine doesn't deal with I/O (input/output) as you need to wait for the machine to finish its computation. In a similar manner you can't stop a GPT mid-computation to get the full result. However, unlike Turing Machines, mid-computation of a GPT is understandable because it is occurring in natural language.
Mechanics
Andrej Karpathy showed how to build GPT-2 from scratch in this video. If you don’t know already, it's well worth spending the time to understand how GPT systems work. Even though this is GPT-2 it's still extremely relevant to how any GPT system works. The difference between GPT-1, GPT-2, GPT-3, and GPT-4 has mainly been size. There are some minor architectural differences but the main difference has been the size of the neurons and the amount of training data. The power of the system comes not only from the architecture, but also the size of the system and the amount of training data used. If you are trying to train GPT-2 on your laptop or a cloud virtual machine like Karpathy shows, your results are likely to be much worse in overall performance than the actual GPT-2 system unless you've acquired a similar amount of training data and are willing to pay the money for the compute.
How we’ll end up programming on top of GPTs seems a bit reminiscent of how Manuel programs Mike in The Moon is a Harsh Mistress by Robert Heinlein. Manuel (one of the main characters) is a tech that is paid to fix Mike (an advanced AI). Their interactions are a type of programming in natural language, a similar manner to how you might learn in conversation with others. Their initial conversation is as follows as Manuel performs the equivalent of a diagnostic check:
So they sent for me to find out why Mike wanted to give away ten million billion Authority Scrip dollars, and fix it before Mike overpaid somebody a mere ten thousand. I took it, time plus bonus, but did not go to circuitry where fault logically should be. Once inside and door locked I put down tools and sat down. “Hi, Mike.”
He winked lights at me. “Hello, Man.”
“What do you know?”
He hesitated. I know—machines don’t hesitate. But remember, Mike was designed to operate on incomplete data. Lately he had reprogrammed himself to put emphasis on words; his hesitations were dramatic. Maybe he spent pauses stirring random numbers to see how they matched his memories.
“‘In the beginning,’” Mike intoned, “‘God created the heaven and the earth. And the earth was without form, and void; and darkness was upon the face of the deep. And—’”
“Hold it!” I said. “Cancel. Run everything back to zero.” Should have known better than to ask wide-open question. He might read out entire Encyclopaedia Britannica. Backwards. Then go on with every book in Luna. Used to be he could read only microfilm, but late ’74 he got a new scanning camera with suction-cup waldoes to handle paper and then he read everything.
“You asked what I knew.” His binary read-out lights rippled back and forth—a chuckle. Mike could laugh with voder, a horrible sound, but reserved that for something really funny, say a cosmic calamity.
“Should have said,” I went on, “‘What do you know that’s new?’ But don’t read out today’s papers; that was a friendly greeting, plus invitation to tell me anything you think would interest me. Otherwise null program.”
Mike mulled this. He was weirdest mixture of unsophisticated baby and wise old man. No instincts (well, don’t think he could have had), no inborn traits, no human rearing, no experience in human sense—and more stored data than a platoon of geniuses.
The conversation above is a very real possibility of how we will interact with some programs in the future. A natural language interface, probably spoken, as if the program is another human-like entity. That brings out its own set of issues on the best way to speak to a program to get the outputs you want. I can foresee individuals who create SI, or spoken interfaces, that craft experiences of how one interacts with an AI. These crafters of personality are already starting to take shape in services like Character.ai and the various voices of ElevenLabs.
At the same time, GPTs will create smaller programs for re-use or take actions based on the conversations we have with them. This is possible because GPTs can be augmented and built upon with more abilities. However, right out the box, one of the abilities of current GPTs is the ability to generate language. Code is itself a type of language and GPTs can create code, potentially to augment themselves or run other processes, albeit the current state of code generation is hit or miss depending on subject matter and specificity. The power of GPTs though is in the general-purpose ability, that is they can take action without having pre-defined routines. However, they can also make use of external APIs in order to take more complicated action that have already been figured out, just like programmers today take advantage of external APIs.
A New World
What does that mean for the future of applications? As the world progresses, people will interact less and less with raw GPT systems due to their position more as an operating system. Instead, people will be interacting with applications built on top of these GPTs. Some people are calling these applications ‘agents’ but really they are simply a different type of generalized program. As I previously wrote, the power of agents/programs running on top of GPTs is in the general problem solving that they can do. The fuzziness of these systems allows them to take actions on things they haven’t been previously explicitly programmed for. The reason for this is that GPTs perform a type of metaprogramming, code that writes code, which is especially helpeful when dealing with external actuators. This is helpful because GPTs can't do everything straightaway; they need quality actuators (typically in the form of external APIs that have figured out a complicated process) to take meaningful actions. Hence, a multitude of applications will be built that extend the capabilities of the GPT layer.
These applications will provide an additional layer of abstraction to accomplish your goals without sweating the finer details. Unfortunately, you'll still need a lot of hard work to hammer out the last 20% of tail end of performance. So, it won't be all sunshine and rainbows. You can’t just think something into existence. Creation is hard and you'll need to take the time to refine what you actually want. The last mile of performance is always the toughest.
One big issue with these agent applications running on top of GPTs right now is response time. It takes a relatively long amount of time to process. That's a limiting factor for any type of application that needs to be real-time. However, if your tasks can be asynchronous, like you would with dispatching a task to an employee, then the process will feel much more natural. While some might suggest one can just scale the number of GPUs to run the process, availability is difficult at the moment. There's a war going on to obtain GPUs and companies are racing to collect as many as possible. Even if the seemingly unending GPU appetite becomes satiated, I think response times may still be slow as larger and larger models with more capabilities are built. An increase in available compute will cause an induced demand of more power hungry models. This is a phenomenon that happens. Just look at how fast programs open in Windows 3.1 when compute was limited compared to the time it takes Windows 11 now with much larger compute. Programs become bigger to take advantage of available hardware.
Another issue is how to vet the outputs and actions taken by an agent. When automating actions, creating programs, or calling external API services, how do you know that the action has been executed correctly? Particularly if you aren't an expert on the actual action being taken. I've previously written that ways to perform hands-off verification are crucial to working with any automation system. That is even more important now as agents/programs/applications are built upon GPT systems known to hallucinate (make errors). Would you feel comfortable setting one up to buy things or make payments for you? Would you want a system to accidentally pay someone $5,000 instead of the requested $50? At the moment, the controls and verifications are not in place to ensure that extra quantities aren't purchased or that a value isn't fat fingered. This issue extends to even low risk things like sending emails or creating alerts.
Looking ahead to the near future, an explosion of applications are going to be built on top of GPTs. That means GPT type systems will act more as a layer in the application stack, providing generalized functionality to applications built on top of them. However, the work to refine the application beyond a proof of concept will still take a large amount of effort. Additionally, new methods for verifying the outputs of the GPT layer will emerge as their own secondary layer. Roles will emerge that deal with shaping personality and language interactions of applications built on top of the GPT layer. An exciting future lies ahead as we find ways to augment the world around us.