Tinkering is a way of understanding difficult problems, of wrapping our heads around them and quantifying the unknowns.
-Gever Tulley
The main work of AI/ML is focused on making better and better predictions. However, the reason research is conducted so heavily in this area is that no one knows what will work ahead of time. It’s a funny thing that for a field so focused on prediction, we can’t predict what will move the field forward. In order to make gains in the field, you need to maximize your exposure to randomness and that means tinkering. Tinkering is quickly playing with, testing, and trying out ideas to see what works and how things behave. It means having a large number of failures which are typically low cost but take up a majority of your time. Tinkering builds your understanding of what works and what doesn't while you search for large improvements.
It was once described to me that a company is composed of sales and engineering, everything else is overhead. Both of these areas require tinkering. In sales, you are figuring out the right pitch for the right individual. Engineering requires playing with current and upcoming technology to find a better way to efficiently solve a problem. Being effective in either of these areas requires an understanding of a few different mental models about from where gains actually derive.
Mediocristan and Extremistan
In Nassim Taleb's Black Swan he describes two different types of statistical distributions - the normal and power law distributions. He dubs the types of areas governed by normal distributions Mediocristan and those governed by power law distributions as Extremistan. Mediocristan tends to govern physical quantities such as height, weight, caloric consumption, thermal capacity, etc, while Extremistan governs pure numerical quantities based more on interactions such as wealth, innovation, networks, number of books sold, etc. Extremistan is governed by outliers while Mediocristan is not. To explain further:
Mediocristan: “Assume you round up a thousand people randomly selected from the general and have them stand next to each other in a stadium […] Imagine the heaviest person you can think of and add him to that sample. Assuming he weighs three times the average, between four hundred and five hundred pounds, he will rarely represent more than a very small fraction of the weight of the entire population (in this case, about a half a percent.) […] You can get even more aggressive. If you picked the heaviest biologically possible human on the planet (who yet can still be called a human), he would not represent more than, say, 0.6 percent of the total, a very negligible increase.”
Extremistan: “Consider by comparison the net worth of the thousand people you lined up in the stadium. Add to them the wealthiest person to be found on the planet—say Bill Gates, the founder of Microsoft. Assume his net worth to be close to $80 billion—with the total capital of the others around a few million. How much of the total wealth would he represent? 99.9 percent? [...] For someone’s weight to represent such a share, he would need to weigh fifty million pounds!”
What does this have to do with tinkering? Innovation lives in Extremistan as the impact of true innovation is orders of magnitude beyond both the cost and typical impact. Only by increasing your surface area of exposure to randomness can you hope to have an outsized impact. Tinkering increases your exposure to randomness by providing many attempts to find something big. However, tinkering comes at a cost as most attempts end in failure. The most likely outcome is that nothing effective will have been found. You will however have learned a lot in the process, which should be highly valued. Tinkering increases your potential exposure to positive Black Swans. Taleb describes a Black Swan as follows:
What we call here a Black Swan (and capitalize it) is an event with the following three attributes.
First, it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme 'impact'. Third, in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.
I stop and summarize the triplet: rarity, extreme 'impact', and retrospective (though not prospective) predictability. A small number of Black Swans explains almost everything in our world, from the success of ideas and religions, to the dynamics of historical events, to elements of our own personal lives.
Alphabet's X
I haven't heard of any organization more focused on tinkering than Alphabet's X. X is Alphabet's moonshoot factory. Their goal is to have "10x impact on the world’s most intractable problems, not just 10% improvement." X is led by Astro Teller who has some quite interesting ways of thinking about company culture. He gave a great talk at Stanford many years ago about how X celebrates failure to push innovation forward. Their team has audacious goals every quarter which should have only a 10% probability of accomplishment. People are actually awarded for failing on these projects, fiscally and socially. It aligns incentives with the lip service of what management is intending. The culture of X is that of tinkering, trying lots of stuff to learn and figure out what works. If something isn't working, kill the project quickly.
X is a learning organization. However, that doesn't mean that they take big risks blindly. They want to take calculated risks where there is an opportunity to learn and improve. They don't intend to do something that will obviously fail. Instead, they realize that they can't see all of the uncertainty and environmental factors around a project 5-10 years into the future. Their process tries to account for uncertainty by having a robust weeding out process for determining what to continue pursuing. They believe that the cost of a false positive (we think it is a great idea but it's not a great idea) is much worse than a false negative (we think it isn't a great idea but it actually is a great idea). This view only becomes an issue if they are supply constricted on great ideas or tough problems.
For X, the rate limiting step is not being smart or creative. Instead the issue is how to get a huge number of ideas on the table and weed through them effectively. A tiny fraction of these ideas then move on to the next stage. Most of the de-risking of a project in the early stages is on the technical side through prototyping (tinkering). Later on, the stages are about economic viability and impact. The goal is to kill more than half the projects after the technical side has been proven. X wants to end projects for the right reasons to remain focused on its overall objective of outsize impact on a time horizon of 5-10 years. They want to figure out how to discover as fast as possible if something will work and if not to get on to something else. They are continuously exploring and doubling down on what is working.
Exploration Budget
I wrote previously about the concept of an exploration budget for dealing with over optimization. An exploration budget is a way to set aside a certain allocation strictly for trying things out. It’s important to know that most of this time will be “wasted” and not to expect a return on any particular initiative. However, over the long run it should produce superior returns due to exposure to nonlinear upside. That is, you explore to find positive black swans. However, in order to explore you need to allocate some amount of resources to it.
Budget allocations are estimates. They estimate how much effort and how many resources are required to achieve a given objective. These allocations are estimates as there is a large amount of uncertainty around what is required to achieve the stated objective. In an ideal world, you would know the exact cost of achieving a particular outcome, but unfortunately the world is far from ideal. To account for the uncertainty, buffers or safety factors are built into the planning process with the hope that they are neither too large to be wasteful or too small to lead to "unexpected" overruns.
Now let's go back to something that Astro Teller mentioned in his talk above. If you finished designing and building a new product and then all of a sudden you lost all of the plans, designs, and other data, how long would it take you to recreate it? Maybe 10% of the total time? Then what was the other 90% of the time for? You were learning how to create that new product. Astro Teller advocates for finding ways to figure out how to learn quickly to compress down that 90%. Alternatively, you can take part of that 90% to explore alternative paths not just for the product but for techniques. In effect, allocate part of the 90% of project effort to tinkering and exploring new methods that might have outsized gains.
In 2022 Alphabet's operating expenses were $81.79B and the estimated cost to run X is somewhere between $3.5-8B. Which means Alphabet has allocated somewhere between 5-10% of their budget on high impact tinkering. So how can we set our own exploration budget? Let's first look at our constraints:
We can't tinker or explore all the time otherwise we'll never get the actual work finished.
Since the ideal tinkering impact exceeds the cost by orders of magnitude, the allocation should be relatively small.
Additionally, we need to find a way to maximize the number of potential attempts to increase the chances of a successful attempt.
The allocation should fit into the typical buffer so as not to put any project at risk of failure
Knowing these constraints, what should we do? Well, it is evident that a majority of our budget should go to finishing the actual objective. That likely means we need to spend somewhere between 75-95% of our budget on the actual objective and 5-25% on exploration. This is the equivalent of deploying a barbell strategy in which most of our assets are deployed to "safe" investments and a small portion is allocated to high-risk high-reward investments. Now that we know the general ballpark of what amount we should allocate to exploration, how do we determine the proportion to use? Experimentation, simulation, rules of thumb, and understanding where you can actually shed buffer while still remaining robust. While that answer might be unsatisfactory, every problem, environment, and personnel situation is different. Set things too high and your objective is put at risk. Too low and you likely won't get the exposure required for the gains.
When dealing with randomness, understanding and controlling your exposure is essential. A cultural mindset of tinkering, trying things out, and understanding how things work will help identify opportunities for outlier impact. Understand that most tinkering will not result in immediate impact, in fact most of it will look like failures. But then you are tinkering to learn.