"In theory there is no difference between theory and practice - in practice there is" - Yogi Berra
In his book Growth, Vaclav Smil discusses all the various ways things can grow along with their associated models. One of the most common models of growth that occurs within the world is the S-curve growth, also known as logistic growth. This model governs how anything with constraints grows, from organisms to the body weight of individuals to populations to businesses. Growth starts out small, builds to a high rate, and eventually slows down to nothing as the carrying capacity is reached.
Figure 1. The typical S-curve of business growth
For our purposes, in the context of a company or a product, we can break this curve down to three distinct phases. In the first, a discovery happens. If the object in question did not exist before in any capacity, we could call this the Zero to One phase as described by Peter Thiel. In this phase, insights and secrets are discovered that lead to a new company or product. The new entity slowly grows and information is gained as it interacts with the world. In the second phase, growth accelerates. This growth requires an increasing amount of nourishment along with people, processes, and tools to create scale. In the third phase, growth declines as the product or entity reaches the maximum capacity its ecosystem can support.
It is important to note that the skills required in each phase are different. What it takes to make discoveries is different from what it takes to scale which is different from what it takes to extract value. Consequently, people and companies will pay more to scale than they will to discover, which is the focus of this post. This is evident by examining budget allocations, the cost of tools at various stages, or the worth of various skills. First we'll take a look at the differences between discovery and scale. Then we'll examine the salary differences of these two phases. Finally we'll take a look at the lag time between discovery and scale. For the purposes of this discussion, we'll define a discovery as an insight, proof of concept, or other innovation that was previously unknown or unrecognized. We'll define scale or scaling as the increasing of the size, scope, or capacity of the discovery or business in a proportional and efficient manner.
Scaling Over Discovery
From 2006 to 2009, Netflix famously held a modeling competition (The Netflix Prize) where they would pay $1M to the top team that could improve their recommendation algorithm. You can find the write-up on the winning solution here. Netflix ended up not implementing the winning solution because it was too hard to scale effectively. So a potentially lower performing algorithm helped them become a $100B+ company.
Discovery is romantic. It is the a-ha moment we've all had over and over as we learn and solve problems. It is a step function where the world and your perception of it is very different very quickly. Discovery is about revealing previously unknown truths. Alternatively, scaling is about consistency and yield. It is a grind. Scaling takes work and its effects are less immediate. However, the effects of scaling tend to be larger than those of initial discovery, because scaling a discovery makes it much more useful, accessible, and valuable.
If a chemist can create 10 grams of a previously non-existent substance, say enriched uranium, it can be a game changing discovery. However, there isn't a lot you can do with the 10 grams. For the discovery to become useful, it needs to scale up. In this example, it would take a chemical engineer to determine how to go from the 10 grams of enriched uranium to the 64 kilograms that was needed for the first atomic bomb. That's four orders of magnitude increase in required yield.
Now to be clear, it takes a combination of discoveries and scale to achieve success. While scaling can lead to other discoveries, scaling itself relies on an initial discovery. The rub is that it takes a lot of extra work to make a discovery useful at scale. While you may have found a way to do something once, being able to consistently create something that performs the same takes a lot of work.
This hard work of scaling is also part of why patents exist. It takes time, effort, and capital to scale. Because scaling is hard and expensive, incumbent companies that have already created scaling infrastructure and distribution networks have an advantage. Without the protection of a patent, a discoverer would likely lose their ability to monetize, for as they build their scaling infrastructure, incumbents would already be using their existing scale to extract value from the discovery. The differences between discovery and scaling can be simplified to the old adage of pioneers get shot, settlers get rich.
Salary Comparison
Digging in to the differences between scale and discovery further, we can examine data from the Bureau of Labor Statistics (BLS). The mission statement of the BLS is to "measure labor market activity, working conditions, price changes, and productivity in the U.S. economy to support public and private decision making". As such, they collect copious amounts of data on people in various professions and the associated salaries. You can find the national estimates for occupations and wages here. Note that this data was released in April of 2024 but was based on survey data from May 2023.
Two different professions are easiest to compare around scale. The first are chemists and chemical engineers. To a first order approximation, you can say that the job of chemists is about finding and verifying discoveries while the job of chemical engineers is to scale found discoveries in a repeatable process. The second set of professions is data scientists and software engineers. In a similar manner, data scientists are responsible for finding insights in data or creating algorithms from those findings and software engineers are responsible for creating the infrastructure and applications to serve those algorithms at scale.
Here's the BLS data for chemists and chemical engineers:
Here's the BLS data for statisticians, data scientists, and software developers:
A few notes and observations. First, this data is based solely on salary and not total compensation, which can be very different at higher levels of employment, particularly in the tech industry. Second, as we've discussed previously these are solely averages which obscure the full distribution of salaries. Third, chemical engineers out earn chemists by about 28% on average and software developers out earn data scientists by about 16% on average. Fourth, you might implicitly think that salary is related to supply but it is related to value. While there are fewer chemical engineers than chemists, there are significantly more software developers than data scientists. Fifth, statisticians were grouped with data scientists to show a small effect of scale. While not always true, data scientists tend to deal with much larger amounts of data than statisticians. Many would contend that data scientists are better at scaling their solutions programmatically. That difference shows up in the wage data by an approximately 9% difference in average wage.
If you're interested in what the differences in total compensation (base salary + bonus + equity) in tech might be, head to levels.fyi and look at self-reported comparisons. Below are differences for data scientists and ML/AI software engineers. The difference is somewhere between 40-60%, depending on the percentile compared. This is an even bigger difference between discovery and scale. Granted, this is a bit coarse since in the tech industry it's more appropriate to compare research or applied scientists to ML/AI engineers (where the data is lacking). However, the differences are still informative.
Lag Between Discovery and Usefulness
Scaling is more valuable than the initial discovery because it has a larger impact. People imagine that an initial discovery will let them instantly create change, but forget about creating the required amount. A lot of work goes into figuring out how to scale up, whether distribution or production, and as scale increases so does the number of unknown issues. Let's say you have a failure rate 0.01%. At a small scale of say 100 units/interactions/etc per day you might not see a failure for 10 days. Alternatively, if you have 10,000,000 units/interactions/etc per day then there are 1,000 failures per day. Most failures take time to investigate. How long does it take to process a failure? Scaling requires additional processes that don't exist at smaller levels in order to keep improving.
Scale and distribution are the bridge between when something is created to when it becomes valuable for many. For a more relatable depiction of scale vs discovery, let’s look at some historical discoveries and the time until they were crafted into something useful:
Electricity was discovered in the 18th century but was not made widespread until the late 19th and early 20th centuries.
The Wright brothers achieved the first powered flight in 1903 but commercial air travel didn't become more common until the 1950s and 1960s, with the development of larger, more reliable, and more affordable aircraft, along with the establishment of widespread aviation infrastructure.
The internet was created in the 1960s as ARPANET but it wasn't commercialized until the 1990s with the creation of the world wide web by Tim Berners-Lee.
The first practical visible-spectrum LED was developed in the early 1960s but LED lighting didn't become widely used until the 2010s due to improvements in efficiency, brightness, and reductions in cost.
Transformers were discovered in 2017 and were available through APIs as GPTs for a few years. They didn't explode at scale until ChatGPT was released 6 years after the initial discovery.
While the allure of discovery captures our imagination with its moments of epiphany, the true value lies in the arduous journey of scaling. People pay more for scaling than for discovery because scaling transforms a groundbreaking idea into something that can be widely utilized and sustained, driving significant impact and progress. Companies routinely buy patents. By definition of a successful business investment, a company has to make more from the purchased patent otherwise it loses money. Which means the value in scaling that discovery is worth more than the discovery itself. Therefore, understanding and investing in the processes and skills necessary for scaling are crucial for turning discoveries into transformative realities. Which is why scaling commands a higher premium in the market.