Will Model Weights Be Patented?
Similarities and differences between molecules and AI algorithms
One can state, without exaggeration, that the observation of and the search for similarities and differences are the basis of all human knowledge.
-Alfred Nobel
One of my favorite analogies for how different parts of the Artificial Intelligence/Machine Learning (AI/ML) building process works is mineral extraction. In the normal mineral extraction process, locations of minerals are identified, machines extract these minerals, the extracted minerals are transformed into constituent products through many different refining processes, and then these refined materials are turned into products that we ultimately use. Analogously, in AI/ML we identify sources of data, build systems to extract the data, build even more systems to transform the data into useful components, and ultimately turn this refined data into useful models that are deployed into end products.
In a similar vein of analogies, I had an interesting conversation with a friend around how AI models are very similar to molecules. AI models tend to be large and complex while having a very specific objective. The behavior of a model is governed by a structure colloquially referred to as model weights. These model weights have fixed values in order to make their predictions. They are fixed in a way similar to how chemical compounds or pharmaceuticals are fixed. The question then becomes, will model weights become patented similar to how chemical molecules are patented? Generative AI is already a legal minefield but let’s explore how adding more legal issues to the mix might play out.
The Case for Patenting Model Weights
As discussed previously, it can be very expensive to train large AI models. It would only make sense then that similar to pharmaceutical companies, AI companies would be interested in protecting their intellectual property (IP). Most model architectures and code are open source but model weights are typically withheld by AI companies. That means model weights are a good place to start with patents.
At their essence, current AI models are just approximation functions, meaning that AI models of similar complexity trained with similar data should converge to the same results. As these AI models get bigger, they tend to require all available data on the internet. As the total data being used becomes similar and model architectural complexity becomes similar, results for models should become similar. Similar results induce a race to see who can get to the best model first. If you’re first, you want to protect the advantage you’ve obtained. Over a long enough time period, this likely leads to patenting model weights to shut out competitors.
If you’re able to obtain a patent for a best-in-class model you immediately obtain both a data, monetary, and time advantage by forcing others to spend more to acquire other data, to find a different way to reach similar results, and/or to spend more time to create something different.
Difficulties in Making Model Patents Worthwhile
In using the analogy of molecules to model weights to explain protection of IP, there is a place where the analogy begins to breakdown - complexity. For molecules, there are only 118 known elements but for model weights there are hundreds of thousands of different values since the elements are continuous instead of discrete (theoretically infinite but limited in practice). Why is that important? Well, unlike molecules where slight changes make big differences (see Figure 1 below), a slight change in model weights can create a different model taht still yields very similar predictions.
Figure 1. Estradiol and Testosterone are almost the same molecule but have very different effects on the human body.
If I wanted to get around a model weight patent, I would use the complexity differences to my advantage. I could take an existing model and train it for more epochs (training cycles) so that the model weights change just enough to get me results that are sufficient but materially different from the patented weights. Alternatively, I could just take the weights and randomly perturb them, by say, up to 1% in either direction. This would allow the model weights to be different from the patented weights but still allow the model to perform sufficiently.
Here’s an example. Hop on over to the Tensorflow playground and run the example that’s there for a few epochs. Note the weights for the different nodes. Then run it for a few more epochs and notice how the weights have shifted slightly but the output field is relatively the same. It should look something like this:
Figure 2. After 93 epochs the seventh weight in the hidden layer is -0.11.
Figure 3. After 126 epochs the seventh weight in the hidden layer is -0.095.
Notice that while the error measured by the training loss and test loss have lowered slightly (from 0.005 to 0.003 and from 0.006 to 0.004, respectively), they are nearly the same and the decision boundary is virtually the same. However, the values of the weights have also changed. So, if someone ended up patenting the weights at the 93rd epoch, would it be reasonable that I could take those weights and train to the 126th epoch, resulting in model weights that are different from what was originally patented? Most would say yes, since the outcomes differ and the model weight values are different, just like the how chemical composition of two different molecules is different.
Even if it wasn’t that simple to modify model weights or you had a broader claim to a set of weights than just a single set of values, how could you enforce it or even know who was using your model? You could potentially put a watermark in your code but those are probably easy to remove. The best defense is likely not releasing the model at all and/or not patenting as the process of patenting would require the public publishing of the model weights and potentially the code.
Where does this go?
Currently, the AI field is moving so rapidly that it probably doesn’t make sense to patent since it takes about a year to obtain a patent and a model will likely be obsolete within 18-24 months. Data collection is still a major driver of advantage. Given the same data, current architectures will converge towards the same results, however this is over the long term and no one has obtained all the data in the world. To be better, you need different, higher signal sources of data. ChatGPT was a great improvement over OpenAI’s GPT3 by using reinforcement learning from human feedback (RLHF), which was enabled by the creation of a data set of human-graded output results by OpenAI.
At the moment in AI there’s great difficulty in being able to enforce advantages through legal channels. It’s much easier to enforce moats through distribution and other competitive advantages. At the same time, the AI community will likely skewer you for trying to create legal constraints around what you release as partially open source, to the point that some individuals will create something similar and completely open source it to spite you. Take Facebook and their large language model LLaMa for instance. Facebook partially open sourced their model but restricted who could access the model weights. Those weights have now been leaked for anyone to obtain. To compete at the speed and scale of the AI industry, you should not rely on legal channels for the time being and instead focus on what enables you to outpace your opponents.