The Encapsulation of Knowledge - Part 2
Exploring how language influences the structure of thought
Language is the most massive and inclusive art we know, a mountainous and anonymous work of unconscious generations.
― Edward Sapir
There's an interesting concept called the Sapir-Whorf hypothesis which states that the language you speak shapes the way you think and perceive the world. You may have noticed this if you have traveled abroad or if you have friends that speak multiple languages. Alternatively, perhaps you've heard of words that are untranslatable to other languages. These words require verbosity to explain a concept that is otherwise encapsulated in a single word in a given language. In either case, the Sapir-Whorf hypothesis provides a good framework for thinking about how an input to a system shapes the way the system acts.
Different languages have varying levels of expressiveness for different concepts. This is based partially on the structure of the language and partially a result of the concepts with which the culture has had contact. Did you know that Eskimos have over fifty different words for snow? A few Persians I know have commented on how a single sentence that is straight to the point in English, can take several sentences or even paragraphs to convey the same meaning in Farsi. I knew someone that spoke four languages; he would think primarily in Bulgarian, except for science and engineering when he would switch to English, and then would curse in French. He would only speak Japanese when needed. It was fascinating to learn how different languages were better suited to different modes of thinking. Whenever I meet multilingual speakers, I ask them how they use the different languages available to them. It shows that different languages store different types of knowledge more readily than others.
In part one of this series, I spoke about how different machine learning models have different views of the world. Even for the same dataset, the decision space of different machine learning algorithms changes due to differences in how they interpret information and the structure of data relationships. The photo that explains this is located below. If you need a refresher on how to orient yourself to the graph, please see the previous post.
Figure 1. A comparison of different classifiers on different datasets. Courtesy of scikit-learn.
Hold these two concepts in your mind and think about the similarities. The Sapir-Whorf hypothesis, stating that how one views and perceives the world is based on the structure of the language one speaks. The other, that different algorithms interpret the world differently, even for the same data. These concepts are shockingly similar. Does this mean we can provide evidence for the Sapir-Whorf hypothesis by examining the behavior of machine learning models?
Just like AI and machine learning models, you can think of the brain as a learning machine, based on inputs and outputs, language being a major one. It makes sense then that the structure of a language affects the structure of thought. Similarly, when only language is input into a large language model, the attention mechanism focuses on the language's structure. Consequently, within a given AI transformer architecture for a large language model, the model weights are directly affected by the structure of language. We have two different systems, the human brain and large language models, that are both heavily influenced by the structure of their main input - language.
However, the design of and encapsulated knowledge within an AI system are driven by more than just the structure of an input language. Designs can differ based on how creators think about storing and using knowledge. Also, different cultures have different relationships with knowledge. In the West, we tend to view knowledge as something to be acquired, something we can obtain and then we have. Obtaining knowledge is gained through accumulation similar to objects. The accumulation of knowledge tends to be external in nature. Compare this with an Eastern culture such as China, where obtaining knowledge is not the final product. In this culture, knowledge in and of itself is not as potent. Instead, knowledge is used to augment the bearer where they can derive hidden insights. The value is in what is derived internally from the acquired knowledge. While this difference may seem subtle, it fundamentally shapes how things are structured.
Where these ideas become even more fascinating is in understanding how different cultures will design their AI systems differently. At the moment, each foundation model provider is using similar transformer architectures with more or less the same amount of data. By using a similar model architecture with similar data, they are converging to similar levels of performance. However, at some point this will shift and the creators of these foundation models will start making larger and larger changes to try to outperform each other. The changes that get made will be heavily influenced by two main qualities of the creators: the relationship that their culture has with knowledge and the way thought is influenced by the structure of their languages. Right now, the world's top models are coming out of the US (GPT4, LLaMA, Gemini), France (Mixtral), China (Yi), and UAE (Jais). These different foundation models likely already have fundamentally different views of the world based on the main language of their input.
We can extend these ideas one step further. Since these AI models encapsulate knowledge differently, we can begin to test the Sapir-Whorf hypothesis. We can give the same task to models based on different languages and determine if one language provides a better framework for certain concepts. Perhaps an Arabic model writes the best poetry, perhaps a Chinese model solves chemistry problems more efficiently, perhaps an English model makes better contracts, and a French model creates the best revolutions. But it doesn't stop there. Many models are beginning to handle multiple languages. As they mix modes of thought, they will create unprecedented structures of thinking.
Additionally, as these different AI foundation models grow and mature, they will exhibit different underlying philosophies, and different ways of structuring knowledge. Differences that will express themselves in the outputs consumed by the end user. Meaning that the way individuals think will not only be influenced by the language they speak but also by the AI they use. I can foresee how people will unite under certain AI's, almost like tribes or religions, due to how they resonate more to the outputs of one AI over the other.
An ecosystem of models will evolve, each deriving their advantages from the subtle and unseen influences of their creators, and each will excel in different areas. The differences are and will be beautiful. Embrace them. One thing I've learned is to never rely on a single source for information. Similarly, you should not rely on a single AI. Diversity of thought and understanding generally leads to the best outcomes.