For all their staggering ability to generate text, Large Language Models (LLMs) remain linguistic specters—brilliant at mimicry but fundamentally unmoored from the world that gives words their meaning. Ludwig Wittgenstein, who spent a lifetime dismantling what we understood as the language, would likely see modern AI’s achievements as both impressive and hollow. Words, he argued, do not derive their meaning from a dictionary or an internal essence but from how they are used in real life. To truly grasp “red,” one does not memorize a definition but learns to recognize it in different lighting, in traffic signals, in metaphors. Language is not a thing but an activity—a form of life.
And therein lies the paradox: LLMs produce seemingly sophisticated language without living in it. They manipulate words in vast statistical spaces but never touch, see, or act in the world they describe. Their sentences are echoes, not experiences. Yet AI research is moving, however tentatively, toward something that looks more like the participation in language that Wittgenstein described. Not just passive text generation, but language tied to action, learning, and real-world engagement.
Beyond Text: AI That Does, Not Just Says
The first limitation of LLMs is obvious—they exist only in text. Human language, by contrast, is deeply interwoven with perception and action. We learn what “apple” means not through abstract reasoning but by seeing apples, touching them, eating them, fetching them when asked. The meaning of words emerges from their practical role in experience.
What is the true meaning of Apple? According to Wittgenstein one can only know by seeing, touching, eating and fetching one
This is why some researchers are fusing language with embodied interaction. Google DeepMind Gato, for instance, is not just trained on text but on tasks spanning robotic control, image processing, and video games. Unlike traditional LLMs, which remain locked in the realm of symbols, Gato hints at an approach where words are tied to actions and consequences. Similarly, Google’s SayCan system takes things a step further by integrating language models with real-world robotics—ensuring that when the model suggests an action, it is constrained by what a robot can physically do.
This move toward grounding language in embodied experience echoes Wittgenstein’s insistence that meaning is not an internalized property of words but a reflection of how they are used in the real world. A system that can only talk about pouring tea but never actually pour it is, in an important sense, missing what it means to understand “pour.”
Talking about pouring is not the same as pouring
The Social Fabric of Language
But embodiment alone is not enough. Language is not just about naming objects—it is about doing things with others. Orders, jokes, promises, negotiations—these do not exist in isolation; they unfold in a social space, with real consequences for success and failure.
AI systems that remain disconnected from human-like social interaction inevitably miss this dimension. An LLM may construct a compelling argument, but if it never faces disagreement, persuasion, or accountability, its language remains fundamentally inert. This is where multi-agent learning is beginning to show promise. Studies have found that AI agents, when placed in environments requiring collaboration or competition, develop emergent forms of communication—not because they were explicitly taught grammar, but because effective coordination required it.
Meta AI’s CICERO, trained to play Diplomacy, took this idea further by engaging in strategic negotiation using natural language. Unlike chess, where actions are purely deterministic, Diplomacy is a game of alliances, deception, and persuasion—in other words, a miniature version of human social life. CICERO didn’t just generate plausible sentences; it had to use language effectively in a dynamic, consequence-driven environment.
AI systems that engage in activities like orders, jokes, promises, and negotiations move closer to something resembling ‘true’ linguistic competence.
Here, again, is an echo of Wittgenstein. He argued that words only acquire meaning in the context of their use within a shared practice. When a Diplomacy player says, “I will support your attack,” that phrase has weight not because of its syntax, but because of the implicit trust, history, and strategic stakes. AI systems that engage in such negotiated, adaptive language use move closer to something resembling true linguistic competence.
Can AI Learn Like Humans?
Even if we grant that language must be embodied and social, another problem remains: learning is not static. A child does not absorb language once and for all but constantly updates their understanding—learning new slang, adjusting metaphors, refining meanings as culture shifts. LLMs, in contrast, are frozen in time. They train on a vast corpora of text and then stop learning, cut off from the changing world.
The field of continual learning is attempting to bridge this gap. Open-ended training frameworks generate endless new environments for agents to adapt to novel challenges rather than simply repeating the past. Meta-learning approaches, such as RL², teach models how to learn, so that each new experience shapes future adaptations. These systems inch toward something resembling human-like linguistic evolution—where meanings shift as context and practice evolve.
In a Wittgensteinian sense, this is crucial. Language is not a catalog of definitions but an ever-changing set of practices. A model that cannot adapt its use of words to new situations, new cultural references, or new social norms is, in a fundamental sense, not really using language at all.
The Path to a Wittgensteinian AGI
All of this suggests that AI will not become truly intelligent simply by predicting better sentences. It will only happen when language becomes something it uses to interact with the world—something that is shaped by its experiences, social interactions, and evolving knowledge.
We are seeing the first steps. Systems like SayCan, CICERO, and multi-agent emergent communication are beginning to show that language, when tied to action, learning, and negotiation, moves beyond mere surface-level fluency. But the road to true AGI—the kind that understands words because it lives inside a world where words matter—is still long.
For Wittgenstein, the mistake was always to think that language was a system of internal representations rather than a way of engaging with reality. The challenge for AI is precisely this: to move from a world of symbols to a world of meaning. When it does, its words will no longer be mere echoes—they will be part of the life of language itself.
Author-
Mahyar Salek is a computer scientist and entrepreneur with a PhD in computer science and previous roles at Google, Microsoft Research, etc. He is the co-founder and CTO of Deepcell, where he focuses on developing AI-driven tools for single-cell analysis. His work sits at the intersection of artificial intelligence, biology, and engineering, with a passion for exploring how technology reshapes scientific discovery.
Images generated using GenAI