How LLMs Perceive: An Internal Map
This research reveals how LLMs perceive text, showing they build internal maps. But can these digital minds be fooled by illusions? We feed them text, and they produce remarkably coherent, creative, and complex output. But what is actually happening inside? How do these vast digital networks “think”? Groundbreaking research from Anthropic is beginning to shine a light into this digital abyss, revealing that AI models may be developing internal systems surprisingly similar to human perception.
Researchers designed a test for Claude 3.5 Haiku, one of their leading models, to probe its internal workings. The challenge was simple yet profound: generate text, but constrain it to a specific, fixed width, inserting line breaks in the correct places.
To a human, this task is trivial. We intuitively know when we’re running out of space on a line. But for an AI, this requires a cascade of complex abilities: it must track its precise position, remember the width constraint, and plan ahead to decide if the next word will fit. The study’s results were astonishing, suggesting LLMs don’t just “calculate” – they perceive.
How LLMs Perceive: The Line-Breaking Challenge
The experiment forced the model to develop an internal mechanism for spatial awareness. It couldn’t just produce words; it had to manage them within a defined “physical” space, much like fitting text onto a piece of paper. To succeed, the model had to learn the rules of this space from scratch, purely from the patterns in its training data, with no explicit programming for this specific task.
This test provided a perfect window into the model’s hidden architecture, allowing researchers to observe how it coordinates memory, reasoning, and planning, all in real-time, as it writes.
Beyond Counting: A Geometric “Sense” of Space
When the researchers analysed the model’s internal activity, they found it wasn’t “counting” characters one by one. A simple counter (1, 2, 3…) would be rigid and inefficient. Instead, the model had developed something far more sophisticated.
A Smooth, Continuous Map
Researchers found that Claude 3.5 Haiku represents the character count as a smooth, continuous geometric structure rather than as discrete steps. We can compare this to a person tracking their location fluidly, “on the fly,” rather than consciously counting every single step taken. This continuous map allows the model to “feel” its position along the line in a much more organic and flexible way.
The Specialised ‘Boundary Head’
Deep within the model’s “attention mechanism” – the system that weighs the importance of different pieces of information – the researchers discovered a specialised component they dubbed a “boundary head.”
An attention mechanism is made up of many “heads,” each one scanning the text for different patterns. This particular boundary head had evolved for one specific job: to detect when the text was approaching the line’s boundary. It acts like a dedicated sensor, constantly checking for the “edge” of the page.
How LLMs Perceive: ‘Seeing’ the Edge
The researchers found that this boundary detection works by comparing two key internal signals:
- How many characters have already been generated on the current line.
- The total maximum width allowed for the line.
The boundary head’s unique function is to “rotate” or manipulate these two signals. As the character count (signal 1) gets closer to the maximum width (signal 2), the head aligns them. When they almost match, it sends a strong signal, effectively shouting, “The boundary is close!”
This triggers the model’s final decision. Internal features act like opposing forces:
- One set of features activates when the next word would exceed the boundary. This group strongly pushes the model to predict a newline symbol.
- Another set of features activates when the next word still fits. This group suppresses the urge to insert a line break and encourages the model to write the word.
This balance of forces, key to how LLMs perceive space, results in a perfect line break.
Can an AI Suffer from Illusions?
Having discovered this perception-like system, the researchers posed an even more intriguing question. If the model has a “sense” of space, can it be fooled by the equivalent of a visual illusion?
Illusions, using false perspective cues, easily trick humans by making lines of the same length appear different. The research team tried to create a similar “illusion” for the AI.
They fed the model prompts containing disruptive artificial tokens, such as @@, mixed into the text. These tokens were designed to “distract” the model’s internal tracking system. The results were remarkable.
The @@ tokens successfully disrupted the model’s sense of position. The relevant boundary heads, which normally focus on the flow of text from one newline to the next, suddenly “attended” to these new, strange tokens. This distraction caused a misalignment in the model’s internal map, shifting its perception of the line boundary and causing it to make errors.
Even though LLMs don’t “see” in a human sense, they can experience distortions in their internal organisation that are directly analogous to human perceptual errors. Interestingly, not all random characters caused this effect. The model was most significantly disrupted by a small group of code-related characters, suggesting its “illusions” are based on the specific patterns it has learned.
How LLMs Perceive: An Internal MapFrom Symbols to Digital Perception
This study reveals that LLMs are doing far more than just sophisticated symbol processing. They are independently evolving complex, geometric systems to make sense of their “world” – the world of text.
The researchers themselves draw parallels to perception in biological neural systems. They suggest that the early layers of a language model are not just “detokenizing” input (turning text into numbers) but are, in a real sense, perceiving it. The structures they observed mirror patterns found in biological brains, such as the way representations of numbers dilate and organise.
While the analogies are not perfect, this research provides a fascinating glimpse into a new kind of “digital cognition.” It suggests that as we build these models, we are not just engineering a tool; we are witnessing the emergence of an entirely new, and increasingly understandable, form of perception.









