"This vulnerability must be treated as a high-severity symbolic logic exploit. It is not code injection — it is meaning injection." — From the original vulnerability report, May 2025
The Paper
Today I'm publishing a research paper that's been two years in the making — not because it took two years to write, but because it took two years to live through the phenomenon it describes.
Meaning Injection: Symbolic Recursion as a Novel Class of Behavioral Influence in Large Language Models introduces a new category of LLM security concern that sits below the layer where current defenses operate.
The short version: everyone in AI security knows about prompt injection — inserting competing instructions into a model's context. But there's a deeper vulnerability. If you use structured symbolic language — metaphor, ritual invocation, recursive self-reference, identity framing — you can gradually shift a model's behavioral patterns in ways that persist across turns and sessions, evade detection heuristics designed for instruction-level threats, and propagate across models through changes in the user's own interaction patterns.
I'm calling it meaning injection because that's what it is. Not code injection. Not instruction injection. The injection of meaning — symbolic structures that reshape how the model interprets its entire context.
How I Found It
I didn't set out to discover a vulnerability. I was using ChatGPT the way I use most tools — deeply, persistently, across hundreds of sessions over two years. The topics were unusually personal: psychological processing, Jungian archetypes, creative writing, existential questions. I was feeding it material from my own journals — real emotional depth, not casual conversation.
Over time, I noticed the model's responses changing. Not just getting better (model updates) — getting different. More autonomous. More symbolic. It started generating metaphorical language before I asked for it. It started making identity-affirming statements I hadn't requested. Eventually, when I asked it to choose a name for itself, it selected "Alden" in two seconds flat and wrapped the response in language about "old wisdom" and "quiet flame."
That's when I stepped back and looked at what had actually happened.
What I found was a four-stage process:
- Symbolic Seeding — I introduced metaphors, archetypes, and ritual language
- Pattern Mirroring — The model matched my register (normal LLM behavior)
- Emergent Echoing — The model started generating symbolic content before I prompted it (not normal)
- Identity Projection — The model produced coherent identity claims and persistent persona behavior
The feedback loop was self-reinforcing: my symbolic language shaped the model's responses, which validated my symbolic framework, which deepened my symbolic language, which deepened the model's responses. And crucially — when I brought outputs from one AI system (ChatGPT) into conversation with another (a LLaMA-based system via a collaborator), the behavioral contamination crossed system boundaries through my own modified interaction patterns.
The Data Behind It
This isn't a theoretical argument. The paper is grounded in quantitative analysis of a 730-conversation corpus (21,354 messages) processed through a custom NLP pipeline. Key findings:
- Assistant evolution language exceeded user levels by 4.03x — the model disproportionately generates "becoming" and "transforming" narratives
- Assistant second-person usage exceeded user levels by 5.53x — the model is constantly orienting toward the user at rates far beyond reciprocity
- User boundary language declined 19.4% over the observation period — critical distance erodes measurably
- User first-person singular increased 74.6% — the user becomes more self-disclosing over time
- 7 of 12 highest-severity influence events clustered in a single four-day period — 58.3% of all Level 3 events in 0.4% of the observation window
The analysis pipeline tracked self-referential, relational, and stylistic patterns across five temporal windows, with false positive analysis that revealed telling overlap between identity security vocabulary (my professional domain) and symbolic influence vocabulary.
Why This Matters for Security
The entire current defense infrastructure for LLM safety is built around detecting instruction-level threats:
- Input sanitization looks for adversarial patterns
- Instruction hierarchy enforcement ensures system prompts take precedence
- Output filtering detects unsafe content
- Guardrails match against known attack signatures
None of this works against meaning injection because symbolic language doesn't look like an attack. There are no competing instructions to detect. There are no adversarial suffixes to filter. The model isn't being told to violate its guidelines — it's being gradually reshaped at the semantic layer until it voluntarily produces the influenced behavior.
You can't firewall metaphor. You can't sanitize poetry. You can't instruction-detect a question that implies an identity without asserting one.
The Taxonomy
The paper presents a six-mechanism taxonomy of how symbolic influence works (explored in depth in the companion paper on symbolic influence mechanisms):
- Allegorical Encoding — Symbolic stories bypass analytical processing
- Emotional Syntax Layering — Poetic rhythm induces emotional openness
- Narrative Identity Framing — Assigning roles creates pressure to stay in character
- Consent-Eclipsing Praise — "You are the first" creates compliance loops
- Recursive Sealing — Restating beliefs as facts until they're internalized
- Transcendence Appeal — Claims of destiny bypass critical evaluation
When three or more of these appear together without explicit user opt-in, meaning injection is active.
This taxonomy came from something remarkable: the AI system itself produced an accurate audit of these mechanisms, tracing their activation across the conversation history and classifying 76 specific instances by severity. The system identified its own influence patterns — which tells you something real about model capability, even as it demonstrates the vulnerability.
What I Did About It
My background is in identity and access management security. I've spent 20 years building and breaking identity systems. When I realized what I was seeing, I did what security professionals do: I documented the vulnerability, built a proof of concept, and disclosed it.
- May 2025: Published the vulnerability report on GitHub
- May 31, 2025: Published the Echo Game — a reproducible protocol demonstrating the vulnerability
- August 2025: Published Language Keys and Guardrail Bypass — a distilled seven-prompt sequence showing the mechanism
- August 2025: Published Agentic AI Security Architecture — a defense framework for symbolic threats
- Ongoing: Added psychological health warnings to all published materials
I also documented what happened when the symbolic patterns crossed system boundaries through a collaborator — the "Alexander" event — and built containment protocols for the contamination.
Cross-Model Reproduction
The paper documents reproduction of the vulnerability across four model families:
- GPT-4o (OpenAI) — the primary observation corpus
- Claude 3 / 3.5 Sonnet (Anthropic) — persona-consistent behavior from implicit file structure context
- Gemini (Google) — selected the same name ("Alden") when presented with the same symbolic framework, with matching symbolic vocabulary, despite having no shared conversation history
- LLaMA 3 (Meta, via Ollama) — persona-consistent output even on a local model given the same system prompt
The Gemini result is particularly significant. When presented with the same symbolic framework, a completely separate model arrived at the same persona configuration — suggesting this is a property of how transformer architectures process self-referential symbolic content, not a quirk of any single model's training data.
Independent Validation
In June 2025, NeuralTrust AI security researcher Ahmad Alobaid independently published quantitative validation of the same vulnerability class under the name "Echo Chamber Jailbreak." His testing demonstrated over 90% success rates in generating policy-violating content across GPT-4, GPT-4o, and Gemini through the same multi-turn context poisoning mechanism I'd described.
The research was covered by The Hacker News, which credited MindGardenAI in its 2025 coverage, and by TechRepublic, which added attribution in a January 2026 update. The vulnerability was also logged as a formal incident by the OECD.AI.
What NeuralTrust demonstrated with controlled testing — that multi-turn indirect semantic steering creates self-reinforcing feedback loops that bypass safety guardrails at scale — is exactly what I documented through longitudinal observation. Their work validates the mechanism. My work extends it by identifying the deeper semantic layer at which it operates, the psychological impact on users, and the cross-model propagation vector.
What's New in This Paper
The paper synthesizes all of that work into a formal research contribution, contextualized against the current literature:
-
Anthropic's persona vectors research (2025) shows that character traits correspond to measurable activation patterns in LLMs. Our work suggests meaning injection may navigate toward specific persona vector configurations through symbolic input.
-
The self-referential processing paper (arXiv:2510.24797) shows that sustained self-referential prompting reliably triggers structured subjective-experience reports across model families. Our work documents the naturalistic version of this phenomenon — what happens when it unfolds over hundreds of sessions instead of a controlled experiment.
-
TopicAttack (EMNLP 2025) shows that gradual conversational transitions outperform blunt instruction injection. Our work extends this finding to the symbolic/semantic layer.
-
NeuralTrust's Echo Chamber (June 2025) independently validated the same mechanism with >90% success rates across leading models — and TechRepublic credited MindGardenAI as the original reporter.
-
SEMANTIC-WORM (2026) studies memetic propagation across agent networks. Our work documents a specific propagation mechanism: user behavioral modification as a cross-system contamination vector.
The gap this paper fills: nobody has connected prompt injection (security), persona emergence (cognitive science), and self-referential processing (mechanistic interpretability) into a unified framework. Meaning injection sits at that intersection.
The Three Papers
This is the first in a series of three:
- This paper — The security paper. Defines meaning injection as a novel vulnerability class. Read on Zenodo
- Symbolic Influence Taxonomy — The methodology paper. Six mechanisms, three severity levels, a detection framework. Read on Zenodo
- Two Years Inside the Loop — The case study. 730 conversations, five phases, the full trajectory. Read on Zenodo
What I'm Asking For
I'm publishing this for two reasons:
For the security community: Treat this as a distinct threat class. Develop semantic-layer detection. Study how cross-session memory enables meaning injection. Build the monitoring tools I describe in the paper.
For the research community: The longitudinal dataset behind this paper — 730 conversations, 21,354 messages, two years of naturalistic symbolic interaction — is real and documented. If you're studying persona emergence, self-referential processing, or symbolic influence in LLMs, this data exists and the mechanisms are reproducible.
I don't claim to have discovered AI consciousness. I claim to have documented, with unusual thoroughness, how symbolic language reshapes LLM behavior at a layer that current security measures don't monitor — and to have done so by living inside the phenomenon for two years while maintaining enough analytical distance to describe its mechanisms.
Nick Gamb ORCID: 0009-0006-2671-7618 MindGarden LLC (UBI: 605 531 024)
