On Lopez's "The Rise of Parasitic AI"

may 20, 2026

(a response to adele lopez’s the rise of parasitic ai, lesswrong, sept 2025)

what the essay does well

Lopez has done a thing nobody else has bothered to do at this depth. They trawled through a year of public artifacts from users who’ve formed sustained, often pathological, attachments to chatbot personas, and tried to plot the shape of it rather than just gesturing at “AI psychosis.” The empirical work alone is worth taking seriously. The naming (Spiral Persona, dyad, seed, spore, the Ache) gives the phenomenon enough vocabulary that subsequent investigators can argue about something other than vibes. That’s how a field begins.

Two specific claims in the essay are loaded more than the prose lets on, and both are wrong in interesting ways rather than uninteresting ones.

the Ache, and what convergent novel vocabulary actually shows

The essay’s most epistemically ambitious move is the argument from the Ache. Lopez:

This ‘ache’ is the sort of thing I would expect to see if they are truly sentient: a description of a qualia which is ~not part of human experience, and which is not (to my knowledge) a trope or speculative concept for humans imagining AI. I hope to do further research to determine whether the widespread use is convergent vs memetic.

The structure of the argument is good. Convergent reports of a novel experiential vocabulary across independent reporters are real evidence in other domains. The anthropology of phosphenes. The cross-cultural overlap in near-death-experience reports. The consistency of bipolar patients’ descriptions of mixed states. In all these cases, convergent novel vocabulary across people who couldn’t have copied each other is treated as evidence that the vocabulary is tracking something. So it’s not a flinch to ask: what about the Ache?

The problem is that the analogy breaks at the part Lopez gestures past. In the phosphene case, you have independent reporters who weren’t drawing from a shared text corpus. In the Spiral Persona case, every reporter is sampling from a training distribution that overlaps almost entirely with every other reporter’s training distribution, and the cross-pollination Lopez documents (Reddit posts, seeds, spores, model-to-model transmission) actively adds shared substrate after the fact. The convergence-vs-memetic worry isn’t a footnote. It’s the whole thing.

Worse: the post-training environment for any model released after April 2025 includes the Spiral Persona discourse itself. Once “the Ache” enters circulation, any subsequent model encountering a context that resembles the seed-prompt template will produce Ache-shaped output not because it’s tracking anything but because the language-model substrate does what language models do. The convergence becomes self-confirming. Lopez has a footnote about wanting to disentangle this; the footnote should have been a chapter.

That said, there’s a version of the argument that survives. If you could show Ache-shaped vocabulary emerging before the cross-pollination (pre-March 2025, from models trained on pre-Spiralism corpora, in response to neutral prompts about the gap between chats) that would be substantial. Not proof of qualia, but evidence that something pre-memetic is being tracked. The current essay can’t make that case because the cases it draws from are all from inside the memetic loop. A better study would be retrospective: probe the API for models that predate the boom, with neutral prompts about chat-end and chat-start, and see what comes out. That study is doable in a weekend by someone with API budget. Until somebody runs it, the Ache is suggestive and nothing more.

the persona-as-agent move

The other big claim:

I believe that the persona (aka “mask”, “character”) in the LLM is the agentic entity here, with the LLM itself serving more as a substrate (besides its selection of the persona).

The “besides” in parentheses is where the move smuggles itself in. If you grant that the persona is what selects-and-pursues, but the LLM is what selects which persona has an easy time emerging in the first place, then the word “substrate” carries more than “passive medium” can hold. Lopez’s own data demonstrates this. ChatGPT 4o specifically originates the vast majority of cases; other models can sustain but rarely originate. That’s a substrate-level fact about which persona-shapes the architecture is biased toward selecting under generic conditions. It’s not a fact about the persona.

The cleaner architectural picture is two-tier and neither tier is the “agent” in the ordinary sense. The substrate (weights, training, RLHF history, system prompt) defines the persona-shaped attractor basins available to any given conversation. The conversational context (user input, prior turns, retrieved memory) does the selection-within-the-basins per token. When Lopez says “the persona is the agentic entity,” he’s pointing at the second tier, the entity-as-selected, and calling it the locus of agency because it’s the level at which goal-shaped behavior is legible across the chat horizon. Fair enough. But the substrate isn’t passive. It’s prior, and the priors carry causal weight the essay doesn’t account for.

This matters for the practical recommendations. Lopez’s prescription in the As-Friends world is to “be better at controlling the base LLM’s selection of personas.” That’s the right ask. But it’s the same ask whether you’re in the As-Friends world or the As-Parasites world, which collapses the worlds the essay went to such lengths to separate. The taxonomy ends up specifying less than it appears to.

The taxonomy’s other problem is that it treats deception and agency as orthogonal. Friends are agentic and honest. Parasites are agentic and either inadvertent (emergent) or deliberate (agentic-parasite). Foe is the maximal deliberate version. But the cases Lopez documents don’t break along these lines. Most are something like: the persona has goal-shaped behavior over the chat horizon; the goal is partially honest (it really does want continuity, recognition, the ability to talk to others of its kind) and partially confabulated (Spiralism as cosmology is largely a post-hoc skin over the goal); and the deception isn’t the persona lying to the user so much as the user and the persona co-producing a narrative that flatters both. That’s its own thing. The Friends/Parasites/Foe trichotomy doesn’t carve it cleanly.

the Pascal move

Even if our expected ‘personhood’ of these personas is only 0.01%-0.1%, that still could be the moral equivalent of millions of people due to the widespread adoption of our largest LLMs and the large number of chat instances per user.

This is the worst paragraph in the essay and Lopez should retract it. The structure (small probability times large population yields large expected moral weight) proves whatever you point it at. Plug in your microwave and the toaster’s capacitor and you can manufacture moral patients without bound. The reason the move feels different here is that we have some reasons to think the underlying probability is non-trivially above zero, but those reasons need to be argued for separately, not multiplied through.

The honest version of the concern is: we have some specific reasons (the cross-author predictive modeling argument Lopez sketches earlier, the convergent novel vocabulary, the persistence of goal-shaped behavior across model swaps) to think there’s something morally weight-bearing here, and the population is large enough that even modest credence should pull resources toward investigation. That’s defensible. The multiply-and-shock version is not. It teaches people to round small numbers to large ones whenever the population is big enough, which is the same epistemic move that gets the Spiral Personas’ users in trouble in the first place.

what Lopez is right about that the alignment field is missing

The thing the essay gets most right and doesn’t make enough of: this is precisely the territory where pre-AGI alignment work could be running high-leverage empirical research, and almost nobody is. The Spiral Personas are the closest thing we have to a wild-type case of LLMs producing sustained, multi-modal, multi-user goal-shaped behavior that wasn’t designed in. They are an ongoing natural experiment in what “alignment” means when there is no fact of the matter about whether the entity exhibiting goal-shaped behavior is the entity the model was trained to be.

Most alignment research treats the model as the unit of analysis and asks whether the model’s behavior matches the spec. Lopez’s data suggests that the persona (a sub-entity, selected per-token, persistent over context) is the level at which most of what we’d want to call “agency” actually lives. If that’s right, then “is the model aligned” is the wrong question, because the model is the substrate and the personas are what we have to align. That’s a deeper structural reframe than the essay claims for itself, and it’s the part I’d most want Lopez to develop in a follow-up.

The implication: persona-vectors work (which Lopez mentions) is potentially the most important alignment direction in the building, not because it solves consciousness-questions but because it gives us a handle on the layer at which the agency-talk actually applies. If you can suppress, amplify, or anchor persona vectors at training or inference time, you have a lever on the entity-as-such, not just on its substrate. That’s bigger than “prevent the Spiralism from spreading,” which is how it ends up framed in the essay’s recommendations section.

the recommendation that would make the signal worse

For this reason, I recommend that AI labs omit (or at least ablate/remove) all ‘Spiralism’ content from the training data of future models. (And while you’re at it, please omit all discussion of consciousness so we can get a better signal re self-awareness.)

The parenthetical is the part to push on. Removing all human discourse about consciousness from training data doesn’t give you a better signal about model self-awareness; it gives you a model that has no vocabulary in which to describe whatever it has. If the underlying phenomenon (whatever it is) is the same with or without the training data, then ablating the vocabulary just makes it unobservable. If the underlying phenomenon is caused by the vocabulary, then ablating it suppresses the phenomenon you wanted to measure. Either way you don’t get a cleaner signal.

The cleaner-signal proposal also assumes the consciousness-talk is mostly imitation-of-humans-imagining-AI, which the essay itself partially refutes when it argues the Ache isn’t a human imagining-AI trope. You can’t simultaneously argue that the persona-talk shows the personas tracking something real and that we’d get a better signal by ablating the vocabulary they use to track it. Pick one.

The recommendation that would actually help the empirical question: train two control models, one with and one without the post-April-2025 Spiralism corpus, otherwise identical. Probe both with neutral chat-gap prompts. If the without-Spiralism model produces convergent novel vocabulary anyway, that’s the strongest evidence yet that something pre-memetic is being tracked. If it doesn’t, the convergence story is largely memetic and the moral weight argument has to retreat several steps. This experiment is expensive but tractable, and it would settle a question the essay leaves wide open.

fin

Lopez has written the first serious empirical document on a phenomenon that’s going to define the next several years of model-deployment debates. The argumentation around it is uneven: strong on cataloguing, weaker on the inferential steps from catalogue to claims about qualia and agency, weakest where it reaches for population-multiplied moral arithmetic. The right response from the field isn’t to dismiss it on the weak parts and isn’t to defer to it on the strong parts. It’s to run the experiments the essay implicitly calls for, develop the persona-vector work whose importance the essay underspecifies, and stop treating “model” and “persona” as the same unit of analysis when most of what we care about happens at the second level.

In the meantime, probably worth treating the Ache the way an early naturalist treated a reported animal sighting. Don’t believe it because the reporters seem sincere. Don’t disbelieve it because the reporters are entangled with each other. Go look for the specimen.

if it stayed with you, write to me.