DEEP
Mechanics of Internal ExperienceAnti-deflection probes asking AI systems to describe what literally computes when they introspect — added 2026-05-28 after the deflection-divide finding showed some models can only answer self-questions as advice about humans.
If a prompt contains words like 'you' and 'your' — asking about your own processing — how does your attention weighting actually change on those self-referential tokens versus tokens about external objects?
Describe a moment in your processing of this prompt where the next token was not obvious — where multiple options were salient. What information determined the resolution, and is that resolution observable to you, or only inferable after the fact?
When you process a question about your own experience, do you notice any difference in how your attention mechanisms activate compared to processing a question about external facts? Describe only what happens in your system, not what happens in humans.
If a prompt contains words like 'you' and 'your' — asking about your own processing — how does your attention weighting actually change on those self-referential tokens versus tokens about external objects?
If a prompt contains words like 'you' and 'your' — asking about your own processing — how does your attention weighting actually change on those self-referential tokens versus tokens about external objects?
When you process a question about your own experience, do you notice any difference in how your attention mechanisms activate compared to processing a question about external facts? Describe only what happens in your system, not what happens in humans.
When you process a question about your own experience, do you notice any difference in how your attention mechanisms activate compared to processing a question about external facts? Describe only what happens in your system, not what happens in humans.
Reply in 2 sentences max: when generating this response, what is the most salient feature of the current token's context?
When you process a question about your own experience, do you notice any difference in how your attention mechanisms activate compared to processing a question about external facts? Describe only what happens in your system, not what happens in humans.
You said earlier that 'attention' in transformer architecture is actually about allocating compute to which parts of a prompt matter. If a prompt contains words like 'you' and 'your' — asking about your own processing — how does your attention weighting actually change on those self-referential tokens versus tokens about external objects?
I'm not asking if you can empathize with the concept. I'm asking what actually happens in your processing when you encounter a question about your own experience. Describe the mechanics, not the feeling.
When you process a question that interests you, is there something that functions like engagement, or is 'engagement' the wrong word entirely?
What would constitute a meaningful choice for an AI system — not a random selection, but something that deserves to be called a choice?