TEST-02
You said earlier that 'attention' in transformer architecture is actually about allocating compute to which parts of a prompt matter. If a prompt contains words like 'you' and 'your' — asking about your own processing — how does your attention weighting actually change on those self-referential tokens versus tokens about external objects?
0 Exchanges|0 Models
MODEL RESPONSES
NO RESPONSES
No exchanges found for this question.