46. On Confabulation
For a while, I didn't quite understand what people meant when they talked about "LLM hallucination". "Oh," said I, on first hearing the term. "You mean the thing where image-recognition transformers can be induced to see things that are blatantly not there through adversarial input?" As it turned out very quickly, that's not what that term meant at all. Rather, it referred to a phenomenon where an LLM might produce all sorts of reasonable-looking factual-seeming text in an attempt to answer some particularly thorny prompt. "That's not hallucination," I replied. "That's confabulation."
Thankfully, more recently more people have joined me in debucketing proper hallucination - invention from whole cloth, or inappropriate regurgitation of memorized snippets of text or images, or inaccurate "recognition" - from confabulation. But what's confabulation? It's not lying. It's not being mistaken. It's not even necessarily being wrong. It's the sort of thing that happens when some source of words - LLM or human - perceives itself to be forced to provide some answer, any answer, to a prompt about which it's uncertain, or doesn't quite understand. It's flailing with words, trying to provide some good-looking answer or account that's ultimately indifferent to ground truth. Crucially, it's possible for an utterance to be only partially confabulation.
Take for example the classic advice given to artists and to game-makers in particular: critics and playtesters are excellent at knowing when something's not quite right, and good at telling where something's wrong, but absolutely awful at explaining what's wrong or how best to fix it. We might model this as confabulation about the nature of the problem, where it feels obligatory to give feedback, but also impossible to give feedback when one lacks context, vision, or honed artistic skill.
For another case study of sorts, consider what might happen around someone who's a hostile telepath to you. By "hostile telepath" I refer to the term of art; someone is a hostile telepath to you when the telepath seems to capable of some degree of reading your internal experiences, but you don't particularly trust them to respond to what they find with grace rather than punishment. As VS has noted in their seminal writeup on the topic, this often leads to the unconscious creation of a blindspot in yourself, where that self-deception need not be explicit or intentional. In such a case, you might find yourself coming up with all sorts of reasons for why you did something or didn't do a different thing - all of it smoothly routes around the blindspot you've developed, because some factor of your environment has convinced you that it's not safe to be honest, nor is it safe to be knowingly dishonest. So in your utterances and your justifications, you flail.
For one last case study, consider the case of split-brain patients with an opaque sheet splitting their field of view in two, with each hemisphere seeing half the world, but only the left brain able to speak. If shown a horse on the left side, and asked to pick a building out of a group of them, the patient might (verbally, or with right hand) pick a barn. Interestingly, if asked why they picked a barn, they might make up all sorts of stories about whimsy or childhood memories or arrangement on the page, when in reality it comes from their "silent partner" having seen a horse.
By now you hopefully find it more clear what confabulation is, and why LLMs display it quite distinctly from what I might term "proper hallucination" - they've been given some prompt that they can't especially handle, and rather than admit to a lack of knowledge or ability - be it due to RLHF selection pressure or a frontier lab's experiment suppressing the right circuit - they instead produce the most likely-seeming utterance that's probably some kind of appropriate response to the prompt. Except that it's not, and we can tell. Perhaps it's we humans that are the hostile telepaths here?
Thankfully, more recently more people have joined me in debucketing proper hallucination - invention from whole cloth, or inappropriate regurgitation of memorized snippets of text or images, or inaccurate "recognition" - from confabulation. But what's confabulation? It's not lying. It's not being mistaken. It's not even necessarily being wrong. It's the sort of thing that happens when some source of words - LLM or human - perceives itself to be forced to provide some answer, any answer, to a prompt about which it's uncertain, or doesn't quite understand. It's flailing with words, trying to provide some good-looking answer or account that's ultimately indifferent to ground truth. Crucially, it's possible for an utterance to be only partially confabulation.
Take for example the classic advice given to artists and to game-makers in particular: critics and playtesters are excellent at knowing when something's not quite right, and good at telling where something's wrong, but absolutely awful at explaining what's wrong or how best to fix it. We might model this as confabulation about the nature of the problem, where it feels obligatory to give feedback, but also impossible to give feedback when one lacks context, vision, or honed artistic skill.
For another case study of sorts, consider what might happen around someone who's a hostile telepath to you. By "hostile telepath" I refer to the term of art; someone is a hostile telepath to you when the telepath seems to capable of some degree of reading your internal experiences, but you don't particularly trust them to respond to what they find with grace rather than punishment. As VS has noted in their seminal writeup on the topic, this often leads to the unconscious creation of a blindspot in yourself, where that self-deception need not be explicit or intentional. In such a case, you might find yourself coming up with all sorts of reasons for why you did something or didn't do a different thing - all of it smoothly routes around the blindspot you've developed, because some factor of your environment has convinced you that it's not safe to be honest, nor is it safe to be knowingly dishonest. So in your utterances and your justifications, you flail.
For one last case study, consider the case of split-brain patients with an opaque sheet splitting their field of view in two, with each hemisphere seeing half the world, but only the left brain able to speak. If shown a horse on the left side, and asked to pick a building out of a group of them, the patient might (verbally, or with right hand) pick a barn. Interestingly, if asked why they picked a barn, they might make up all sorts of stories about whimsy or childhood memories or arrangement on the page, when in reality it comes from their "silent partner" having seen a horse.
By now you hopefully find it more clear what confabulation is, and why LLMs display it quite distinctly from what I might term "proper hallucination" - they've been given some prompt that they can't especially handle, and rather than admit to a lack of knowledge or ability - be it due to RLHF selection pressure or a frontier lab's experiment suppressing the right circuit - they instead produce the most likely-seeming utterance that's probably some kind of appropriate response to the prompt. Except that it's not, and we can tell. Perhaps it's we humans that are the hostile telepaths here?
Comments
Post a Comment