52. Why Infrabayesian Epistemics Should Permit Winning At Causally-Weird Decision Theory Puzzles

(Epistemic status: The butterflyest of butterfly ideas. Buried lore that I dug up from a handful of places; pieces that pretty much only I would think to put together, if pieces they are. I'm thus writing this up so it doesn't get lost. This might be incomprehensible or otherwise not hold together too well... but this smells like there's something here to fit together or find. Heck if I quite know what.)

(With thanks to MB, JSW, MM, and CRW.)

Background Fact 1: Infrabayesian probability theory is something like a flavor of imprecise probability, where our expectation value operator uses minima instead of averages, and where we explicitly prune away impossible possible worlds. Importantly, it has us weaken Kolmogorov's second probability axiom such that \(p(S) \leq 1\) - the probability that anything we thought it possible to happen is at most 1, not exactly 1.

Background Fact 2: Bayes nets and general causal inference seem like powerful tools for modeling, understanding, and predicting aspects of the world; most recently, the development of the theory of natural latents has built on this foundation. They make use of directed graphs where the edges are variables and the vertices are morphisms - specifically, Markov kernels. In Markov categories, which are a kind of monoidal category where you can copy and discard information, we also require the condition that applying a morphism and then deleting the output is equivalent to deleting the input. According to ncatlab, this is equivalent to both the fact that classically, all probabilities sum to 1, and also that information-theoretically, reading an input, processing it, and destroying the output is equivalent to destroying the input.

Observation A: There sure are lots of weird decision theory puzzles that mess around with states of information, or information that's of negative value, or interactions that appear to violate causality in some exciting way. Newcomb's dilemma is the classic example of the genre. I wonder what's up with that?

Observation B: I've read the claim that infrabayesian reinforcement learning could converge to optimal policies for Newcomblike problems, as well as read a paper or two describing how Markov decision theories using imprecise probabilities might feasibly do so - in a way that classical RL is known to be incapable of converging to. Unfortunately, leading researchers I've talked to on infrabayesian theory have been bizarrely uninterested in actual implementational details here, sticking to proofs about regret bounds instead and shrugging off the whole question as either uninteresting or already solved.

So then - what are we to make of the fact that we might apply infrabayesian probability theory to the use of Markov kernels? After all, a lot of existing work talks about POMDPs that use them! We should get Markov kernels out of this, and likewise if we decided to do something like causal inference using infrabayesian epistemics, we should expect to end up building something like a Bayes net using them. But recalling (1), our probabilities don't necessarily sum to 1, and by (2) this likely means that we should be on the lookout for places where refusing to learn some information is not equivalent to - or means you see different things from - learning the information, doing something internally with it, and then discarding the output without conveying it to anyone else - or where running some computation and deleting the output is inequivalent to never running the computation to start with.

Unfortunately, this is where I lose the scent - I'm not sure whether any of this coheres that well so far, and I'm also not clear that the connection between the information-theoretic interpretation of the violation/weakening of K2 and the claim that IBRL would be able to converge to winning policies for Newcomb's problem and the like holds up all that well. Even if that part worked, IB isn't at all the only promising version of imprecise probability theory where probabilities might sum to less than 1 - Dempster-Shafer theory, for instance, also shares this trait. All the same, if I wanted to argue for why IB should, morally-in-the-math-sense, permit decision theories using it to have stronger policy convergence guarantees than classical RL, this is how I'd be waving my hands.

Comments

Popular posts from this blog

4. Seven-ish Words from My Thought-Language

20. A Sketch of Helpfulness Theory With Equivocal Principals

11. Why the First “High Dimension” is Six or Maybe Five