31. Why Yoshua Bengio's Non-Agentic Scientist AI Idea is Unlikely to Work Out Well

(With thanks to JSW for the nudge.)

I read the paper, and I don't think that Yoshua Bengio's Non-Agentic Scientist (NAS) AI idea is likely to work out especially well. Here are a few reasons I'm worried about LawZero's agenda, in rough order of where in the paper my objection arises:

  • Making the NAS render predictions as though its words will have no effect and/or as if it didn't exist means that its predictive accuracy will be worse. Half the point here is for humans to take its predictions and use them to some end, and this will have effects; the NAS as described can only make predictions about a world it isn't in. More subtly, the NAS will make its predictions and give its probability estimates with respect to a subtly different world - one where the power it uses, the chips comprising it, and the scientists working at its lab were all doing different things; this has impacts on (e.g.) the economy and the weather (though possibly for the economy they even out?).
  • If the NAS is dumber than an agentic AI, the agentic AI will probably be able to fool our NAS about the purpose of its actions. Wasn't the hope here for the NAS to give advance warning of what an agentic AI might do?
  • A NAS as described would not do much about the kind of spoiler who would release an unaligned agentic AI. A lot of other plans share this problem, admittedly, but I think it's worth noting that explicitly.
  • LIkewise, arms race dynamics mean that a NAS would not be where any nation-state or large corporate actor would want to stop. In particular, I think it's worth noting that no parallel to SALT is nearly as likely to arise - an AGI would be a massive economic boost to whoever controlled it for however long they controlled it; it wouldn't just be an existential threat to keep in one's back pocket.
  • "...using unbiased and carefully calibrated probabilistic inference does not prevent an AI from exhibiting deception and bias." (p22)
  • I'm suspicious about the use of purely synthetic data; this runs a risk of overfitting to some unintended pattern in the generated data, or the synthetic-ness of the data meaning that some important messy implicit aspect of the real world gets missed.
  • It's not at all clear that there should be a "unique correct probability" in the case of an underspecified query, or one which draws on unknown or missing data, or something like economic predictions where the probability itself affects outcomes. In a similar vein, it's not clear how the NAS would generate or label latent variables, or that those latent variables would correspond to anything human-comprehensible.
  • Natural language is likely too messy, ambiguous, and polysemantic to give nice clean well-defined queries in.
  • Reaching the global optimum of training objective (that is, training to completion) is already fraught - for one, how do we know that we got there, and not to some faraway local optimum that's nearly as good? Additionally, elsewhere in the paper (p35?), the fact that we only aim at an approximate of the global optimum is mentioned.
  • It seems plausible to me that a combination of Affordances and Intelligence might lead to the arising of a Goal of some kind, or at least Goal-like behavior.
  • Even a truly safe ideal NAS could (p27) be a key component of a decidedly unsafe agentic AI, or a potent force-multiplier for malfeasant humans.
  • The definition of "agent" as given seems importantly incomplete. The capacity to pick your own goals feels important; conversely, acting as though you have goals should make you an agent, even if you have no explicit or closed-form goals.
  • Checking whether something lacks preferences seems very hard.
  • Even the mere computation of probabilistic answers is fraught - even if we dodge problems of self-dependent predictions by making the NAS blind to its own existence or effects - itself a fraught move - then I doubt that myopia alone will suffice to dodge agenticity; the NAS could (e.g.) pass notes to itself by way of effects it (unknowingly?) has on the world, which then get fed back in as part of data for the next round of analysis.
  • The comment about "longer theories [being] exponentially downgraded" makes me think of Solomonoff induction. It's not clear to me what language/(prior/Turing-machine) we pick to express the theories in, and also like that choice matters a lot.
  • I'm not happy about the "false treasures" thing (p33), nor about the part where L0 currently has no plan for tackling it.
  • It's not clear what the "human-interpretable form" for the explanations (p37) would look like; also, this conflicts with the principle that the only affordance that the NAS has should be the ability to give probability estimates in reply to queries.
  • Selection on accuracy (p40) seems like the kind of thing that could deep-deception-style cause us to end up with an agentic AI even despite our best efforts.
  • The "lack of a feedback loop with the outside world" seems like it would result in increasing error as time passes.

As a meta point, this seems like the most recent in a long line of "oracle AI" ideas. None of them has worked out especially well, not least because humans see money and power to grab by making more agentic ML systems instead.

Also, not a criticism, but I'm curious about the part where (p24) we want a guardrail that's ...an estimator of probabilistic bounds over worst-case scenarios" and where (p29) "[i]t's important to choose our family of theories to be expressive enough" - to what extent does this mean an infrabayesian approach is indicated?

Comments

Popular posts from this blog

4. Seven-ish Words from My Thought-Language

11. Why the First “High Dimension” is Six or Maybe Five

0. 31 Posts, 1k Words, 2^15 Total