66: The Diminishing Returns of Double-Checking: A Phenomenological and Mathematical Sketch

(Epistemic status: Morally correct; based on lived experience and some hand-wavy probability theory that holds up shockingly well.)

Picture this: You're working on something with reasonably high stakes - coding a function on a deadline, writing up a proof, filling out a form or application, or taking a test. You check it once, and it looks good. But a nagging feeling remains - should you check again? And again, after that? And at what point do you actually stop?

I'm particularly bad about this. My natural inclination is to check things twice, three times, sometimes four or more times before I throw up my hands and call it good. And I've noticed something curious about the phenomenology of each successive check - they feel different in a way that maps surprisingly well onto the actual mathematics of error detection.

On the first check, you're sharp, with focused attention. You're tacitly assuming that you've made an error, but just one error, and one that you can find. There's a specific mental motion here - "let me trace through this once more, from a slightly different angle". You feel high cognitive load, and genuine uncertainty about what you'll find - after all, maybe you really did get it as right as you can, first try.

The second check is similar. You're still focused, but there's a different flavor to that focus. You're no longer starting fresh, but rather double-checking that your first check didn't miss anything. Knowing what your first check found, you're now specifically hunting for any errors that might have slipped through. It's somewhat lower cognitive load, but often still genuinely useful.

The third check is where it starts to get weird. At this point, you're no longer really checking, but rather starting to reassure yourself that the previous checks were adequate. You've started hoping to find nothing rather than being eager to find and fix something. The cognitive motion thus shifts sharply in stance to verifying the absence of errors, and that's both philosophically and practically much more difficult. In my experience this starts to feel "sticky", "clinging", or "tangled", by contrast to a first check's sense of "illuminating dark corners" or a second check's sense of "tugging a safety harness". You're not doing something new, and you're not even making basic checks - you're just reconfirming old work.

From the fourth check on, you're chasing your own tail, running on pure anxiety. You're starting to obsess, to spiral. You've started to check your own checking; you've started to thrash against that trapped feeling from the previous check. Your marginal returns aren't just diminishing - they've possibly even gone negative, actively eating up time and energy you need elsewhere, or even introducing new errors by over-handling and second-guessing yourself. You're soothing yourself, training and giving in to the sense that nothing is ever good enough.

So how does this cash out to mathematics? Let's be a little more formal. Let \(p\) be your error rate per unit of work, and let \(q\) be the probability that your check catches an error, assuming that there is one. Naturally, \(q < 1\), because checks aren't perfect, while \(p\) is technically much less constrained, able to take on any nonnegative real value, but in practice \(p << 1\), because if you're in this position you probably do have any idea of what you're doing. For the moment, we'll also assume for the sake of argument strength that the probability of catching an error is fixed and uncorrelated across successive checks. This gives us a probability of \(P(\)error remains after \(n\) checks\() \approx p \cdot (1-q)^n\) for each unit of work.

So what do these probabilities of remnant error look like for some reasonable values of \(p, q\)? Let's pick \(p = 0.05, q = 0.75 \): you make errors in your work on a crit fail, and your checks miss errors if you flip two tails. Then we have:

  • 0 checks: 5% error rate
  • 1 check: 1.25% error rate
  • 2 checks: ~0.3% error rate
  • 3 checks: ~0.08% error rate
  • 4 checks: ~0.02% error rate 

As you can see, the absolute gains shrink pretty quickly:

  • The first check catches a little under 4 percentage points.
  • The second check catches a bit less than 1 percentage point.
  • The third check catches barely 0.2 percentage points.
  • The fourth check catches maybe 0.06 percentage points, and it's not getting any better. 

It's even worse than that, actually. Remember how we assumed up front that the checks' failures were uncorrelated? Yeah, they probably aren't and you know it. After you've made that first check, your second check is searching a space that's already had the low-hanging fruit picked and made into a jam. By construction, these have to be errors that are harder for you to spot. (Though this helps reconfirm the old saw that "with enough bugs, all eyes are shallow"!) So really, the formula above is at best a first-order approximation, and a more sophisticated version might look like \(P( \)error remains after \(n\) checks\() \approx p \cdot \prod_{i=1}^n (1-q_i)\), where the \(q_i\) are surely weakly decreasing. Restated: each successive check has worse and worse ability to catch errors, because each subsequent check is hunting for ever-subtler errors... and the errors that remain are in effect pessimized against your ability to check.

OK, so... what does that mean for us? What do we do about it? It's obvious if you understand decision theory.

The naive approach is to check until \(P(\)error remains\() < \epsilon\) for some fixed threshold \(\epsilon\) you care about. \(\epsilon = \frac{1}{256} \sim 0.4\%\), maybe. Hopefully you have a really strong sense of your own personal \(p\) and \(q_i\)!

A slightly less naive approach takes into account the fact that missing an error comes with costs, but running checks comes with costs, too. In that case, you'd ideally check until \(E(\)remaining errors\() \leq cost(\)an additional check\()\) (where \(E\) is the expected value operator), taking into account the cognitive cost of the check itself, the expected value of regret for missing one or more errors, and the probability of introducing a new error.

What do I actually do personally, then, with all this analysis? Usually, check until the anxiety loop kicks in - usually the third check or so - and then make myself stop by main force, overriding the feelings of wrongness.

And what would I recommend doing, you and me both? Start by thinking about the stakes. If they're pretty low, then one check should be enough. "Measure twice, cut once", and all that. If the stakes are a little higher and I have time, check carefully once and then run through a double-check. If the stakes are truly high? Two careful checks of my own, plus another quick check after that - ideally, specifically by someone else. (Fresh eyes have a higher effective \(q_i\). Precommit to that course of action at the start of checking or even working, and stick to it.)


Maybe the most interesting part to me here is that that sticky, clinging feeling of the third check is very much telling you something. That's the feeling of your brain recognizing that you're well into the territory of diminishing returns. There's a bigger shift than it seems, going from "hunting errors" to "verifying the absence of any errors": the former is a surprisingly generative process, while the later is pure confirmation - and confirmatory process are infamous for falling prey to confirmation bias.

The sticky feeling of the third check is *telling you something*. It's your brain recognizing that you've hit diminishing returns. The shift from "hunting errors" to "verifying absence of errors" is real - you've moved from a generative search process to a confirmatory one, and confirmatory processes are notorious for confirmation bias. The thing to do is feel out inside yourself for when the checking process stops feeling like bug-hunting and starts feeling like reassurance-seeking. That's when to stop.

So: executive takeaways! First off, precommit to a checking budget based on a clear-eyed calculation. "I will check this twice and then submit it without guilt or fear." Stop the anxiety loop before it can even start. Second, try to take different angles on each check you do. The more different, the better. Trace through the logic; read over the form paying close attention to feelings of unease or "probably it's fine"; talk to your mental emulations of smarter or more careful people you admire, if you've got those up and running. Look at it upside-down if you think it'll help. Try to decorrelate your failure modes; keep those Emmental holes well apart. Third and relatedly, make use of fresh eyes whenever you can. Other people have other ways of thinking, other knowledge bases, other skills, and other likely failure modes. Leverage that! Fourth, it's probably worth keeping tabs on where it is an error might arise and how one might stick around. Do you make careless mistakes and catch most of them, with \(p\) and \(q\) both high? Then maybe more checks are actually worth it. Are you instead making a few systematic errors and then failing to find them, with \(p\) and \(q\) both pretty low? Then more checks are futile, and you need to think more carefully about different approaches. Finally, heuristically speaking it's still probably best to pause for a moment before your third check to ask yourself whether you really need a third check or if it's in fact just the start of an anxiety spiral. You might be finding errors, sure, but you might also just be uncomfortable with uncertainty.

Ultimately, I'm writing this partly as a reminder to myself. I've got this awful tendency to check things three, four, five times, each time finding nothing, and each time feeling like I really ought to check once more. Just to be sure. But as we've seen, the math says that this is silly, and the phenomenology says that this is anxiety.

As one last closing remark, it's worth noting how well this maps onto the classic observation that only three numbers require no justification: \(0\), \(1\), and \(\infty\), and the lesser-known addition that when working over the naturals, you often get the natural offset by one. So too here: if the stakes are nontrivial, it's worth checking at least once. And then: the only natural things to do are to check an additional 0 times if the stakes are pretty low and you're not worried, check 1 more time if they're sizable and you think you can catch something on a double check, or check basically infinity more times if you've completely lost the plot and ended up chasing comfort rather than accepting the possibility of imperfection.

The generally correct move remains to check twice with different approaches, and then trust in one's own strength. (Of course, I'll probably check this post three times before publishing it.)

Comments

Popular posts from this blog

4. Seven-ish Words from My Thought-Language

20. A Sketch of Helpfulness Theory With Equivocal Principals

11. Why the First “High Dimension” is Six or Maybe Five