18. Positive Feedback is More Efficient Than Negative Feedback - A Geometric Approach
Positive reinforcement is just plain more efficient and effective than negative reinforcement, if both are feasible, and I can pretty straightforwardly present a model that argues strongly for it. Consider the following setup: we’re trying to get some tiny simple agent to navigate to a goal area within some simple space. No overly complex obstacles, no particular hazards, just a tiny simple agent capable of approaching or avoiding marked areas and a goal with a strong but extremely short-range attractiveness. We have negative feedback markers, which the agent will strive to avoid, and positive feedback markers, which the agent will try to approach; we can model these both as having some infinite-range repulsive or attractive force imparted to the moving agent with some appropriate falloff inversely dependent on the distance. (I’m using valence terminology here, not control-system terminology.)
Let’s say our agent lives in some nice low-dimensional space - a line segment, say, or some region of the plane. Using markers of only one flavor, how many will we need? We can assume, to put a thumb on the scale, that we can ensure that our little agent starts closer to the goal than our markers, if we want. If we use positive feedback markers, we only ever need one of them - just put it right on the goal area to augment the goal’s weak stickiness and be done; the agent will approach it straightforwardly. On the other hand, for the line segment, we need two negative feedback markers to bound the agent, and in the plane region, we’ll likely need at least three or four, depending on just how far from the goal the agent starts.
What if the agent is in some slightly higher-dimensional space, like our familiar three-dimensional space? It’s a little harder to see, but if we’re using negative markers we’ll now need something like six to eight of them at the least, and more likely more than that - but still just one positive marker. Hopefully you can see the pattern here - as the dimension D keeps growing, we’ll need easily 2*D negative markers and generally more like 2^D of them, if not many more - and yet we will only ever need a single positive marker to draw the agent in.
Now consider just how high-dimensional the space of behaviors really is - especially the kinds of complex behaviors you might hope to elicit from your fellow people, or even animals! Successively downvoting undesirable behaviors will take lots of time and effort and exhaust your training subject’s patience and energy fairly quickly; helping them out by telling or showing them what you want and marking and praising that behavior will show returns with vastly better speed and surety. The same is naturally true of self-modifications you wish to enact. Rather than punish yourself for doing the things you don’t want, or ruling out and pruning away behaviors you don’t want, you should preferentially figure out what you do in fact want yourself to do and reward yourself when you do it; this will be vastly more effective.
The moral here is: don’t tell people what you don’t want them to do when it’s vastly easier for everyone involved to just tell them what you want from them. They’ll know what you want, you’ll get what you want, and everyone will be much the happier for it.
21261/32768
Comments
Post a Comment