It can be kind of fun, in a self-flagellating way, to read about cognitive biases like the availability heuristic or the Dunning-Kruger effect, or just to browse through big lists like this one. If only our brains could take into account all the information they have, process it instantly, and store it forever! But machine learning engineers know that bias is just one kind of error—the error of making systematically wrong choices through a model that doesn’t fit the data as well as it could—and that there’s another kind of error called variance that comes from a model fitting its data too well. It’s kind of surprising that this is even possible! Here’s how that might happen:
One job that banks train computers to do is to guess which of the many incoming credit card purchases are fraudulent, so that the card owner can be alerted right away. Two variables the computer could take into account are the size of the purchase (perhaps most fraudulent purchases are unusually large) and the location of the purchase (fraudulent purchases are often made far from the card owner’s usual stomping grounds). To practice, the bank might feed the computer a large collection of individual purchases, with each one tagged as fraudulent or not so that the computer can start to learn which is which:
The computer might start by trying to draw a straight line that separates the good from the bad—here’s what it might come up with:
This model has some bias: it will systematically miss the fraudulent purchases in the northwest and southeast corners of the plot, and it will accidentally flag a lot of the legitimate purchases in the middle. Another way of putting it is that this model is underfit to the data: making the model a little more complex, say by allowing the dividing line to be a circular arc instead, can help it fit the data better:
But why stop there? If we let the curve get more complicated, we can separate the good from the bad data points more and more exactly:
This squiggly model gets a perfect score: 100% correct! Somehow we know that’s not right. Why not, if it fits the data well?
Because the true test of a computer model isn’t how well it fits the data you fed it, but how well it fits data it’s never seen before. The squiggly model does great with its limited set of data, but if you start adding fresh data points it will get a lot of them wrong: it has overfit the training data, and has a lot of error from variance (dependence on what exactly the training data points are). In contrast, the circular arc divider does pretty well at avoiding both bias and variance; it is neither overfit nor underfit.
I was reminded of all this while reading two books recently: How to Think by Alan Jacobs, and Lies My Teacher Told Me by James W. Loewen. Here are their descriptions of how we want to avoid both cynically un-nuanced beliefs (underfit with high bias) as well as beliefs that vacillate rapidly from one extreme to another with every new opinion they encounter (overfit with high variance):
We don’t want to be, and we don’t want others to be, intractably stubborn; but we don’t want them to be pusillanimous and vacillating either. Tommy Lasorda, the onetime Los Angeles Dodgers manager, used to say that managing players was like holding a bird in your hands: grip it too firmly and you crush it, too loosely and it escapes and flies away. In the life of thought, holding a position is like that: there’s a proper firmness of belief that lies between the extremes of rigidity and flaccidity. We don’t want to be paralyzed by indecision or indifference, but like the apocryphal Keynes, we want to have the mental flexibility and honesty to adjust our views accordingly when the facts change.
—Alan Jacobs, “How to Think”
From that perspective, it’s not so surprising that healthy thinking follows neither simple rules with lots of exceptions nor complex rules with no exceptions, but instead a virtuous mean between the extremes. Loewen calls this mean between cynicism and credulousness, “informed skepticism”:
There is no simple rule, like evenhandedness, to employ. There is no shortcut to amassing evidence and assessing it. When confronting a claim about the distant past or a statement about what happened yesterday, students—indeed, all Americans—need to develop informed skepticism, not nihilistic cynicism.
—James W. Loewen, “Lies My Teacher Told Me”
According to the last paragraph of the Wikipedia page on the bias-variance tradeoff, people usually err on the side of simple, biased rules than complicated, overfit rules, so it’s no surprise that cognitive bias gets more attention than “cognitive variance.” But I’ve caught myself overfitting in some areas of my life, and I feel like we don’t have a lot of cultural vocabulary to talk about it yet. Our rental lease allows small pets, but “no dogs, cats, or ferrets”—what can I say sounds wrong about that except that their rules are overfit to their training data?
How about you? Have you ever found yourself making a laundry list of exceptions to your own rules? Could a simpler rule generalize better to new experiences? I’m interested to hear in the comments!