The Off-Policy Theory of Happiness

Why philosophers agree on what it takes to be happy.

Posted May 14, 2018

These are words that I never heard from my parents: "Son, we just want you to be happy." I first noticed this apparent lack of parental encouragement for my happiness when I was a sophomore in college. It seemed like everyone else's parents had told them that whatever they did, it was okay as long as it made them happy. Why, I wondered, had my parents never said this to me?

Here’s one hypothesis. Perhaps there is some tantalizingly inappropriate set of life choices which would provide me with a great deal of satisfaction but which would fail to fulfill some criteria for legitimacy. I can think of lots of examples here. There are classics, like doing a lot of drugs and not a lot of work (turns out okay for some people). In the same vein, there's graduating with a degree in art history (not so much). Then there are some indulgences which are probably more idiosyncratic to myself. For instance, I could embellish every piece I write with the word 'fornication' included exactly three times. This, I assure you, would bring me great joy, though I'm not sure what it would do to would benefit me or society as a whole.

But I don’t think that hypothesis is correct. For one thing, my parents have been eminently supportive of my life decisions, even the questionable ones. At one point I told them I wanted to be a jazz musician, and they told me to give it a shot. If that’s not support for a misguided pursuit of happiness then I can’t tell you what is. There is, it turns out, a much better hypothesis. And it didn’t occur to me until I read the autobiography of John Stuart Mill.

Don't get me wrong, I like John Stuart Mill. He had one of the highest IQs in human history. And his father, the venerable historian James Mill, began teaching him Ancient Greek at the age of three. By eight, he had read the whole of Herodotus's histories in the original. So I thought his life story might make an engaging read. But that's not the case. His autobiography is a total snooze-fest. As I recall it, the work is an exhaustive compilation of the least interesting things that Mill ever read, saw, or contemplated. A representative passage: "When we had enough of political economy, we took up the syllogistic logic in the same manner, Grote now joining us. Our first text-book was Aldrich, but being disgusted with its superficiality, we reprinted one of the most finished among the many manuals of the school logic, which my father, a great collector of such books, possessed, the Manuductio ad Logicam of the Jesuit Du Trieu. After finishing this, we took up Whately's Logic, then first republished from the Encyclopedia Metropolitana, and finally the Computatio sive Logica of Hobbes." For the love of God, John. Who cares? Though I'm not exactly sure why, I trudged through it. And I'm glad I did.

But in order to understand what Mill says about happiness, it’ll be helpful first to understand a concept from artificial intelligence. It’s called off-policy reinforcement learning. The basic setup of reinforcement learning is simple. It is a method for designing an agent—be it a person, a robot, a computer program—to behave intelligently. The definition of intelligence here is what computer scientists call “reward maximization.” Simply put, there is something that you want, and intelligent behavior consists in getting as much of it as possible. For example, if your agent is a robot that plays basketball, then its reward comes in the form of points. The more baskets the robot makes, the more points she gets and the more intelligently she behaved. Reinforcement learning is a mathematical solution to the way that the robot would learn to acquire more and more points.

At the heart of reinforcement learning is what’s known as a “policy.” It’s the robot’s playbook. A policy says, in mathematical abstraction, “This is where I am right now. This is what I have to do next to maximize my reward." In basketball, a good policy might be to get the ball, dribble it toward the basket, and toss in a lay-up. Each time the robot does this, she looks at how effective she was in getting points, and adjusts her behavior to do better next time. The robot might start off bad, but using reinforcement learning she could become better over time. That’s what intelligence means here—over time you get better and better at achieving your goal.

The idea might be simple, but all of the nuance in reinforcement learning comes from precisely how you learn that policy. For example, is the best policy to drive toward the basket? Or should you sit back and shoot jumpers? How do you know which is going to work out better next time around? Will the same policy work against a different opponent? There are two general strategies for how to learn a policy. The first is called on-policy. It's the more straightforward of the two strategies. On-policy means that the robot uses the same information to make decisions and evaluate whether or not they were good decisions. If her policy says to drive toward the basket and that results in a lot of points, then she will be more likely to keep going with that same policy in the future. The second strategy is called off-policy. This means that the robot is using different information to make decisions than she is to evaluate them. The agent could make decisions based on, for instance, her time of possession of the ball. Or, another possibility, she could attempt to disarm her opponent by encouraging her to investigate a nearby tutorial in the art and science of fornication. She could then look back at her play based on that policy and see if focusing on something else actually increased her number of baskets in the end.

At first, it might seem like the better strategy is always going to be on-policy. How could you score more points by focusing on something totally irrelevant? But that's not true. The empirical fact in artificial intelligence research is that some problems are better solved by off-policy methods. Sometimes the best way to attain a goal is indirectly.

This is precisely what Mill argues about happiness. The way to maximize your happiness, so to speak, is to aim at something else. Dedicate yourself to something larger than your own happiness. Work hard at that. Then you’ll look back and realize that you’ve been accruing happiness the whole time. Mill writes,

“The enjoyments of life are sufficient to make it a pleasant thing when they are taken en passant without being made a principal objective. Once you make them so, you will immediately feel them to be insufficient. They will not bear a scrutinizing examination. Ask yourself whether you are happy, and you cease to be so. The only chance is for you to have as your purpose in life not happiness but something external to it. Let your self-consciousness, your scrutiny, your self-interrogation, exhaust themselves on that; and if you are otherwise fortunately circumstanced you will inhale happiness with the air you breathe, without dwelling on it or thinking about it, forestalling it in imagination, or putting it to flight by fatal questioning.”

In other words, the on-policy strategy doesn’t work for happiness. If you try to maximize for it directly, then you’re going to be worse off than if you had taken a different approach. Happiness is one of those problems that works better with the off-policy strategy. There has to be a separation between action and evaluation. If you’re using your own happiness as a metric by which to evaluate your next decision, the scope of your concern cannot extend past your own feelings. Instead, argues Mill, focus on something larger than yourself and you’ll wake up one day to realize that you inhale happiness with the air you breathe.

The reason, then, that my parents never told me to pursue happiness directly was that they, like Mill, believe in an off-policy approach to happiness. When you tell someone that they should "do what makes them happy," you're advocating for an on-policy approach to happiness—making decisions and evaluating them by the same metric. That's exactly what they didn't want me to do. And while my parents didn’t learn this from reading Mill, the surprising thing about this position on happiness that it is shared—in some version or another—by practically every other philosopher who has weighed in on the matter. Old white dudes have been saying for millennia that the key to happiness is to be dedicated to a purpose larger than yourself.

One of my favorite of these accounts belongs to Bertrand Russell. He more or less says the same thing as Mill, but with a certain flair of nonchalance that contrasts with Mill’s solemn weightiness. Russell writes in The Conquest of Happiness, "Fundamental happiness depends more than anything else upon what may be called a friendly interest in persons and things." He continues, "let your interests be as wide as possible, and let your reactions to the things and persons that interest you be as far as possible friendly rather than hostile."

Happiness, in other words, is the natural result of the observation that there are a great many persons and things in the world worth taking a friendly interest in, and only one of them is yourself. It is with this idea in mind that I want to write this blog. But while Mill, Russell, and my parents might generally be correct about an off-policy approach to happiness, to be honest with you, I don't think I'll really be happy unless I sneak in that third and final fornication.


Mill, J.S. (1873/2003). Autobiography. Project Guternberg. 

Russell, B. (1930). The Conquest of Happiness. New York, NY: Liveright Publishing Corp.