Friday, July 23, 2010

The Prisoner's Dilemma and Multiplayer Game Design

One of the most fascinating things about massively multiplayer online roleplaying games (MMORPGs) is that, although these games are for the most part designed to promote competitive behavior, cooperation among players frequently emerges.

These games do usually provide some mechanism for four or five players to work together as a temporary "pick-up group," or for up to 100-200 players to form a somewhat longer-lasting organization as a "guild" or "corp." But these forms of cooperative play are always subsidiary to competitive behavior -- ultimately winning means doing better than the other guy.

And yet, in these games there can be remarkable examples of cooperative behavior, which seem to emerge despite the rules of the game that clearly favor looking out mostly for oneself. How does this happen? What are the features of these gameworlds that enable islands of cooperation to emerge out of a sea of advantage-taking?

Understanding this means taking a look at a simple game that allows two players to choose whether to cooperate with each other or to "defect" and take advantage of the other player. From this simple game -- with some tweaks -- it's possible to see how people behave, and from that behavior identify the factors that allow cooperation to become a viable strategy.

What follows is an essay on this game -- the "Prisoner's Dilemma" -- that I wrote in 1998. The principles haven't changed, though, so I thought I'd add that essay to this blog since it offers some useful insights into certain core game design concepts that are interesting to think about.

We begin with the surprising (if you think about it) observation: sometimes people choose to cooperate with each other.

THE EMERGENCE OF COOPERATION

Why do we cooperate with one another?

Don't we do better personally when we take advantage of someone who tries to cooperate with us?

How can we justify cooperating when others don't?

It would seem that cooperation is for suckers. And yet there are examples of cooperation all around us: no single person could build a skyscraper, or fight a war, or agree to an international treaty. For all of these things (and many others) to happen, sufficient numbers of self-interested individual human beings must agree to cooperate even when cheating is easy. But why does this happen?

In the late 1970s a researcher named Robert Axelrod was studying the question of how cooperation can emerge in an uncooperative world. Given that in many real-world situations the payoff for taking advantage of others is greater than the reward for cooperation, how can the observed prevalence of cooperative behavior be explained?

THE PRISONER'S DILEMMA

Axelrod began by considering the "Prisoner's Dilemma" experiment. This is a kind of thought game which examines the rewards and punishments for either cooperating or not. The story usually given runs like this:

Suppose you and an accomplice (whom you barely know) in some crime are arrested. The chief detective visits you. He says, "We know you and your pal did it. And you're both going to jail for it. But we always like to make sure, so you've got a choice. You can tell us what your pal did, or you can keep quiet. (And by the way, we're giving him this same choice you're getting.)"

"If you give us evidence against him, and he keeps quiet about you, then you get one year, and he spends six behind bars. If you keep quiet and he gives us the goods on you, you stay for six and he's out in one. If you both keep quiet, you both get three years; if you both turn each other in, you both get five years. So what'll it be?"

You must select one of two options -- you can either cooperate with your fellow prisoner (by keeping quiet) or defect (by providing evidence). You have no way of passing information between you, and you don't know him well enough to predict what he'll choose based on his personality. So which will you choose?

The table below describes what is called the "payoff matrix" of this classic formulation of the Prisoner's Dilemma:

Classic Prisoner's Dilemma
Accomplice Cooperates Accomplice Defects
You Cooperate R = -3 S = -6
You Defect T = -1 P = -5
Note: In this table the results are the payoffs to you. They are categorized as follows:

T: Temptation for defecting when the other party cooperates

S: "Sucker's payoff" for cooperating when the other party defects

R: Reward to both players for both cooperating

P: Punishment to both players for both defecting

Try it for yourself. Here's a link to an on-line version of the Prisoner's Dilemma.

Looking at the table in a purely rational way, it would seem that your best choice is to defect. The two possible results of your defection mean a chance for jail time of either 1 year (if your compatriot cooperates by keeping quiet about you) or 5 years (if he talks), for an average risk of 3 years. On the other hand, if you cooperate with your accomplice, you get 3 years if he cooperates too and 6 years if he defects, for an average risk of 4.5 years.

What's more, you have to assume that your accomplice is capable of thinking about this just like you did. Since he -- just like you -- is likely to conclude that defection is less risky, he'll probably defect... in which case you have even less incentive to cooperate.

And so you defect. And so does your accomplice. And so you both come away worse off than if you had cooperated with each other.

It appears that as long as the temptation to defect is greater than the reward for cooperating, there's no reason to cooperate. Yet it's clear that in the real world we do cooperate. So could there be some other factor at work, the addition of which to the Prisoner's Dilemma might make its outcomes more realistic?

THE ITERATED PRISONER'S DILEMMA

This is where Robert Axelrod enters the story. He considered a possibility others had suggested: What if instead of a single chance to cooperate or defect, you and an ally had numerous opportunities on an ongoing basis? Would that affect your choice on any single interaction?

Suppose you arrange to sell to a fence some stolen goods you regularly receive. To protect you both, it is agreed that while you are leaving the goods in one place, he will leave the payment in another place. At each exchange, each of you will have to decide whether to cooperate by leaving the goods or the money, or to defect by picking up the goods or the money without leaving anything of your own. Furthermore, each of you knows that this arrangement will continue until some unspecified time; neither of you knows when or if at some future date the exchanges will cease.

Assume that the payoff values remain the same as in the basic Prisoner's Dilemma. Does your strategy of defection in the earlier one-shot Prisoner's Dilemma change in an environment of repeated interactions with the same individual?

In 1979, Axelrod devised an ingenious way of testing this possibility (known as the "iterated" Prisoner's Dilemma). He contacted a number of persons in various fields -- mathematicians, experts in conflict resolution, philosophers -- explained the payoffs, and asked each of them to come up with a strategy for a player in an interated Prisoner's Dilemma tournament that could be encoded in a computer program.

No limitation was placed on strategy length. One strategy might be as simple as "always defect." Others might take into account their memory of what the other player had done on previous turns. "Always cooperate but with a random ten percent chance each encounter of defecting" would be still another strategy, and so on.

Axelrod collected 13 such strategies and encoded each of them in the form of a computer program. (He also added one strategy of his own, which randomly chose cooperation or defection on each turn.) He then began to pit each of the 14 strategies against every other strategy over 200 iterations. This would determine if any one strategy would prove to do well against all other strategies (as measured by average payoffs to that strategy).

The winner was the shortest strategy submitted. It consisted of four lines of BASIC code submitted by psychology and philosophy professor Anatol Rapaport of the University of Toronto in Ontario, Canada. In its entirety it consisted of the following: Cooperate on the first turn, then in all subsequent turns do whatever the other player did on its previous turn. This strategy was dubbed "Tit for Tat".

THE "ECOLOGICAL" PRISONER'S DILEMMA

After deriving some preliminary conclusions about this result, Axelrod tried an even more interesting innovation. In this new round, for which Axelrod publicly requested submissions from any source, there were 62 entrants plus one (RANDOM, from Axelrod) for a total of 63. All these strategies were then pitted against one another in a giant free-for-all tournament.

The winner was Tit for Tat, submitted again by Rapaport. (But, oddly, by no one else.) Again, it had the highest average score of payoffs.

Axelrod scored the results of the tournament as a 63x63 matrix which showed how each strategy had fared against every other strategy. An analysis of the strategies played revealed that there were six strategies that best represented all the others. Since the 63x63 matrix showed how each strategy played against all others, Axelrod was able to calculate the results of six hypothetical "replays" in which one of the six representative strategies was initially dominant.

Tit for Tat scored first in five of the six replays, and scored second in the sixth.

Then came the cleverest innovation yet. Suppose, Axelrod's notion had it, we performed a hypothetical replay in which all strategies were pitted against each other, and in each turn the "loser" was replaced by a copy of the "winning" strategy, thus altering the population of players? Each strategy's score -- already known from the 63x63 matrix -- could treated as a measure of "fitness" against other strategies in a kind of "ecological" tournament.

The results left no doubt. The lowest-ranked strategies, which tended to be "not-nice" (in other words, which tried to defect occasionally to see what they could get away with), were extinct within 200 rounds. Only one not-nice strategy (which had been ranked eighth in the original 63x63 competition) lasted past 400 rounds, but by then the population of surviving strategies consisted only of those which replied to defections with immediate retaliation. Because the not-nice strategy had no more strategies which could be taken advantage of, it began a precipitous decline. By the thousandth round, it too was virtually eliminated.

And the winning strategy? Once again it was Tit for Tat, which was not only the most prevalent strategy at the end of 1000 rounds, but the strategy with the highest rate of growth. Tit for Tat was not merely successful, it was robust -- it did well in all kinds of environments.

Why did Tit for Tat do so well? How could such a simple strategy perform so capably in such a broad mix of more complex strategies? More to the essential point, how could Tit for Tat do so well even when surrounded by strategies which depended on defecting and so would supposedly tend to earn better payoffs? It appeared that a strategy which cooperated by default was able to not only survive but actually thrive amidst a sea of defectors.

In other words, cooperation evolved over time in a world dominated by uncooperative players. If this simulation bore any relation to the real world of humans, there could be some important lessons in it for us.

HOW COOPERATION WORKS

What actual, emotion-driven human beings do with individual choices of cooperation or defection is, of course, unpredictable. But in general, rational players will tend to make similar choices. This allows those interested in human behavior to work out some mathematical predictions of behavior.

In this case, it turns out that a careful choice of the four payoffs for cooperation or defection results in being able to say that there exists a rational choice of cooperation, even in the midst of a majority of defectors. Specifically, Axelrod found that if just four conditions were met, cooperation can be the most rational choice -- even in a population consisting almost entirely of defectors. These are the "world" conditions that affect whether a gameworld such as a MMORPG will tend to encourage cooperative behavior to emerge or not.

First, the players must be able to recognize one another. Anonymity, becauses it decreases the penalty for defection in an environment of iterated interactions, tends to work against the evolution of a population of cooperators. This has a direct impact on MMORPGs, in which players are somewhat recognizable by the names they choose for their characters, but because they can play multiple characters on multiple servers, players are for the most part anonymous to each other. (This is why the recent attempt by Blizzard to switch to a "Real Names" system in their online game forum had a chance of promoting more cooperative behaviors there, instead of the flaming and hyperemotional verbal abuse -- "defection" behavior -- that characterizes game forums currently. It's unfortunate that Blizzard's concept was shouted down and not given a chance to be implemented; it would have been useful to see whether changing the rules of the "world" to minimize anonymity would, as suggested here, have encouraged more cooperative discussion.)

Second, the total number of potential opportunities for interaction must be unknown to each player. It is the uncertainty preventing players from calculating just how much defection they can get away with that decreases the long-term reward for defection. If you don't know when your final interaction will be, and thus can't plan to defect on that turn, you must take into your calculations the fact that what you do on this interaction will affect the response to you on the next interaction. This creates an incentive to cooperate.

Third, the payoff for mutual cooperation (that is, both players cooperate with each other) in each interaction must be greater than the average payoff of a cooperation-defection (the payoff to the defector plus the sucker's payoff divided by 2). In mathematical terms, this is the condition in which R > (P + S) / 2.

Fourth and last, there must be a certain minimum proportion of cooperating players in the population... and here was one of the greatest surprises. Axelrod calculated that -- amazingly -- if all the preceding conditions are met, and there is a high probability that players which have interacted before will do so again (specifically, ninety percent), then cooperation can eventually evolve to include the entire population if only five percent of the total initial population consists of cooperators. It's reasonable to expect that there must be enough cooperators so that they can create a sort of island of trust in a sea of defection. What's surprising is that so few are necessary.

HOW TIT FOR TAT WORKS

Axelrod concluded that Tit for Tat succeeded not by trying to do the absolute best for itself in every transaction, but by trying to maximize the sum of its own and the other player's reward in all transactions combined. In other words, Tit for Tat did well for itself because the effect of its strategy was to allow every player with whom it interacted to do well.

An intriguing aspect of this is found in the raw scores of the various Prisoner's Dilemma tournaments. Looking at the numbers, it quickly becomes obvious that in individual encounters Tit for Tat never did better than strategies which were more "aggressive" (i.e., defected more often) or -- interestingly -- strategies which were more "forgiving" (i.e., didn't always respond immediately to a defection with a defection of its own). In individual transactions, Tit for Tat's numbers were solidly middle-of-the-road.

But over iterated transactions the consequences of defection began to outweigh the benefits. As more players started to resemble Tit for Tat, which always retaliated immediately to a defection but was always open to cooperation, the long-term payoff for defection dropped. Soon there were no players who could be taken advantage of by a defecting strategy. Meanwhile, the Tit for Tat-like cooperating strategies were busy cooperating. Their long-term payoffs were never outstanding... just better than those of the defectors.

THE PRINCIPLES OF TIT FOR TAT

Axelrod distilled several principles from his observation of how well Tit for Tat did against various defecting and cooperating players. Not only do these explain how Tit for Tat did better than even other cooperating players, they have useful implications for real world human interactions.

Be Nice
Don't be the first to defect. Assume cooperativeness on the part of others. If you go into an interaction assuming that you're going to get ripped off, then you might as well try to take advantage of the other person. But if instead the other person turns out to have been willing to cooperate with you, you've just missed a chance for both of you to do well.

Be Forgiving
Don't overreact. When taken advantage of, retaliate once, then stop. Meeting one defection with a harsh response can create a series of echoing mutual defections that prevent cooperation from ever occurring.

Be Provocable
When a defection occurs, always respond in kind. Don't be too forgiving. In the instructions for the second tournament, Axelrod included the two lessons ("be nice" and "be forgiving") that he had drawn from the first tournament. Several of those who submitted second tournament strategies concluded that being forgiving was essential to the evolution of cooperation. Their strategies tended to let a few defections slide. In effect, these strategies tried to elicit cooperation by allowing not-nice players to take advantage of them without penalty. But the actual result was to encourage not-nice strategies to keep defecting. A lesser penalty for defecting made that lack of cooperation more valuable, so cooperation became less valuable. A better choice is to always defect when provoked.

Be Clear
Respond in kind immediately. Strategies that tried to be clever tended to appear unresponsive, which elicited defection. (If your attempts to cooperate are ignored, then you might as well defect to get as much as you can while you can.) Cooperation should meet with immediate cooperation, and a defection should be met with an immediate defection.

THE IMPLICATIONS OF TIT FOR TAT

What if anything does this mean for actual human interactions? There is a strong suggestion that the behaviors that elicit cooperation in this restricted world of the Prisoner's Dilemma do indeed carry over to our real world.

One finding particularly worthy of note was the evidence that too much forgiveness actually works against the evolution of cooperation. The notion of "tolerance" so trendy today turns out to be an invitation to defection, rather than the means to a better society as its proponents claim. While being "nice" is necessary to evoke cooperation in others, it's not enough. Bad behavior requires a proportionate response, or the result will be more bad behavior.

This applies as well to criminal justice. There is a vocal minority today calling for a reduced emphasis on incarceration as societal retribution, and a commensurate greater attention given to rehabilitation. Without disputing the goodness of the impulse, the success of Tit for Tat suggests that it's a bad idea. If an individual member of a society defects (commits a crime), that defection should provoke an immediate retaliation from society. Not an overreaction, but some equivalent reaction nonetheless appears necessary in order to elicit future cooperation from that individual, and to demonstrate to other players the value of cooperation and the price of defection.

The ancient policy of lex talionis -- "an eye for an eye, and a tooth for a tooth" -- may be the wisest policy after all.

THE FUTURE OF COOPERATION

For cooperation to evolve, there have to be enough cooperators who interact with one another on a sufficiently regular basis. Such "islands of cooperation," once established, can grow... but too small an island will sink beneath the waves of defectors.

One critical factor not addressed by any other commentator on Axelrod's work I've seen concerns being able to recognize other players. The Tit for Tat strategy depends on remembering what another player did on the immediately previous turn. But if the other player is anonymous, or is encountered only once, it's impossible to associate a history with that player. This leads either to cooperating with an unknown (and possibly being taken advantage of repeatedly) or defecting from lack of trust (and possibly missing an opportunity to create an environment of cooperation).

This takes on added relevance today. Not only are the streets and highways filled with persons whom we'll never see again -- and who thus have no qualms about defecting (in other words, driving like jerks) -- we are spending more time surfing the Web as anonymous entities than we once did sitting in the back yard talking with our neighbors. Our contacts with other players in the game of trust/don't-trust are more likely to be brief encounters with strangers: ephemeral and anonymous. Under such conditions, not only is it unlikely that new clusters of cooperative behavior will evolve, but even the maintenance of what cooperation there is becomes difficult. Trust breaks down.

How long can such a state of affairs last?

Can we find a way to balance legitimate privacy interests with the guaranteed recognition required for cooperation to emerge? Or is anonymity and the Hobbesian, everyone-out-for-himself world imposed by anonymity inevitable?



". . . perhaps the chief thesis of the book on The Fatal Conceit . . . is that the basic morals of property and honesty, which created our civilization and the modern numbers of mankind, was the outcome of a process of selective evolution, in the course of which always those practices prevailed, which allowed the groups which adopted them to multiply most rapidly (mostly at their periphery among people who already profited from them without yet having fully adopted them)."
-- letter from Friedrich A. Hayek to Julian Simon, Nov. 6, 1981.