aan15

download aan15

of 8

Transcript of aan15

  • 8/13/2019 aan15

    1/8

    Games and Economic Behavior 79 (2013) 6774

    Contents lists available at SciVerse ScienceDirect

    Games and Economic Behavior

    www.elsevier.com/locate/geb

    Note

    Evolutionary stability in repeated extensive games played bynite automata

    Luciano Andreozzi 1

    Universit degli Studi di Trento, Facolt di Economia, Via Inama, 8, 38100 Trento, Italy

    a r t i c l e i n f o a b s t r a c t

    Article history:Received 10 May 2010Available online 18 January 2013

    JEL classication:C70C72

    Keywords:Finite automataTrust gameEvolutionary stabilityCooperation

    We discuss the emergence of cooperation in repeated Trust Mini-Games played by niteautomata. Contrary to a previous result obtained by Piccione and Rubinstein (1993) , werst prove that this repeated game admits two Nash equilibria, a cooperative and a non-cooperative one. Second, we show that the cooperative equilibrium is the only (cyclically)stable set under the so-called best response dynamics.

    2013 Elsevier Inc. All rights reserved.

    1. Introduction

    Repeated games have enormous sets of equilibria. In a seminal article, Abreu and Rubinstein (1988) introduced the ideathat the equilibrium selection problem could be addressed by modeling strategies as nite automata. In this approach, thetotal payoff of a strategy is a combination of the complexity of the automaton that represents it (as measured by the numberof its states) and the payoff it obtains in the playing of the game. They proved that arbitrarily small costs of complexitycould drastically reduce the strategies that can be sustained in equilibrium. In the repeated Prisoners Dilemma (PD), forexample, some popular strategies such as Tit for Tat (TfT) cannot be Nash equilibria. The reason is that, in playing againstitself, TfT never reaches the states in which it does not cooperate. It follows that a strategy of unconditional cooperation isa best response to TfT, because it obtains the same payoff as TfT itself, but with a smaller number of states.

    Abreu and Rubinstein (1988) proved that cooperation can only be achieved by machines that put the punishing phaserst. Each machine starts by punishing the other by playing Defect for a xed number of rounds and does not revert tocooperation unless the other machine has played Defect for the same number of rounds. Once the punishing phase isover, both machines start cooperating. Switching to defection during the cooperative phase is deterred by the threat tostart the punishment phase all over again. Abreu and Rubinstein (1988) provide a nice interpretation of this initial phase of punishment as a show of strength: at the beginning of the play, each machine will test the ability of the other machineto punish an eventual defection. Machines that are unable to punish are exploited by unending defection.

    While this argument drastically restricts the strategies that one can observe in equilibrium, it still allows for a hugevariety of possible outcomes, including perpetual defection. Binmore and Samuelson (1992) used an evolutionary model tostudy the resulting equilibrium selection problem and obtained a stark result. When the cost of complexity is so small that

    E-mail address: [email protected] .1 I would like to thank Ken Binmore, Larry Samuelson and Michele Piccione for their comments on previous versions of this paper. Two anonymous

    referees and an editor of this journal provided detailed comments that greatly improved the exposition of the matter. All remaining mistakes are mine.

    0899-8256/$ see front matter 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.geb.2013.01.003

    http://dx.doi.org/10.1016/j.geb.2013.01.003http://www.sciencedirect.com/http://www.elsevier.com/locate/gebmailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.geb.2013.01.003http://crossmark.dyndns.org/dialog/?doi=10.1016/j.geb.2013.01.003&domain=pdfhttp://dx.doi.org/10.1016/j.geb.2013.01.003mailto:[email protected]://www.elsevier.com/locate/gebhttp://www.sciencedirect.com/http://dx.doi.org/10.1016/j.geb.2013.01.003
  • 8/13/2019 aan15

    2/8

  • 8/13/2019 aan15

    3/8

    L. Andreozzi / Games and Economic Behavior 79 (2013) 6774 69

    Fig. 2. The unique minimum CURB set of the machine game.

    changes depending upon its current state and the outcome of the previous round. Formally, a machine for player i is aquadruple Q i , q0i , , with the following characteristics.

    Q i is a set of states; q0i is the initial state; i : Q i Si is the output function;

    i : Q i E Q i is the transition function.

    Let M S and M R be the set of nite automata for player S and R. Two machines (M S, M R) playing against each otherproduce a deterministic history of strategies chosen by the two players (st ) . E (st ) is the set of end nodes reached as aconsequence of the history of play st , and h i ( E (st )) is the payoff obtained by player i at round t . The payoff resulting froma match (M S, M R) for player i = S, R is thus i ( M S, M R) = (1 ) t = 1

    t 1h i ( E (st )) , where is the time discount factor.Each state after the rst one has a cost c .

    We shall denote with G the quadruple M S, M R, c S, c R . G

    is usually referred to as the machine game . The ex-tension to mixed strategies is done in the usual way. We shall indicate with c S( M S, q) (

    c R( M R, p)) the payoff the Sender

    (Receiver) obtains by using the machine M S (M R) when the cost of a state is c and he expects the Receiver (Sender) to usehis mixed strategy q ( p). Notice that, since in Section 4 mixed strategies will be interpreted as the frequencies with whichpure strategies are used within two populations, payoffs are only dened for pure strategies. Finally, BRS(q) and BRR( p)denote the best response correspondences for the Sender and the Receiver.

    3. Equilibria

    Before we present our results, we introduce the notion of Closed Under Rational Behavior (CURB) set ( Basu and Weibull,1991 ). Let ( M ) be the set of probability distributions over a set of machines M .

    Denition 1. Let ( M S, M R) be two nite sets of automata. We say that ( M S, M R) is a CURB set for the game G if (a)for all q ( M R) , M S BRS(q) implies that M S M S and (b) for all p ( M S) , M R BRR( p) implies that M R M R.A CURB set is minimal if it contains no proper subsets that are CURB sets.

    Intuitively, a set of machines ( M S, M R) is a CURB set if, when player i expects player j to choose with positive proba-bility only strategies in M j he will only choose a strategy within M i .

    We shall now prove that the machine game G has a small minimal CURB set, and then that there is one cooperative

    equilibrium within that set. Fig. 2 contains all the automata that form this minimal CURB set. These are: all the one-state machines for both players ( M NT S , M

    T S for the Sender and M

    NRR , M

    RR for the Receiver) plus a two-state machine for the

  • 8/13/2019 aan15

    4/8

    70 L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

    Table 1A simplied version of the machine game.

    M NRR M RR

    M NT S 0, 0 0, 0M T S vS, V R 1, 1

    M g S vS(1 ) c , V R(1 ) 1 c , 1

    sender M g S . The latter machine implements the grim strategy: it plays T in the rst round and keeps playing T as long asthe other player has played R.3 It reverts to a constant play of NT after the rst round in which the other player has playedNR.

    These strategies yield very simple patterns when matched against each other. M NT S produces a stream of NT indepen-dently of the strategy with which it is matched. (M T S , M

    RR) and (M

    g S , M

    RR) produce an uninterrupted stream of (T , R) .

    ( M T S , M NRR ) produces a continuous stream of (T , NR) . (M

    g S , M

    NRR ) produces (T , NR) in the rst round, followed by a stream

    of NT .Let S S = {M NT S , M

    T S , M

    g S}, S R = { M

    NRR , M

    RR} and S = S R S S. We shall indicate with G

    S the game in which playerschoices are restricted to the set S . Table 1 represents GS .

    All proofs are based on the following assumption:

    Assumption 1.

    (i) crit := V R 1V R ;

    (ii) c c crit := v S(1 v S) .

    crit is the threshold value of such that when crit (M g S , M

    RR) is a Nash equilibrium in the machine game without

    complexity costs ( c = 0). This is the familiar condition that players must be sufficiently patient for cooperation to be a Nashequilibrium in a repeated game. The second condition imposes that complexity costs are sufficiently small with respect to and vS.

    Proposition 1. Let Assumption 1 hold. Then the set S is the unique minimal CURB set for G .

    I shall only present a sketch of the proof that S is in fact a minimal CURB set. First note that against M NT S

    all Receiversmachines obtain the same payoff (zero). Hence, it suffices to consider what any alternative machine can obtain against M T Sand M g S . The key of the proof is that against M

    T S no machine can do better than M

    NRR , because M

    NRR exploits M

    T S with the

    minimum number of states. Against M g S no machine that always plays R can do better than M RR . On the other hand, if a

    machine M R that plays NR for the rst time in round n > 1 obtains a larger payoff than M RR against the mixed strategychosen by the Sender, then M NRR (which plays NR at the rst round) will obtain an even larger payoff, because it wouldobtain a larger payoff with a smaller number of states. Similarly, any machine M S with |M S| > 0 that never plays Trust isstrictly dominated by M NT S , because it contains a larger number of states and obtains the same repeated game payoff (zero).Now consider a machine M S that plays T for the rst time in round n . The proof consists in showing that any such machinecannot obtain a payoff which is simultaneously larger than the payoff obtained by M g S and M

    T S .

    The following is a simple corollary of Proposition 1, which is worth stating as a separate result.

    Corollary 1. All the NE for GS are NE also for G .

    The following proposition characterizes the NE for GS , which, because of the previous corollary, are equilibria for G aswell.

    Proposition 2. The game GS has a connected component N of NE in which the Sender chooses M NT S with probability one and theReceiver chooses M NRR with a probability q min (

    11 v S ,

    1 c 1 v S(1 ) ) := q0 . If Assumption 1 is met, G

    S has also a mixed strategy NE

    ( p, q) where p = (0, V R( 1)V R , V R 1

    V R ) and q= ( c v S ,

    vS+ c v S ).

    There is a clear intuition behind both (sets of) NE in Proposition 2. First, there is a set of NE in which the Senderchooses the non-trusting machine M NT S because she expects the Receiver to choose the non-rewarding machine M

    NRR with a

    3 For the sake of a simple notation, we use the same letters for strategies and outcomes, when this does not create confusion. So in Fig. 2 the letter Rstands for the outcome ( T , R) , and NR stands for (T , NR) and so on.

  • 8/13/2019 aan15

    5/8

    L. Andreozzi / Games and Economic Behavior 79 (2013) 6774 71

    sufficiently high probability. The second NE is slightly more complex. The Sender expects the Receiver to always play Reward(M RR) or always play Not Reward ( M

    NRR ) and the probability he puts on these two strategies are such that he gets the same

    payoff by choosing the unconditionally trustful machine M T S , and the grim machine M g S . M

    g S is more complex than M

    T S .

    However, it obtains a larger payoff against M NRR because M g S quits trusting after the rst time that the Receiver has played

    NR. So, while M T S yields higher payoffs when the Receiver plays M RR with a sufficiently large probability, M

    g S becomes the

    best reply when the non-rewarding machine M RR is expected with a larger probability. The trick of the proof is that whenthe cost of an extra state is sufficiently low (i.e. when c < c

    crit ) it pays to have an extra state to discriminate between M NR

    Rand M RR rather that reverting to the simpler (but not discriminating) machine that never trusts M NT S .

    4. Learning

    Consider the following extremely simplied model of learning. 4 There are two large (innite) populations of agentswhich, with an abuse of notation, we shall denote as S (Sender) and R (Receiver). The game G is played by pairs of individuals drawn at random from S and R. Each agent in each population adopts a machine. The state of the two pop-ulations is represented by a pair ( p(t ), q(t )) , where p(t ) and q(t ) are the distributions among the machines within the Sand R population respectively. Let pM S (t ) and qM R (t ) be the fraction of the population S and R that use machines M S andM R respectively at time t . As above, let BRS(q(t )) and BRR( p(t )) be the set of best replies for the Sender and the Receiverrespectively when the state of the two populations is ( p(t ), q(t )) .5

    Agents in each population revise their strategies at a xed rate. When revising her strategy, an agent will switch to one

    of the best replies. These hypotheses ensure that the states of the two populations evolve according to the well known BestResponse Dynamics (BRD)

    p(t ) = b S(t ) p(t ),

    q(t ) = bR(t ) q(t ) (1)

    where b S(t ) BRS(q(t )) and b R(t ) BRR(q(t )) for all t (Gilboa and Matsui, 1991 ).6

    Proposition 3. The set ( S ) is invariant under the BRD. That is, for any initial condition ( p(0), q(0)) ( S ) , for any t > 0( p(t ), q(t )) ( S ) .

    Proof. This is an immediate consequence of the denition of the BRD and CURB set.

    This proposition allows us to study the stability properties of the equilibria in the only minimal CURB set as if that werean independent game. In fact, if the system starts at any point of ( S ) , the dynamics will not take it out of ( S ) .

    The stability concept we shall use was introduced by Matsui (1992) . Intuitively, a set of states X is stable if there is nobest response path that leads from any element of X to a state which is not in X . To make this precise, we need somefurther piece of terminology. A strategy distribution ( p, q) ( S) is directly accessible from ( p , q ) if there exists a best replypath such that ( p(0), q(0)) = ( p, q) and ( p( T ), q( T )) = ( p , q ) for some T 0. Also ( p , q ) is accessible from ( p, q) if oneof the following holds true: (i) ( p , q ) is directly accessible from ( p, q) ; (ii) there exists a sequence ( pn , qn ) converging at( p , q ) such that ( pn , qn ) is directly accessible from ( p, q) for any n; (iii) if ( p , q ) is accessible from another ( p , q ) whichis accessible from ( p, q) .

    We need one nal denition: a set of states F ( S ) is a cyclically stable set (CSS) if (i) any ( p , q ) /F is not accessiblefrom any ( p, q) F , and (ii) any ( p, q) F is accessible from any ( p , q ) F . The idea of a CSS is that a set of states F isstable if the best response dynamics ( 1) cannot leave F .

    We are now ready to formulate our main result.

    Proposition 4. In game G , the mixed strategy NE ( p, q) is cyclically stable. The set of NE N is not cyclically stable.

    Fig. 3 provides a graphical illustration of this proposition. It represents the state space ( S ) , with the two (sets of) Nashequilibria ( p, q) and N . p0 , p1 and p g represent the fractions of population S which play M NT S , M

    T S and M

    g S respectively

    and q0 and q1 are the fractions of R that play M NRR and M RR . It also represents two orbits generated by the BRD. The rst

    orbit originates from x, which lies on the face where p0 = 0. and has the familiar appearance of the orbits generated in

    4 This is the continuous and deterministic counterpart of the stochastic learning model proposed by Volij (2002) .5 We are implicitly assuming here that only a nite number of machines are represented at the initial state ( p(0), q(0)) and at any following time t > 0.

    The analysis of the BRD for games with an innite number of strategies is beyond the scope of this paper. On the other hand, none of the results presentedbelow requires such an analysis.

    6 Note that these are not differential equations, because best replies might not be unique, so that more than one orbit can originate from the same initialcondition.

  • 8/13/2019 aan15

    6/8

    72 L. Andreozzi / Games and Economic Behavior 79 (2013) 6774

    Fig. 3. Orbits generated by the BRD on ( S ) .

    2 2 games with a single mixed strategy NE (see for example Berger, 2002 ). p0 remains constantly equal to zero, while thepopulations converges towards ( p, q) . The second orbit starts from y N and converges to ( p, q) , just like the rst one.The logic of the proof is to show that from any point in N there is a best response path that approaches ( p, q) , whilethere are no best response paths going from ( p, q) to N . Actually, ( p, q) attracts all best response paths originating ina sufficiently small neighborhood.

    5. Conclusions

    The results presented above have two main implications for the theory of repeated games played by nite automata. First,a show of strength is not necessary to sustain cooperation, provided that mixed strategies are allowed. Even sequentialgames, for which the show of strength argument fails, do have cooperative equilibria in mixed strategies. Second, whena player is allowed to choose rst in a PD the non-cooperative equilibrium cannot be stable. This suggests that the exacttiming of decision plays a crucial, and neglected, role in the way in which cooperation can spread in a world of universaldefection.

    Appendix A

    A.1. Proof of Proposition 1

    We shall prove Proposition 1 with the help of three lemmata.

    Lemma 1. The set of strategies S is a minimal CURB set for G .

    Proof. We must show that any machine for the Sender and the Receiver M i /S (i = S, R) yields, against any element of ( S ) , a payoff which is strictly smaller than the payoff offered by at least one element of S . First consider any machine

    for the sender M S /S . M S has at least two states, because otherwise it would belong to S . If it never plays T , its payoff is 0 c | M S| < 0 (c 1) and hence it is strictly dominated by M NT S , whose payoff is zero. Suppose thus that M S plays T atthe beginning of the game (there is no loss of generality in this assumption, because none of the machines in S R behavedifferently depending on the round in which the rst T takes place). Its payoff against M NRR is thus bounded above byv S(1 ) c | M S| . Its payoff against M RR is bounded above by 1 c | M S| . As a consequence, M S is strictly dominated by M

    g S

    whenever | M S| > 1. If | M S| = 1, M S is a two-state machine whose initial state is T and the other is NT . (If both states were

    T , M S would be dominated by M T S .) It is a tedious exercise to show that all two states machines in which the st state is

    T are strictly dominated by M g S against any probability distribution involving only M NRR and M

    RR .

    Now consider an alternative machine for the Receiver M R /S R. All Receivers machines obtain the same payoff (zero)against M NT S , and hence it suffices to consider what M R obtains against M

    T S and M

    g S . M R has at least two states. If the initial

    state is NR, it is dominated by M NRR . To see this, consider that against M T S any machine obtains at most V R c | M R| < V R,

    while against M g S a machine that plays NR in the rst round gets V R(1 ) c | M R| . Suppose then that M R has R as initialstate. Neither M T S nor M

    g S play NT , unless the Receiver has played NR once. If M R always plays R after any T , M R is strictly

    dominated by M RR . The reason is that it obtains a constant stream of ( T , R) against both M T S and M

    g S , and it has at least two

    states. So M R must have at least one state in which it plays NR, which is reached after a sequence of T s. After it has playedNR the rst time, the best that M R can do is keep playing NR, for M

    g S will not play Trust any longer, while M

    T S will continue

    to play T . As a consequence, M R obtains the same payoff as M NRR , beginning at round n. Before that, it obtains a stream of 1.Let 0 ( p) and 1 ( p) = 1 be the payoff obtained by M NRR and M

    RR resp. when the Sender plays the mixed strategy p. If the

    Receiver uses M R he obtains: ( p) = 1 n + n 0 ( p) c | M R| . Clearly, if 0 ( p) 1 ( p) = 1, then 0 ( p) > ( p) , so thatM R cannot be a best response. If 0 ( p) 1 ( p) = 1, then 1 ( p) = 1 > ( p) , and again M R cannot be a best response.

  • 8/13/2019 aan15

    7/8

    L. Andreozzi / Games and Economic Behavior 79 (2013) 6774 73

    The second part of the following lemma is a well known result in this kind of literature: the best reply to a machinenever contains more states that the machine itself. The rst part depends upon the sequential structure of the Trust Game.

    Lemma 2. Let M S and M R be two machines for the Sender and the Receiver resp. such that | M i | 2. If M R BRR( M S) , then | M R| 0, and hence along the best response path originatingfrom y, p0 (t ) goes monotonically to zero and p1 (t ) and p g (t ) approach p1 and p

    g .

    Consider any point on H . At any such point p1 = p1 (1 p00 ) , with p00 [0, 1], and q0 [q

    0 , q0 ]. One of the best repliesfor the sender is clearly M g S , so that one has that when p1 (0) = p

    (1 p00 ) , a best reply path is p1 (t ) = p(1 p00 )e t ,with t [0, t 1 ) . t 1 = log(

    q0q0

    ) is the time it takes for population R to move from q0 to q0 . Let p11 be p1 (t 1 ) . When q0 (t ) = q

    0 ,

    M T S becomes the best reply, so that one has that p1 (t ) = 1 (1 p11 )e (t t 1 ) . The orbit will cross H when p1 (t ) p1 (t )+ p g (t ) =

    p1 (t )1 p0 (t ) = p

    1 . This requires that

    1 (1 p11 )e (t t 1 )

    1 p00 e t = p1 ,

    1 (1 ( p

    (1 p00 )e t 1

    )) e (t t 1 )

    1 p00 e t = p1 .

    Solving the last equation for t , one obtains t 2 = log( p1 e

    t 1

    1 p1) , which does not depend on p00 . This means that starting

    from y, p1 will take the same time t 2 to come back to p1 , as when starting from x. At that time, one would have thatq0 (t 2 ) = q0e t 2 . With an analogous reasoning one can show that it takes the same time t 3 for the system to go back to theplane H , independently of the point ( x or y) from which the orbit started. One can iterate this reasoning to show that infact (qi0 ( y)) = (q

    i0 ( y)) for any i, and therefore that both orbits converge to ( p

    , q) .

    References

    Abreu, D., Rubinstein, A., 1988. The structure of Nash equilibria in repeated games with nite automata. Econometrica 56, 12591282.Basu, K., Weibull, J.W., 1991. Strategy subsets closed under rational behavior. Econ. Letters 36 (2), 141146.

    Berger, U., 2002. Best response dynamics for role games. Int. J. Game Theory 30, 527538.Binmore, K., Samuelson, L., 1992. Evolutionary stability in repeated games played by nite automata. J. Econ. Theory 57, 278305.Gilboa, I., Matsui, A., 1991. Social stability and equilibrium. Econometrica 59, 859867.Matsui, A., 1992. Best response dynamics and socially stable strategies. J. Econ. Theory 57, 343362.Piccione, M., Rubinstein, A., 1993. Finite automata play a repeated extensive game. J. Econ. Theory 61, 160168.Samuelson, L., Swinkels, J.M., 2003. Evolutionary stability and lexicographic preferences. Games Econ. Behav. 44 (2), 332342.Volij, O., 2002. In defense of defect. Games Econ. Behav. 39, 309321.