1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...

30
1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister

Transcript of 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...

Page 1: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

1

UM Stratego

Collin SchepersDaan VeltmanEnno RuijtersLeon Gerritsen

Niek den TeulingYannick Thimister

Page 2: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

2

• Introduction (Yannick)• Starting positions (Daan)• Evaluation Function (Leon)• Monte Carlo (Collin)• Genetic Algorithm (Enno)• Opponent modelling and strategy (Niek)• Results (Yannick)• Conclusion (Yannick)

Page 3: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

3

Starting Positions

• Distance to Freedom• Being bombed in

• Partial obstruction

• Adjacency

• Flag defence

• Startup Pieces

Page 4: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

4

Starting Positions

• Distance to Freedom

Page 5: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

5

Starting Positions

• Flag defence

Page 6: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

6

Starting Positions

• Startup Pieces

Page 7: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

7

Starting Positions

Flag: −10 for DTF < 2

+ 5 when bombed in

Spy: − 1 for pieces > captain, adjacent with higher DTF

− 2 when bombed in

− 1 when DTF > 5

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

Scout: − 2 for >2 in flag defence

− 2 when bombed in

− 1 when DTF > 5

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

− 1 for > 5 in DTF < 2

Miner: − 1 for >1 in flag defence

− 2 when bombed in

− 1 when DTF > 5

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

− 1 when on front row

− 1 for each > 1 in DTF < 2

Sergeant: − 1 for >1 in flag defence

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

− 2 for > 3 in DTF < 2

Lieutenant: − 1 for >1 in flag defence

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

− 2 for > 3 in DTF < 2

Captain: − 2 when bombed in

− 1 when DTF > 5

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

− 2 for > 3 in DTF < 2

− 1 when spy adjacent

− 1 for each > 2 in DTF < 2

Major: − 2 when bombed in

− 1 when DTF > 5

+ 1 when on flag side

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

− 2 for > 2 in DTF < 2

Colonel: − 2 when bombed in

− 1 when DTF > 5

+ 1 when on flag side

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

− 1 for > 1 in DTF < 2

General: − 2 when bombed in

− 1 when DTF > 5

+ 2 when on flag side and Marshal not on flag side

− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF

Marshal: − 2 when bombed in

− 1 when DTF > 5

+ 3 when on flag side

Bombs: − 2 when bombed in

− 1 when DTF > 5

+ 2 when sergeant adjacent

+ 2 when lieutenant adjacent

− 1 when scout adjacent

Page 8: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

8

• Sub-functions of the evaluation function:• Material value• Information value• Near enemy piece value

• Near flag value

• Progressive bonus value

Evaluation Function

Page 9: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

9

Evaluation Function

How it works: All the sub-functions return a value These values are then weighted and added to each

other The higher the total added value, the better that

move is for the player

Page 10: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

10

Evaluation Function

Material Value: Used for comparing the two players' board

strengths Each piece type has a value Total value of the opponent's board is

subtracted from the player's board value Positive value means strong player board Negative value means weak player board

Page 11: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

11

Evaluation Function

Information value: Stimulates the collection of opponent information

and the keeping of personal piece information Each piece type has a certain information value All the values from each side are summed up and

then substracted from each other A marshall being discovered is worse than a

scout being discovered

Page 12: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

12

Evaluation Function

Near enemy piece value Checks if a moveable piece can or

cannot defeat a piece next to it If piece can be defeated, return

positive score If not, return a negative one If piece unknown, return 0

Page 13: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

13

Evaluation Function Near flag value

Stimulates the defence of own flag and the attacking of enemy's flag

Constructs array with possible enemy flag locations

If enemy near own flag, return negative number

If own piece near possible enemy flag, return positive number

Page 14: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

14

Evaluation Function

Progressive bonus value Stimulates the advancement of pieces

towards enemy lines Returns a positive value if piece moves

forward Negative if backward

Page 15: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

15

Monte Carlo

A subset of all possible moves is played No strategy or weights used

Evaluation value received after every move

At the end a comparison of evaluation values determines the best move

A depth limit is used so the tree doesn't grow to big and the algorithm will end at some point

Page 16: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

16

Monte CarloTree representation

Page 17: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

17

Monte Carlo

Advantages:

Simple implementationCan be changed quicklyEasy observation of behaviorGood documentationGood for partial information situations

Page 18: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

18

Monte Carlo

Disadvantages:

Generally not smartDependents on the evaluation functionComputationally slow

Tree grows very fastA lot of memory required

Page 19: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

19

Genetic Algorithm

• Evolve weights of the terms in the evaluation functions

• AI uses standard expectiminimax search tree• Evolution strategies (evolution paremeters are

themselves evolved)

Page 20: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

20

Genetic Algorithm

• Genome:

• Mutation:

n1, w,wα,σ,=G ...

0,1τN1nn eσ=σ

0,1Nασ+α=α nn1nn

10,11 11 Nwσ++w=w ni,nni,ni,

Page 21: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

21

Genetic Algorithm

• Crossover:• σ and α of parents average• weights:

• Averaged if difference < α• Else randomly chosen from parents

Page 22: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

22

Genetic Algorithm

Fitness function: Win bonus Number of own pieces left Number of turns spent

Page 23: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

23

Genetic Algorithm

Reference AI: Monte Carlo AI Self-selecting reference genome

Select average genome from each generation

Pick winner between this genome and previous reference

Page 24: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

24

Opponent modeling

• Observing moves• Ruling out pieces• Stronger pieces are moved towards you• Weaker pieces are moved away

Page 25: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

25

Opponent modeling

• Initial probability distribution• Updating the probabilities

• Update the probability of the moving piece• Update probabilities of free pieces nearby

• Bluffing• Bluffing probability

Page 26: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

26

Strategy

• Split the game up into phases• Exploration phase• Elimination phase• End-game phase

• Alter the evaluation function

Page 27: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

27

Results

First generation GA against MC First generation GA wins 80,0% of the games

Algorithm nr. of wins Average nr. of turns

Standard deviation

First GA 879 90,6 84,5

MC 163 184,7 89,6

None (draw) 60

Page 28: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

28

Results

Newest generation (66th) GA against MC GA won 81,3 % of the games

Algorithm nr. of wins Average nr. of turns

Standard deviation

Newest GA 426 85,0 78,2

MC 73 181,4 90,8

None (draw) 25

Page 29: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

29

Results

First generation GA against newest generation of GA First GA wins 40,9 % of the games Newest generation GA wins only 25%

Algorithm nr. of wins Average nr. of turns

Standard deviation

First GA 3910 80,6 65,4

Newest GA 2393 125,0 66,7

None (draw) 3266

Page 30: 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

30

Conclusion

Monte Carlo is (still) weaker than GA Limited by begin piece setup Manual weight tweaking required

GA gets weaker after training Non transitivity of references Should let it play against other AI's