1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...

36
1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister

Transcript of 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...

Page 1: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

1

UM Stratego

Colin SchepersDaan VeltmanEnno RuijtersLeon Gerritsen

Niek den TeulingYannick Thimister

Page 2: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

2

Introduction Yannick The game of Stratego Daan Evaluation Function Leon Monte Carlo Colin Genetic Algorithm Enno Opponent modeling and strategy Niek Conclusion

Yannick

Content

Page 3: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

3

The game of Stratego

Board of 10x10

Setup field 4x10

Page 4: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

4

The game of Stratego

B Bombs 1 Marshall 2 General 3 Colonels 4 Majors 5 Captains

6 Lieutenants 7 Sergeants 8 Miner 9 Scout S Spy F Flag

Page 5: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

5

The game of Stratego

WinFlag captureUnmovable pieces

DrawUnmovable piecesMaximum moves

Page 6: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

6

Starting Positions

Flag placed Bombs placed Remaining pieces placed randomly

Page 7: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

7

Starting Positions

Distance to Freedom Being bombed in Partial obstruction Adjacency

Flag defence Startup Pieces

Page 8: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

8

Starting Positions

Distance to Freedom

Page 9: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

9

Starting Positions

Startup Pieces

Page 10: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

10

Sub-functions of the evaluation function: Material value Information value Near enemy piece value Near flag value Progressive bonus value First-move penalty

Evaluation Function

Page 11: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

11

Evaluation Function

How it works: All the sub-functions return a value These values are then weighted and added to

each other The higher the total added value, the better

that move is for the player

Page 12: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

12

Evaluation Function

Material Value: Used for comparing the two players' board

strengths Each piece type has a value Total value of the opponent's board is

subtracted from the player's board value Positive value means strong player board Negative value means weak player board

Page 13: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

13

Evaluation Function

Information value: Stimulates the collection of opponent information

and the keeping of personal piece information Each piece type has a certain information value All the values from each side are summed up and

then substracted from each other A marshall being discovered is worse than a

scout being discovered

Page 14: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

14

Evaluation Function

Near enemy piece value Checks if a moveable piece can or cannot

defeat a piece next to it If piece can be defeated, return positive

score If not, return a negative one If piece unknown, return 0

Page 15: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

15

Evaluation Function Near flag value

Stimulates the defence of own flag and the attacking of enemy's flag

Constructs array with possible enemy flag locations

If enemy near own flag, return negative number If own piece near possible enemy flag, return

positive number

Page 16: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

16

Evaluation Function

Progressive bonus value Stimulates the advancement of pieces

towards enemy lines Returns a positive value if piece moves

forward Negative if backward

Page 17: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

17

Evaluation Function

First-move valueKeeps pieces from giving away informationKeeps the number of unmoved pieces high

Page 18: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

18

Monte Carlo A subset of all possible moves is played

No strategy or weights used Evaluation value received after every move

At the end a comparison of evaluation values determines the best move

A depth limit is used so the tree doesn't grow to big and the algorithm will end at some point

Page 19: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

19

Monte Carlo

Advantages:

Simple implementation Can be changed quickly Easy observation of behavior Good documentation Good for partial information situations

Page 20: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

20

Monte Carlo

Disadvantages:

Generally not smart Dependent on the evaluation function Computationally slow

Tree grows very fast

Page 21: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

21

Monte Carlo Experiments

MC against lower-depth MC

Player Wins Losses Draw

MC 28 59 49

MC-LD 59 28 49

Page 22: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

22

Monte Carlo Experiments

MC against no-depth MC

Player Wins Losses Draw

MC 15 2 12

MC-ND 2 15 12

Page 23: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

23

Monte Carlo Experiments

MC against deeper-depth but narrower MC

Player Wins Losses Draw

MC 5 2 11

MC-DDN 2 5 11

Page 24: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

24

Monte Carlo Experiments

MC against narrower MC

Player Wins Losses Draw

MC 62 18 85

MC-N 18 62 85

Page 25: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

25

Genetic Algorithm

Evolve weights of the terms in the evaluation functions

AI uses standard expectiminimax search tree Evolution strategies (evolution paremeters are

themselves evolved)

Page 26: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

26

Genetic Algorithm

Genome:

Mutation:

G= σ,α,w1,. .. ,wn

σ n=σ n−1⋅eN 0,τ

α n=α n−1+α n⋅N 0,σ w i,n= w i,n−1w i,n−1⋅N 0,σ

Page 27: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

27

Genetic Algorithm

Crossover: σ and α of parents average weights:

Averaged if Else randomly chosen from parents

1α<ratio<α

Page 28: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

28

Genetic Algorithm

Fitness function: Win bonus Number of own pieces left Number of turns spent

Page 29: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

29

Genetic Algorithm

Reference AI: Monte Carlo AI Self-selecting reference genome

Select average genome from each generation

Pick winner between this genome and previous reference

Page 30: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

30

Hill climbing

The GA takes too long to train Hill climbing is faster

Page 31: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

31

Opponent modeling

Observing moves Ruling out pieces Stronger pieces are moved towards you Weaker pieces are moved away

Page 32: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

32

Opponent modeling

No knowledge about enemy pieces at the start Updating the probabilities

Update the probability of the moving piece Update probabilities of all other pieces

Page 33: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

33

Monte Carlo Experiments

MC against MC with opponent modeling using a database of Human versus human games

Player Wins Losses Draw

MC 39 44 58

MC-OM 44 39 58

Page 34: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

34

Monte Carlo Experiments

MC against MC with opponent modeling using a database of MC versus MC games

Player Wins Losses Draw

MC

MC-OM

Page 35: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

35

Strategy

Split the game up into phases Exploration phase

Until 25% of enemy pieces are identified Elimination phase

Until 70% of enemy pieces are killed End-game phase

Alter the evaluation function

Page 36: 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.

36

Conclusion

Both AIs are very slow The genetic AI takes too long to train

In case of Stratego, tweaking a few weights may not be an optimal way to create an intelligent player