1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...

1

UM Stratego

Colin SchepersDaan VeltmanEnno RuijtersLeon Gerritsen

Niek den TeulingYannick Thimister

2

Introduction Yannick The game of Stratego Daan Evaluation Function Leon Monte Carlo Colin Genetic Algorithm Enno Opponent modeling and strategy Niek Conclusion

Yannick

Content

3

The game of Stratego

Board of 10x10

Setup field 4x10

4


B Bombs 1 Marshall 2 General 3 Colonels 4 Majors 5 Captains

6 Lieutenants 7 Sergeants 8 Miner 9 Scout S Spy F Flag

5


WinFlag captureUnmovable pieces

DrawUnmovable piecesMaximum moves

6

Starting Positions

Flag placed Bombs placed Remaining pieces placed randomly

7

Starting Positions

Distance to Freedom Being bombed in Partial obstruction Adjacency

Flag defence Startup Pieces

8

Starting Positions

Distance to Freedom

9

Starting Positions

Startup Pieces

10

Sub-functions of the evaluation function: Material value Information value Near enemy piece value Near flag value Progressive bonus value First-move penalty

Evaluation Function

11

Evaluation Function

How it works: All the sub-functions return a value These values are then weighted and added to

each other The higher the total added value, the better

that move is for the player

12

Evaluation Function

Material Value: Used for comparing the two players' board

strengths Each piece type has a value Total value of the opponent's board is

subtracted from the player's board value Positive value means strong player board Negative value means weak player board

13

Evaluation Function

Information value: Stimulates the collection of opponent information

and the keeping of personal piece information Each piece type has a certain information value All the values from each side are summed up and

then substracted from each other A marshall being discovered is worse than a

scout being discovered

14

Evaluation Function

Near enemy piece value Checks if a moveable piece can or cannot

defeat a piece next to it If piece can be defeated, return positive

score If not, return a negative one If piece unknown, return 0

15

Evaluation Function Near flag value

Stimulates the defence of own flag and the attacking of enemy's flag

Constructs array with possible enemy flag locations

If enemy near own flag, return negative number If own piece near possible enemy flag, return

positive number

16

Evaluation Function

Progressive bonus value Stimulates the advancement of pieces

towards enemy lines Returns a positive value if piece moves

forward Negative if backward

17

Evaluation Function

First-move valueKeeps pieces from giving away informationKeeps the number of unmoved pieces high

18

Monte Carlo A subset of all possible moves is played

No strategy or weights used Evaluation value received after every move

At the end a comparison of evaluation values determines the best move

A depth limit is used so the tree doesn't grow to big and the algorithm will end at some point

19

Monte Carlo

Advantages:

Simple implementation Can be changed quickly Easy observation of behavior Good documentation Good for partial information situations

20

Monte Carlo

Disadvantages:

Generally not smart Dependent on the evaluation function Computationally slow

Tree grows very fast

21

Monte Carlo Experiments

MC against lower-depth MC

Player Wins Losses Draw

MC 28 59 49

MC-LD 59 28 49

22


MC against no-depth MC


MC 15 2 12

MC-ND 2 15 12

23


MC against deeper-depth but narrower MC


MC 5 2 11

MC-DDN 2 5 11

24


MC against narrower MC


MC 62 18 85

MC-N 18 62 85

25

Genetic Algorithm

Evolve weights of the terms in the evaluation functions

AI uses standard expectiminimax search tree Evolution strategies (evolution paremeters are

themselves evolved)

26

Genetic Algorithm

Genome:

Mutation:

G= σ,α,w1,. .. ,wn

σ n=σ n−1⋅eN 0,τ

α n=α n−1+α n⋅N 0,σ w i,n= w i,n−1w i,n−1⋅N 0,σ

27

Genetic Algorithm

Crossover: σ and α of parents average weights:

Averaged if Else randomly chosen from parents

1α<ratio<α

28

Genetic Algorithm

Fitness function: Win bonus Number of own pieces left Number of turns spent

29

Genetic Algorithm

Reference AI: Monte Carlo AI Self-selecting reference genome

Select average genome from each generation

Pick winner between this genome and previous reference

30

Hill climbing

The GA takes too long to train Hill climbing is faster

31

Opponent modeling

Observing moves Ruling out pieces Stronger pieces are moved towards you Weaker pieces are moved away

32

Opponent modeling

No knowledge about enemy pieces at the start Updating the probabilities

Update the probability of the moving piece Update probabilities of all other pieces

33


MC against MC with opponent modeling using a database of Human versus human games


MC 39 44 58

MC-OM 44 39 58

34


MC against MC with opponent modeling using a database of MC versus MC games


MC

MC-OM

35

Strategy

Split the game up into phases Exploration phase

Until 25% of enemy pieces are identified Elimination phase

Until 70% of enemy pieces are killed End-game phase

Alter the evaluation function

36

Conclusion

Both AIs are very slow The genetic AI takes too long to train

In case of Stratego, tweaking a few weights may not be an optimal way to create an intelligent player

1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...

Documents

Transcript of 1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...