1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...
-
Upload
noah-roche -
Category
Documents
-
view
219 -
download
3
Embed Size (px)
Transcript of 1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick...

1
UM Stratego
Collin SchepersDaan VeltmanEnno RuijtersLeon Gerritsen
Niek den TeulingYannick Thimister

2
• Introduction (Yannick)• Starting positions (Daan)• Evaluation Function (Leon)• Monte Carlo (Collin)• Genetic Algorithm (Enno)• Opponent modelling and strategy (Niek)• Results (Yannick)• Conclusion (Yannick)

3
Starting Positions
• Distance to Freedom• Being bombed in
• Partial obstruction
• Adjacency
• Flag defence
• Startup Pieces

4
Starting Positions
• Distance to Freedom

5
Starting Positions
• Flag defence

6
Starting Positions
• Startup Pieces

7
Starting Positions
Flag: −10 for DTF < 2
+ 5 when bombed in
Spy: − 1 for pieces > captain, adjacent with higher DTF
− 2 when bombed in
− 1 when DTF > 5
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
Scout: − 2 for >2 in flag defence
− 2 when bombed in
− 1 when DTF > 5
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
− 1 for > 5 in DTF < 2
Miner: − 1 for >1 in flag defence
− 2 when bombed in
− 1 when DTF > 5
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
− 1 when on front row
− 1 for each > 1 in DTF < 2
Sergeant: − 1 for >1 in flag defence
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
− 2 for > 3 in DTF < 2
Lieutenant: − 1 for >1 in flag defence
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
− 2 for > 3 in DTF < 2
Captain: − 2 when bombed in
− 1 when DTF > 5
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
− 2 for > 3 in DTF < 2
− 1 when spy adjacent
− 1 for each > 2 in DTF < 2
Major: − 2 when bombed in
− 1 when DTF > 5
+ 1 when on flag side
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
− 2 for > 2 in DTF < 2
Colonel: − 2 when bombed in
− 1 when DTF > 5
+ 1 when on flag side
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
− 1 for > 1 in DTF < 2
General: − 2 when bombed in
− 1 when DTF > 5
+ 2 when on flag side and Marshal not on flag side
− 1 when DTF > 1, adjacent piece -1 or -2 = piece, adjacent piece higher DTF
Marshal: − 2 when bombed in
− 1 when DTF > 5
+ 3 when on flag side
Bombs: − 2 when bombed in
− 1 when DTF > 5
+ 2 when sergeant adjacent
+ 2 when lieutenant adjacent
− 1 when scout adjacent

8
• Sub-functions of the evaluation function:• Material value• Information value• Near enemy piece value
• Near flag value
• Progressive bonus value
Evaluation Function

9
Evaluation Function
How it works: All the sub-functions return a value These values are then weighted and added to each
other The higher the total added value, the better that
move is for the player

10
Evaluation Function
Material Value: Used for comparing the two players' board
strengths Each piece type has a value Total value of the opponent's board is
subtracted from the player's board value Positive value means strong player board Negative value means weak player board

11
Evaluation Function
Information value: Stimulates the collection of opponent information
and the keeping of personal piece information Each piece type has a certain information value All the values from each side are summed up and
then substracted from each other A marshall being discovered is worse than a
scout being discovered

12
Evaluation Function
Near enemy piece value Checks if a moveable piece can or
cannot defeat a piece next to it If piece can be defeated, return
positive score If not, return a negative one If piece unknown, return 0

13
Evaluation Function Near flag value
Stimulates the defence of own flag and the attacking of enemy's flag
Constructs array with possible enemy flag locations
If enemy near own flag, return negative number
If own piece near possible enemy flag, return positive number

14
Evaluation Function
Progressive bonus value Stimulates the advancement of pieces
towards enemy lines Returns a positive value if piece moves
forward Negative if backward

15
Monte Carlo
A subset of all possible moves is played No strategy or weights used
Evaluation value received after every move
At the end a comparison of evaluation values determines the best move
A depth limit is used so the tree doesn't grow to big and the algorithm will end at some point

16
Monte CarloTree representation

17
Monte Carlo
Advantages:
Simple implementationCan be changed quicklyEasy observation of behaviorGood documentationGood for partial information situations

18
Monte Carlo
Disadvantages:
Generally not smartDependents on the evaluation functionComputationally slow
Tree grows very fastA lot of memory required

19
Genetic Algorithm
• Evolve weights of the terms in the evaluation functions
• AI uses standard expectiminimax search tree• Evolution strategies (evolution paremeters are
themselves evolved)

20
Genetic Algorithm
• Genome:
• Mutation:
n1, w,wα,σ,=G ...
0,1τN1nn eσ=σ
0,1Nασ+α=α nn1nn
10,11 11 Nwσ++w=w ni,nni,ni,

21
Genetic Algorithm
• Crossover:• σ and α of parents average• weights:
• Averaged if difference < α• Else randomly chosen from parents

22
Genetic Algorithm
Fitness function: Win bonus Number of own pieces left Number of turns spent

23
Genetic Algorithm
Reference AI: Monte Carlo AI Self-selecting reference genome
Select average genome from each generation
Pick winner between this genome and previous reference

24
Opponent modeling
• Observing moves• Ruling out pieces• Stronger pieces are moved towards you• Weaker pieces are moved away

25
Opponent modeling
• Initial probability distribution• Updating the probabilities
• Update the probability of the moving piece• Update probabilities of free pieces nearby
• Bluffing• Bluffing probability

26
Strategy
• Split the game up into phases• Exploration phase• Elimination phase• End-game phase
• Alter the evaluation function

27
Results
First generation GA against MC First generation GA wins 80,0% of the games
Algorithm nr. of wins Average nr. of turns
Standard deviation
First GA 879 90,6 84,5
MC 163 184,7 89,6
None (draw) 60

28
Results
Newest generation (66th) GA against MC GA won 81,3 % of the games
Algorithm nr. of wins Average nr. of turns
Standard deviation
Newest GA 426 85,0 78,2
MC 73 181,4 90,8
None (draw) 25

29
Results
First generation GA against newest generation of GA First GA wins 40,9 % of the games Newest generation GA wins only 25%
Algorithm nr. of wins Average nr. of turns
Standard deviation
First GA 3910 80,6 65,4
Newest GA 2393 125,0 66,7
None (draw) 3266

30
Conclusion
Monte Carlo is (still) weaker than GA Limited by begin piece setup Manual weight tweaking required
GA gets weaker after training Non transitivity of references Should let it play against other AI's