Werkstuk Mak Tcm39 91392

download Werkstuk Mak Tcm39 91392

of 34

Transcript of Werkstuk Mak Tcm39 91392

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    1/34

    Approximate LinearProgramming for Traffic

    Control at IsolatedSignalized Intersections

    Angelique Mak

    Vrije ni!ersiteit"acult# of Sciences

    $usiness Mat%ematics and Informatics&e $oelelaan '()'a'()' *V Amsterdam

    +o!em,er -((.

    _____________________________________________________________________- 1 -

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    2/34

    Page 2

    Preface

    The BMI paper is one of the last compulsory subjects of the masterstudy Business Mathematics and Informatics. The target of the BMIpaper is describing and analying a problem in the field of BMIbased on e!isting literature. Its focus embraces aspects ofeconomics" mathematics" and computer science.

    The building bloc#s of this paper are t$o papers. The first paper %1/describes an &ppro!imate 'inear Programming approach fora(erage cost dynamic programming. & traffic control problemformulated in the second paper %)/ is chosen to illustrate this

    approach. It focuses on the linear approach to appro!imate thedynamic programming (alue function through e!periments $ithtraffic control at isolated signalied intersections to find out ho$traffic light s$itching schemes for this system can be determinedsuch that the number of cars in all flo$s is minimied in the longterm.

    I $ould li#e to than# my super(isor" *andjai Bhulai. +e pro(ided meguidance and support" not only during my $riting" but also most ofthe time during my master study BMI. &lso his optimism hasencouraged me and I am grateful for that.

    ,urther" I $ould li#e to than# Paul +ar#in#" hi +ung Mo#" andhristel ijman for their technical support. /ithout their generosityand help" finishing this thesis $ould ha(e been much harder for me.

    _____________________________________________________________________- 2 -

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    3/34

    Page 0

    Management Summar# in 0nglis%

    Intersections are places in the traffic net$or# $here many potentialconflicts occur. Traffic is much affected by the traffic light control. &single intersection of t$o t$o-$ay streets $ith controllable trafficlights at each corner is considered in this paper. The main purposeof this paper is to apply the two-phase Approximate LinearProgrammingapproach for average cost dynamic programmingpresented by e ,arias and an 3oy %1/" to find out ho$ traffic light

    s$itching schemes for this system can be determined such that thenumber of cars in all flo$s is minimied in the long term. ars arri(eat an intersection controlled by a traffic light and form a 4ueue. Thedynamic control of the traffic lights is based on the numbers of cars$aiting in the 4ueues. The model that describes the e(olution of the4ueue lengths used in this paper is formulated in the paper of+aijema and an der /al %)/" $hich is modelled as a Mar#o(ecision Process in discrete time. The set of all flo$s is partitionedinto disjointed combinations of non-conflicting flo$s that $ill recei(egreen together.

    In principle" problems of this type can be sol(ed (ia dynamicprogramming. ynamic programming refers to a collection ofalgorithms that can be used to compute optimal policies gi(en amodel of the en(ironment" such as a Mar#o( ecision Process.ynamic programming computes the optimal (alue function bysol(ing the Bellman5s e4uation. The domain of the optimal (aluefunction is the state space of the system to be controlled. Thismeans that the number of (ariables 6(alue function7 to be stored ise4ual to the sie of the state space. /hen the state space is largethis dynamic programming computation becomes intractable. It is#no$n as the 8curse of dimensionality5. 9specially" $hen dealing$ith a multi-dimensional state space" its sie gro$s e!ponentially inthe number of state (ariables. This is also the case in traffic lightcontrol" due the fact that each 4ueue 6lane7 has actually an infinitebuffer.

    &ppro!imate dynamic programming intends to alle(iate the curse ofdimensionality by considering the appro!imation to the (aluefunction" the scoring function" $hich can be stored and computedefficiently. :ne of the considerations $ithin &ppro!imate ynamicProgramming is choosing the appro!imation architectures" the

    structure of the appro!imation to the (alue function. In this paper"the use of linear architectures is considered. & collection of

    _____________________________________________________________________- 0 -

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    4/34

    Page )

    functions that maps the system state space to real numbers 6thebasis functions7 is chosen and the scoring function can be generatedby finding an appropriate linear combination of these basis

    functions. +ence" it suffices to store the $eights assigned to each ofthe basis functions in the linear combination instead of storing the(alue function for each state in the system. The number of(ariables 6one per basis function7 to be stored is tremendouslysmaller than the number used by the (alue function $ith one (alueper state in the system.

    & successful use of appro!imate dynamic programming depends ona good choice of the basis functions and a good choice of $eightsassigned to each of the basis function in the linear combination. The

    dynamic programming problem can be recast as a linearprogramming problem. +o$e(er" this e!act linear programmingapproach also suffers from the curse of dimensionality. They ha(eas many (ariables as the number of states in the systems and atleast the same number of constraints. ombining the e!act linearprogramming approach $ith the linear appro!imation architectureleads to theApproximate Linear Programmingalgorithm 6&'P7.ompared to the e!act linear program that stores the optimal (aluefunction for each state in the system" the &'P has a much smallernumber of (ariables since that it has as many (ariables as thenumber of basis functions. There are t$o phases included in the &'P

    approach for a(erage cost dynamic programming. The first phasepriorities appro!imation of the optimal a(erage cost" but does notnecessarily gi(e a good appro!imation to the (alue function. Thesecond phase e!plicitly appro!imates the (alue function" $ithpresence of the so called state rele(ance $eights that is used forcontrolling the 4uality of the appro!imation to the optimal (aluefunction.

    Based on the pre-specified basis functions and state rele(ance$eights" it is obser(ed that the &'P algorithm did a good job inappro!imating the dynamic programming (alue functions. Itcorresponds to the determination of the s$itching scheme for thetraffic light control such that the number of cars in all flo$s isminimied in the long term.

    _____________________________________________________________________- ) -

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    5/34

    Page ;

    Management Summar# in &utc%

    n(loeddoor sturing (an het (er#eerslicht. it (erslag behandelt een#ruising (an t$ee t$eerichtingsstraten met controleerbare(er#eerslichten op el#e hoe# (an de straat. +et hoofddoel (an dit(erslag is om te $eten hoe de omscha#elingsregelingen (an het(er#eerslicht (oor dit systeem #unnen $orden bepaald" odanig dat

    het aantal auto?s in alle stromen op lange termijn $ordtgeminimaliseerd. +ier(oor $ordt de (oorgesteldebenaderingsmethode door e ,arias en an 3oy %1/" 8T$o-phase&ppro!imate 'inear Programming for a(erage cost dynamicprogramming5 toegepast. e auto?s #omen bij een #ruising aan diedoor een (er#eerslicht $ordt gestuurd en (ormen een rij. edynamische controle (an de (er#eerslichten is gebaseerd op hetaantal auto5s die in de rijen $achten. +et model dat de e(olutie (ande rijlengten beschrijft" is gebaseerd op de formulering (an +aijemaen an der /al %)/" dat gemodelleerd is als een Mar#o( ecisionProcess in discrete tijd. e stromen $orden (erdeeld in disjuncte

    combinaties (an conflict(rije stromen die samen groen licht #rijgen.

    In principe #an dit probleem $orden opgelost (ia dynamischeprogrammering. e dynamische programmering (er$ijst naar een(erameling (an algoritmen die gebrui#t #unnen $orden om eenoptimaal beleid te (inden" gege(en een Mar#o( ecision Processmodel. e dynamische programmering bere#ent de optimalewaardefunctiedoor Bellman5s (ergelij#ing op te lossen. +et domein(an de optimale $aardefunctie is de toestandsruimte (an hetsysteem. it ou bete#enen dat het aantal (ariabelen 6deopgeslagen $aardefunctie7 gelij# is aan de grootte (an detoestandsruimte. e dynamische programmering is niet meercomputationeel effici=nt $anneer de toestandsruimte (an hetsysteem groot is. it probleem $ordt meestal genoemd als de8curse of dimensionality.ooral" $anneer de toestandsruimtemultidimensionaal is" groeit ijn grootte e!ponentieel in het aantal(ariabelen. it is oo# het ge(al in het (er#eerslichtprobleem" doorhet feit dat el#e stroom een oneindige buffer heeft.

    Approximate Dynamic Programmingis bedoeld om het probleem(an de 8curse of dimensionality5 te (erminderen door het benaderen

    (an de $aardefunctie" die effici=nt #an $orden opgeslagen en(er#regen. 9@n (an de o(er$egingen binnen &ppro!imate ynamic

    _____________________________________________________________________- ; -

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    6/34

    Page A

    Programming is het #ieen (an benaderingsstructuur" de structuur(an de benaderende $aardefunctie. In dit (erslag $ordt de lineairearchitectuur ge#oen. /e #ieen een (erameling (an functies

    6basisfuncties7die de mapping (an toestandsruimte (an hetsysteem naar de re=le getallen geeft. e benadering (an de$aardefunctie 6scoring functie)#an $orden (er#regen door eengeschi#te lineaire combinatie (an de basisfuncties te (inden. +et isdus (oldoende om de $egingsfactor (an el#e basisfunctie in delineaire combinatie op te slaan in plaats (an het opslaan (an de$aardefunctie (oor el# toestand in het systeem. +et aantal(ariabelen is daardoor enorm #leiner ten opichte (an het aantal$aardefuncties met @@n $aarde per toestand in het systeem.

    9en succes(ol gebrui# (anApproximate Dynamic Programminghangt af (an een goede #eue (an de basisfuncties en een goede#eue (an de $egingsfactor (an el#e basisfunctie in de lineairecombinatie. +et dynamische programmering probleem #an alslineaire programmering 6LP ) probleem$orden herschre(en. Maardee lineaire programmering benadering lijdt oo# onder deogenaamde 8curse of dimensionality5 e hebben e(en(eel(ariabelen als het aantal toestanden in het systeem en minstenshetelfde aantal restricties.+et combineren (an de 'P met delineaire benaderingsarchitectuur leidt totApproximate LinearProgramming 6&'P7. e &'P heeft een (eel #leiner aantal (ariabelen

    dan de 'P omdat er o (eel (ariabelen ijn als het aantal ge#oenbasisfuncties. 9r ijn t$ee fasen in de benadering (an de &'P (oorgemiddelde #osten dynamische programmering. e eerste fase isgericht op het benaderen (an de optimale gemiddelde #osten" maarhet geeft niet altijd een goede benadering (oor de $aardefunctie.e t$eede fase benadert specifie# de $aardefunctie.

    Cebaseerd op de ge#oen basisfuncties en state relevance weights"de &'P algoritme doet het goed in het benaderen (an de$aardefuncties. it orgt (oor het bepalen (an deomscha#elingsregelingen (oor de (er#eerscontrole probleemdusdanig dat het aantal auto5s in alle stromen op lange termijngeminimaliseerd $ordt.

    _____________________________________________________________________- A -

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    7/34

    Page D

    Contents

    Preface.............................................................................................2

    Management *ummary in 9nglish...................................................0

    Management *ummary in utch.....................................................;

    ontents...........................................................................................D

    1 Introduction ...................................................................................1

    2 &ppro!imate ynamic Programming.............................................)

    2.1 Mar#o( ecision Processes.......................................................)2.2 &P $ith a linear appro!imation architecture...........................A

    0 &ppro!imate 'inear Programming for a(erage costs.....................D

    0.1 ,irst phase of the a(erage-cost &'P..........................................D0.2 *econd phase of the a(erage-cost &'P.....................................E0.0 *tate 3ele(ance /eights.........................................................1F

    ) Traffic light control ......................................................................11

    ).1 Basic notation and the modelling assumptions.......................11).2 Mar#o( decision problem formulation.....................................10

    ; The t$o-phase &'P approach for ,)2 .......................................1D

    ;.1 The t$o-phase &'P formulation...............................................1D;.2 3esults and e(aluation............................................................1G;.0 3educed 'inear Program.........................................................2)

    A onclusion....................................................................................2A 3eferences.....................................................................................2D

    _____________________________________________________________________- D -

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    8/34

    Page 1

    ' Introduction

    &s the number of road users and the need of transportationincreases" cities around the $orld face serious road trafficcongestion problems. Traffic jams ha(e become the e(eryday life5sritual for most of the people in the $orld. Traffic jams do not onlycause tremendous costs due to unproducti(e time losses they alsocause the increasing probability of accidents and ha(e a negati(eimpact on the en(ironment 6congestion $astes fuel and increasesair pollution due to increased idling" acceleration" and bra#ing7 and

    on the 4uality of life 6stress and frustration7. This leads to the4uestion about the possibility of controlling traffic flo$s in order toreduce the traffic jams.

    Intersections are places in the traffic net$or# $here many conflictscan occur potentially. These conflicts e!ist because an intersectionis a road area $here multiple traffic flo$s meet or cross. 3educingconflicts can be accomplished through a combination of efforts"including the careful use of the road infrastructure" comprehensi(etraffic safety la$s and regulations" sustained education of dri(ers"the $illingness among dri(ers to obey the traffic safety la$s" and

    traffic management. The traffic management around theintersection area is done by the traffic light control.

    In most countries" three-state traffic light is used %A/. The se4uenceis red" green and yello$ $hich means stop" go and prepare to stop"respecti(ely. The most control strategies found in practice arecyclic the order in $hich the groups of flo$s are ser(ed is fi!ed.There is a range of se(eral logical policies by $hich the traffic lightcan be controlled. The most basic policy can be classified accordingto the follo$ing characteristics %)/H

    ixed-time6,7 control. In this form of operation" not only the orderis fi!ed but the red" yello$" and green light indications are timed atfi!ed inter(als. ,i!ed cycle controllers are best suited forintersections $here traffic (olumes are predictable" stable" andfairly constant.

    nli#e the ," intersections $ith traffic-responsivecontrol consist ofactuated traffic controllers and (ehicle detectors placed on the lanesapproaching the intersection. This form of control ma#es use of real-time measurements. The length of the green inter(al can be

    lengthened or shortened based on the present (olume of the traffic.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    9/34

    &t a hectic intersection" the green inter(al $ould be lengthened" orone gets a green period on in4uiry.

    nder exhaustive6J+7 control1 the green signals $ill be #ept untilall flo$s that ha(e right of $ay are 8e!hausted5 6empty7. The cyclic

    (ariant of e!hausti(e control is abbre(iated by J+. The alternati(eform of e!hausti(e control is anticipative exhaustivecontrol J+617and J+627" $hich anticipates departures during 1 and 2 yello$slots" respecti(ely. In other $ords" the green periods $ill be #eptuntil the number of cars at each flo$ in the combination that hasright of $ay is at most one and t$o in J+617 and J+627"respecti(ely.

    !solated controlis applicable to single intersections. The signals areoperated $ithout consideration of any adjacent signals. In such acase" each intersection $ill ha(e a signal control that is mostappropriate for that single intersection. :n the contrary"coordinated controlconsiders an urban one or e(en a $holenet$or# comprising many intersections.

    +o$e(er" the annual toll of accidents due to motor (ehicle crasheshas not substantially changed in more than 2; years despiteimpro(ed intersection infrastructures and more sophisticatedapplication of traffic engineering measures. &s mentioned in %D/1installing signals do not al$ays ma#e intersections safer. Theinstallation of signals that operate improperly can create situations

    $here o(erall intersection congestion is increased" $hich in turn cancreate aggressi(e dri(ing beha(ior. ri(ers tend to becomeimpatient and (iolate red lights $hen the traffic light control causeslonger $aiting times at intersections. This subjects local residents toa greater ris# of collisions" $orse congestions and more air andnoise pollution. +ence" appropriate traffic control decisions arere4uired.

    This paper focuses on the dynamic control at a signalied singleintersection of t$o t$o-$ay streets $ith isolated control. Thiscontrol problem can be formulated as a Mar#o( ecision Process

    6MP7 for a stochastic dynamical system $ith the a(erage costcriterion. The state of system e(ol(es under uncertainty and ase4uence of decisions has to be made. Based on the real-timesituation" i.e." the number of cars $aiting in the 4ueues" a decisionhas to be made as to $hich set of flo$s has right of $ay and thesedecisions ha(e a long term effect. The current action determines ane$ configuration that determines $hich configurations may bereached in the future. & solution to a Mar#o( decision problem is apolicy6a mapping from a state to an action7 that determines statetransitions to minimie the a(erage cost" $hich is the long-run

    number of cars in all flo$s. To be able to deri(e the optimal policies"a function defined on the state space called the (alue function is

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    10/34

    re4uired to be computed and stored. ,or most problems of practicalinterest" the state space is e!tremely large so that computing andstoring the optimal (alue function re4uire a lot of time. This largespace ma#es the dynamic programming computationallyintractable.

    Because of the computational comple!ity of sol(ing the dynamicprogram" much effort has been put into finding alternati(e learningalgorithms. ,or instance" +aijema and an der /al %)/presented anapproach to smoothen the traffic flo$ that starts from a 6nearly7optimal fi!ed cycle strategy and e!ecutes one policy impro(ementstep that leads to a dynamic control strategy.

    &nother approach is &ppro!imate ynamic Programming 6&P7. Thisapproach is about finding a good appro!imation to the (aluefunction. This paper applies the linear programming approach toappro!imate dynamic programming proposed by e ,arias and an3oy %1/ for sol(ing the traffic control problem that is formulated as adiscrete time MP. This approach considers the a(erage costcriterion and a (ersion of the appro!imate linear program thatgenerates appro!imations to the optimal a(erage cost and (aluefunction.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    11/34

    - Approximate namic Programming

    Mar#o( decision processes 6MP5s7 pro(ide a mathematicalframe$or# for modelling decision ma#ing in situations underuncertainty. Ci(en a model of the en(ironment as a Mar#o( decisionprocess" dynamic programming can be used to compute the optimalpolicies. This chapter gi(es a description of ho$ dynamicprogramming offers a solution to the problem of minimiing thea(erage cost o(er a finite horion. Moreo(er" $e discuss ho$ thecurse of dimensionality affects the dynamic programming algorithm.,urther" the main ideas in appro!imate dynamic programming $illbe presented.

    -2' Marko! &ecision Processes

    ynamic programming offers a solution to problems in(ol(ingse4uential decision ma#ing in systems $ith non-linear" stochasticdynamics. *ystems in this setting are described by a set of (ariablese(ol(ing o(er time K the state variables.The state (ariables ta#e(alues in the state space of the system" $hich is the set of allpossible states the system can be in. The main idea in dynamicprogramming is that an optimal decision can be deri(ed based on

    the score assigned to each of the states in the system K the valuefunction. The optimal (alue function obtained from dynamicprogramming captures the ad(antage of being in a gi(en staterelati(e to being in all other states.

    onsider discrete-time stochastic control problems in(ol(ing a finitestate space Sof cardinality NS =|| . ,or each state Sx " there arepossible actions that can be chosen &!. In a gi(en statex$henaction ais ta#en" a cost of ),( axc is incurred. The transition

    probabilities ( )yaxp ,, " for each state pair ( )yx, and action a &!"represent the probability that gi(en action a $hile being in statex "the ne!t state $ill bey .

    &policyis a mapping from a state to an action. nder policyu " thesystem follo$s a Mar#o( process $ith transition probabilities ( )yxpu ,

    .

    The solution to a Mar#o( ecision Process can be e!pressed as apolicyu " $hich gi(es the action to ta#e for a gi(en state" regardlessof the prior history. 'et tx denote the random (ariable for the state

    that the system is in at time tand )( tu xc denotes the correspondingcost $hen policy uis ta#en. Then" it is $ell #no$n that there e!ists a

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    12/34

    policy u such that

    =

    =

    xxxcET

    T

    t

    tu 0

    1

    0

    |)(1

    " as "goes to infinity" is

    minimied simultaneously for all states and the aim is to identifythat policy.

    +ere" the Mar#o( process is assumed to be irreducible for each pair

    of state ( )yx, and each policy u " there is a tsuch that ( ) 0, >yxPtu . Inother $ords" it is possible to get to any state from any state. This

    implies that" for each policyu " the limit

    =

    =

    xxxcET

    T

    t

    tuT

    0

    1

    0

    |)(1

    lim

    e!ists and the a(erage cost is independent of the initial state in thesystem.

    enote the optimal a(erage cost by uu

    gg min* = . The optimal policy

    $ith the a(erage cost criterion can be deri(ed from the solution of#ellmans e$uation1

    ( ) ( ) ( ) ( )

    +=+

    yAa

    yVyaxpaxcxVgx

    ,,,min .3-4'5

    $here ( )V denote (alue function. The interpretation of ( )xV is thedifference in accrued cost $hen starting the process in statexrelati(e to a reference state. Bellman5s e4uation can be formulatedin terms of matrices as follo$s.

    VPcVeguu **

    . +=+ "3-4-5

    $here e is a (ector $ith 1 as entries" *uc and *uP are (ectors of thecosts and the transition probabilities based on the optimal policy"respecti(ely.

    enote the solution of Bellman5s e4uation by pairs ( )** ,Vg . &nalternati(e method for deri(ing %is so called policy iteration. Thisalgorithm starts $ith considering a policy L. The corresponding

    ( )Vg, can be obtained by sol(ing the Bellman5s e4uationH

    ( ) ( ) ( )+=y

    yVyxxpxxcx ),(,)(,)( . 3-465

    To impro(e the policy L" ta#e

    ( ) ( ) ( )

    += yAa

    yVyaxpaxcx ,,,minarg)('

    3-475

    in each state. The corresponding ( )

    ''

    ,Vg can be obtained by againsol(ing the Bellman5s e4uation. The impro(ement can be again

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    13/34

    obtained by sol(ing 6 2 -)5based on 'V . This iteration $ill becontinued until the minimum is attained for each state. ote thatthe (alue function for e(ery state has to be stored in memory ine(ery step. Therefore" the applicability of dynamic programming isse(erely limited. The domain of the optimal (alue function is the

    state space of the system to be controlled. This means that thenumber of (ariables 6(alue function7 to be stored and computed ise4ual to the sie of the state space. /hen the state space is largethis dynamic programming method becomes computationallyintractable. 9specially $hen dealing $ith multi-dimensional statespace" its sie gro$s e!ponentially in the number of state (ariables.This problem is called curse of dimensionality.

    -2- A&P 8it% a linear approximation arc%itecture

    To alle(iate the curse of dimensionality" the problem is sol(ed by

    finding an appro!imation to the (alue function" KSV :~

    " called

    the scoring function. The underlying assumption is that the (aluefunction has some structure such that a reasonable appro!imatione!ists.

    By using the linear appro!imation architecture" the scoring functionis generated $ithin a parameteried class of functions. It maps thesystem state space to the set of real numbers. onsider a gi(en set

    of basis functions KiS

    i

    ,,1,: =" the scoring functions arerepresented as linear combinations of the basis functionsH

    ( ) ( )=

    =K

    i

    ii rxrV1

    ~

    ., 3-495

    Imagine that the pre-selected basis functions are stored as columnsof matri! KS " and each ro$ corresponds to the basis functionse(aluated at a different state x .

    =

    ||

    ||1 K

    . 3-4:5

    o$ the optimiation problem is formulated and analyed as an

    optimiation problem for computing the $eights Kir . +ence" itsuffices to store the $eights assigned to each of the basis functionsin the linear combination instead of storing the (alue function foreach state in the system. The number of (ariables 6one per basisfunction7 to be stored is tremendously smaller than the number

    compared to the (alue function $ith one (alue per state in thesystem.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    14/34

    6 Approximate Linear Programming fora!erage costs

    & successful use of appro!imate dynamic programming depends ona good choice of the basis functions and a good choice of $eightsassigned to each of the basis functions in the linear combination. &study to the optimal selection of basis functions is out of the scopeof this paper. Therefore" $e assume that the set of basis functions ispre-specified" and that the focus is on finding an appropriateparameter (ector Kr " gi(en a pre-selected set of basis functions.

    It is #no$n that the dynamic programming problem can be recast as

    a linear programming problem. +o$e(er" this e!act linearprogramming approach also suffers from the curse ofdimensionality. They ha(e as many (ariables as the number ofstates in the system and at least the same number of constraints.ombining the e!act linear programming approach $ith the linearappro!imation architecture leads to the approximate linearprogrammingalgorithm 6&'P7. ompared to the e!act linearprogram that stores the optimal (alue function for each state in thesystem" the &'P has a much smaller number of (ariables since it hasas many (ariables as the number of basis functions. In the ne!tsections" the t$o-phase &'P approach for a(erage costs isdescribed. The first phase of the a(erage-cost &'P prioritiesappro!imation of the optimal a(erage cost" but does not necessarilygi(e a good appro!imation to the (alue function. The second phasee!plicitly appro!imates the (alue function.

    62' "irst p%ase of t%e a!erage4cost ALP

    3ecall the Bellman5s e4uation 6see 94uation3 - 4'57. It can besol(ed by the a(erage cost 9!act 'inear Programming 69'P7H

    ( ) ( ) ( ) ( ) .,,,,min..

    max,

    xyVyaxpaxcxVgts

    g

    yAa

    Vg

    x

    ++

    364.5

    The problem is translated in a ma!imiation of the a(erage cost that$ould be subject to ine4ualities of the form NO $hich correspondsto upper bounds. ote that the constraints are non-linear" eachconstraint in(ol(es a minimiation o(er the possible actions. Buteach constraint can be decomposed into &! constraints. Therefore"

    problem 6 0 -D7can be seen as a 'inear Programming described by6 0 -E7.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    15/34

    ( ) ( ) ( ) ( ) .,,,,,..

    max,

    axyVyaxpaxcxVgts

    g

    y

    Vg

    ++ 364)5

    This results in a total of ||S &!Q1 constraints" $hich isunmanageable if the state space is large. The combination of thee!act linear programming and the linear appro!imation architectureleads to the first phase ALPdescribed by 6 0 -G7.

    ( ) ( ) ( ) ( ) .,,,,,..

    max,

    axyryaxpaxcxrgts

    g

    y

    rg

    ++ 364;5

    enote the solution of the first phase &'P by ( )11

    , rg . ote that the

    ma!imiation problem in 6 0 -G7is e4ui(alent to minimiing || 1* gg .

    *ince the first phase &'P corresponds to the e!act 'P 6 0 -D7$ith thee!tra constraint rV = " the solution to the first phase &'P is limited"

    *

    1 gg for all feasible 1g . This implies that the first phase &'P can beseen as an algorithm to appro!imate the optimal a(erage cost.

    ompared $ith the 9'P that stores the optimal (alue function foreach state in the system" the &'P has a much smaller number of(ariables since that it has as many (ariables as the number of basisfunctions plus one. +o$e(er" the &'P has still as many constraintsas the number of state-action pairs.

    62- Second p%ase of t%e a!erage4cost ALP

    It turns out" from an e!ample gi(en in the paper %1/" that e(enthough the first phase &'P produces a good appro!imation to theoptimal a(erage cost" it can produce arbitrarily bad policies. Themain problem is that the algorithm of the first phase &'P has priorityto appro!imate the optimal a(erage cost" but it does not necessarily

    yield a good appro!imation to the optimal (alue function. +ence" at$o-phase a(erage-cost &'P is proposed in $hich the first phase issimply the first phase of the a(erage-cost &'P introduced in *ection0.1. In the first phase" the appro!imation to the optimal a(eragecost is generated" $hile in the second phase the focus is on theappro!imation to the optimal (alue function.

    The second phase of the &'P is formulated as follo$s.

    ( ) ( ) ( ) ( ) .,0,,,,..

    max

    2 axyryaxpaxcxrgts

    rc

    y

    T

    r

    ++

    364'(5

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    16/34

    The parameters that ha(e to be pre-specified are the staterele(ance $eights cRF and 2g . enote the optimal solution of thesecond phase of the a(erage-cost &'P by r&. In e ,arias and ander 3oyLet r-,e t%e optimal solution to t%e t8o4p%ase ALP2 It

    minimizes cg rV ,12 o!er t%e feasi,le region of t%e t8o4

    p%ase ALP2

    Proof> T%e normc,1

    . is defined ,#

    =Sx

    c xVxcV |)(|)(

    ,1 2

    Maximizing rcT is equi!alent to minimizing )(2

    rVc gT 2 It is

    8ell kno8n t%at for all V1 VegcPI )()( 2

    1**

    1 8e %a!e

    2gVV

    2 *ence1 an# rt%at is a feasi,le solution to t8o4p%ase ALP

    pro,lems satisfies2g

    Vr 2 It follo8s t%at

    ,|)()(|)(222 ,1

    rcVcxrxVxcrV T

    g

    T

    Sx

    gc

    g ==

    and maximizing rcT is t%erefore equi!alent to minimizing

    cg rV

    ,12 2

    +ence" any fi!ed choice of 2g " that satisfies *

    2 gg " there is bound

    ( ) ( ) ePIcggrVrV Tc

    gc

    1

    2

    *

    ,12

    ,12

    **

    2

    +

    .364''5

    The t$o-phase &'P minimies the upper bound on the normc

    rV,1

    2

    * of the error in the appro!imated (alue function. The

    state rele(ance $eight c determines ho$ errors o(er differentregions of the state space are $eighted $hen appro!imating theoptimal (alue functions" and can be used for specifying the trade-offin the 4uality of the appro!imation across different states.Therefore" to generate a better appro!imation in a region of thestate space one can assign relati(ely larger $eights to that region.To ha(e some clue on ho$ to choose the appropriate staterele(ance $eights c" performance bounds $ill be pro(ided in thene!t section.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    17/34

    626 State =ele!ance ?eig%ts

    & bound on the performance of greedy policies associated $ithappro!imate (alue functions $ere presented in e ,arias and an3oy

    "or all V1 let Vg and V denote t%e a!erage cost and t%e

    stationar# state distri,ution of t%e greed# polic# associated8it% V2 T%en1 for all V suc% t%at VV@1

    .,1

    **

    V

    VVggV

    +

    Proof>+ote t%at t%e a!erage cost associated 8it% V is gi!en ,#

    V

    T

    VV cg = andT

    VV

    T

    VP = is !alid for t%e stationar# state

    distri,ution2 Vc and VP denote t%e costs associated 8it% t%e

    greed# polic# 8it% respect to V2 Vg can ,e formulated as

    )( VVPccg VVT

    VV

    T

    VV +== 2 +o8 if VV@1 t%en

    V

    VVegVVeg

    VVeg

    VVPcVVPc

    T

    V

    T

    V

    VV

    T

    VVV

    T

    V

    ,1

    ****

    **

    *

    .)(.

    ).(

    )()( **

    +=+=

    +=

    ++

    The performance bound described inTheorem 2gi(es an alternati(efor selecting state rele(ance $eights. :ne approach is to select thestate rele(ance $eight corresponding to the stationary statedistribution associated $ith the greedy policy. It seems logical" sincethe aim is to ha(e a good appro!imation to the (alue function"importantly" thus the states that are (isited more often need to beappro!imated better. :ne difficulty $ith obtaining the stationarystate distribution is that one should #no$ the optimal policybeforehand and the problem is finding the optimal policy yet. Itsuggests an iterati(e scheme using in each iteration the $eightscorresponding to the stationary state distribution associated $ith

    the policy generated by the pre(ious iteration.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    18/34

    7 Traffic lig%t control

    The aim of this paper is to sol(e the traffic light control problem $iththe &'P approach described in hapter 0. In this chapter" the basicnotion and the modelling assumptions $ill be introduced.,urthermore" the problem $ill be formulated as a Mar#o( ecisionProblem.

    72' $asic notation and t%e modelling assumptions

    onsider a simple intersection of t$o t$o-$ay streets" ,)2" $hichis illustrated in ,igure ) .1. The flo$s are numbered cloc#$ise. arsthat arri(e at one of the lanes go either straight crossing theintersection or ma#e a left turn. The set of ) flo$s is partitioned into2 disjoint subsets" 1and 2. & subset of flo$s is called acombination. T$o compatible flo$s can safely cross the intersectionsimultaneously" else they are called antagonistic. The combinationsare fi!ed" and they are chosen such that there is a conflict-freeintersection. The flo$s 1 and 0 are considered as 1and the flo$s 2and ) constitute 2. ,lo$s in the same combination $ill al$ays ha(ethe same light indication at the same time. /hen one combinationhas green or yello$ indication" another combination has red

    indication.

    "igure 72'> T8o t8o48a# streets intersection1 "7C-1 8%ic%ser!e 7 flo8s in - s#mmetric com,inations2

    ,or the sa#e of simplicity" the problem is formulated in discretetime. Time is di(ided into slots. This time unit 'slot)is ta#en to be

    C2

    C

    -

    C'

    C'

    6

    -

    '

    7

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    19/34

    the time a car needs to cross the intersection $hen the light isgreen or yello$. +aijema and an der /al %)/assumed this timeunit as being t$o seconds.

    To a(oid interference bet$een antagonistic streams of consecuti(e

    slots $hen s$itching from a green indication for one combination toa green for a different combination" a switching timeis necessary.The s$itching time is chosen to be fi!ed and it ta#es 0 slots 2 slotsof yello$ and 1 slot in $hich all flo$s ha(e a red indication. &s ane!ample" a cycle for the light indication for the flo$s is sho$n in,igure ) .2. The flo$s 1 and 0 get a green indication during 0 slotsand the flo$s 2 and ) get a green indication during 1 slot. Thes$itching time ta#es 0 slots. +ence" the duration of the cycle is 1Fslots.

    "igure 72-> 0xample of lig%ts indication diagram2

    The arri(als in different flo$s and in different slots are independent.It is reasonable to assume that the number of car arri(als in one slotis either F or 1 per flo$. /e denote the arrival ratein one slot by fq

    for flo$ f .

    In each flo$ that has right of $ay 6ha(ing a green or a yello$indication7" e!actly one car can pass the stopping line in one slot. & carthat arri(es at an empty 4ueue that has right of $ay passes thestopping line $ithout delay.

    The state of the process is obser(ed in each slot. The decisionepochs are as follo$s. e$ arri(als ta#e place at the beginning ofthe slot" $hich is after the obser(ation of the state of the process.

    epartures ta#e place at the end of the slot prior to the obser(ationof the ne$ state. +ence $hen a flo$ has right of $ay and a car

    1

    2

    0

    )

    slots

    green

    yello$ red

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    20/34

    arri(es in the certain slot" the state of the flo$ remains the same forthe ne!t slot.

    72- Marko! decision pro,lem formulation

    &s stated abo(e" the problem is considered as a discrete-timestochastic control problem. The Mar#o( decision problemformulation consists of the specification of the states space" thedecision space 6in each state there are se(eral actions from $hichthe decision must be chosen7" the transition probabilities" and thecost function.

    72-2' States

    The state of the system is represented by t$o (ectors" onerepresents the state of the traffic and the other represents the stateof the lights. The state of the traffic is fully described by a (ector

    ( )4321 ,,, kkkkk= " $ith fk the number of cars in flo$ f present atthe beginning of a slot. ,urther" the state of the light is described by(ector ( )ilx ,= $ith { }2,1l the combination $hich is ha(ing a green( )0=i " a first yello$ ( )1=i " a second yello$ ( )2=i " or a red ( )3=i light. The state of the light is fully described by x " because $henone combination of the flo$s has right of $ay 6ha(ing a green or ayello$ indication7" the other flo$s all ha(e a red indication. +ence"

    the states are denoted by the (ector ( ) ( )),(),,,,(, 4321 ilkkkkxk = "intotal a A-dimensional (ector.

    72-2- &ecisions

    In each state there are se(eral actions from $hich decision must bechosen. The decisions depend on the state of the traffic lights butalso on the lengths of the 4ueues. The possible decisions in the(arious situations are described as follo$s.

    If all lights are red" the possible decisions are to #eep all lightsred" or to gi(e a green indication to one of the combinations.

    If the lights are green for one combination" there are t$opossible decisionsH #eep the lights as they are or change fromgreen to first yello$.

    &t the end of a first yello$ slot there is only one decisionHcontinue to the second yello$ slot.

    &fter the second yello$" the only decision is to change into redfor all flo$s.

    +ence the decision space" denoted by & ( )( )ilk ,, (is

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    21/34

    & ( )( )

    ( ) ( ){ }

    ( )

    ( )( ){ }

    =

    =+

    =

    =

    3,0,,3,

    2,1,1,

    0,1,,0,

    ,,'

    iifll

    iifil

    iifll

    ilk "374'-5

    $ith 'l the ne!t non-empty combination. ecisions are ta#en at thebeginning of a slot and e!ecuted instantaneously. Thus" if acombination has right of $ay" cars of that combination can lea(e inthe (ery same slot.

    72-26 Transition pro,a,ilities

    Ci(en a state ( )xk, " the chosen action a implies an instantaneouschange of lights from state x into statea " due to the fact that thechosen action is part of the state. +ence" the transition probability

    from state ( )xk, to state''

    ,xk is F unless ax ='

    . The transitionprobabilities" denoted by ( )akxkp ,;, ' " are best described byconsidering each flo$ separately. 'et ( )',, fff kakp denote thetransition probability for the number of cars in flo$ f $hen actiona is ta#en. *ince the flo$s are independent" the transitionprobabilities are simply the product of transition probabilities foreach flo$.

    ( ) ( )=

    =4

    1

    '' ,,,,,f

    ffff kakpakxkp . 374'65

    If by action a flo$ f has right of $ay during the coming slot" thetransition probabilities per flo$ are gi(en by

    0,,,;11,, >== ffffffff kqkakpqkakp

    ( ) 10,,0 =apf .374'75

    &nd if action a implies red for flo$ f " then

    0,1,,;1,, =+=

    kqkakpqkakp ffffffff . 374'95

    The transition probabilities for the number of cars in flo$ f areillustrated in ,igure ) .0and ,igure ) .).

    "igure 726> Illustration of t%e transition pro,a,ilit# for t%enum,er of cars in flo8 f8%en flo8 f %as rig%t of 8a#2

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    22/34

    "igure 727> Illustration of t%e transition pro,a,ilit# for t%enum,er of cars in flo8 f8%en t%e state of lig%t of flo8 fisred2

    72-27 Costs

    The aim is to minimie the o(erall a(erage $aiting time per car.

    Based on Littles Law" this corresponds to minimiing the a(eragenumber of cars $aiting at the 4ueues. Therefore" a linear costfunction is considered $here one unit of costs is accounted for e(erycar present at the beginning of a slot. +ence" the cost function"denoted by ( )kc " is gi(en by

    ( ) =

    =4

    1f

    fkkc . 374':5

    72-29 Counta,le state spaces

    To easily compute the performance measures 6and the optimalpolicy7" the state space has to be countable. The state space can bereduced to be a finite one by limiting the number of cars that can bepresent in each flo$. The ma!imum state N in each flo$ becomesone of the parameters in the system. Thus" the arri(als to a 4ueue$hich is full $ill be rejected. This stochastic control problemin(ol(es a finite state space S of cardinality 42|| 4 = NS " $herethe state of the system is a A-dimensional (ector. The transitionprobabilities should be changed $ith respect to fq such that there

    are no transitions from state N to 1+

    N . The transition probabilitiesper flo$ in the finite state space are defined as follo$s. If during the

    ( ' - 6

    1-q

    f

    1-q

    f

    1-q

    f

    1

    q

    f

    q

    f

    q

    f

    1-q

    f

    ( ' - 6

    q

    f

    1-q

    f

    1-q

    f

    1-qf

    1-q

    f

    q

    f

    q

    f

    q

    f

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    23/34

    coming slot flo$ f has right of $ay" the transition probability isdefined as follo$s"

    ( ( Nkqkakpqkakp ffffffff

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    24/34

    9 T%e t8o4p%ase ALP approac% for "7C-

    &fter defining the model" the &'P formulation for this problem canbe determined. The control strategy used in this paper is cyclic. Theimplementation of the approach is done in the follo$ing steps.

    1. ,ind the optimal a(erage cost based on the first phase of thet$o-phase &'P" denoted by 1g . The greedy policies 1associated

    $ith 1g can be obtained. The first phase of the t$o-phase &'P isdone gi(en the predefined basis functions.

    2. 9(aluate the policies 1by simulating the state of each flo$ for #slots" say ;FFFF" and determine the state of the lights based on

    the greedy policies 1. Then" the a(erage cost can be computedby ta#ing the a(erage of the sum of the number of cars in all

    flo$s" denoted by*1g . It is e!pected that 1*1 gg because the

    first phase of the t$o-phase &'P does not necessarily gi(e a goodappro!imation to the (alue function $hich leads to non-optimalpolicies 1.

    0. Impro(e the policies 1by using the second phase of the t$o-phase &'P approach" gi(en the appro!imation of the optimala(erage cost 1g " the predefined basis functions" and the state

    rele(ance $eights. The ne$ policies obtained $ill be denoted by

    2 . The a(erage cost obtained $ill be denoted by *2g .

    ). 9(aluate the ne$ policies 2 by simulation as in step 2. It is

    e!pected that1*2*1

    ggg .

    The steps are further discussed in the ne!t section. The arri(al rates$ill be (aried to compare the approach $ith the other strategies.

    The results obtained are reported in *ection ;.2.

    92' T%e t8o4p%ase ALP formulation

    To ensure uni4ueness" assume state 6F"61"F77 acts as a referencestate by ta#ing 6F"61"F77SF. The first-phase &'P is gi(en by

    ( ) ( ) ( ) ( ) .),,(,,,,,,..max

    ''

    ,

    axkakrakxkpkcxkrgts

    g

    y

    rg

    ++ 394';5

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    25/34

    enote the solutions by 6 11 , rg 7. The basis functions that are usedare

    },4,3,2,1{,;

    ,

    ,10

    =

    =

    =

    bakk

    k

    baab

    aa

    394-(5

    $here bk is the number of cars in flo$ b. The (alue function isappro!imated by a second order polynomial that includes terms thatcorrelate different flo$s $ith each other" $ith different $eights foreach state of light ( )il, . +ence" the (alue function is ( )( )ilkV ., appro!imated by ( )( )ilkr ., " $hich is

    ( )( )

    .

    ,,

    43

    ),(

    3442

    ),(

    2432

    ),(

    2341

    ),(

    1431

    ),(

    1321

    ),(

    12

    2

    4

    ),(

    44

    2

    1

    ),(

    114

    ),(

    41

    ),(

    1

    ),(

    0

    kkrkkrkkrkkrkkrkkr

    krkrkrkrrilkrilililililil

    ililililil

    +++++

    +++++++= 394-'5

    The constant term )0,1(0r SF" since ( )( ) 00.1,0 =V is chosen. There arein total 1;E S '-((ariables. In this case" the number of (ariablesdoes not change as the number of states gro$s. The linearprogramming is sol(ed by using the Premium olver Platform for*xcel($hich is able to sol(e linear programming problems $ith anumber of (ariables up to E"FFF. /hile the &'P may in(ol(emanageable number of (ariables" the number of constraints is stillas many as the number of state-action pairs. Therefore" to be able

    to sol(e the &'P problem" the largest state on each flo$ has to belimited to be able to include all the state-action constraints forsol(ing the 'P problem. The ma!imum of the largest state on eachflo$ is ) corresponding to D";FF constraints based on the MPformulation. The optimal policy 1associated $ith 1g can begenerated by ta#ing

    ( )( )

    ( ) ( )

    = '

    ,,;,minarg, '

    1

    '

    ),(,1

    kilkAa

    akrakxkpxk . 394--5

    In order to chec# $hether the first phase of the t$o-phase &'Pyields a good appro!imation for the (alue function" simulation of thestates $ill be done for ;FF"FFF slots. The second phase of the t$o-phase &'P is done by sol(ing linear problem 6 ; -207.

    ( ) ( ) ( ) ( ) axkakrakxkpkcxkrgtsrc

    y

    T

    r

    ),,(,,,,,,..

    max

    ''

    2 ++

    394-65

    &n ob(ious choice for 2g is 1g " the estimate for the optimal a(erage

    cost obtained from the first phase &'P. The solution to the secondphase &'P is r&. It is aimed to control the accuracy of the

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    26/34

    appro!imation to the cost function o(er different portions of thestate space. &s described inTheorem 1" ma!imiing rcT is

    e4ui(alent to minimiing )(2

    rVc gT 1$hich is the sum the errors of

    the appro!imation to the cost function $eighted by c for balancing

    accuracy of the appro!imation o(er different states.Based onTheorem 2" the state rele(ance $eights can be chosencorresponding to the stationary state distribution. It may suffice touse rough guesses about the stationary state distribution in somecases. Thus" the state rele(ance $eights c are chosen in the form

    ( )( ) ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( )

    =

    == ++

    ++

    ,2,11

    ,1,11,,

    3142

    4231

    3

    2

    3

    21

    3

    2

    3

    21

    lif

    lifilkc

    kkkk

    iia

    kkkk

    iia

    394-75

    $here

    ( )( ).,,),(,

    =ilk

    ilkca394-95

    To ma#e sure that the sum of state rele(ance $eights is 1" ( )( )ilkc ,,

    is multiplied by a1 .

    92- =esults and e!aluation

    92-2' S#mmetric arri!al rates

    In this section" the results for (arying arri(al rates at a symmetric,)2 intersection are presented in $hich all flo$s ha(e identicalarri(al rates per slot. The a(erage cost 1g resulted by the first

    phase &'P and the a(erage cost*1g and *2g yielded by the

    greedy policy $ith respect to 1r and 2r " respecti(ely" are gi(enin theTable ; -1belo$. The second phase &'P is done for the state

    rele(ance $eights c$ith iSF.GG for all i. &s obser(ed" the resultsfrom the second phase &'P do not differ much from the first phase&'P.

    Ta,le 94'> T%e a!erage cost in slots for t%e s#mmetric "7C-2

    'argest state ) ) )

    fq F.2 F.0 F.)

    1g 1.EA 0.22 ).GA

    *1g 2.FF ).FF A.EA

    *2g 2.FF 0.GG A.E;

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    27/34

    The coefficients of the linear combination of the basis functions forappro!imating the (alue function are gi(en belo$. There are se(eralremar#able obser(ations regarding the coefficients. The coefficientsr+,and r&for all ( )il, are ero. This means that the multiplication of

    the number of flo$s from different combinations does not ha(eadded (alue to the appro!imation of the (alue function.

    Ta,le 94-> T%e 8eig%ts of t%e linear com,ination of t%e ,asis

    functions for t%e s#mmetric "7C-1 8it% 2.0=fq f2

    Coefficients

    l' l-i( i' i- i6 i( i' i- i6

    rF F.FF F.0) 1.1D F.10 F.FF F.0) 1.1D F.10r1 F.DA 1.FF A.GG ;.DA ).DF 0.0; 2.F1 F.A2r2 ).AD 0.0; 2.F1 F.A2 F.DE F.GE D.FA ;.E)

    r0 F.A0 F.GE D.FA ;.E; ).AD 0.0A 2.F2 F.A1r ).DF 0.0A 2.F2 F.A1 F.A1 1.FF A.GG ;.DAr++ F.A) 1.)1 F.FF F.10 F.2A F.)) F.;; F.AAr&& F.2G F.)) F.;A F.AD F.A0 1.); F.F) F.1Dr,, F.A; 1.)) F.F) F.1D F.2G F.)0 F.;; F.AAr F.2A F.)0 F.;) F.A; F.A; 1.)1 F.FF F.10r+& F.FG F.FF F.FF F.FF F.FF F.F2 F.FG F.FGr+, F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr+ F.F1 F.F) F.21 F.22 F.22 F.FF F.F1 F.F1r&, F.FF F.F1 F.FD F.FD F.FD F.FF F.FF F.FF

    r& F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr, F.2F F.FF F.F1 F.F1 F.F1 F.F) F.1G F.2F

    Ta,le 946> T%e 8eig%ts of t%e linear com,ination of t%e ,asisfunctions for t%e s#mmetric "7C-1 8it% 3.0=fq f2

    Coefficients

    l' l-i( i' i- i6 i( i' i- i6

    rF F.FF F.)0 1.AE F.FF F.FF F.)0 1.AE F.FFr1 F.GE 1.G1 D.21 A.AF ;.DG ).0E 2.E1 F.GEr2 ;.;) ).0G 2.E2 F.GE F.GE 1.EG A.GE A.0Fr0 F.AD 1.ED A.E0 A.10 ;.0G ).11 2.;0 F.AD ;.A) ).FG 2.;2 F.AD F.AD 1.GF D.FA A.))r++ F.E2 1.2E F.FF F.FF F.10 F.)F F.AF F.E2r&& F.1) F.)F F.A1 F.E2 F.E2 1.2; F.FF F.FFr,, F.E0 1.22 F.FF F.FF F.1) F.)2 F.A2 F.E0r F.10 F.)2 F.A2 F.E0 F.E0 1.2; F.FF F.FFr+& F.)1 F.F2 F.FA F.FD F.FE F.FG F.02 F.0Er+, F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr+ F.FA F.FD F.2A F.0F F.02 F.F1 F.F; F.FAr&, F.2F F.FG F.00 F.0E F.)1 F.F) F.1A F.1E

    r& F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr, F.0; F.F) F.1A F.1G F.2F F.FE F.2E F.00

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    28/34

    Ta,le 947> T%e 8eig%ts of t%e linear com,ination of t%e ,asisfunctions for t%e s#mmetric "7C-1 8it% 4.0=fq f2

    oefficients

    lS1 lS2iSF iS1 iS2 iS0 iSF iS1 iS2 iS0

    rF F.FF F.AF 2.)F F.1) F.FF F.AF 2.)F F.1)r1 1.2F 2.2A D.1G A.D; A.AF ;.FF 0.21 1.F0r2 A.DG ;.2F 0.0G 1.1A 1.2; 2.02 D.0) A.G0r0 1.11 2.0) D.0E A.GA A.E0 ;.22 0.)F 1.11r A.A) ;.F0 0.22 F.GE 1.FD 2.2D D.22 A.DGr++ F.G2 1.2) F.FF F.FF F.FF F.00 F.A0 F.G;r&& F.FF F.00 F.A0 F.GA F.G; 1.2; F.FF F.FFr,, F.GE 1.2; F.FF F.FF F.FF F.0) F.A) F.GEr F.FF F.0) F.A) F.GD F.G; 1.2) F.FF F.FFr+& F.1F F.1A F.21 F.2A F.2G F.FA F.FD F.FG

    r+, F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr+ F.1; F.F) F.F; F.FD F.FD F.FE F.11 F.1)r&, F.2) F.F0 F.F) F.F; F.F; F.10 F.1E F.22r& F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr, F.21 F.1A F.21 F.2A F.2E F.11 F.1; F.1E

    In order to e(aluate the results obtained from the &'P approach" acomparison $ill be made $ith the a(erage cost based on se(eralcontrol strategies obtained by simulating the chain. ,irst" the &'Papproach is compared to the , strategy" in $hich the order of the

    ser(ed combinations is fi!ed and also the duration of the greenperiods. ,urther" the &'P approach is compared to the e!hausti(econtrol strategy" in $hich the order of ser(ed combinations is fi!edand the green periods $ill be #ept until the flo$s in the combinationthat has right of $ay is empty. T$o alternati(e e!hausti(e controlsare also added in the e(aluation J+617 and J+627" $hichanticipate departures during 1 and 2 yello$ slots" respecti(ely. Inother $ords" the green periods $ill be #ept until the number of carsat each flo$ in the combination that has right of $ay is at most oneand t$o in J+617 and J+627" respecti(ely.

    Based on Littles Law" the a(erage $aiting time per car can bederi(ed from the a(erage number of cars $aiting at the 4ueue.enote the a(erage $aiting time in seconds for flo$ f as )( fW .Then"

    f

    f

    fq

    gW

    =

    2)( "

    394-:5

    $here fg and fq are the a(erage number of cars $aiting at flo$ f

    and the arri(al rate" respecti(ely. The o(erall a(erage $aiting time

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    29/34

    is $eighted by the a(erage arri(al at the 4ueues. Thus" the o(eralla(erage $aiting time in seconds is gi(en by

    c

    gg

    cq

    g

    c

    qW

    c

    qW

    f

    f

    f f

    ff

    f

    f

    f 22..2)()( ==== "

    394-.5

    $here =f

    fqc .

    Table ; -;presents the o(erall a(erage $aiting time in seconds for(arying arri(al rates at a symmetric ,)2 intersection" $hich meansthat all flo$s ha(e identical arri(al rates per slot of fq 2

    Ta,le 949> B!erall a!erage 8aiting time in seconds for t%es#mmetric "7C-2

    3ule =fq

    F.2 =fq

    F.0 =fq

    F.)T$o-phase &'P ;.FF A.A; E.;D, ;.0D D D.0F 1F E.G2 )J+ ;.AA 10 D.;) 10 E.A; 1J+617 ;.FA 1 A.AG 1 E.0D -2J+627 ;.1A 0 A.GD ; E.GG ;, green periodsin slots for 1"2

    1" 1 0" 0 E" E

    ,or this simple intersection" the o(erall a(erage $aiting time gained

    from the t$o-phase &'P approach is less than the other strategies.The results obtained by the t$o-phase &'P approach is close to theanticipating e!hausti(e J+617 strategy.

    92-2- As#mmetric arri!al rates

    In the case of asymmetric arri(al rates" t$o situations areconsidered for ,)2. The first situation that is considered is $herethe arri(al rates are F.1; for flo$s 1 and 0" and F.); for flo$s 2 and). +ence" the flo$s $ithin the same combination ha(e the identical

    arri(al rates" but 2 is three times as busy as 1. The secondsituation is $here the arri(al rate for flo$ 1 is F.1F" and the arri(alrates for the other flo$s are F.0F. The coefficients of the linearcombination of the basis functions obtained from the t$o-phase &'Papproach for appro!imating the (alue function are gi(en belo$.

    Ta,le 94:> T%e 8eig%ts of t%e linear com,ination of t%e ,asisfunctions for t%e as#mmetric "7C-1 8it% q3(2'91 (2791(2'91 (27952Coefficie

    ntsl' l-

    i( i' i- i6 i( i' i- i6

    rF F.FF F.)) F.FF F.FF F.FF 1.2; 2.GD F.FFr1 2.2; 1.22 G.F1 E.02 D.A1 ;.GD ).2E 2.2;

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    30/34

    r2 A.0F A.10 ).A1 F.G1 F.G1 0.1F A.DA A.))r0 2.A) 1.FD G.AD E.DF E.2G A.;G ).E1 2.A)r A.1G ;.AA ).F0 F.G1 F.G1 0.FD A.A; A.0Fr++ F.A2 1.DA F.FF F.FF F.FF F.1G F.0G F.A2r&& F.FF F.FF F.0F F.G; F.G; F.G; F.FF F.FF

    r,, F.AE 1.GD F.FF F.FF F.FF F.21 F.)0 F.AEr F.FF F.1F F.)2 F.G; F.G; F.G) F.FF F.FFr+& F.FE F.F1 F.F1 F.F1 F.F1 F.F0 F.FA F.FDr+, F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr+ F.F0 F.FF F.FF F.FF F.FF F.F1 F.F2 F.F0r&, F.12 F.1D F.21 F.2; F.2A F.F) F.FG F.11r& F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr, F.22 F.1E F.21 F.2A F.2D F.FD F.1A F.2F

    Ta,le 94.> T%e 8eig%ts of t%e linear com,ination of t%e ,asis

    functions for t%e as#mmetric "7C-1 8it% q3(2'1 (261 (261(2652oefficien

    tslS1 lS2

    iSF iS1 iS2 iS0 iSF iS1 iS2 iS0rF F.FF F.01 1.F; F.FF F.FF F.)D 1.)) F.F0r1 F.;; F.FF A.22 ).GE ).1E 2.ED 1.D1 F.;;r2 ).G0 ).F) 2.)A F.AE F.AE 1.E1 A.00 ;.;2r0 1.)2 1.AA D.E2 D.20 D.FE ;.;D ).F2 1.)1r ;.2A ).F2 2.)0 F.AG F.AG 1.EE A.GA A.0Ar++ F.;A 1.;) F.1F F.1D F.20 F.)A F.;1 F.;A

    r&& F.1F F.0A F.;G F.DG F.DG 1.10 F.FF F.FFr,, F.G1 1.0E F.FF F.FF F.FF F.02 F.;; F.G1r F.1A F.0) F.;E F.DE F.DE 1.20 F.FF F.FFr+& F.FF F.F1 F.1; F.1A F.1A F.FF F.FF F.FFr+, F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr+ F.F2 F.F; F.2D F.2G F.2G F.FF F.FF F.FFr&, F.AA F.21 F.2) F.2E F.0F F.1) F.;1 F.A1r& F.FF F.FF F.FF F.FF F.FF F.FF F.FF F.FFr, F.22 F.2) F.1; F.1E F.1G F.F; F.1D F.2F

    The results are gi(en inTable ; -E. The o(erall mean $aiting time isdenoted by )(W and the a(erage $aiting time for flo$ f isdenoted by )( fW . The length of green periods in the , controlstrategy for 1 and 2 are 1 and ; slots" respecti(ely" in the firstsituation. In the second situation" the length of the green periods is0 slots for both combinations.

    Ta,le 94)> Mean 8aiting times in seconds for t8oas#mmetric "7C- cases2

    3ule )(W )( 1W )( 2W )( 3W )( 4Wq S6F.1;" F.);"F.1;" F.);7

    T$o-phase &'P ;.G; E.)D ;.1A E.)1 ;.FD, A.02 A 1F.;0 ).G1 1F.;; ).G1

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    31/34

    J+ A.D2 10 G.EA ;.AE G.EE ;.ADJ+617 A.0D D E.12 ;.DG E.11 ;.DGJ+627 D.02 20 D.FD D.)2 D.FE D.0Gq S6F.1F" F.0F"F.0F" F.0F7

    T$o-phase &'PA.22 ).G0 ;.)0 E.2; ;.)F

    , D.FG 1) ;.1G D.0F D.01 D.0FJ+ A.GA 12 A.)2 A.AA D.D2 A.ADJ+617 A.22 F ;.2; A.F; A.EE A.FAJ+627 A.A1 A ).A0 A.A; D.1G A.A;

    &s obser(ed" the t$o-phase &'P approach results a lo$er o(erallmean $aiting time compared to the other strategies. ,urther" it isobser(ed that the flo$s that are parts of the more busy combinationha(e a lo$er $aiting time. This gi(es an idea that a more busycombination gets the priority. If a busy flo$ and a less busy flo$ are

    grouped together in one combination" then the less busy flo$ $illta#e ad(antage of the priority and the more busy flo$ $ill suffer abit.

    926 =educed Linear Program

    &lthough the dimension of the problem is reduced in theappro!imate (ersion 6&'P7" the number of constraints is still asmany as the number of state-action pairs $hich can be (ery large.

    The t$o-phase &'P only sol(es parts of the 8curse of dimensionality5problem. Therefore" the largest state on each flo$ is limited to ). Inpractice" this is certainly not the case. an 3oy %2/suggested aconstraint sampling approach for appro!imating solutions tooptimiation problems $hen the number of constraints isintractable" the educed Linear Program 63'P7. The idea is to definea probability distribution U o(er the set of constraints and to includeonly a subset of the distributed constraints for sol(ing the problem.T$o properties $ere pro(en that if a reasonable number ofconstraints are sampled from distributed constraints" then almost allothers $ill be satisfied and the constraints that are not satisfied do

    not distort the solution too much.

    The 3'P for the traffic control problem is characteried as follo$s.The largest state on each flo$ is set to 1F. The constraint samplesie is 2F"FFF. The subset of constraints $ere sampled based onprobability measure U6#7 S61-V7)V #" V SF.GG. In the case of smallarri(al rates" say F.2" the 3'P gi(es a good appro!imate to the (aluefunction that results the a(erage cost similar to a(erage costyielded from the t$o-phase &'P. +o$e(er" $hen the load is high"the results do not resemble the e!pectations because the sampledconstraints do not represent almost allO other constraints" and theconstraints that are not satisfied distort the solution effecti(ely. It ispro(en that the results of 3'P rely on an idealied choice of U. The

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    32/34

    underlying thought is similar as choosing the state rele(ance$eights 6see *ection 0.07. The constraints that represent the statesthat are (isited more often should be then included. The chosenprobability measure U6#7 has an e!ponential form $hich gi(es highprobabilities to the states that are close to the situation $hen there

    are no cars present in all flo$s" i.e." the states $ith small number ofcars present in the flo$s. This probability measure is an acceptablechoice of U $hen the arri(al rates are small. +o$e(er" $hen thearri(al rates are higher" the high probability has to be assigned tothe states $here more cars are present in the flo$s. Therefore"e!ponential form is not an idealied choice of the probabilitymeasure U.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    33/34

    : Conclusion

    The t$o-phase &'P approach for a(erage cost dynamicprogramming is applied to the traffic control at isolated signaliedintersection. &s an illustration" an intersection of t$o t$o-$aystreets $ith controllable traffic lights on each corner is considered.Based on the chosen basis functions and the state rele(ance$eights" it is obser(ed that the results based on the s$itchingschemes obtained by the &'P approach $ere better compared tothe other strategies $here there are at most ) cars on each flo$.

    In order to deal $ith the intractable state space" the &'P approach

    pro(ides an algorithm for finding a good appro!imation to theoptimal (alue function by fitting a parameteried linear function.+ence" the number of (ariables that has to be stored in &'Papproach is e4ual to the number of the pre-specified basis functions"instead of storing the optimal (alue function itself for each state inthe system. +o$e(er" the &'P approach still has as manyconstraints as the e!act linear programming formulation. Therefore"the largest state on each flo$ is limited to ). In practice" this iscertainly not the case. an 3oy %2/suggested a constraint samplingapproach for appro!imating solutions to optimiation problems$hen the number of constraints is intractable. The idea is to define

    a probability distribution U o(er the set of constraints and to includeonly a subset of the distributed constraints for sol(ing the problem.

  • 8/13/2019 Werkstuk Mak Tcm39 91392

    34/34

    =eferences

    %1W ,arias" .P. de" Benjamin (an 3oy" &ppro!imating 'inearProgramming for &(erage-ost ynamic ProgrammingO.

    %2W ,arias" .P. de" Benjamin (an 3oy" :n onstraint sampling inthe 'inear Programming &pproach to &ppro!imate ynamicProgrammingO. /athematics of operations research( %ol. &0( 1o. ,(August &22( pp. 3&-&45.

    %0W ,arias" .P. de" 6Xune 2FF27" The 'inear Programming &pproachto &ppro!imate ynamic ProgrammingH theory and applicationO.

    %)W+aijema" 3" Xan (an der /al 60rdo(ember 2FFA7" &n MPdecomposition approach for traffic control at isolated signaliedintersectionsO.

    %;W en28ikipedia2org8ikiTrafficDlig%tEIntroduction

    %DW Intersection *afetyH Myth ersus 3ealityO" Intersection *afety

    Brief" .*. epartment of Transportation" ,ederal +igh$ay&dministration.

    http://en.wikipedia.org/wiki/Traffic_light#Introductionhttp://en.wikipedia.org/wiki/Traffic_light#Introduction