High Quality and Resilient IPTV Backbone Networks André ... · vi. vii Abstract Resumo Distribuir...

101
High Quality and Resilient IPTV Backbone Networks André Ricardo Simões Bento Dissertation submitted for obtaining the degree of Master in Electrical and Computer Engineering Jury Supervisor: Prof. João Luís da Costa Campos Gonçalves Sobrinho President: Prof. Nuno Cavaco Gomes Horta Members: Prof. Joao Jose de Oliveira Pires Outubro 2012

Transcript of High Quality and Resilient IPTV Backbone Networks André ... · vi. vii Abstract Resumo Distribuir...

  • High Quality and Resilient IPTV Backbone Networks

    André Ricardo Simões Bento

    Dissertation submitted for obtaining the degree of

    Master in Electrical and Computer Engineering

    Jury

    Supervisor: Prof. João Luís da Costa Campos Gonçalves Sobrinho

    President: Prof. Nuno Cavaco Gomes Horta

    Members: Prof. Joao Jose de Oliveira Pires

    Outubro 2012

  • ii

    To my grandmother

  • iii

    Acknowledgements

    Acknowledgements

    Foremost, I would like to express my sincere gratitude to my supervisor Prof. João Luís Sobrinho

    for the continuous support of my M.Sc. study and research, for his patience, motivation and

    enthusiasm. His guidance helped me in all the time of research and writing of this thesis.

    Besides my advisor, I would like to thank my family, especially my parents, who supported me

    emotionally and financially all the way through the course.

    Last but not the least, I would like to thank my beloved girlfriend for her patience to see me

    writing this thesis throughout the summer and during her vacations.

  • iv

  • v

    Abstract

    Abstract

    Multicasting of TV (real time video) streams with the IP protocol suite is a challenge. IP was

    initially designed for unicast not multicast. The routing and transport protocols of IP were not planned

    for applications requiring a combination of steady high bandwidth, low packet delay and low packet

    loss.

    This thesis focuses on the design of broadcast trees for the backbone of IPTV networks. We will

    present algorithms for the calculation of optimal out-branchings and resilient out-branchings, where in

    case of link failure, it is possible to substitute the link by a backup path, granting that the other links of

    the tree are not affected by the failure. The failure recovering is provided by Fast Reroute

    mechanisms, which are capable of handling failure in sub-second time (around 50ms).

    Keywords

    IPTV, Backbone, Minimum Out-Branchings, Resilient Out-Branchings, Fast Reroute, MPLS

  • vi

  • vii

    Abstract

    Resumo

    Distribuir fluxos de TV (vídeo em tempo real) usando os protocolos IP para transmissões ponto-

    multiponto é um desafio. O IP foi inicialmente desenhado para transmissões ponto a ponto, não

    ponto-multiponto. Os protocolos de transporte e encaminhamento do IP não foram planeados para

    aplicações que requerem-se uma combinação de largura de banda estável, baixo atraso e baixa

    perda de pacotes.

    Esta tese foca-se no desenho de arborescências de distribuição para o núcleo das redes IPTV.

    Apresentamos algoritmos para o cálculo de arborescências óptimas e arborescências resilientes,

    onde, no caso de falha de uma ligação é possível substituir a ligação que falhou, por um caminho de

    protecção, garantindo que as outras ligações da arborescência não são afectadas pela mudança. A

    recuperação das falhas é feita usando Fast Reroute, um mecanismo capaz de lidar com falhas em

    tempos que rondam os 50 ms.

    Palavras-Chave

    IPTV, Núcleo, Arborescências de Custo Mínimo, Arborescências Resilientes, Fast Reroute, MPLS

  • viii

  • ix

    Table of Contents

    Table of Contents

    Acknowledgements ................................................................................ iii

    Abstract ................................................................................................... v

    Resumo ................................................................................................ vii

    Table of Contents ................................................................................... ix

    List of Figures ........................................................................................ xi

    List of Tables ......................................................................................... xiii

    List of Algorithms .................................................................................. xv

    List of Acronyms .................................................................................. xvii

    1 Introduction .................................................................................. 1

    1.1 Problem Statement .................................................................................. 2

    1.2 Contents .................................................................................................. 3

    1.3 Contributions ........................................................................................... 3

    1.4 State of the Art ......................................................................................... 4

    1.5 IPTV Architecture .................................................................................... 5

    2 The Unicast Routing Problem ....................................................... 9

    2.1 Network Service Models ........................................................................ 10

    2.2 Routing Protocols .................................................................................. 11

    2.3 MPLS ..................................................................................................... 13

    2.4 Fast Reroute .......................................................................................... 20

    3 The Multicast Routing Problem .................................................. 29

    3.1 From Unicast to Multicast ...................................................................... 29

    3.2 Protocol Independent Multicast (PIM) .................................................... 31

    3.3 Point-to-Multipoint LSPs (P2MP LSPs) ................................................. 35

    3.4 Protocol Architecture ............................................................................. 37

    3.5 Failure Recovering Process................................................................... 39

  • x

    4 Designing Multicast Out-branchings ........................................... 43

    4.1 Definitions .............................................................................................. 44

    4.2 Minimum Cost Out-Branchings .............................................................. 46

    4.3 Connectivity Theorems .......................................................................... 51

    4.4 Link-Disjoint Out-branchings Algorithms ................................................ 56

    4.5 Comparison of the Out-branchings ........................................................ 70

    4.6 Protection Paths .................................................................................... 73

    5 Conclusions ................................................................................ 75

    References............................................................................................ 79

  • xi

    List of Figures

    List of Figures Fig. 1 – IPTV architecture ......................................................................................................................... 5

    Fig. 2 – Structure of the backbone of an IPTV network ........................................................................... 6

    Fig. 3 – Changes in case of link failure. The link represented in bold is introduced to substitute the link that failed (represented with a cross) ................................................................. 6

    Fig. 4 – Verizon’s continental US backbone network (2011) [31] ............................................................ 7

    Fig. 5 – T-Systems (aka Deutsche Telekom) germany50 [26] backbone network (2006) ....................... 8

    Fig. 6 – Layer model ...............................................................................................................................10

    Fig. 7 – Various Virtual Circuits established over a Physical Link: one per packet stream ...................11

    Fig. 8 – MPLS layer model .....................................................................................................................13

    Fig. 9 – Label Switching in MPLS. A packet with label 50 arrives at LSR at interface B. The LSR looks at its label switching table a sends the packet with label 40 through interface C. ....................................................................................................................14

    Fig. 10 – MPLS’s header structure .........................................................................................................15

    Fig. 11 – MPLS tunnel ............................................................................................................................15

    Fig. 12 – LDP session establishment .....................................................................................................17

    Fig. 13 – Path signalling using RSVP .....................................................................................................19

    Fig. 14 – Constrain based routing example that uses bandwidth as a routing metric ...........................20

    Fig. 15 – Network Failure Anatomy ........................................................................................................21

    Fig. 16 – Local Protection .......................................................................................................................22

    Fig. 17 – End-to-end protection ..............................................................................................................23

    Fig. 18 – Setting up a facility backup ......................................................................................................24

    Fig. 19 – Forwarding traffic using the facility backup .............................................................................24

    Fig. 20 – Setting up a one-to-one backup ..............................................................................................25

    Fig. 21 – Forwarding traffic over a one-to-one backup ..........................................................................26

    Fig. 22 – Packet distribution in unicast. A packet stream is generated for each receiving node. ..........30

    Fig. 23 – Packet distribution in multicast. Packet streams are not replicated at the source. .................30

    Fig. 24 – Receiver “join” and source “register” in PIM-SM .....................................................................32

    Fig. 25 – Source “join” and traffic flow in PIM-SM ..................................................................................32

    Fig. 26 – Shortest path reconfiguration in PIM-SM ................................................................................33

    Fig. 27 – Shortest Path Tree in PIM-SM after reconfiguration ...............................................................33

    Fig. 28 – Source “join” in PIM-SSM. “Join” messages for different group are sent to its respective video source. ...............................................................................................34

    Fig. 29 – Data flow in PIM-SSM. Video packets are sent to the reverse packet of the respective “join” message. .............................................................................................................34

    Fig. 30 – P2MP LSPs example. The forwarding table of P1 shows the different labels using for the different interfaces. .................................................................................................35

    Fig. 31 – P2MP LSP setup .....................................................................................................................36

    Fig. 32 – Cross layer architecture ..........................................................................................................38

    Fig. 33 – Setup of calculated out-branchings .........................................................................................39

    Fig. 34 – Link failure scenario ................................................................................................................40

    Fig. 35 – Minimum cost out-branching (bold red arrows) .......................................................................46

    Fig. 36 - Algorithm to find MOBs: Step I (link cost update) ....................................................................47

  • xii

    Fig. 37 - Algorithm to find MOBs: Step II (0 cost cycle is agglomerated in a supernode) .....................48

    Fig. 38 - Algorithm to find MOBs: Step III ...............................................................................................48

    Fig. 39 - Algorithm to find MOBs: Step IV (0 cost links are selected as part of the MOB) .....................49

    Fig. 40 - Algorithm to find MOBs: Step V (the links of the cycle except link are added to the MOB) .............................................................................................................................49

    Fig. 41 – Link-disjoint example part I ......................................................................................................51

    Fig. 42 – Link-disjoint example part II .....................................................................................................51

    Fig. 43 – Two symmetric cycles. There are two disjoint paths between 0 and 1. ..................................52

    Fig. 44 – Closed Path using the 2 links of .................................................................................53

    Fig. 45 – Closed path using the 3 links of .................................................................................53

    Fig. 46 – Simple path between sets and ......................................................................................54

    Fig. 47 – 2-connected directed graph .....................................................................................................55

    Fig. 48 – Graph with 2 link-disjoint out-branchings. One represented in normal red lines, the other in dashed blue lines .............................................................................................55

    Fig. 49 – Edmond’s theorem application example. Set and have 3 and 2 entering links represented in bold, respectively. .................................................................................56

    Fig. 50 – First iteration of algorithm 2: Part I (path is added to out-branching ) ...............................57

    Fig. 51– First iteration of algorithm 2: Part II (path is added to out-branching ) ..............................57

    Fig. 52 – First iteration of algorithm 2: Part III (the reverse edges of path are added to ) ...............58

    Fig. 53 – First iteration of algorithm 2: Part IV (the reverse edges of path are added to ) ...............58

    Fig. 54 – Final result of the application of algorithm 2 ............................................................................59

    Fig. 55 – Algorithm 2 iteration: Part I (there are always two paths available from the two out-branchings to node ) .............................................................................................61

    Fig. 56 – Algorithm 2 iteration: Part II ........................................................................................61

    Fig. 57 – Choosing paths in algorithm 2: Part I (path - - is added to out-branching ) .................62

    Fig. 58 – Choosing paths in algorithm 2: Part II (path - - - is added to out-branching ) ...........63

    Fig. 59 – Choosing paths in algorithm 2: Part III (the reverse edges of the paths are added to the out-branchings) .......................................................................................................63

    Fig. 60 – Choosing paths in algorithm 2: Part IV (the result of not choosing the shortest paths for out-branching ) ...........................................................................................................64

    Fig. 61 – Graph and the maximum number of disjoint out-branchings ............................................66

    Fig. 62 – First iteration of algorithm 4: Part I ..........................................................................................66

    Fig. 63 – First iteration of algorithm 4: Part II .........................................................................................67

    Fig. 64 – First iteration of algorithm 4: Part III ........................................................................................67

    Fig. 65 – Final result of the application of algorithm 4 ............................................................................68

    Fig. 66 – 3 out-branchings rooted at vertex 0 generated by our implementation of algorithm 4 ............70

    Fig. 67 – 10 node link-disjoint trees generated using algorithm 4. (Main tree in Red) ...........................72

    Fig. 68 – 10 node link-disjoint trees generated using algorithm 2 optimized for the main tree (Red) .............................................................................................................................72

  • xiii

    List of Tables

    List of Tables

    Table 1 – Router S IGP routing table .....................................................................................................40

    Table 2 – Algorithm comparison .............................................................................................................71

    Table 3 – Protection path cost comparison ............................................................................................74

  • xiv

  • xv

    List of

    List of Algorithms

    Algorithm 1 - Finding a MOB rooted at in a given multigraph ........................................................49

    Algorithm 2: Finding 2 link disjoint out-branchings rooted at in a 2-connected symmetric graph .........................................................................................................................59

    Algorithm 3: Finding the maximum number of link-disjoint paths in from to . ..............................65

    Algorithm 4: Finding a maximal set of link disjoint out-branchings in rooted at .............................68

  • xvi

  • xvii

    List of

    List of Acronyms

    AS Autonomous System

    ATM Asynchronous Transfer Mode

    BBC British Broadcasting Corporation

    BFD Bidirectional Forwarding Detection

    BFS Breadth First Search

    CBS Columbia Broadcasting System

    CNN Cable News Network

    CPU Central Processing Unit

    DSL Digital Subscriber Line

    DV Distance Vector

    ERO Explicit Route Object

    FEC Forward Equivalence Classes

    FRR Fast Reroute

    IGP Interior Gateway Protocols

    IP Internet Protocol

    IPTV Internet Protocol Television

    IPv4 Internet Protocol version 4

    IPv6 Internet Protocol version 6

    IS-IS Intermediate Systems to Intermediate Systems

    ISP Internet Service Providers

    LDP Label Distribution Protocol

    LER Label Edge Router

    LIB Label Information Base

    LS Link State

    LSA Link State Advertisements

    LSP Label Switched Path

    LSR Label Switched Routers

    MOB Minimum Out-Branching

    MP Merge Point

    MPLS Multi Protocol Label Switching

    MPLS-TE Multi Protocol Label Switching-Traffic Engineering

    MST minimum spanning tree

    OSPF Open Shortest Path First

    P2MP Point to Multipoint

  • xviii

    PIM Protocol Independent Multicast

    PIM-DM Protocol Independent Multicast-Dense Mode

    PIM-SM Protocol Independent Multicast-Sparse Mode

    PIM-SSM Protocol Independent Multicast-Source Specific Mode

    PLR Point of Local Repair

    PON Passive Optical Networks

    QoS Quality of Service

    RIP Routing Information Protocol

    RP Rendezvous Point

    RRO Record Route Object

    RSVP Resource Reservation Protocol

    RSVP-TE Resource Reservation Protocol-Traffic Engineering

    SHO Super Hub Office

    SPT Shortest Path Tree

    TCP Transmission Control Protocol

    TE Traffic Engineering

    TTL Time to Live

    TV Television

    UDP User Datagram Protocol

    UK United Kingdom

    US United States

    VC Virtual Circuit

    VHO Video Hub Offices

    VOD Video On Demand

    VSO Video Serving Office

  • 1

    Chapter 1

    Introduction

    1 Introduction

  • 2

    1.1 Problem Statement

    In the past few years, we have seen the telecommunication carriers do a global effort in the

    rapid implementation of Internet Protocol Television (IPTV). IPTV encodes live TV streams in a flow of

    IP packets and delivers them to users through broadband networks. The key reasons for using IP

    networks for the transport of TV are:

    Convergence: The services offered by the telecommunications companies are converging on

    IP, including the transmission of digital voice and data. The addition of IPTV allows

    telecommunication carriers to strengthen their competitiveness by offering triple play services

    (digital voice, data and TV);

    Cost: Having a uniform platform reduces the cost of maintaining separate networks;

    Flexibility: IP has the ability to support diverse applications; this offers flexibility opening up

    opportunities for the introduction of new services like interactive TV and Video on Demand

    (VOD).

    telecommunication carriers use their IP networks to distribute the TV streams; this presents

    challenges because the networks were not designed for this type of service and IPTV requires:

    Minimal Packet Loss: IP packet loss can represent various levels of image damage, ranging

    from a single pixelated block in an image, to a long period of degraded, pixelated or

    unavailable sequence of images. In order to offer the user a satisfactory Quality of Service

    (QoS) packet loss needs to be minimal.

    Steady High Bandwidth: The total amount of video-stream data that can be sent is ultimately

    limited by the bandwidth provisioned over the network. Any increase in bandwidth demand

    that is above the maximum capacity of the link, will result in video packets being lost.

    Low Packet Delay: Live TV streams are encoded and sent in IP packets. At the reception, an

    initial delay is introduced and the first packets are stored in a buffer. After the initial delay,

    packets are decoded and transmitted in the user’s TV. This mechanism helps in coping with

    short transmission delays and jitter. However, if a packet fails to arrive at the reception at the

    time it is supposed to be decoded, it is like it was lost since it will not be used.

    Sub-second Failure Restoration: In the case of link failure, restoration needs to be done in

    sub-second time to avoid packet loss, which could cause image damage or even service

    interruption.

  • 3

    1.2 Contents

    In this dissertation we will evaluate network designs that can cope with the challenging

    requirements of IPTV. We will start with an overview of the IPTV backbone architecture and its

    components.

    In chapter 2: The Unicast Routing Problem, we will discuss the network service models and

    protocols used to forward traffic sent from a single source to a single location in the network. We will

    also address the resource reservation problem and refer to a link-failure protection mechanism called

    Fast Reroute, which is capable of providing sub-second restoration.

    The emergence of applications like IPTV led to introduction of routing protocols capable of

    forwarding traffic to a group of recipients. The changes in routing, from a source to multiple

    destinations, as opposed to routing from a source to a single destination, are addressed in chapter 3:

    The Multicast Routing Problem. We will see the out-trees that are currently being used for forwarding

    and analyse mechanisms to calculate them. An out-tree is a directed tree rooted at a specific source

    vertex. If the out-tree spans all the vertices of a given graph is called out-branching.

    In the end of chapter 3, we will also discuss possible protocol suites that can cope with the

    requirements of IPTV and simultaneously accommodate custom out-branchings. We will then describe

    a methodology to handle link failure. The purpose of this methodology is to avoid packet loss when

    using the Fast Reroute mechanisms.

    In chapter 4: Designing Out-branchings, we will present the custom out-branching designs. We

    will present algorithms to calculate optimal out-branchings and resilient out-branchings. In the end of

    the chapter we will compare the designs and evaluate its pros and cons. Finally, in chapter 5:

    Conclusion, we will do a short summary of the topics covered by this thesis and discuss future work to

    be done in this subject.

    1.3 Contributions

    The objective of this thesis is to compute and implement different out-branching designs,

    understanding the advantages and disadvantages of these implementations. Given a certain out-

    branching we purpose a protocol suite that can use this out-branching to transport IPTV packets and,

    at the same time, is capable of handling the requirements of IPTV. We have studied the interaction

    between the involved protocols and based on previous work ([1], [2], [3] and [4]), developed a strategy

    to operate them.

    We have also studied different out-branching designs:

    Shortest path out-branchings, where the cost of the paths from the root to the each vertex is

    minimum;

  • 4

    Minimum cost out-branchings, where the sum of the costs of the links in the out-branching is

    minimum;

    Resilient out-branchings, where each link of the out-branching can be protected by a path

    that is disjoint from the out-branching. If the protection paths shared links with the main tree,

    these links would transport more than packet stream replicas, this could cause link overload.

    We have implemented algorithms that generate the out-branchings above. Regarding the

    resilient out-branchings we will present 2 algorithms. Given a certain graph, one of them generates 2

    disjoint out-branchings and the other generates the maximum number of link-disjoint out-branchings

    possible. For the first, we have come up with 2 heuristics that improve the resultant out-branchings.

    The purpose of the heuristics is the following:

    Improve the main out-branching: where the objective is to reduce the link cost of the paths

    between the root and the vertices in the main out-branching;

    Improve for the backup paths: where the objective is to reduce the total link cost of paths

    used for backup.

    As far as the protection paths are concerned, we also purpose a way of calculating them.

    1.4 State of the Art

    A group of authors ([1], [2], [3] and [4]), suggested a protocol suite capable of accommodate

    custom out-branching designs, as well as a failure handling method used to avoid packet loss during

    the Fast Reroute recovering process . The authors have also suggested an algorithm to calculate out-

    branchings, and respective disjoint protection paths. This algorithm took care of link overload problem

    in case failure, however, the authors were not preoccupied with the cost of the out-branchings. They

    also did not had in consideration the cost of the protection paths. Other authors came up with

    algorithms [5] that resulted in similar out-trees, also disregarding the cost optimization aspect.

    In [6] and [7] simple schemes are proposed to solve single link failures in sub-second time with

    low packet loss in the process, these schemes introduce minor changes in routing and are easy to

    implement. However, these schemes disregard the link overload problem during the recovery. Like the

    others the proposed solutions do not account for the costs of the out-branchings.

    Another different solution is proposed in [8], this scheme uses fast layer 3 failure detection, this

    type of schemes is often disregarded by the telecommunications carriers because they require short

    timers to detect the failures, this can produce false positive cases and as we will see further ahead

    can compromise the best routing options.

    Finally in [9], 3 protocol architectures are discussed with the objective of distributing IPTV

    streams, the advantages and disadvantages of the architectures are briefly explained.

  • 5

    1.5 IPTV Architecture

    IPTV networks are usually built on top of existent IP networks and are separated in three areas:

    access, metro and backbone. These areas have different components and philosophies. Fig. 1 shows

    a typical IPTV architecture.

    SHO VHO VSOPON

    Access

    Cable

    Access

    DSL

    Access

    Backbone Metro Acess

    Super Hub

    Office

    Video Hub

    Office

    Video Serving

    Office

    National Channel

    Servers

    Local Channel

    Servers

    Fig. 1 – IPTV architecture

    In the backbone, there are two essential components: the Super Hub Office (SHO) and the

    Video Hub Offices (VHOs). The SHO gathers the TV streams of the national channels (channels that

    are not specific to a region or state, like CNN in the US, or BBC One in the UK) and distributes them to

    a large set of receiving locations, the VHOs. Each VHO gathers the TV streams of the local channels

    (channels that are only reproduced in a specific region like WCBS the CBS channel of New York) and

    feeds both national and local streams to a metro area. IP routers are used to transport the IPTV

    content from the SHO to the VHOs. In addition, both SHOs and VHOs are enabled with powerful

    buffer servers to enable retransmissions and some other Video-On-Demand (VOD) and interactive

    services. Most of them have a satellite used to receive local/national contents.

    Is at the VHO, that traffic leaves the core network. It distributes the contents to Video Serving

    Offices (VSOs) via the routers in the metro area. The VSOs establish the border between metro and

    access networks, than can either be Cable, Digital Line Subscriber (DSL), or Passive Optical

    Networks (PON).

    The focus of this thesis is on the backbone of the IPTV networks, which are supported by the

    telecommunication carriers’ already existent IP networks. Fig. 2 shows an example of the backbone of

    an IP network that supports IPTV. The bold green circle represent de SHO and dotted and dashed

    green circles represent de VHOs. Both SHOs and VHOs incorporate two routers so that in case of

    router failure, the network is connected in a way, which grants that there is always one router available

    to feed the VSOs. The black lines are undirected edges; every edge has an upstream and

  • 6

    downstream link. The normal black lines represent edges that have a link used in the out-branching.

    This out-branching, rooted at the SHO, is depicted by the red arrows that represent links. The black

    dashed lines are unused edges. They are used to substitute a link of the out-branching in case of

    failure.

    SHO

    VHO

    VHO

    VHO

    VHO

    VHO

    VHO

    SHO

    VHO

    Router

    Used Edge

    Unused Edge

    Link of the

    broadcast out-tree

    Fig. 2 – Structure of the backbone of an IPTV network

    SHO

    VHO

    VHO

    VHO

    VHO

    VHO

    VHO

    SHO

    VHO

    Router

    Used Edge

    Unused Edge

    Link of the

    broadcast out-tree

    Fig. 3 – Changes in case of link failure. The link represented in bold is introduced to substitute

    the link that failed (represented with a cross)

    Most developed countries have their own IP backbone networks. In general, there are few big

    telecommunication carriers per country, each one supporting its own national backbone network. The

  • 7

    main difference between these networks is size.

    When compared to the US’s, the European networks are generally smaller. Nevertheless major

    IPTV carriers in US and in Deutschland, Verizon and T-Systems, respectively, support IP backbone

    networks that are about the same size. These networks are represented in Fig. 4 and Fig. 5.

    Fig. 4 – Verizon’s continental US backbone network (2011) [10]

  • 8

    Fig. 5 – T-Systems (aka Deutsche Telekom) germany50 [11] backbone network (2006)

    As we can see Verizon’s network has 47 nodes and T-Systems’ has 50 nodes. We will focus our

    work in this kind of networks: big IPTV backbone networks.

    Instead of analysing European countries individually, we could also picture it as a whole, but

    mind that in Europe, carriers distribute IPTV contents in and for its own country. This explains the

    existence of an IPTV backbone area per country per telecommunication carrier, to where video

    contents should be derived and then distributed.

  • 9

    Chapter 2

    The Unicast Routing Problem

    2 The Unicast Routing Problem

  • 10

    This chapter overviews the network service models and routing protocols used in unicast routing.

    Unicast is the term that designates the sending of messages from a single source to a single

    destination in the network. We will also talk about Multi Protocol Label Switching and Fast Reroute.

    2.1 Network Service Models

    Perhaps the most important abstraction provided by the network layer to the upper layers is

    whether it uses virtual circuits (VCs) or datagrams [12].

    Layer 2

    Data-link

    Layer 1

    Physical

    ...

    Layer 3

    Network

    Layer 4

    Transport

    Layer 2

    Data-link

    Layer 1

    Physical

    Layer 3

    Network

    Layer 4

    Transport

    ...

    Sender Receiver

    Fig. 6 – Layer model

    VCs provide a connection oriented service through the establishment and teardown of pseudo

    (virtual) circuits. This mechanism enables the reservation of link resources, providing steady high

    bandwidth, low delay and low packet loss. Nevertheless, VCs require state information to be recorded

    in the routers for each packet stream.

  • 11

    Physical

    Link

    Virtual Circuit

    Ph

    ysic

    al

    Lin

    k

    Ph

    ysic

    al

    Lin

    k

    Fig. 7 – Various Virtual Circuits established over a Physical Link: one per packet stream

    In contrast, datagrams do not require state information to be recorded in the routers for each flow

    of data, since data transmission is not preceded by circuit establishment. IP uses datagrams; it has

    low complexity and therefore handles heterogeneous networks best. IP uses resources evenly through

    any number of users, giving them no guarantees whatsoever. Due to its lack of guarantees, IP service

    is often called a best-effort service.

    As far as IPTV is concerned, we can see that VCs would be suitable for video transmission,

    however IP uses datagrams. Multi Protocol Label Switching is an answer to this dichotomy. MPLS is a

    label switching mechanism that allows the traffic to be divided into forwarding classes which may be

    forwarded the same way. The use of MPLS together with the Resource Reservation Protocol (RSVP)

    enables the reservation of resources to a specific data flow type (eg. Video), keeping only state

    information relative to the flow type, not the flow itself. MPLS is fully compatible with IP and in fact

    RSVP uses the IP routing protocol information to do the reservation of resources. We will talk about

    MPLS and RSVP later, for now, we will explore the IP routing protocols, from which RSVP depends

    on.

    2.2 Routing Protocols

    In this dissertation we are particularly interested on the best way of using the resources of a

    particular operator. Given so, we are then only interested in routing within a given autonomous system

    (AS). An intra-AS routing protocol is used to maintain the forwarding tables in the routers of an AS.

    Once the routing tables are configured, datagrams are routed within the AS. Intra-AS routing protocols

  • 12

    are also known as Interior Gateway Protocols (IGP). IGP can be divided into two categories: Distance-

    Vector (DV) protocols and Link-State (LS) protocols.

    The name distance-vector is derived from the fact that routes are advertised as vectors of

    (distance, direction), where distance is defined in terms of link cost and direction is defined in terms of

    the next-hop router. Each router learns routes from its neighbouring routers and then advertises the

    paths from its own perspective. A typical distance vector routing protocol is a routing algorithm in

    which routers periodically send routing updates to all neighbours by broadcasting their entire routing

    tables. The algorithm used to calculate the shortest paths is distributed and because of that routing

    loops may occur when using a DV protocol [13].

    In a LS protocol, the network topology and all link costs are known in each router. This is

    accomplished by having each node broadcast the identities and costs of its attached links to all other

    routers in the network. This link state broadcast can be accomplished without the routers having to

    initially know the topology of the network. A node only needs to know the identities and costs to its

    directly-attached neighbours; it will then learn about the topology of the rest of the network by

    receiving link state broadcast messages from the other nodes. The result of the nodes' broadcast is

    that all nodes have an identical and complete view of the network. Each router can then run a Dijkstra

    algorithm to calculate the shortest paths to the other routers. Any link cost changes or failure

    occurrences are communicated through the network using Link State Advertisements (LSAs).

    The key differences between distance vector and link state routing algorithms are the following:

    Convergence time: the computation time to calculate the shortest paths in DV protocols take

    usually take longer than in LS protocols. The distributed computations can result in cycles

    that can extend the computation time;

    State information: LS protocols require the routers to store state information of the whole

    network, in other words routers need to know the complete topology of the network. DV

    protocols only need to store information related to their neighbours.

    The fact that in LS routing protocols, routers have complete knowledge of the network topology

    is useful if we want to design custom out-branchings. If the routers know the topology they can use it

    to calculate the out-branchings independently, reaching the same results without having the necessity

    of exchanging messages between the routers. The most popular LS protocols are Open Shortest Path

    First (OSPF) and Intermediate Systems to Intermediate Systems (IS-IS). They are similar; the

    operator’s choice is usually based on convenience issues, not technical aspects.

    Another advantage of using LS protocols is its low convergence time. This is especially useful

    upon failure, in order to recalculate the shortest paths in the new topology (after failure). IS-IS and

    OSPF use a message exchange mechanism to detect failures. They keep HELLO messages between

    nodes to evaluate their condition; they use timers to detect the non-reception of a scheduled HELLO

    packet. When a failure is detected, routers distribute this info through the network using LSAs. Note

    that the choice of the timers is a tricky question, short timers may cause false positive failure

    detection, on the other hand, long timers may cause slow failure detection and in consequence,

  • 13

    packet loss. The failure detection times depend on the timers, but in common implementations range

    from 1 to 10s. Late extensions of OSPF and IS-IS offer a Fast-Hello mechanism that is able to detect

    failure in less than 1s.

    After the application of the LS protocol the resultant shortest path may be used to forward data

    packets. However if we use MPLS we can choose alternate paths, other than the shortest.

    2.3 MPLS

    Multi-Protocol Label Switching (MPLS) [14] began in the mid-1990s and appeared mainly to

    solve problems on IP over Asynchronous Transfer Mode (ATM) networks, which used VCs to forward

    the IP datagrams. MPLS has generally four main objectives:

    Improving scalability using labels to aggregate state information and to do routing

    hierarchies;

    Improving routing flexibility, using the labels to identify traffic with specific necessities to

    construct personalised paths;

    Optimizing network performance if used globally;

    Simplifying router integration with switching technology by forcing switches to act like

    routers, reporting physical topology information to the network layer and by unifying the

    routing, control and addressing of layers 2 and 3.

    Multi protocol label switching operates at a layer that is generally considered to lie between

    traditional definitions of layer 2 (data link layer) and layer 3 (network layer), and thus is often referred

    to as a "layer 2.5" protocol. It was designed to provide a unified data-carrying service for both circuit-

    based clients and packet-switching clients which provide a datagram service model. It can be used to

    carry many different kinds of traffic, including IP packets.

    IP

    ATM / Frame Relay / Ethernet/

    ...

    DSL / 10BASE-T / OTN/ ...

    Layer 3 - Network Layer

    Layer 2 - Data-link layer

    Layer 1 - Physical Layer

    ...

    MPLS Layer “2,5" - MPLS

    Fig. 8 – MPLS layer model

    This label switching mechanism brought by MPLS consists on the addition of a short extra label

  • 14

    in each IP packet. In a MPLS network, like the one represented in Fig. 9, a packet enters a MPLS

    domain by a Label Edge Router (LER), which introduces a MPLS header in the packet and sends it to

    the core of the network. This header contains the label that has forwarding information to be used by

    the core routers named Label Switched Routers (LSR). When the packet reaches the end of a path (to

    an exit LER), the label is removed. This path is called Label Switched Path (LSP). The labels are

    stored in a label switching table, where each entry contains a pair incoming label, incoming interface

    and the corresponding pair containing the out-coming label and its destination interface.

    LSR

    DLER

    (Ingress)

    LER

    (Egress)

    LSP

    A

    BC

    B 50 C 40

    IN

    Int

    IN

    Label

    OUT

    IntOUTLabel

    D50

    IP

    40 IP

    Fig. 9 – Label Switching in MPLS. A packet with label 50 arrives at LSR at interface B. The LSR looks

    at its label switching table a sends the packet with label 40 through interface C.

    Labels have a local character and can be distributed using the Label Switching Protocol (LDP) or

    the Resource Reservation Protocol (RSVP), which we will refer to later in this chapter.

    Fig. 10 indicates the structure of MPLS’s header. It is formed by:

    Label – Packets are expedited based on this field. This field will be used to index MPLS’s

    expedition table, the Label Information Base (LIB);

    EXP – originally meant to be experimental, are now used to classify packets into service

    classes;

    S – Labels can be stacked. S is a bit used to identify the bottom of the stack.

    TTL – Time to Live field is used to avoid cycles and infinite packet redistribution.

  • 15

    Data-link Layer Header MPLS Header IP Header

    Label20 bit

    EXP3 bit

    S1 bit

    TTL8 bit

    Fig. 10 – MPLS’s header structure

    Since it is not possible to have a “personalized” treatment to each packet, they need to be

    grouped in classes. These are called Forward Equivalence Classes (FEC). Each class contains

    packets with same transport requirements. This concept is inherited from IP. However, in MPLS, it is

    possible to have much more traffic classes. The most common factors used to classify the packet into

    FECs are the packet stream destination, packet stream origin and content type (e.g. video). All

    packets of a given FEC may be bound to the same MPLS label, which are expedited the same way. In

    contrast with IP expedition, the attribution of FEC to a packet is only done once (you can change a

    packets FEC by adding a second label to the packet).

    Labels are the essential element for packet expedition. A LSR just needs to examine the label to

    know which will be the next node, to where the packet needs to be expedited to. Labels are local, this

    means they only identify univocally connections between LSRs.

    A. Label Stacking

    It is possible to insert more than one label per IP packet. This enables the existence of

    hierarchies in the LSPs.

    R3 R4 R5

    R1

    R2 R7

    R66

    617

    1317

    13

    621

    1321

    Fig. 11 – MPLS tunnel

  • 16

    Fig. 11 shows an example of this hierarchy. In this example R1 sends a packet with label 6 to

    R3, which is the entry router of an MPLS tunnel. R3 receives the packet and instead of switching label

    6, it adds another label: label 17. R3 then sends the packet to R4 that only evaluates the first label of

    the label stack, 17. R4 switches the label 17 with label 21 and forwards the packet to R5. This is an

    egress LSR, it removes label 21 from the stack and evaluates the inner labels. R5 then sends the

    packet without MPLS labels to the respective router, according to its forwarding table.

    With label stacking, R1 does not need to know the configuration of the MPLS tunnel. It just

    needs to know the entry and exit routers. On the other hand, R4 does not need to know the

    configuration outside the MPLS tunnel.

    There are two protocols that usually used to attribute labels to the packet streams: the Label

    Distribution Protocol (LDP) and the Resource Reservation Protocol (RSVP).

    B. LDP: Label Distribution Protocol

    LDP is the result of the MPLS Working Group in the IETF. LDP was made for MPLS unlike

    RSVP, which existed before and was extended to handle label distribution. Nevertheless, LDP is much

    used, especially when optimizing the network is not a vital issue. Typically it is used in metro or local

    networks. In other cases RSVP is the one to go with.

    LDP has the advantage of not requiring any manual configuration; nodes interact between each

    other in order to discover new LSPs automatically. Each LSR establishes a TCP session with its

    neighbours to run the protocol, this grants reliable message delivery. With this setup we can see that

    each router has only information from its neighbours, this limits the labelling options but enhances

    scalability.

    The operation of LDP is driven by message exchanges between peers. Potential peers, also

    known as neighbours, are automatically discovered via hello messages multicast to a well-known UDP

    port. The protocol enables the discovery of remote peers using targeted hello messages. Once a

    potential peer is discovered, a TCP connection is established to it and an LDP session is set up. At

    session initialization time, the peers exchange information regarding the features and mode of

    operation they support. After session setup, the peers exchange information regarding the binding

    between labels and FECs over the TCP connection. As we have referred, the use of TCP ensures

    reliable delivery of the information, but it also allows the use of incremental updates, rather than

    periodic refreshes. LDP uses the regular receipt of protocol messages to monitor the health of the

    session. In the absence of any new information that needs to be communicated between the peers,

    keep-alive messages are sent. Fig. 12 shows LDP session establishment.

  • 17

    Fig. 12 – LDP session establishment

    LDP does its routing based on the information it gets from the running IGP protocol. Having an

    IGP running below LDP is therefore essential. The association between an FEC and a label is

    advertised via label messages: label mapping messages for advertising new labels, label withdraw

    messages for withdrawing previously advertised labels. The fundamental LDP rule states that LSR A

    that receives a mapping for label L for FEC F from its LDP peer LSR B will use label L for forwarding if

    and only if B is on the IGP shortest path for destination F from A’s point of view. This means that LSPs

    setup via LDP always follows the IGP shortest path and that LDP uses the IGP to avoid loops.

    Given our necessities, LDP by its own, does not seem to be the way to go. It is a highly scalable

    and simple protocol but it fails when we try to get optimization out of it. It is not to make full path

    reservations or have a good traffic engineering process, since it does its routing automatically and

    based on IGP information. Therefore we shall evaluate our other choice, RSVP, which gives us better

    tools to treat video flows.

    C. RSVP

    Another scheme for distributing labels for transport LSPs is based on the Resource Reservation

    Protocol (RSVP). RSVP was invented before MPLS. It was originally developed as a scheme to create

    bandwidth reservations for individual traffic flows in networks (e.g. a video telephony session between

    a particular pair of hosts) in an attempt to bring QoS to IP. RSVP includes mechanisms for reserving

    bandwidth along each hop of a network for an end-to-end session. However, the original QoS

    application of RSVP has fallen out of favor because of concerns about its scalability: the number of

    end-to-end host sessions passing across a service provider network would be extremely large, and it

    would not be desirable for the routers within the network to have to create, maintain and tear down

    state as sessions come and go.

    In the context of MPLS, however, RSVP has been extended to allow it to be used for the

    creation and maintenance of LSPs and to create associated bandwidth reservations. When used in

    this context, the number of RSVP sessions in the network is much smaller because of the way in

  • 18

    which traffic is aggregated into an LSP. A single LSP requires only one RSVP session, yet can carry

    all the traffic between two LER, containing many end-to-end packet streams. MPLS also allows the

    traffic to be aggregated into FECs, through the simple label switching mechanism, reducing the

    number of existent packet streams.

    In contrast to LDP, a RSVP-signalled LSP may not necessarily follow the path dictated by the

    IGP and, in its extended form, it is possible for the ingress router to specify the entire end-to-end path

    that the LSP must follow, having full control over the path. This property is very important in the

    context of traffic protection schemes such as Fast Reroute, discussed in detail lately in this chapter.

    RSVP requests resources for a traffic stream in only one direction from sender to one or more

    receivers and maintains soft state (the reservation at each node needs a periodic refresh) of the host

    and routers' resource reservations, hence supporting dynamic automatic adaptation to network

    changes.

    Paths may be computed online by the router or offline using a path computation tool. In the case

    of online computation, typically only the ingress router needs to be aware of any constraints to be

    applied to the LSP. Moreover, use of the explicit routes eliminates the need for all the routers along

    the path to have a consistent routing information database and a consistent route calculation

    algorithm.

    The creation of an RSVP-signalled LSP is initiated by the ingress LER. The ingress LER sends

    an RSVP Path message. The destination address of the Path message is the egress LER. The Path

    message includes the IP address of the previous node and some data objects, the most relevant are:

    Label Request Object - Requests a MPLS label for the path. As a consequence, the egress

    and transit routers allocate a label for their section of the LSP.

    Explicit Route Object (ERO) - The ERO contains the addresses of nodes through which the

    LSP must pass. If required, the ERO can contain the entire path that the LSP must follow from

    the ingress to the egress.

    Record Route Object (RRO) - RRO requests that the path followed by the Path message (and

    hence by the LSP itself once it is created) be recorded. Each router through which the Path

    message passes adds its address to the list within the RRO. A router can detect routing loops

    if it sees its own address in the RRO.

    Sender TSpec - TSpec enables the ingress router to request a bandwidth reservation for the

    LSP in question.

    In response to the Path message, the egress router sends a Resv message. Note that the

    egress router addresses the message to the adjacent router upstream, rather than addressing it

    directly to the source. This triggers the upstream router to send a Resv message to its upstream

    neighbour and so on. As far as each router in the path is concerned, the upstream neighbour is the

    router from which it received the Path message. This scheme ensures that the Resv message follows

    the exact reverse path of the Path message.

    Here are some of the objects contained in a Resv message:

  • 19

    Label Object - Contains the label to be used for that section of the LSP.

    Record Route Object (RRO) - Records the path taken by the Resv message, in a similar way

    to the RRO carried by the Path message. Again, a router can detect routing loops if it sees its

    own address in the Record Route Object.

    As can be seen, RSVP Path and Resv messages need to travel hop-by-hop because they need

    to establish the state at each node they cross, e.g. bandwidth reservations and label setup.

    R3 R4

    R5R1

    R2

    Path messages

    Resv messages

    200

    300 100

    Fig. 13 – Path signalling using RSVP

    Fig. 13 shows an example of the establishment of a path. R1 needs to setup a data packet

    stream with specific QoS to R5. To accomplish it, R1 sends a Path message to R3, asking for a label

    for a given data flow. R3 then redirects the Path message to router R4 and R4 redirects it to R5. R1

    needs to resend this Path message every 30 seconds to refresh the state.

    When the message reaches its destination, router R5, it allocates a label for that flow locally and

    responds with a Resv message to R4. In this case label 100 was chosen. As soon as the Resv

    message reaches R4, the router must first make a reservation based on the request parameters and

    then forward the request upstream (in the direction of the sender). This process is repeated in the

    upstream routers.

    The routers store the characteristics of the packet stream, and also police it to grant the

    specified QoS. This is all done in soft state, so if nothing is heard for a certain length of time, then the

    reservation will be cancelled. This solves the problem if either the sender or the receiver crash or are

    shut down incorrectly without first cancelling the reservation.

    D. Traffic Engineering

    Traffic engineering is the process of routing data traffic to balance the traffic load on the various

    links, routers, and switches in the network and is most applicable in networks where multiple parallel

    or alternate paths are available. Fundamentally, the objective of traffic engineering is to ensure

    sufficient capacity exists to handle the forecast demand from the different service classes while

  • 20

    meeting their respective QoS objectives.

    Along the years many protocols got Traffic Engineering (TE) extensions to better cope with

    nowadays traffic demands, MPLS was no different.

    MPLS-TE brought with it the concept of constrain based routing, a change in shortest path

    concept. If all the traffic is forwarded through the same “shortest paths”, paths may become

    congested. Perhaps we can get better results distribute the traffic wisely between the paths available.

    Constraint-based routing is a mechanism used to meet TE requirements, which can take in to

    account more than one metric to define its paths. Besides link cost, it can use bandwidth, delay and

    jitter.

    1

    R2

    R6 R7

    R1

    R3 1

    R5

    40Mbit/s traffic

    70Mbit/s traffic

    40Mbit/s bandwidth

    100Mbit/s bandwidth

    1 1

    1

    R4

    Fig. 14 – Constrain based routing example that uses bandwidth as a routing metric

    Fig. 14 shows an example of constrain based routing that uses bandwidth as a routing metric. If

    short path first routing was used in this case, both traffic flows from R1 to R5 would pass through path

    R1-R2-R3-R4-R5, which has the lowest total combined cost. This is the worst path in terms of

    bandwidth and will have difficulties in handling the traffic.

    2.4 Fast Reroute

    Fast Reroute (FRR) [14] is a technology that offers resiliency to MPLS networks. With FRR it is

    possible to quickly recover from link failure. The medium time down time when a single fails is around

    50 ms, which is far better than convergence times of the IGP protocols.

    FRR uses pre-programmed backup paths to detour the packets from the affected links, offering a

    safe path that can quickly be introduce once failure is detected. These procedures are transparent to

    the IGP protocols and therefore will leave the layer 3 routing tables intact.

  • 21

    R1 R6R5R2

    R3 R4

    PLR MP

    Fig. 15 – Network Failure Anatomy

    Fig. 15 shows the anatomy of a failure in a given network. If connection R2-R5 fails, traffic will be

    redirected through path R2-R3-R4-R5. The point that is upstream of the failure is called Point of Local

    Repair (PLR); on the other hand, the point downstream of the failure is called Merge Point (MP).

    FRR gives us essentially two schemes for protecting a given LSP. We can protect a full path

    (end-to-end) or just do a local protection detour (hop-by-hop). Both have its advantages and

    disadvantages and although local protection is the most popular, they can be used complementarily.

    A. Local Protection

    The idea of local protection is simple. Instead of protecting the full path, the traffic is only

    detoured at the PLR, protecting only one link. This is pretty similar to what happens when a freeway

    between two cities closes down somewhere between exits A and B. Instead of redirecting the traffic

    through other national routes, vehicles are redirected to a detour at exit A, which leads them to point B

    where they enter the freeway once again. Following the freeway’s philosophy, it is necessary to create

    what we call detour path as we can see in Fig. 16.

  • 22

    R8

    R1

    R3

    R2

    R4

    R9

    S

    R5

    D

    LSP from S to D

    Detour

    Fig. 16 – Local Protection: If link R1R2 fails traffic is detoured at the PLR (R1), returning to the LSP at

    the MP (R2)

    B. End-to end Protection

    End-to end protection consists on having a secondary LSP protecting a primary LSP (Fig. 17).

    This LSP is an end-to-end path that redirects traffic between two nodes that are typically a few hops

    apart in the network. This is usually done offline given that it is normally necessary that the involved

    nodes have full network knowledge, which may not be available to all routers (which may be in

    different Link State areas or just not running any Link State protocol at all).

    On the other hand, this scheme of protection offers advantages as well. Since the backup paths

    normally are callibrated by the network manager, it is possible to have better QoS guarantees.

    Another advantage is that no matter what link(s) fail in the primary LSP, it is always possible to protect

    it. Path signaling is usually done by RSVP.

  • 23

    LSP2

    secondary

    LSP1 primary

    R1

    R2 R3

    DS

    R4

    Fig. 17 – End-to-end protection

    Both in local and end-to-end protection, the protection paths must be computed and signalled

    before a failure happens. We have seen in MPLS that flag S indicated the end of a stack of labels. We

    stack an extra label on top of another usually to prevent the following router to access the inner label.

    This encapsulation mechanism forms a LSP tunnel; these tunnels can be used for protection. There

    are two methods for doing protection LSPs, which differ in the label with which the traffic arrives at the

    MP. This in turn influences the number of LSPs that can be protected by a single backup tunnel,

    yielding either N:1 (facility backup) or 1:1 (one-to-one backup).

    C. Facility Backup

    In facility backup, a single protection LSP is used to protect many primary LSPs. Traffic arrives

    over the backup tunnel with the same label as it would if it arrived over the failed link. The only

    difference from the point of view of forwarding is that, when arriving to the MP, it may come from

    different interfaces. Normally, traffic comes from the protected LSP, but in failure cases, traffic may

    travel through the protection tunnel. In both cases, traffic reaches the MP with the same label.

    To ensure that traffic arrives at the MP with the correct label, all that needs to be done is to

    tunnel it into the backup by pushing the backup tunnel label on top of the protected LSP label at the

    PLR (label stacking) and remove the first label on the stack at the penultimate node before reaching

    the MP (penultimate hop popping). The ability for several LSPs to share the same protection path is

    an important scaling property of facility backup.

  • 24

    C

    A

    D

    X Y

    Protected LSP from X to Y

    Protection

    tunnel for

    link A-B

    B101

    200

    102 100

    201 Pop

    Push 102 Swap 102→101 Swap 101→100 Pop 100

    Swap 201→200 Pop 200

    Fig. 18 – Setting up a facility backup

    Fig. 18 shows the setup of the backup tunnel before the failure and the forwarding state that is

    installed at every hop in the path. No new forwarding state is installed at the MP (B). At the PLR (A),

    the forwarding state must be set in place to push the label of the backup path (label 201 in the

    example). Any number of LSPs crossing link A–B can be protected by the backup shown in the figure.

    There is no extra forwarding state for each LSP protected either at the MP or at any of the routers in

    the path.

    The label that is advertised by the MP is an implicit null label this triggers penultimate hop

    popping to be performed for the backup tunnel. Thus, traffic arrives at the MP with the same label with

    which it would have arrived over the main LSP.

    C

    A

    D

    X YB

    Push 102 Swap 102→101 Swap 101→100 Pop 100

    Swap 201→200 Pop 200

    Push 201

    IP102

    IP101200

    IP1

    01

    20

    1

    IP1

    01

    IP100

    Fig. 19 – Forwarding traffic using the facility backup

    Fig. 19 depicts an example of forwarding IP traffic using facility backup. To better understand the

    process of recovering from a link failure, let us analyse each step of the fixing procedure:

  • 25

    Router X is not aware of the failure of link A-B, it pushes the attributed 102 label as usually;

    Router A has already detected that link A-B is down, it swaps label 102 to label to label 101,

    issued by router B, and pushes label 201 issued by router C to redirect the packet to the

    protection tunnel;

    Router C acts like if it was a normal LSP and swaps label 201 to label 200 issued by router D.

    Label 101 remains untouched;

    Router B advertised an implicit null label to router D. Therefore D knows that it needs to pop

    the last label for B, to know the origin of that packet. D then pops label 200 and sends the

    packet, with the untouched label 101, to router B.

    D. One-to-one Backup

    In One-to-one backup traffic arrives at the MP with a different label than the one used by the

    main (protected) LSP. Fig. 20 shows the setup of a one-to-one backup for the LSP from the previous

    example and Fig. 21 shows forwarding over the backup following a failure. Traffic arrives at the MP

    with label 300, the backup tunnel label and is forwarded using label 100, the protected LSP label.

    Thus, the MP must maintain the forwarding state that associates the backup tunnel label with the

    correct label of the protected LSP. If a second LSP were to be protected in this figure, a separate

    backup tunnel would be required for it, and a separate forwarding state would be installed at the MP.

    C

    A

    D

    X Y

    Protected LSP from X to Y

    Protection

    tunnel for

    link A-B

    B101

    200

    102 100

    302300

    Push 102 Swap 102→101 Swap 101→100 Pop 100

    Swap 302→301 Swap 301→300

    Swap 300→100

    Fig. 20 – Setting up a one-to-one backup

    Similar to facility backup, the forwarding state must be set up to map traffic from the protected

    LSP into the backup. For example, traffic arriving with label 102, the label of the protected LSP, is

    forwarded over the backup using label 302, the backup tunnel label. Note that, using this approach,

    the depth of the label stack does not increase when packets are forwarded over the backup path,

    because the top label is simply swapped to the backup tunnel label.

  • 26

    C

    A

    D

    X YB

    Push 102 Swap 102→302 Swap 300→100 Pop 100

    Swap 201→200 Pop 200

    IP102

    IP3

    00

    IP100

    IP3

    02

    IP300

    Fig. 21 – Forwarding traffic over a one-to-one backup

    Let us once again go through the link failure recovering steps, but now using one-to-one backup:

    Router X unaware of the failure pushes label 102 to an IP packet;

    Router A had already detected the failure in link A-B and swaps label 102 to label 302,

    previously advertised by router C;

    Router C and D act like if it was a normal LSP and remove the arriving label. After that they

    add the label advertised by the downstream router;

    Finally the packet reaches router B with a label that has been negotiated between B and D,

    instead of being negotiated between B and A.

    E. Failure Detection

    In order to accomplish the restoration goal of 50 ms, we need a way of quickly detecting a

    failure, which is out of FRR scope.

    Some transmission media provide hardware indications of connectivity loss. One example is

    SDH/SONET, which is, as we have seen in the first chapter, widely used in the network backbones.

    These technologies have the capability of detecting a failure in the physical layer within seconds.

    Some gigabit Ethernet routers have also that capability they use pulse detection to evaluate if the link

    is up.

    When failure detection is not provided in the hardware, this task can be accomplished by an

    entity at a higher layer in the network. As we have seen IGP protocols have a hello exchange

    mechanism to detect link failure. The original versions of OSPF and IS-IS had architectural time limits

    in the detecting link failure. The minimum time interval used to them was 3 seconds for OSPF and 1

    second for IS-IS. Recent versions of the protocols included a fast-hello mechanism that that can be

    used to do sub-second failure detection. Nevertheless it is still CPU consuming and causes some

    overhead.

  • 27

    Based on this realization, the BFD protocol was developed jointly by Juniper and Cisco, having

    rapidly gained acceptance. It is a simple hello protocol designed to do rapid failure detection. Its goal

    is to provide a low-overhead mechanism that can quickly detect faults in the bidirectional path

    between two forwarding engines, whether they are due to problems with the physical interfaces, with

    the forwarding engines themselves or with any other component.

  • 28

  • 29

    Chapter 3

    The Multicast Routing Problem

    1 The Multicast Routing Problem

  • 30

    3.1 From Unicast to Multicast

    If unicast transmissions were used to distribute TV streams, copies of the same video packets

    would pass through the links (Fig. 22). This would cause congestion.

    Packet

    Stream

    Fig. 22 – Packet distribution in unicast. A packet stream is generated for each receiving node.

    The problem of having copies of the same packets passing in a single link is solved with the

    introduction of multicast. Multicast [15] is the delivery of information to a group of destinations

    simultaneously in a transmission from a given source. In a multicast transmission links do not carry

    copies of the same packet streams, instead, packets are replicated only when they are needed in

    order to reach all the destinations (Fig. 23).

    Packet

    Stream

    Fig. 23 – Packet distribution in multicast.

    In this chapter we will talk about some novelties brought by multicast. The most common

  • 31

    concept of an IP address is in unicast addressing, available in both IPv4 and IPv6. It normally refers to

    a single sender or a single receiver, and can be used for both sending and receiving.

    A multicast address is associated with a group of interested receivers. In IPv4, addresses

    224.0.0.0 through 239.255.255.255 (the former Class D addresses) are designated as multicast

    addresses. IPv6 uses the address block with the prefix ff00::/8 for multicast applications. In either

    case, the sender sends a single datagram from its unicast address to the multicast group address and

    the intermediary routers take care of making copies and sending them to all receivers that have joined

    the corresponding multicast group.

    3.2 Protocol Independent Multicast (PIM)

    Routing had also changes in multicast. Protocol-Independent Multicast (PIM) [16] is a family of

    multicast routing protocols for networks that provide one-to-many and many-to-many distribution of

    data over the Internet. PIM routes multicast packets to multicast groups, and is designed to efficiently

    establish distribution out-trees. It is termed protocol-independent because PIM does not include its

    own topology discovery mechanism, much like RSVP and LDP. Instead it uses the unicast routing

    information supplied by other traditional routing protocols, such as IS-IS or OSPF, to perform the

    multicast forwarding. Since it relies only on the network layer routing protocols, PIM does not depend

    on the IP transport protocols, like TCP and UDP. Nevertheless IP datagrams are still used in data

    transmission. There are many PIM variants, the most popular are PIM Dense Mode (PIM-DM), PIM

    Sparse Mode (PIM-SM) and PIM Source-Specific Multicast (PIM-SSM).

    As far as PIM-DM is concerned, it builds source out-trees by flooding multicast traffic domain

    wide, then pruning back branches of the out-tree where no receivers are present. PIM-DM is

    straightforward to implement but generally has poor scaling properties and that is why we will not

    explore it in this dissertation, since it is not really used for IPTV purposes. We will then be focused on

    PIM-SM and PIM-SSM.

    A. PIM Sparse Mode (PIM-SM)

    In sparse-mode PIM, a router in a group is designated as a rendezvous point (RP). Its choice

    can be done statically or automatically. The RP collects information about multicast senders and

    makes that information available to potential receivers. The purpose of RP is to allow a receiver to find

    out the IP address of the source for a particular group.

    Multicast traffic flows from the sender back down the path created by the PIM messages. All

    received multicast traffic is checked to verify if the incoming multicast traffic is being received via the

    interface, on which PIM request was sent. This prevents loops and duplicate packets.

    Each router forms a neighbour relationship with adjacent PIM routers in a group using PIM

  • 32

    “hello” messages. When a router in a group wants to receive a multicast stream, it sends a PIM “join”

    message towards the RP. The source sends a PIM “register” message to the RP with multicast traffic

    already encapsulated on it, as we can see on Fig. 24.

    10

    10 10

    20

    A

    B

    RP192.168.0.1

    Multicast group address

    234.1.1.1

    PIM – “join

    234.1.1.1”

    PIM – “register

    234.1.1.1”

    Encapsulated

    multicast traffic

    Video source

    10.0.1.1

    C

    DPIM – “join

    234.1.1.1”R

    Fig. 24 – Receiver “join” and source “register” in PIM-SM

    The RP then sends a “join” message to the source. As soon as the message is received, the

    multicast traffic flows from the source to the receiver (Fig. 25).

    10

    10 10

    20

    A

    B

    RP192.168.0.1

    Multicast group address

    234.1.1.1

    PIM – “join

    234.1.1.1”

    Video source

    10.0.1.1

    C

    DR1

    Fig. 25 – Source “join” and traffic flow in PIM-SM

    When a router wants to stop receiving a multicast stream, it sends a PIM “prune” message

    towards the IP address of the multicast source.

    Multicast traffic initially travels from the sender to the receiver via the RP. Once the downstream

    router starts receiving the multicast traffic (and knows the senders IP), it is possible to build path

    directly back to the sender. This is triggered by the receiver that sends join messages directly to the

  • 33

    source. In the example traffic would then flow through links D-C and C-A, that are part of the shortest

    path. After this reconfiguration the path between D and A will be the shortest path (Fig. 26).

    10

    10 10

    15

    A

    B

    RP192.168.0.1

    Multicast group address

    234.1.1.1

    Video source

    10.0.1.1

    C

    DR1

    Fig. 26 – Shortest path reconfiguration in PIM-SM

    If another router wanted to join the multicast group, a path would be added, forming an out-tree.

    The added path would also be the shortest path between the source and the referred router. An out-

    tree formed by concatenation of the shortest paths between the root and the nodes in the out-tree is

    designated Shortest Path Tree (SPT) (Fig. 27).

    10

    10 10

    15

    A

    B

    RP192.168.0.1

    Multicast group address

    234.1.1.1

    Video source

    10.0.1.1

    C

    DR1

    R2

    10

    Fig. 27 – Shortest Path Tree in PIM-SM after reconfiguration

    If the receiver knew the sender’s address, the RP would not be necessary.

  • 34

    B. PIM Source Specific Multicast (PIM-SSM)

    PIM-SSM [17] requires the edge router to know the address of the multicast source for each

    group, avoiding the need of an RP. In PIM-SSM the edge router sends a PIM “join” message directly

    towards the sender using the unicast routing table. This “join” request will be forwarded along the

    shortest path (in terms of the IGP) to the multicast source until it reaches a router which is already

    aware of the multicast group.

    10

    1010

    20

    A

    B

    Video source 2

    192.168.1.2

    PIM – “join

    234.1.1.2”

    Group

    234.1.1.1

    Video source 1

    192.168.1.1

    Group

    234.1.1.2

    PIM – “join

    234.1.1.1”

    C

    D

    PIM – “join

    234.1.1.2”

    PIM – “join

    234.1.1.1”

    Fig. 28 – Source “join” in PIM-SSM. “Join” messages for different group are sent to its

    respective video source.

    10

    1010

    20

    A

    B

    Video source 2

    192.168.1.2

    Group

    234.1.1.1

    Video source 1

    192.168.1.1

    Group

    234.1.1.2

    C

    D

    Fig. 29 – Data flow in PIM-SSM. Video packets are sent to the reverse packet of the

    respective “join” message.

    Fig. 28 and Fig. 29 show PIM-SSM in operation, as we can see there is no shared out-tree

    sourced at a RP, as in PIM-SM, and there is also no need for any source discovering mechanism. This

    is an advantage of PIM-SSM over PIM-SM; the complexity of the multicast routing for PIM-SSM is

  • 35

    lower than PIM-SM and therefore easier to configure and maintain. PIM-SSM has also a more efficient

    network usage, since traffic is not routed temporarily via the RP and the most direct path from source

    to receiver is always used.

    Despite the wide use of PIM-SM in IPTV networks, the advantages of the fairly new PIM-SSM

    make it the one to be chosen in IP backbone, especially when the range of potential multicast group

    addresses is not much large, which is typically the case.

    3.3 Point-to-Multipoint LSPs (P2MP LSPs)

    The LSPs generated in MPLS for unicast are also different in the multicast case [18]. So far we

    have seen how MPLS is used to establish point-to-point LSPs, now we will see the tools that MPLS

    has to offer in order to handle multicast connections. Independently of the signalling protocol that is

    used, the forwarding mechanisms are similar.

    L2 L4

    L3

    PE2 PE3

    P1 B

    A

    L1

    L5

    L7

    L6

    Data

    Source

    PE1

    P2

    P3

    PE5

    PE4

    C

    Forwarding table on P1:

    IN

    Label

    IN

    Interface

    OUT

    Label

    OUT

    Interface

    L1 C L2 A

    L1 C L3 B

    Fig. 30 – P2MP LSPs example. The forwarding table of P1 shows the different labels using for

    the different interfaces.

    Fig. 30 shows an MPLS multicast scenario, as we can see these LSPs have only one ingress

    router but several egress routers. PE1, the ingress router, replicates the packets and sends them to

    the downstream nodes, always with different labels. The downstream routers act in a similar way; they

  • 36

    also replicate the packet and send the copies to the next downstream routers, again, always with

    different labels. These routers are called branch nodes. P3 is however a special case, since instead of

    replicating the packets, it just transmits them with a new MPLS label. This forwarding mechanism is

    quite useful, since it is far more efficient than using unicast LSPs.

    RSVP was not prepared to P2MP LSPs as well. However, RSVP got a traffic engineering

    (RSVP-TE) extension, allowing it to handle this new type of LSPs. Nevertheless it still uses routing

    information of layer 3 protocols; therefore it is still out of the RSVP ambit to construct the out-

    branchings.

    L2 L4

    L3

    PE2 PE3

    P1

    L1

    L5

    L6

    L7

    PE1

    P2

    P3

    PE5

    PE4

    L1

    Path messages

    Resv messages

    L5

    Fig. 31 – P2MP LSP setup

    Fig. 31 shows the setup of multicast LSPs using RSVP-TE. We can see that like in unicast,

    routers running RSVP-TE use Path messages to communicate downstream, and Resv messages to

    communicate upstream. In the point of view of MPLS’s control plan a P2MP LSP is just a sum of point-

    to-point LSPs.

    Path messages have an Explicit Route Object (ERO) which determines the route followed by the

    LSP. On the other hand, Resv messages incorporate, for each step, the label that will be used for

    forwarding in that step. In the case P2MP networks, each sub-LSP is signalled using its own Path and

    Resv messages. For this to happen both type of messages contain a new P2MP Session Object that

    identifies the sub-LSP. This mechanism grants the replication of the packets in forwarding.

    Let us see how this works in the example network shown in Fig. 31. The represented P2MP LSP

    has four egress routers, it is composed of four sub-LSPs, one from PE1 to PE2, another from PE1 to

    PE3, and so on. Because each sub-LSP has its own associated Path and Resv messages, on some

    links multiple Path and Resv messages are exchanged. For example, the link from PE1 to P1 has

  • 37

    Path messages corresponding to the sub-LSPs to PE2 and to PE3. Let us examine in more detail how

    the P2MP LSP in Fig. 31 is signalled; looking at sub-LSP PE1 to PE3 whose egress is PE3:

    A Path message is sent by PE1, the ingress router, containing the ERO {PE1, P1, P3, PE3}.

    This can contain a bandwidth reservation for the P2MP LSP if required.

    PE3 responds with a Resv message that contains the label value, L4, that P3 should use

    when forwarding packets to PE3.Similarly, the Resv message sent on by P3 to P1 contains

    the label value, L3, that P1 should use when forwarding packets to P3.

    In a similar way, for the sub-LSP whose egress is PE2, P1 receives a Resv message from

    PE2 containing the label value, L2, that P1 should use when forwarding packets to PE2. P1

    knows that the Resv messages from PE2 and P3 refer to the same P2MP LSP, as a

    consequence of the P2MP Session Object contained in each.

    P1 sends a separate Resv message to PE1 corresponding to each of the two sub-LSPs, but

    deliberately uses the same label value for each, L1, because the two sub-LSPs belong to the

    same P2MP LSP.

    P1 installs an entry in its forwarding table such that when a packet arrives with label L1, one

    copy is sent on the link to PE2 with label L2 and another copy on the link to P3 with label L3.

    This ensures that when the Resv messages are sent from P1 to PE1 corresponding to the two

    sub-LSPs, no double-counting occurs of the bandwidth reservation.

    PE1, knowing that the two Resv messages received from P1 refer to the same P2MP LSP, a

    consequence of the P2MP session object contained in each, forwards only one copy of each

    packet in the flow to P1, with the label value L1 that had been dictated by PE1 in those two

    Resv messages.

    3.4 Protocol Architecture

    In order to implement the custom out-branchings presented in chapter 4, we will present a set of

    protocols that enable custom designs, can cope with resource reservation and handle fast reroute

    restoration. This protocol scheme is based on [3], and uses PIM-SSM combined with P2MP LSPs.

  • 38

    Multicast coordination

    PIM-SSM

    Route Signaling

    RSVP-TE

    IGP routing

    IS-IS/ OSPF

    FRR support

    MPLS

    Routing

    info

    changes

    Link

    restoration

    Fig. 32 – Cross layer architecture

    PIM-SSM is used to coordinate the multicast groups and generate the multicast out-branchings.

    IGP routing is assured by either IS-IS or OSPF. P2MP LSPs are established using RSVP to enable

    resource reservation and FRR is used for rapid link restoration. Routes are established following the

    next steps:

    1. The network topology and its current link costs are fed to an out-branching calculation

    algorithm that will be presented in the next chapter. The routers can run the programmed

    algorithm since they are running an LS protocol which provides them the complete network

    topology. The out-branching calculation can also be done by the network operator;

    2. After finding the out-branching, we need to re-set the IGP link costs. The cost of the physical

    links in the main out-branching remains the same. The cost of the physical links that are not

    in the main out-branching are set to (Fig. 33) The new IGP costs are then downloaded to

    the routers;

    3. This way PIM-SSM will generate the same out-branchings that we have calculated;

    4. RSVP signals the out-branchings generated by PIM-SSM, making the proper resource

    reservations, forming a P2MP LSP.

    5. The backup paths for each link of out-branching, can be calculated by the network operator

    and downloaded to the routers. Alternatively, RSVP can setup them automatically using the

    IGP routing information. To do such, IGP costs need to be re-set again. The cost of the

    physical links of the main out-branching are set to and the costs of the other physical links

    are set to its original value. Doing such grants that the backup paths that are calculated do

    not share any link with main out-branching.

    6. If the backup paths were calculated using RSVP, the IGP costs need to be set to the values

    of step 2;

  • 39

    3.5 Failure Recovering Process

    In point 5.1 we have described a procedure to assure the setup of the pre calculated out-

    branchings. We have also presented a way of calculating and setting up protection paths. Fig. 33

    shows a network that has already been set.

    R2

    5

    1

    3

    4

    6

    3

    2

    R4

    S