UP U niversité de P oitiers - LIAS lab · Mots-Clés : systèmes embarqués temps réel,...

UP : Université de PoitiersLIAS : Laboratoire d'Informatique et d'Automatique pour les Systèmes

THESE

pour l'obtention du Grade de

DOCTEUR DE L'UNIVERSITE DE POITIERS

(Ecole Nationale Supérieure d'Ingénieurs de Poitiers)(Diplôme National � Arrêté du 7 août 2006)

Ecole Doctorale : Sciences et Ingénierie pour l'Information, MathématiquesSecteur de Recherche : Informatique et Applications

Présentée par :

Guillaume PHAVORIN

*****************************

Hard Real-Time Scheduling subjected to

Cache-Related Preemption Delays

��

Ordonnancement Temps Réel Dur avec prise en compte

des Délais de Préemption dus au Cache

*****************************

Directeur de Thèse : Pascal RICHARD

*****************************Soutenue le 23 septembre 2016devant la Commission d'Examen

*****************************JURY

Sanjoy BARUAH Full Professor, University of North Carolina (USA) Rapporteur

Liliana CUCU-GROSJEAN Chargée de Recherche, INRIA Paris-Rocquencourt RapporteurIsabelle PUAUT Professeur, Université de Rennes 1 Rapporteur

Joël GOOSSENS Professeur, Université libre de Bruxelles (Belgique) Examinateur

Claire MAIZA Maître de Conférences, Université Grenoble Alpes ExaminateurPascal RICHARD Professeur, Université de Poitiers Examinateur

Abstract

Nowadays, the trend in real-time embedded systems is to use commercial o�-the-shelf components,even for critical systems such as cars or airplanes. In particular, processors with cache memories areused as they allow to increase the average performances.For highly critical systems, embedded applications are subjected to very strict timing constraints. Real-time scheduling aims to ensure that no task in the system can be executed without missing a deadline.But, because of the use of cache memories, additional delays, known as Cache-Related PreemptionDelays (crpds), might occur as soon as multiple tasks can run on the same processor. Those crpdsmay cause a task to miss its deadline and so jeopardize the system integrity.Di�erent strategies have been proposed in the literature to deal with crpd issues, from memory man-agement techniques to limited-preemption scheduling approaches. Most of the existing work focuseson either reducing the crpds or improving the system predictability by bounding the crpds. But notmuch has been done concerning the problem of optimally scheduling hard real-time tasks on unipro-cessor systems with cache memories.This PhD work focuses on the general problem of taking scheduling decisions while accounting forcache e�ects. We identify two di�erent scheduling problems dealing with cache issues and study theircomputational complexity. Both the Cache-aware scheduling problem and the crpd-aware schedulingproblem are proved to be NP-hard in the strong sense. Then, we study the impact of crpds on thebehaviour of classic online scheduling policies such as Rate Monotonic (rm) and Earliest DeadlineFirst (edf). In particular, we show that neither rm nor edf is sustainable when crpds are accountedfor. Moreover, we prove that optimal online scheduling is impossible for sporadic tasks subjected tocrpds. So, we propose an o�ine scheduling approach to optimally solve the crpd-aware schedulingproblem, using mathematical programming.

Keywords: real-time embedded systems, hard real-time scheduling, uniprocessor, cache, Cache-Related Preemption Delays, online scheduling, o�ine scheduling, linear programming, computationalcomplexity, sustainability, Rate Monotonic (rm), Deadline Monotonic (dm), Earliest Deadline First(edf), optimal scheduling

3

Résumé

La tendance actuelle est à l'utilisation de composants dits sur étagère dans les systèmes embarquéstemps réel, y compris dans des systèmes critiques tels que des voitures ou des avions. Des proces-seurs avec mémoires cache sont notamment employés car ils permettent d'améliorer les performancesmoyennes.Les applications embarquées dans des systèmes hautement critiques sont soumises à des contraintestemporelles très strictes. L'ordonnancement temps réel vise à garantir qu'aucune tâche dans le systèmene violera son échéance. Mais, dès lors que plusieurs tâches peuvent s'exécuter sur un même processeur,l'utilisation de mémoires cache peut entraîner l'apparition de délais supplémentaires, appelés Délaisde Préemption dus au Cache (en anglais Cache-Related Preemption Delays, crpds). A cause de cesdélais, une tâche peut être amenée à râter son échéance, compromettant ainsi l'intégrité du système.Di�érentes statégies, allant de techniques de gestion de la mémoire à des politiques d'ordonnancementlimitant le nombre de préemptions, ont été proposées dans la littérature pour traiter les problèmesliés au cache. La plupart des travaux existants visent soit à réduire les délais dus au cache soit à lesborner a�n d'améliorer la prédictibilité du système. Mais peu de travaux de recherche se sont attaquésdirectement au problème d'ordonnancer optimalement des tâches temps réel à contraintes temporellesdures sur un système monoprocesseur comportant des mémoires cache.Cette thèse s'intéresse au problème général consistant à prendre des décisions d'ordonnancement tenantcompte des e�ets du cache. Nous identi�ons deux problèmes di�érents d'ordonnancement avec mémoirescache et nous étudions leur complexité. Nous prouvons que le problème d'ordonnancement considérantdirectement le cache et le problème d'ordonnancement considérant seulement les délais de préemptiondus au cache sont NP-di�ciles au sens fort. Par la suite, nous étudions l'impact des délais dus aucache sur des politiques d'ordonnancement en-ligne utilisées classiquement, telles que Rate Monotonic(rm) et Earliest Deadline First (edf). Nous montrons, en particulier, que ces deux politiques nesont pas viables dès lors que nous considérons des délais de préemption dus au cache. De plus, nousprouvons qu'il est impossible d'ordonnancer en ligne de façon optimale des tâches sporadiques sujettesà des délais de préemption dus au cache. Nous proposons donc une approche hors-ligne, utilisant laprogrammation linéaire, pour résoudre optimalement le problème d'ordonnancement considérant lesdélais de préemption dus au cache.

Mots-Clés : systèmes embarqués temps réel, ordonnancement temps réel dur, monoprocesseur, cache,Délais de Préemption dus au Cache, ordonnancement en-ligne, ordonnancement hors-ligne, programma-tion linéaire, complexité, viabilité, Rate Monotonic (rm), Deadline Monotonic (dm), Earliest DeadlineFirst (edf), ordonnancement optimal

5

Résumé détaillé

Introduction

Motivation

Les systèmes embarqués temps réels, qui sont des systèmes électroniques et informatiques conçus poure�ectuer une fonction spéci�que [Gro07, Nel12], sont désormais présents dans tous les aspects de notrevie quotidienne. De nos jours, 99% des processeurs fabriqués chaque année dans le monde sont destinésau marché des systèmes embarqués [BW01]. Les applications temps réels peuvent, en particulier, seretrouver dans des systèmes aussi critiques que des centrales nucléaires, des voitures, des trains, dessatellites, des fusées ou encore des avions, qu'ils soient à usage civil ou militaire.

Les secteurs aéronautique et spatial utilisent, notamment, des systèmes particulièrement critiques : unedéfaillance du système peut résulter en des pertes humaines et �nancières extrêmement importantes.A�n d'éviter (autant que possible) de telles défaillances, beaucoup d'e�orts portent sur la sûreté defonctionnement, via notamment l'introduction de standards tels que ARINC 653 1 et de normes tellesque la DO-178B 2. Ce besoin de surêté concerne aussi bien la structure du système, comme l'ailed'un avion qui ne doit pas se briser à cause des forces aérodynamiques s'exerçant au cours du vol,que le logiciel embarqué. Le code informatique utilisé ne doit, bien sûr, pas comporter d'erreurs.L'application doit, notamment, être fonctionnellement correcte, c'est-à-dire répondre au cahier descharges : le premier vol d'Ariane 5 explosa car le système de guidage n'était pas programmé pour pouvoirtraiter les fortes accélérations subies par le lanceur 3. Mais comme une application informatique tempsréel est composée de plusieurs tâches soumises à des contraintes temporelles strictes, il faut égalements'assurer que toutes ces tâches pourront s'exécuter de façon concurrente sans qu'aucune échéance nesoit violée. Par exemple, le module d'atterrissage de la mission Mars Path�nder connut plusieursproblèmes car une tâche ne fut pas en mesure de s'exécuter suite à une mauvaise gestion des accès à

1http://www.aviation-ia.com/cf/store/catalog.cfm?prod_group_id=1&category_group_id=32http://www.rtca.org/store_product.asp?prodid=5813http://www.esa.int/esapub/bulletin/bullet89/dalma89.htm

7

http://www.aviation-ia.com/cf/store/catalog.cfm?prod_group_id=1&category_group_id=3

http://www.rtca.org/store_product.asp?prodid=581

http://www.esa.int/esapub/bulletin/bullet89/dalma89.htm

Résumé détaillé

une ressource partagée, résultant en une inversion de priorités [Wil97]. Un système monoprocesseur nepouvant exécuter qu'une seule instruction à la fois, le temps processeur disponible doit être partagéentre les di�érentes tâches. L'ordonnancement permet de déterminer dans quel ordre exécuter lesdi�érentes tâches a�n de s'assurer que toutes les contraintes temporelles, et en particulier les échéancesdes tâches, soient respectées. Dans le cas de tâches concurrentes, des méthodes sont donc nécessairesa�n d'ordonnancer correctement ces tâches. Mais, il faut également pouvoir analyser le système, avantsa mise en service, a�n de s'assurer qu'aucune anomalie d'ordonnancement ne se produira lors de sonexécution.De nombreux travaux de recherche ont été conduits dans le domaine de l'ordonnancement temps réel,depuis les travaux fondateurs de Liu and Layland [LL73] jusqu'à l'étude plus récente des systèmesmultiprocesseurs [Bar07, FLS11] ainsi que la prise en compte de modèles de tâches plus complexes (enparticulier de tâches pouvant s'exécuter en parallèle [KI09, GR16] ou dont une partie du code peuts'exécuter en parallèle [MBB+15]).

Dans l'industrie aéronautique, la tendance actuelle est à l'utilisation croissante de systèmes électro-niques, allant du pilote automatique au contrôle de la pression d'air en cabine. En particulier, de plusen plus de commandes sont désormais électriques. Ainsi, depuis l'introduction de l'A320 d'Airbus, lescommandes de vol manuelles sont désormais systématiquement remplacées par des commandes élec-triques [LEJ15]. Cette tendance a atteint un sommet avec le nouvel A380 dont même les commandes desecours sont maintenant électriques 4. Par conséquent, les avions actuels ont besoin de plus en plus depuissance de calcul. A�n de limiter le nombre de calculateurs embarqués, des systèmes informatiquesplus évolués doivent être utilisés, et en particulier des processeurs plus puissants.

Comme la part des systèmes informatiques embarqués dans le système global est de plus en plus impor-tante, une autre tendance actuelle est à l'utilisation de composants dits sur étagère (en anglais Com-ponents O�-The-Shelf, cots), c'est-à-dire de composants développés et déjà largement utilisés pourd'autres applications embarquées, et ce a�n de réduire les coûts de développement [Adm01, Adm04].De tels composants utilisent notamment des éléments matériels destinés à améliorer les performancesmoyennes du système et qui sont souvent déjà présents depuis longtemps sur les ordinateurs personnels.Ainsi, presque tous les processeurs utilisent aujourd'hui des pipelines, des prédicteurs de branchementainsi que plusieurs niveaux de mémoires cache. Un pipeline améliore les performances du processeur enétageant les étapes successives (fetch, decode, execution, memory access, register write back) qui sontnécessaires à l'exécution des di�érentes instructions d'un programme. Chacune de ces étapes utilise, ene�et, un système électronique distinct. La prédiction de branchement, quant à elle, essaie de deviner laprochaine instruction à devoir être chargée (dans le cas, par exemple, où une instruction conditionnelleest rencontrée dans le code d'une tâche), ce qui accélère en moyenne l'exécution des programmes. Lespipelines et les prédicteurs de branchement permettent ainsi d'améliorer les performances des systèmesinformatiques. A cela s'est ajoutée l'augmentation de la fréquence de fonctionnement des processeursau cours de ces dernières années. Par conséquent, le fossé entre la vitesse du processeur et le tempsd'accès à la mémoire centrale s'est creusé de façon exponentielle. Pour combler quelque peu cet écart,les mémoires cache ont été introduites. Un cache est une mémoire dont le temps d'accès est relative-ment faible et qui est placé entre les registres du processeur et la mémoire centrale. Si les caches sont

4http://www.fzt.haw-hamburg.de/pers/Scholz/dglr/hh/text_2007_09_27_A380_Flight_Controls.pdf

8

http://www.fzt.haw-hamburg.de/pers/Scholz/dglr/hh/text_2007_09_27_A380_Flight_Controls.pdf

plus rapides que la mémoire centrale, ils sont en revanche bien plus chers. L'idée est de stocker dans lecache les instructions et/ou les données fréquemment utilisées (à cause, par exemple, de boucles dansle code des tâches), a�n de réduire les temps de chargement ultérieurs.

Les mémoires cache, de même que les pipelines et les prédicteurs de branchement, permettent d'amélio-rer les performances moyennes du système. Mais pour garantir la sureté de fonctionnement, le compor-tement pire cas doit être considéré lors de l'analyse d'une application critique. Lorsqu'une instructionest exécutée par un processeur avec une mémoire cache, son temps d'exécution dépend de la présenceou non de cette instruction dans le cache. Charger une instruction depuis la mémoire centrale plutôtque depuis le cache peut, en e�et, être jusqu'à 10 fois plus coûteux en temps [Lev09]. Quand plusieurstâches s'exécutent sur un même processeur, elles partagent également le même cache. Ainsi, certainesde ces tâches peuvent être amenées à accéder au mêmes emplacements dans le cache. Considéronsen particulier l'exemple d'une tâche préemptée par une autre tâche. Lors de son exécution, la tâchepréemptante peut charger des instructions et/ou des données dans des emplacements du cache déjàutilisés par la tâche préemptée. Les instructions/données de la tâche préemptée sont alors écrasées.Ainsi, lorsque la tâche préemptée reprend son exécution, elle peut avoir besoin de recharger des ins-tructions/données depuis la mémoire centrale plutôt que d'y accéder directement depuis le cache. Celainduit des délais de rechargement connus sous le nom de délais de préemption dus au cache (en anglaisCache-Related Preemption Delays, crpds). Ces délais peuvent représenter jusqu'à 40% du pire tempsd'exécution d'une tâche (en anglais Worst-Case Execution Time, wcet), comme montré dans [PC07].Ainsi, le comportement du cache doit être étudié précisément a�n de borner ces délais pour pouvoirles prendre en compte dans l'ordonnancement. Mais cela n'est pas chose facile. Il existe en e�et unedépendance circulaire entre durées d'exécution des tâches et délais de préemption dus au cache : lescrpds allongent les durées d'exécution des tâches, ce qui en retour provoque potentiellement une aug-mentation du nombre de préemptions lorsque les tâches s'exécutent et donc une augmentation descrpds...

L'utilisation de mémoires cache dans les systèmes embarqués temps réel rend le problème d'ordonnan-cement encore plus complexe. Il faut donc étudier davantage ce problème a�n de pouvoir proposer denouvelles stratégies d'ordonnancement minimisant les e�ets du cache. De nouvelles analyses d'ordon-nançabilité prenant en compte les délais de préemption dus au cache sont également nécessaires a�nd'assurer la prédictibilité du système.

Objectifs de la thèse et aperçu des solutions

Dans le cadre de ces travaux de thèse, l'objectif principal poursuivi est l'étude de l'ordonnancementtemps réel dur avec prise en compte des délais de préemption dus au cache. Ce but étant très général,nous nous sommes focalisés en particulier sur les points suivants :

1. formaliser le problème d'ordonnancement de tâches temps réel à contraintes temporelles duressur des systèmes monoprocesseurs comportant des mémoires cache,

2. étudier théoriquement ce problème en considérant, en particulier, la complexité d'ordonnancerdes tâches de façon optimale en prenant en compte les e�ets du cache,

9

Résumé détaillé

3. étudier l'impact de l'interférence due au cache (mesurée par le biais des délais de préemption dusau cache) sur des politiques classiques d'ordonnancement en-ligne, en particulier Rate Monotonicet Earliest Deadline First,

4. proposer des solutions au problème d'ordonnancement temps réel dur sur des systèmes monopro-cesseurs comportant des mémoires cache.

Pour atteindre ces di�érent objectifs, nous nous attelons d'abord à dé�nir deux problèmes d'ordonnan-cement distincts. Dans le cas du problème d'ordonnancement avec cache, l'état du cache (c'est-à-direson contenu) est connu à chaque instant, de même que tous les accès mémoire e�ectués par chaquetâche au cours de son exécution. Ainsi, l'impact de chaque décision d'ordonnancement peut être ana-lysé précisément. Dans le cas du problème d'ordonnancement avec crpds, l'interférence due au cacheest mesurée par le biais de bornes supérieures sur les délais de préemption dus au cache.Nous étudions par la suite la complexité algorithmique de ces deux problèmes. Nous prouvons qu'ilssont tous les deux NP-di�ciles au sens fort. Ainsi, aucun de ces deux problèmes ne peut être résolude façon optimale en utilisant un algorithme d'ordonnancement s'exécutant en temps polynomial oupseudo-polynomial.Dans un troisième temps, nous étudions l'impact des délais de préemption dus au cache sur plusieurspolitiques d'ordonnancement en-ligne. Nous nous intéressons en particulier à Rate Monotonic et EarliestDeadline First, qui sont les algorithmes d'ordonnancement très couramment utilisés dans la littératuretemps réel. Nous montrons, tout d'abord, que des anomalies d'ordonnancement peuvent se produirelorsque des délais de préemption dus au cache sont considérés. Ainsi, ni Rate Monotonic ni EarliestDeadline First ne sont viables dès lors que les crpds sont pris en compte. En�n, nous prouvonsqu'aucun algorithme d'ordonnancement en-ligne ne peut être optimal lorsque les délais de préemptiondus au cache sont considérés.Puisqu'aucun algorithme d'ordonnancement en-ligne ne peut résoudre de manière optimale le problèmed'ordonnancement avec crpds, nous proposons, dans un quatrième temps, deux approches hors-lignese basant sur la programmation mathématique linéaire. Nous montrons en particulier que la secondeapproche permet de résoudre de façon optimale le problème d'ordonnancement avec crpds, lorsqu'unmodèle simple de tâche est utilisé.Finalement, nous menons plusieurs séries d'expérimentations a�n de mesurer la perte d'ordonnançabi-lité de Rate Monotonic, d'Earliest Deadline First et de notre approche hors-ligne optimale lorsque lesdélais de préemption dus au cache sont considérés.

Plan de thèse

Ce manuscrit de thèse s'organise en deux grandes parties. Nous introduisons tout d'abord plusieursconcepts en lien avec le temps réel, l'ordonnancement et les mémoires cache et nous présentons égale-ment, dans les grandes lignes, les principaux travaux de recherche portant sur ces problèmes de cache.Dans un second temps, nous détaillons nos di�érentes contributions.

Ce document s'ouvre par un bref aperçu de l'état de l'art actuel (Partie I).Dans le Chapitre 1, nous commençons par introduire quelques concepts généraux en lien avec lessystèmes embarqués temps réel. Nous décrivons également brièvement le fonctionnement d'un cache.

10

Puis, dans le Chapitre 2, nous présentons les principales dé�nitions et résultats connus en ordonnan-cement temps réel qui nous seront utiles par la suite. En particulier, nous expliquons succinctement lesprincipes de l'ordonnancement et des analyses d'ordonnançabilité. Dans la dernière partie de ce cha-pitre, nous mettons en relation ordonnancement et mémoires cache et montrons quelques conséquencesde l'utilisation des caches dans un système temps réel.Finalement, dans le Chapitre 3, nous présentons les principaux travaux de recherche s'intéressant àl'ordonnancement temps réel et aux mémoires cache. Nous proposons en particulier une classi�cation deces di�érents travaux. Cela nous permet de mettre en lumière certains aspects qui ont peu été étudiés,et notamment le problème d'ordonnancement optimal avec prise en compte des délais de préemptiondus au cache.

Une fois ces bases nécessaires posées, nous passons à la présentation des di�érentes contributionsproposées au cours de ces trois années de thèse (Partie II).Dans le Chapitre 4, nous commençons par formaliser le problème d'ordonnancement sur un systèmemonoprocesseur comportant une mémoire cache. Nous identi�ons en particulier deux problèmes d'or-donnancement distincts, que nous nommons respectivement problème d'ordonnancement avec cache etproblème d'ordonnancement avec crpds. Puis, nous étudions la complexité algorithmique de ces deuxproblèmes. Nous concluons ce chapitre par une brève discussion sur le problème d'ordonnancementavec crpds que nous allons considérer dans la suite de ces travaux de thèse.Dans le Chapitre 5, nous nous plaçons dans le cadre de l'ordonnancement temps réel en-ligne etproposons deux contributions. Tout d'abord, nous étudions plusieurs politiques classiques d'ordon-nancement en-ligne (principalement Rate Monotonic et Earliest Deadline First) lorsque les délais depréemption dus au cache sont considérés. Nous montrons que plusieurs anomalies temporelles peuventalors se produire. Puis, dans un deuxième temps, nous étudions la question plus générale consistant àordonnancer en-ligne et de façon optimale des tâches temps réel en considérant les délais de préemp-tion dus au cache. Malheureusement, nous prouvons qu'aucun algorithme en-ligne ne peut résoudreoptimalement ce problème.Puisque l'ordonnancement optimal en-ligne est impossible, nous nous plaçons, dans le Chapitre 6,dans le cadre de l'ordonnancement temps réel hors-ligne prenant en compte les délais de préemptiondus au cache. Nous proposons alors deux approches pour construire un ordonnancement hors-ligne, enutilisant l'optimisation linéaire. Nous montrons, en particulier, que la seconde approche proposée estoptimale pour le problème d'ordonnancement avec crpds, dès lors que certaines conditions portantsur le modèle de tâche sont respectées.En�n, dans le Chapitre 7, nous étudions l'impact des délais de préemption dus au cache sur l'or-donnançabilité des systèmes. Le but des expérimentations menées dans ce chapitre est double. D'uncôté, nous cherchons à évaluer la perte d'ordonnançabilité de politiques classiques d'ordonnancementen-ligne (comme Rate Monotonic et Earliest Deadline First) lorsque les délais de préemptions dus aucache sont pris en compte. D'un autre côté, nous cherchons à comparer les performances de notre so-lution hors-ligne avec Rate Monotonic et Earliest Deadline First. Deux séries d'expérimentations sontréalisées en utilisant deux di�érents modèles pour le paramètre correspond au délai de préemption dûau cache.

Nous concluons �nalement ce manuscrit de thèse, et proposons plusieurs perspectives pour de futurs

11

Résumé détaillé

travaux (Partie III).

Etat de l'art

Nous présentons tout d'abord, dans les grandes lignes, l'état de l'art existant en lien avec l'ordonnan-cement temps réel sur des systèmes monoprocesseurs comportant des mémoires cache.Les travaux présents dans la littérature se concentrent principalement sur deux problèmes. Le premierest d'assurer la prédictibilité du système. Pour cela, il est nécessaire de pouvoir borner les délaisde préemption dus au cache ou alors de les éliminer complètement. Mais ces délais ayant aussi unimpact sur l'ordonnançabilité des systèmes (puisqu'ils augmentent l'utilisation du temps processeur),ce problème est également étudié.

Prise en compte des délais de préemption dus au cache

A�n d'assurer la prédictibilité du système, les délais de préemption dus au cache doivent pouvoir êtrepris en compte lorsque l'ordonnançabilité du système est étudiée.

Une première façon de procéder est d'augmenter les durées d'exécution pire cas des tâches a�n de tenircompte de tous les délais de préemption pouvant se produire lors de l'exécution du système. Ainsi, dupoint de vue de l'ordonnancement, les préemptions n'induisent plus de délais supplémentaires, ce quipermet de réutiliser les résultats classiques présents dans la littérature. La solution la plus simple pourévaluer le crpd pour chaque tâche est de considérer un défaut de cache pour chaque accès mémoire.Mais cette méthode est, bien sûr, très pessimiste. Des approches plus précises ont ainsi été proposées,par exemple dans [Sch00, AB11, WTA14].Un des problèmes principaux reste cependant l'évaluation du nombre maximal de préemptions qu'unetâche peut subir au cours de son exécution. Ce nombre est en e�et dépendant de la politique d'ordon-nancement adoptée et des autres tâches composant le système. En�n, inclure les délais de préemptiondus au cache dans les durées d'exécution des tâches résulte souvent en des surestimations très impor-tantes, ce qui en retour mène à un gaspillage des ressources matérielles. Ces dernières doivent, en e�et,être surdimensionnées pour pouvoir assurer le bon fonctionnement du système dans le pire cas.

Une autre solution proposée dans la littérature est donc de considérer les crpds au niveau de l'analysed'ordonnançabilité. Le problème principal est alors de borner les délais de préemption dus au cache.Pour ce faire, l'utilisation du cache par les di�érentes tâches est mesurée a�n d'évaluer l'interférencepouvant résulter d'une exécution concurrente. D'un côté, le nombre de blocs mémoire que la tâche peutstocker dans le cache au cours de son utilisation (en anglais Evicting Cache Blocks, ecbs), et qui sontdonc susceptibles de remplacer des blocs mémoire utilisés par d'autres tâches, est mesuré. D'un autrecôté, le nombre de blocs mémoire susceptibles d'être réutilisés par une tâche au cours de son exécution(en anglais Useful Cache Blocks, ucbs), et dont une éviction du cache par une autre tâche nécessiterale rechargement ultérieur depuis la mémoire centrale, est également calculé.A partir de ces ecbs et ucbs, di�érentes approches ont été proposées dans la littérature pour bornerles délais de préemption dus au cache. Ces méthodes considèrent soit l'impact de la tâche préemptante(comme pour l'approche ecb-only introduite dans [BMSMOC+96]), soit l'impact de la tâche préemptée

12

(comme pour l'approche ucb-only proposée dans [LHS+98]), soit les impacts combinés de la tâchepréemptée et des possibles tâches préemptantes (comme pour les approches ecb-union, ucb-union,ecb-union Multiset et ucb-union Multiset présentées dans [ADM12]). En particulier, ces dernièresméthodes permettent d'obtenir des bornes plus précises du délai de préemption dû au cache.

Techniques de gestion de la mémoire

Les approches présentées précédemment visent seulement à assurer la prédictibilité du système enprenant en compte les crpds. Mais aucun changement n'est apporté au niveau de la politique d'ordon-nancement ou de la gestion de la mémoire. Cependant, comme les délais de préemption augmententl'utilisation du temps processeur, ils peuvent nuire à l'ordonnançabilité du système. Ceci est d'autantplus vrai que ce sont en fait des bornes supérieures du crpd, potentiellement très pessimistes, qui sontconsidérées, pouvant ainsi mener à un gaspillage important des ressources matérielles. Une solutionpour résoudre ce problème est d'essayer d'éliminer, ou du moins de réduire, ces délais de préemption, enjouant sur la gestion du cache ou de la mémoire. Pour ce faire, di�érentes approches ont été proposéesdans la littérature temps réel.

Le cache peut être divisé en plusieurs partitions. Chaque tâche n'a ensuite accès qu'à une seule partitionce qui réduit les con�its dans le cache entre les di�érentes tâches du système. Lorsqu'un partitionnementtotal est utilisé, il y autant de partitions dans le cache que de tâches [Wol94, Mue95, PLM09]. Ainsi, lescrpds sont complètement éliminés puisque chaque tâche est isolée des autres dans le cache. En contre-partie, chaque tâche a accès à une portion plus réduite du cache ce qui, en retour, peut provoquerune augmentation de sa durée d'exécution pire cas et donc compromettre également l'ordonnançabilitédu système. A�n de trouver un compromis, un ordonnancement hybride peut être employé. Certainestâches sont alors amenées à partager une partition commune dans le cache [BMGGW00, TM05].

D'autres travaux de recherche recourent à des techniques de verrouillage du cache (en anglais cache lo-cking). Certains blocs mémoires sont protégés dans le cache a�n d'éviter qu'ils puissent être remplacéspar d'autres blocs, chargés ultérieurement par une autre tâche. Cela permet notamment d'avoir uneconnaissance précise du contenu du cache à chaque instant et ainsi de garantir plus facilement la prédic-tibilité du système. Ce verrouillage peut être statique, c'est-à-dire que les blocs verrouillés dans le cachene changent pas durant toute l'exécution du système, ou au contraire dynamique. De plus, comme dansle cas du partitionnement, le cache peut être entièrement verrouillé [CIBM01, FPT07, PP07, LLX12](ce qui peut induire une augmentation des durées d'exécution pire cas des tâches) ou bien seulementpartiellement [DLM13, DLM14]. Une partie du cache est alors laissée disponible pour toutes les tâchesdu système.

En�n, des techniques de positionnement des blocs en mémoire centrale peuvent être utilisées. Pourchaque tâche, l'emplacement de certaines sections de code peut être modi�é a�n de réduire ensuitel'interférence intra-tâche dans le cache [TY97, FK11]. Le positionnement global du code de chaque tâchepeut également être changé dans la mémoire a�n de réduire, par la suite, les con�its entre les tâchesdans le cache (et donc les délais de préemption dus au cache), comme proposé dans [GA07, LAD12].

13

Résumé détaillé

Politiques d'ordonnancement améliorées

L'interférence dans le cache, due aux di�érentes tâches, peut aussi être réduite en travaillant au niveaude l'ordonnancement a�n d'améliorer l'ordonnançabilité du système. Deux approches peuvent alorsêtre adoptées :

� réduire (ou contrôler) le nombre de préemptions par le biais de modi�cations apportées à unepolitique d'ordonnancement existante a�n de réduire le délai de préemption global,

� dé�nir des politiques d'ordonnancement prenant leurs décisions en considérant explicitement leurimpact sur le cache.

La première approche conduit à modi�er des algorithmes existants, tandis que la seconde s'intéresse auproblème plus général de trouver une politique d'ordonnancement optimale lorsque les e�ets du cachesont pris en compte.

Une première solution pour réduire le nombre de préemptions est d'utiliser des seuils de préemption.Une tâche donnée ne peut être préemptée que par une tâche plus prioritaire, mais dont la priorité estégalement supérieure au seuil de préemption de la tâche moins prioritaire. L'assignation des seuils depréemption peut s'e�ectuer en ayant pour objectif la minimisation des délais de préemption dus aucache [BAVH+14].Un modèle de tâche utilisant une région non préemptive �ottante peut également être employé. Chaquefois qu'une tâche veut préempter une tâche moins prioritaire, elle doit attendre la �n de l'exécution decette région non préemptive. Les crpds peuvent être inclus dans les durées d'exécution pire cas destâches, par exemple en utilisant une fonction représentant l'évolution du délai de préemption en fonctionde la progression de l'exécution du programme [MNPP12a]. Cependant, la région non préemptant étant�ottante, il est di�cile de considérer les délais de préemption dus au cache a�n de décider de préempterou non une tâche.Un autre modèle se basant sur des régions non préemptives utilise des points de préemption �xesdans le code de chaque tâche. Une tâche ne peut donc pas être préemptée lorsqu'elle exécute du codesitué entre deux points de préemption consécutifs. Di�érents algorithmes de placement des points depréemption ont été proposés a�n de minimiser le délai de préemption total dû au cache [BBM+10,BXM+11, PFB14, CTF15].

Cependant, aucune des méthodes présentées précédemment n'est optimale, en ce sens où certainssystèmes faisables peuvent ne pas être ordonnançables avec ces méthodes. Ainsi, le problème plusgénéral consistant à prendre des décisions d'ordonnancement en considérant le cache doit être considéré.Cependant, à notre connaissance, cette question a peu été abordée dans la littérature. C'est pourquoielle constitue l'un des axes de recherche principaux de nos travaux de thèse.

Contributions

Nous présentons maintenant succinctement les principales contributions proposées au cours de cestravaux de thèse.

14

Ordonnancement temps réel avec mémoires cache

Nous considérons donc, dans cette thèse, le problème consistant à ordonnancer des tâches temps réelindépendantes à contraintes temporelles dures sur un système monoprocesseur comportant une mémoirecache. Comme expliqué précédemment, les e�ets du cache doivent en e�et être considérés a�n depouvoir assurer la prédictibilité du système. Une solution est de borner ces e�ets pour pouvoir ensuiteles prendre en compte dans l'analyse d'ordonnançabilité. Mais les délais additionnels, dus aux défautsde cache, peuvent également compromettre l'ordonnançabilité du système. Pour résoudre ce problème,les di�érentes décisions d'ordonnancement doivent être prises en considérant explicitement le cache.Les ordonnanceurs classiques décident d'ordonnancer, à un instant donné, l'instance courante d'unetâche prête en se basant sur la priorité de cette instance. Par exemple, Rate Monotonic considère lapériode des tâches, tandis qu'Earliest Deadline First préfère se baser sur les échéances absolues desinstances de tâches. Mais aucune de ces deux politiques ne considère l'impact du cache, ce qui peutprovoquer l'apparition de délais additionnels entraînant le non respect d'une échéance. Il est doncnécessaire d'introduire des paramètres supplémentaires à même de pouvoir capturer le comportementdu cache ainsi que son impact sur l'ordonnancement.

Problème d'ordonnancement avec cache

Une première approche pour prendre en compte le cache lors de l'ordonnancement consiste à considérerprécisément les di�érents accès au cache qui sont réalisés lors de l'exécution du système. Le contenu ducache doit donc être connu à chaque instant. De plus, il est nécessaire de connaître les accès mémoirede chaque tâche au cours de son exécution. En utilisant ces connaissances, en plus des paramètrestemporels classiques permettant de mesurer la priorité d'une instance de tâche (période, échéance...),l'ordonnanceur peut alors décider quelle instance de tâche exécuter à un instant donné a�n de favori-ser la réutilisation d'instructions/données déjà présentes dans le cache, ou alors, de façon équivalente,de minimiser les évictions dans le cache. Maximiser la réutilisation du cache paraît particulièrementapproprié dans le cas de tâches utilisant des librairies partagées (instructions communes à plusieurstâches) ou bien de tâches travaillant sur des données partagées (par exemple dans le cas de calculs surdes matrices). Cette approche correspond au problème d'ordonnancement avec cache.

A�n d'étudier la complexité de ce problème, nous dé�nissons un problème de base s'appuyant surplusieurs simpli�cations. Nous considérons notamment un cache composé d'une seule ligne (c'est-à-dire d'un seul emplacement où stocker une instruction/donnée). De plus, nous utilisons un modèle detâche dans lequel chaque instance de tâche a un unique chemin d'exécution possible (et donc une uniqueséquence d'accès mémoire). Cela nous permet de dé�nir un problème d'ordonnancement de base quisoit le plus simple possible, tout en couvrant tous les types de cache (direct-mapped, set-associative oufull-associative). De cette façon, si le problème de base est complexe à résoudre, alors, nécessairement,toutes les généralisations de ce problème le seront aussi.Nous prouvons d'abord que le problème d'ordonnancement préemptif avec cache est NP-di�cile ausens fort. Cela signi�e que ce problème ne peut être résolu optimalement par un algorithme s'exé-cutant en temps polynomial ou pseudo-polynomial. Ensuite, nous nous intéressons au problème d'or-donnancement non-préemptif avec cache. Dans le cas où aucune mémoire cache n'est considérée, leproblème d'ordonnancement non-préemptif a déjà été longuement étudié et a notamment été prouvé

15

Résumé détaillé

NP-di�cile dans [GJ79]. Nous montrons que, lorsque l'on considère un système monoprocesseur com-portant une mémoire cache, le problème d'ordonnancement non-préemptif avec cache est NP-di�cileau sens faible. Même si l'existence d'un algorithme optimal pour ce problème, s'exécutant en tempspseudo-polynomial, ne peut être formellement exclue, nous pensons que ce problème est en fait NP-di�cile au sens fort comme dans le cas du problème d'ordonnancement préemptif.

L'ordonnancement avec cache est une approche intéressante, en ce qu'elle o�re une vue précise du sys-tème et, par conséquent, doit permettre d'atteindre de meilleurs résultats en terme d'ordonnançabilité.Mais ce degré élevé de précision implique plusieurs contre-parties négatives :

� Dans une application temps réel réaliste, chaque tâche peut avoir un code composé de milliersd'instructions et résultant donc en des milliers d'accès mémoire. Cela rend le modèle de tâchepotentiellement très complexe et di�cile à manipuler.

� Le code d'une tâche, même lorsque celui ne comporte que quelques dizaines de lignes, est souventcomposé de plusieurs instructions conditionnelles dé�nissant des chemins d'exécution di�érentspour la tâche. Pour calculer le temps d'exécution pire cas d'une tâche, le chemin d'exécutionle plus défavorable est considéré. Mais lors de l'exécution réelle de la tâche, ce chemin pire casne sera probablement pas celui suivi. De plus, chaque instance de la tâche peut emprunter unchemin d'exécution di�érent. A�n d'identi�er un scénario pire cas lorsque l'approche d'ordon-nancement avec cache est utilisée, chaque chemin possible devrait être considéré ainsi que toutesles combinaisons possibles avec chaque autre tâche du système.

Par conséquent, l'approche d'ordonnancement avec cache est presque impossible à utiliser en pratique,hormis pour des systèmes extrêmement simples.

Problème d'ordonnancement avec crpds

Une autre approche pour prendre en compte l'impact du cache sur l'ordonnancement est de consi-dérer les délais de préemption dus au cache. Contrairement au cas du problème d'ordonnancementavec cache, le contenu du cache et les accès mémoire des di�érentes tâches ne sont pas connus ici.A la place, des bornes supérieures des délais de préemption dus au cache sont utilisées. A chaqueinstant, l'ordonnanceur connaît donc seulement le coût à payer pour préempter l'instance de tâcheen train de s'exécuter. Il peut donc, en toute connaissance de cause, décider de préempter ou nonl'instance a�n de réduire le délai de préemption global pour le système. Réduire ce délai permet enretour de diminuer la charge de travail du processeur. Cette seconde approche correspond au problèmed'ordonnancement avec crpds. C'est cette approche qui est, en fait, couramment utilisée dans la plu-part de travaux s'intéressant à l'ordonnancement temps réel avec mémoire cache, comme par exempledans [LHS+97, ADM12, LAMD13, BBM+10, BXM+11].

Pour étudier la complexité du problème d'ordonnancement avec crpds, nous considérons, une foisencore, un problème de base. Ce dernier s'appuie sur la simpli�cation suivante : une borne supérieuredu délai de préemption dû au cache, identique pour toutes les tâches et quel que soit le point depréemption, est considérée. Ainsi, pour ce problème de base, le crpd pire cas pour le système estutilisé à chaque préemption. Une borne supérieure sûre est, par exemple, de considérer que la totalité

16

du cache doit être rechargé à chaque fois qu'une tâche reprend son exécution après une préemption,comme proposé dans [BMSO+96]. Ce problème de base est le plus simple possible, et ce a�n de pouvoiraisément se généraliser à des problèmes utilisant une modélisation plus �ne du paramètre représentantle crpd (comme par exemple un délai de préemption di�érent pour chaque tâche et chaque point depréemption). Nous montrons tout d'abord qu'aucun algorithme en-ligne, utilisant soit des priorités �xesaux tâches soit des priorités �xes aux instances de tâches, ne peut résoudre optimalement le problèmed'ordonnancement avec crpds. Ensuite, nous prouvons que ce problème est, en fait, NP-di�cile ausens fort. Par conséquent, ce problème ne peut être résolu optimalement par un algorithme s'exécutanten temps polynomial ou pseudo-polynomial. Nous montrons également que le problème d'ordonnan-cement avec crpds reste NP-di�cile au sens faible même lorsqu'il n'y a que deux dates de départ etdeux échéances di�érentes pour les tâches et un crpd égal à une unité de temps.

L'approche d'ordonnancement avec crpds est plus facilement utilisable en pratique que l'approched'ordonnancement avec cache. En e�et, le paramètre modélisant le crpd représente une borne supé-rieure du délai de préemption dû au cache qu'une tâche peut subir et, par conséquent, il n'est plusnécessaire de considérer les di�érents chemins d'exécution possibles de la tâche, de même que le contenudu cache à chaque instant, lors de l'ordonnancement. La di�culté principale pour l'ordonnancementavec crpds demeure la dé�nition du paramètre représentant le délai de préemption dû au cache. Ceparamètre doit en e�et être une borne supérieure sûre du crpd, tout en étant su�samment précispour éviter d'introduire trop de pessimisme ce qui, en retour, induirait un gaspillage des ressourcesmatérielles (en particulier du temps processeur). Le crpd peut être borné en considérant soit :

� une borne supérieure commune à toutes les tâches,

� une borne supérieure par tâche,

� un ensemble de bornes supérieures par tâche dépendant également des possibles tâches préemp-tantes,

� en�n, un ensemble de bornes supérieures par tâches dépendant à la fois des possibles tâchespréemptantes mais également de l'emplacement des points de préemption dans le code de latâche.

La façon la plus facile de procéder est, bien sûr, de considérer que tout le cache est rechargé aprèschaque préemption. Mais cette approche est, en revanche, très pessimiste. La dernière approche propo-sée précédemment est de loin la plus précise. Mais, comme dans le cas de l'ordonnancement avec cache,une tâche peut avoir plusieurs chemins d'exécution possible. Ainsi, le problème d'ordonnancement peutdevenir extrêmement complexe.

Dans la suite de nos travaux de thèse, nous allons considérer l'ordonnancement avec crpds, en utilisant,pour le délai de préemption dû au cache, une borne supérieure par tâche (indépendante du point réelde préemption dans le code de la tâche et des possibles tâches préemptantes). Le modèle de tâcheutilisé en ordonnancement classique s'en trouve modi�é. En plus de sa période, de son échéance et desa durée d'exécution pire cas, chaque tâche se voit attribuer un paramètre supplémentaire représentantune borne supérieure du délai de préemption dû au cache payé par la tâche chaque fois qu'elle reprendson exécution après une préemption.

17

Résumé détaillé

Ordonnancement en-ligne avec délais de préemption dus au cache

Nous considérons d'abord le problème d'ordonnancement en-ligne avec crpds. Un ordonnanceur estdit en-ligne s'il prend ses décisions durant l'exécution du système en ne se basant que sur l'état courantde celui-ci. Ainsi, un ordonnanceur en-ligne n'a connaissance que des paramètres temporels (dates deréveil, échéances...) des instances de tâches actives à l'instant considéré. Il n'est donc pas clairvoyant,c'est-à-dire n'a aucune connaissance des instances de tâches à venir. Le problème d'ordonnancement en-ligne a été longuement étudié pour des systèmes monoprocesseurs sans mémoire cache. Trois politiquessont notamment couramment utilisées : Rate Montonic (rm), Deadline Monotonic (dm) et EarliestDeadline First (edf). En particulier, edf est optimale dans le cas où les e�ets du cache ne sont pasconsidérés. Cependant, lorsque les crpds sont pris en compte, ce n'est plus le cas, ainsi qu'expliquélors de l'étude du problème d'ordonnancement avec crpds.

Viabilité des politiques rm, dm et edf pour l'ordonnancement avec crpds

Un des buts de l'ordonnancement est de garantir le déterminisme du système. En particulier, l'analysed'ordonnançabilité doit considérer le scénario pire cas a�n d'assurer que toutes les échéances serontrespectées lors de l'exécution du système (quel que soit le scénario se produisant réellement, en particu-lier quelles que soient les durées d'exécution réelles des tâches). A�n de construire ce scénario pire cas,les valeurs les plus défavorables sont souvent considérées pour les di�érents paramètres du modèle detâche : par exemple, chaque instance de tâche s'exécute pour sa pire durée d'exécution et, dans le casde tâches sporadiques, chaque tâche est réveillée le plus tôt possible. Mais durant l'exécution réelle dusystème, les durées d'exécution des tâches seront souvent plus courtes, de même, dans le cas de tâchessporadiques, les réveils seront plus espacés. Il faut donc s'assurer qu'un système, dont l'ordonnançabi-lité a été déterminée en se fondant sur les paramètres pire cas pour une politique d'ordonnancementdonnée, sera toujours ordonnançable avec cette politique dans le cas de valeurs plus favorables pourles di�érents paramètres des tâches. Cela correspond à étudier la viabilité de la politique d'ordonnan-cement, ainsi qu'introduite par Burns et Baruah dans [BB08] pour des systèmes ne comportant pas demémoire cache.Nous commençons donc par étendre la dé�nition de viabilité dans le cas de l'ordonnancement aveccrpds. En particulier, nous dé�nissons la notion de viabilité par rapport au délai de préemption dûau cache : un système doit rester ordonnançable si la valeur du paramètre représentant le crpd estdiminuée. Tout comme pour les durées d'exécution pire cas des tâches, les valeurs réelles des crpdsau cours de l'exécution du système sont souvent plus faibles que celles considérées pour le paramètrereprésentant le délai de préemption (qui n'est, en e�et, qu'une borne supérieure du crpd). Il est donccrucial de s'assurer de la viabilité de la politique d'ordonnancement utilisée vis-à-vis du paramètrereprésentant le crpd.Une fois ces dé�nitions établies, nous étudions la viabilité des politiques d'ordonnancement rm, dm etedf lorsque les délais de préemption dus au cache sont considérés. Nous montrons que, malheureuse-ment, ni Rate Monotonic, ni Deadline Monotonic, ni Earliest Deadline First ne sont viables dès lors queles crpds sont pris en compte. En particulier, réduire la durée d'exécution ou le crpd d'une tâche peutentraîner, pour un système ordonnançable avec rm, dm et edf, le non respect d'une échéance. Celaimplique, notamment, que la simulation ne peut pas être utilisée pour valider un système comportantune mémoire cache, à moins de considérer tous les scénarios possibles (c'est-à-dire, entre autres, toutes

18

les combinaisons de valeurs possibles pour les durées d'exécution des tâches et les crpds), ce qui estimpossible en pratique.

Ordonnancement optimal en-ligne avec crpds

Puisqu'aucune des politiques en-ligne utilisées classiquement n'est optimale pour le problème d'ordon-nancement avec crpds, nous étudions la question de l'existence d'un tel ordonnanceur optimal. Nousmontrons alors qu'un algorithme optimal en-ligne pour le problème d'ordonnancement avec crpds,dans le cas de tâches sporadiques, doit être nécessairement clairvoyant. Pour cela, nous utilisons uneanalyse de compétitivité, comme employée par exemple dans [BEY05]. Nous travaillons sur un exempleconstruit par nos soins pour lequel, à un instant donné, n'importe quel algorithme en-ligne doit prendreune décision d'ordonnancement. En fonction de cette décision, une nouvelle instance de tâche est gé-nérée de façon à provoquer un non respect d'échéance. Au contraire, un adversaire clairvoyant, qui alui connaissance de cette nouvelle instance, peut prendre une décision d'ordonnancement permettantde respecter toutes les échéances. Le système est donc faisable, mais aucun algorithme en-ligne ne peutl'ordonnancer sans manquer une échéance. Ainsi, l'ordonnancement optimal en-ligne de tâches spora-diques avec prise en compte des délais de préemption dus au cache est impossible, car un algorithmeen-ligne ne peut être clairvoyant.

Ordonnancement hors-ligne avec délais de préemption dus au cache

Comme aucun ordonnanceur en-ligne ne peut être optimal pour le problème d'ordonnancement aveccrpds, nous nous concentrons par la suite sur l'ordonnancement hors-ligne. Dans ce cas-là, l'ordon-nancement est construit avant que le système ne soit exécuté. Cela suppose une entière connaissancedu système, en particulier de toutes les tâches et de leurs caractéristiques temporelles. Ensuite, lors deson exécution, le système suit simplement cet ordonnancement pré-calculé pour décider quelle instancede tâche exécuter à chaque instant. Puisque nous avons montré qu'il ne peut exister un algorithme,s'exécutant en temps polynomial ou pseudo-polynomial, pour ordonnancer optimalement des tâchessujettes à des délais de préemption dus au cache (à moins que P = NP), nous proposons ici des solutionsbasées sur l'optimisation linéaire.

L'idée est d'utiliser la programmation mathématique pour construire un ordonnancement faisable(c'est-à-dire respectant toutes les échéances des tâches) si cela est possible. Nous utilisons plus particu-lièrement la programmation linéaire avec certaines variables pouvant prendre des valeurs non entières(en anglais Mixed-Integer Linear Programming, milp).Pour cela, nous considérons un ordonnancement comme une suite de tranches consécutives qui sontdélimitées soit par l'activation d'une instance de tâche soit par une échéance. Ainsi, au sein de chaquetranche, aucune nouvelle instance ne peut être activée, et aucune échéance ne doit avoir à être res-pectée avant la �n de la tranche. Nous considérons ensuite chaque instance de tâche comme une suitede sous-instances, chacune pouvant seulement être exécutée dans une tranche spéci�que. Dé�nir unordonnancement revient alors à calculer les dates de début de chaque sous-instance, ainsi que la duréed'exécution de cette sous-instance. Notons que �xer une durée d'exécution nulle pour une sous-instancedonnée revient à exprimer que l'instance de tâche correspondante ne s'exécute pas dans la tranche as-sociée à cette sous-instance.

19

Résumé détaillé

Nous dé�nissons ensuite un ensemble d'égalités et d'inégalités mathématiques a�n d'assurer que lessolutions du programme mathématique sont bien des ordonnancements valides (une tâche ne peutcommencer à s'exécuter avant sa date de réveil, une échéance ne doit pas être violée...) :

� des contraintes assurent que chaque instance de tâche s'exécute au total pour exactement la duréed'exécution pire cas de la tâche correspondante,

� des contraintes assurent que les di�érentes sous-instances d'une tranche donnée commencent etterminent leur exécution sans excéder les limites de cette tranche et que la durée totale d'exécutionde ces sous-instances ne dépasse pas la taille de la tranche,

� des contraintes assurent que deux sous-instances ne peuvent être exécutées en même temps ausein d'une même tranche,

� et en�n des contraintes assurent qu'un crpd est payé par chaque instance de tâche lorsque cettedernière reprend son exécution après une préemption.

Ensuite, pour choisir un ordonnancement parmi toutes les solutions valides au problème, nous dé�nis-sons une fonction objectif qui minimise le délai global de préemption dû au cache. Minimiser le crpdtotal revient en fait à réduire la charge de travail pire cas du processeur, ce qui est le but communé-ment visé par la plupart des algorithmes d'ordonnancement classiques. En e�et, cela permet d'avoir dessystèmes plus robustes (cf. [But11]), c'est-à-dire restant ordonnançables même dans le cas de scénariosplus défavorables que le scénario pire cas, ce qui peut arriver si des tâches sont ajoutées au système ouque des durées d'exécution sont plus longues que prévues.Comme le nombre potentiel de solutions pour le programme mathématique peut être très important,nous ne considérons que des ordonnancements pour lesquels une instance de tâche s'exécute au maxi-mum une fois par tranche. Par conséquent, chaque instance de tâche ne peut payer qu'un crpd partranche. Ainsi, le nombre d'ordonnancements valides s'en trouve considérablement réduit, sans quecela ne nuise à la solution �nale. En e�et, un ordonnancement pour lequel une instance de tâche don-née peut s'exécuter deux fois au sein d'une même tranche aura toujours une utilisation du processeursupérieure à un ordonnancement pour lequel une instance de tâche ne peut s'exécuter qu'une seule foispar tranche.

Première approche

Nous proposons d'abord une première approche. A�n d'avoir un modèle mathématique plus simple,nous considérons un problème d'ordonnancement légèrement modi�é : la durée d'exécution pire cas dechaque instance de tâche est décrémentée de la valeur du paramètre représentant le crpd. En contre-partie, chaque instance de tâche paie un délai de préemption lorsqu'elle commence son exécution pour lapremière fois. Cela permet de réduire le nombre de contraintes pour le programme mathématique, touten considérant deux problèmes équivalents. Toutefois, pour qu'une telle transformation soit possible,il ne faut considérer que des systèmes pour lesquels le crpd de chaque tâche est inférieur à la piredurée d'exécution de la tâche, ce qui paraît véri�é de façon évidente pour des cas réels d'utilisation.Cependant, le paramètre représentant le crpd étant en fait une borne supérieure du délai de préemptiondû au cache, il est possible qu'il soit supérieur à la pire durée d'exécution de la tâche dans le cas où une

20

approche particulièrement pessimiste ait été employée pour borner le crpd (par exemple, considérerque la totalité du cache est rechargée après chaque préemption).Nous faisons de plus l'hypothèse, pour cette première approche, que le délai de préemption, payé par uneinstance de tâche lorsqu'elle reprend son exécution dans une tranche donnée, peut être entièrement payédans cette tranche. Ainsi, un délai de préemption ne peut pas déborder sur la tranche suivante. Cettehypothèse n'empêche pas, dans la plus grande partie des cas, de construire un ordonnancement valide.Cependant, notamment pour des systèmes requérant presque totalement le temps processeur disponibleet dont les crpds associés ont des durées importantes, il se peut que les seuls ordonnancements validesne puissent pas respecter l'hypothèse précédente. Ainsi, cette première approche n'est pas optimalepour le problème d'ordonnancement avec crpds.

Deuxième approche

Nous proposons donc une seconde approche pour résoudre le problème d'optimalité. Nous ne considé-rons plus ici de problème transformé, ce qui implique de modi�er les contraintes de notre programmemathématique relatives à la dé�nition des préemptions.Ensuite, nous devons permettre à un délai de préemption d'être réparti sur plusieurs tranches consécu-tives si besoin. Pour ce faire, et ce a�n d'éviter d'ajouter un nombre trop important de variables et decontraintes supplémentaires, nous décidons d'inclure les délais de préemption dus au cache directementdans les durées d'exécution. Ainsi, la durée d'exécution calculée pour chaque sous-instance correspondà la fois à la durée d'exécution de l'instance de tâche correspondante dans la tranche considérée, maiségalement potentiellement à un crpd (ou une partie de crpd) payé si l'instance reprend son exécu-tion dans cette tranche. Nous ne considérons donc plus la position réelle du crpd, ce qui permet àun délai de s'exécuter sur plusieurs tranches consécutives. En contre-partie, nous devons nous assurerqu'aucun délai de préemption n'est anticipé, c'est-à-dire payé avant la préemption correspondante.Nous prouvons alors qu'aucun crpd n'est anticipé pour tout ordonnancement minimisant le délai depréemption global. Ainsi, cette nouvelle approche permet de résoudre de façon optimale le problèmed'ordonnancement avec crpds.Notons que la complexité mathématique de notre solution augmente avec le nombre de tâches et lenombre de tranches à considérer. Mais surtout, le temps nécessaire pour résoudre le problème d'optimi-sation explose avec le nombre de variables du milp. Ainsi, pour assurer des temps de calcul raisonnables,notre approche doit être limitée à de petits systèmes de tâches.

Nous étudions en�n l'impact du paramètre représentant le délai de préemption dû au cache sur l'op-timalité de notre solution. Notre approche considère une borne supérieure du crpd ne dépendantque de la tâche préemptée. Sous ce modèle relativement simple, elle est optimale en ce sens où ellepeut construire un ordonnancement valide s'il en existe un. Cependant, une telle borne peut se révélertrès pessimiste. Par exemple, si la tâche préemptée et les tâches pouvant la préempter n'accèdent pasaux mêmes emplacements dans le cache, alors le délai de préemption dû au cache est en réalité nulpuisqu'aucun bloc mémoire n'est en fait enlevé du cache. Pour réduire ce pessimisme, des bornes plusprécises (prenant en compte à la fois les e�ets de la tâche préemptée et de toutes les tâches préemp-tantes) peuvent être utilisées. Mais cela induit de modi�er le programme mathématique sur lequelrepose notre approche en lui ajoutant un grand nombre de variables et de contraintes supplémentaires.La complexité résultante risque alors de rendre l'approche inutilisable en pratique.

21

Résumé détaillé

Evaluation de l'impact du cache sur l'ordonnançabilité

Maintenant que nous disposons d'une approche optimale pour résoudre le problème d'ordonnancementavec crpds, nous pouvons l'utiliser, notamment comme test de faisabilité pour déterminer si un sys-tème donné est ordonnançable ou non lorsque les délais de préemption dus au cache sont considérés.Nous pouvons donc désormais mesurer la perte d'ordonnançabilité de politiques classiques, telles queRate Monotonic et Earliest Deadline First, dès lors que les crpds sont pris en compte. Nous souhai-tons également évaluer l'e�cacité de notre solution hors-ligne en terme d'ordonnancement des systèmes.

Pour cela, nous menons deux séries d'expérimentations implémentées avec le langage de programmationMATLAB. Les caractéristiques de chaque tâche (pires durées d'exécutions, périodes, échéances) sontgénérées aléatoirement en suivant le protocole proposé notamment dans [ADM12, LAMD13]. Dansle premier cas, un paramètre pour chaque tâche représentant directement le crpd est utilisé. Dansle deuxième cas, le paramètre considéré pour chaque tâche s'appuie sur le cache, et en particulierl'occupation du cache par chaque tâche.Pour toutes les expérimentations, nous considérons Rate Monotonic et Earliest Deadline First faceà notre approche hors-ligne optimale. Nous mesurons ensuite principalement l'ordonnançabilité dessystèmes pour chaque politique utilisée lorsque di�érents paramètres d'entrée sont modi�és. Lorsquel'impact de l'utilisation totale du temps processeur est considéré, une simple mesure du nombre desystèmes de tâches ordonnançables est possible. Cependant, lorsque d'autres paramètres d'entrée sontétudiés (facteur de réutilisation, utilisation totale du cache...), il est di�cile de les considérer sansprendre en compte également l'impact de l'utilisation du processeur. Ainsi, il est nécessaire d'utiliser unemesure plus évoluée (appelée en anglais Weighted Schedulability [BBA10]) qui permet de réduire desgraphes tri-dimensionnels à seulement deux dimensions sans avoir à donner une valeur �xe à l'utilisationdu processeur. L'ordonnançabilité pour Rate Monotonic et Earliest Deadline First est mesurée enutilisant des analyses d'ordonnançabilité prenant en compte les crpds. Pour notre approche hors-ligne, un système de tâches est déclaré ordonnançable dès lors qu'un ordonnancement valide peut êtreconstruit. Nous utilisons le solveur CPLEX 12.6.1 d'IBM pour résoudre le programme mathématique.Cependant, en�n de contenir l'explosion du temps de résolution du milp, le nombre d'instances detâches doit être limité. Nous considérons ainsi, pour toutes nos expérimentations, des systèmes detâches ne générant au maximum que 200 instances sur l'hyperpériode (plus petit commun multiple despériodes des tâches). De même, pour limiter la taille de l'hyperpériode, nous considérons seulement unintervalle réduit (1 à 10 ms) pour les périodes des tâches.

Expérimentations basées sur le crpd

Pour la première série d'expérimentations, nous considérons directement, pour chaque tâche, un pa-ramètre temporel représentant le crpd que paie chaque tâche lorsqu'elle reprend son exécution aprèsune préemption. Ce paramètre est généré aléatoirement à partir de la pire durée d'exécution de latâche considérée. Ce paramètre est limité à un pourcentage maximum de cette durée d'exécution parun facteur maximal, qui est une variable d'entrée pour ces expérimentations.Pour ce premier cas, nous mesurons également le nombre moyen de préemptions générées par chaquepolitique d'ordonnancement ainsi que le délai de préemption global dû au cache ramené à l'hyperpériodedu système.

22

Nous constatons alors que l'ordonnançabilité de Rate Monotonic, d'Earliest Deadline First et de notreapproche hors-ligne se dégrade lorsque l'utilisation totale du processeur ou le facteur maximal autorisépour le délai de préemption est augmenté. Mais notre approche hors-ligne domine clairement les deuxautres politiques d'ordonnancement, en particulier pour des utilisations importantes du temps proces-seur ou des facteurs autorisés élevés (ce qui revient à avoir des délais de préemption dus au cache plusimportants en moyenne). En e�et, le nombre de préemptions, et donc le délai de préemption global,est bien plus faible pour notre approche optimale que pour Rate Monotonic ou Earliest Deadline First.

Expérimentations basées sur le cache

Pour la seconde série d'expérimentations, nous ne générons plus directement un délai de préemptionpar tâche. Au contraire, nous procédons comme dans [ADM12, LAMD13] et considérons plusieursvariables d'entrée représentant le cache (taille du cache, temps de rechargement d'un bloc) ainsi queson utilisation par les di�érentes tâches (utilisation totale du cache, facteur de réutilisation). Nouspouvons ainsi générer aléatoirement un ensemble d'ecbs et d'ucbs pour chaque tâche représentantl'impact de chacune des tâches sur le cache. Un crpd pour chaque tâche peut ensuite être facilementdérivé à partir de ces ecbs et ucbs. Nous considérons en particulier l'approche ucb-only, qui estproposée dans [LHS+98, LAMD13].En plus de l'ordonnançabilité de chaque système, nous mesurons également ici un facteur d'accélération(speedup factor en anglais). Ce dernier correspond au facteur par lequel il faut augmenter la vitesse duprocesseur, mais également, dans notre cas, réduire le délai de rechargement d'un bloc, a�n de rendretout système ordonnançable.Ces expérimentations nous permettent de mettre en lumière la perte d'ordonnançabilité pour RateMonotonic, Earliest Deadline First et notre approche hors-ligne lorsque l'utilisation du processeur estaugmentée, mais également lorsque l'utilisation du cache, le temps de rechargement d'un bloc ou lefacteur de réutilisation augmente. Mais, une fois encore, notre approche hors-ligne domine clairement lesdeux autres approches. En particulier, en étudiant le facteur d'accélération requis pour chaque politiqued'ordonnancement, nous constatons que, pour un processeur très chargé, il peut être nécessaire d'utiliserun système presque 50% plus rapide pour Rate Monotonic et Earliest Deadline First a�n d'arriver auxmêmes résultats d'ordonnançabilité que pour notre approche hors-ligne.En�n, nous étudions l'impact du modèle du crpd sur notre solution hors-ligne. Dès lors que des bornessupérieures du délai de préemption sont adoptées, notre approche n'est plus optimale. Pourtant, ellecontinue de dominer clairement Rate Monotonic et Earliest Deadline First.

Conclusion et perspectives

Au cours de cette thèse, nous avons étudié le problème consistant à ordonnancer des tâches temps réelsen prenant en compte les délais de préemption dus au cache. Nous nous sommes plus particulièrementconcentrés sur :

1. le problème théorique d'ordonnancement temps réel dur sur un monoprocesseur comportant unemémoire cache,

2. l'impact des délais de préemption dus au cache sur des politiques classiques d'ordonnancementen-ligne (en particulier Rate Monotonic et Earliest Dealine First),

23

Résumé détaillé

3. proposer des approches permettant d'ordonnancer de façon optimale des tâches temps réel àcontraintes temporelles dures en prenant en compte les délais de préemption dus au cache.

Une fois quelques notions basiques sur les systèmes embarqués temps réel et les mémoires cache intro-duites au Chapitre 1, nous nous sommes intéressés, dans le Chapitre 2, à l'ordonnancement tempsréel et avons décrit plus précisément ce que sont les délais de préemption dus au cache. Ces di�é-rents concepts ayant été posés, nous avons présenté, dans le Chapitre 3, une brève classi�cation desprincipaux travaux de recherche qui, dans la littérature temps réel, traitent de mémoires cache et d'or-donnancement temps réel. En particulier, nous avons vu qu'aucune des solutions proposées ne domineclairement les autres. Très souvent, des méthodes combinant di�érentes approches sont nécessaires a�nde permettre des améliorations signi�catives en termes d'ordonnançabilité et/ou de prédictibilité dusystème. De plus, à notre connaissance, aucune approche d'ordonnancement optimale n'a été proposéelorsque les e�ets du cache sont pris en compte.Dans le Chapitre 4, nous nous sommes attachés à formaliser le problème consistant à ordonnancer destâches temps réel à contraintes temporelles dures en prenant en compte les e�ets dus au cache. Nousavons identi�é deux problèmes d'ordonnancement distincts : le problème d'ordonnancement avec cacheet le problème d'ordonnancement avec crpds. Dans le cas du problème d'ordonnancement avec cache,l'ordonnanceur connaît l'état du cache (c'est-à-dire son contenu) à chaque instant ainsi que tous lesaccès mémoire e�ectués par les tâches au cours de leur exécution. Au contraire, dans le cas du problèmed'ordonnancement avec crpds, l'ordonnanceur prend ses décisions d'ordonnancement en se basantuniquement sur des bornes supérieures de l'interférence extrinsèque due au cache (c'est-à-dire les délaisde préemption dus au cache). Nous avons étudié ces deux problèmes de façon théorique en considérantdeux problèmes de base. Nous avons ainsi prouvé que le problème d'ordonnancement avec cache et leproblème d'ordonnancement avec crpds sont tous les deux NP-di�ciles au sens fort. Par conséquent,nous avons montré que prendre explicitement en compte les mémoires cache modi�e radicalement leproblème d'ordonnancement. Ainsi, les résultats théoriques existants ne peuvent pas se généraliserdirectement. Par la suite, nous avons décidé de nous concentrer sur le problème d'ordonnancementavec crpds puisque l'ordonnancement avec cache est presque impossible à utiliser en pratique. Nousnous sommes donc intéressés brièvement à la modélisation du délai de préemption dû au cache pourle problème d'ordonnancement avec crpds, et ce a�n de trouver un compromis entre précision etpraticabilité.Ensuite, dans le Chapitre 5, nous avons étudié l'impact du cache sur l'ordonnancement en-ligne.Nous avons montré d'abord que les politiques d'ordonnancement classiques que sont Rate Monotonic,Deadline Monotonic et Earliest Deadline First ne sont pas viables lorsque les délais de préemptiondus au cache sont pris en compte. Dans un second temps, nous avons considéré le problème de trou-ver une solution optimale en-ligne au problème d'ordonnancement avec prise en compte des crpds.Nous avons alors prouvé qu'un tel algorithme doit être nécessairement clairvoyant. Par conséquent,l'ordonnancement optimal en-ligne avec crpds est impossible.L'ordonnancement optimal en-ligne avec crpds étant impossible, nous avons proposé, dans le Cha-pitre 6, deux approches hors-ligne pour ordonnancer des tâches temps réel à contraintes temporellesdures prenant en compte les délais de préemption dus au cache. Ces deux solutions utilisent la pro-grammation linéaire pour construire un ordonnancement hors-ligne faisable. Nous avons montré quenotre seconde approche est optimale lorsqu'une modélisation simple du crpd est utilisée. Nous avonségalement évalué le temps de calcul de notre solution.

24

Finalement, dans le Chapitre 7, nous nous sommes intéressés à l'impact des délais de préemption dusau cache sur l'ordonnançabilité du système lorsque Rate Monotonic, Earliest Deadline First et notresolution hors-ligne optimale sont employés. Pour cela, nous avons utilisé des tâches générées aléatoire-ment en considérant deux di�érents modèles pour le paramètre correspondant au délai de préemptiondû au cache. La perte d'ordonnançabilité pour Rate Monotonic, Earliest Deadline First et notre ap-proche hors-ligne a été mesurée en faisant varier di�érents paramètres d'entrée tels que l'utilisationtotale du processor, l'utilisation totale du cache, le temps de rechargement d'un bloc mémoire... Nousavons pu constater que ces di�érents paramètres ont un impact important sur l'ordonnançabilité dusystème.

Pour résumer, les principales contributions de cette thèse sont :

� l'identi�cation et la formalisation de deux problèmes d'ordonnancement avec mémoire cache : leproblème d'ordonnancement avec cache et le problème d'ordonnancement avec crpds,

� l'étude de la complexité de ces deux problèmes : le problème d'ordonnancement avec cache et leproblème d'ordonnancement avec crpds sont NP-di�ciles au sens fort,

� la non viabilité des politiques d'ordonnancement en-ligne Rate Monotonic, Deadline Monotonicet Earliest Deadline First lorsque les crpds sont pris en compte,

� l'impossibilité d'ordonnancer optimalement en-ligne des tâches temps réel à contraintes tempo-relles dures avec prise en compte des crpds,

� la proposition d'une solution optimale hors-ligne au problème d'ordonnancement avec crpds.

Perspectives

Di�érents points abordés au cours de ces travaux de thèse pourraient être l'objet de développementsau sein de futurs travaux.

La partie expérimentale pourrait ainsi être étendue en utilisant des approches plus précises pour bornerles délais de préemption dus au cache (telles que les approches ecb- et ucb-union Multiset introduitesdans [ADM12, LAMD13]). D'autres politiques d'ordonnancement pourraient également être comparéesà notre approche hors-ligne optimale, telles que celles évoquées dans la Section IV.2 du Chapitre 3 etpermettant de limiter le nombre de préemptions. De plus, il reste à ré�échir à la mise en place ainsiqu'à l'implémentation d'un plan expérimental qui mettrait en lien la génération des ecbs et ucbs (etdonc du crpd) pour chaque tâche avec le code de cette tâche (de donc sa durée d'exécution pire cas).En ce qui concerne notre solution d'ordonnancement hors-ligne, son optimalité pourrait être étendueà d'autres cas d'utilisation en considérant un modèle plus précis pour le paramètre représentant ledélai de préemption dû au cache. En particulier, la prise en compte des e�ets sur le cache dus à lafois à la tâche préemptée et aux tâches préemptantes semble nécessaire. Mais une simple extension duprogramme mathématique présenté dans cette thèse (par l'introduction de nouvelles variables et denouvelles contraintes) mènerait à une complexité mathématique très importante ce qui rendrait unetelle solution inutilisable en pratique. Il sera donc probablement nécessaire de proposer une approchedi�érente pour résoudre ce problème.

25

Résumé détaillé

Une question intéressante (mais particulièrement di�cile) concerne également la proposition d'un testde faisabilité lorsque les délais de préemption dus au cache sont pris en compte.Dans ces travaux de thèse, nous avons prouvé que l'ordonnancement en-ligne optimal avec crpds estimpossible. Il serait donc intéressant de proposer des heuristiques et de les comparer à notre approchehors-ligne optimale. En particulier, des combinaisons entre di�érentes méthodes devraient être consi-dérées a�n de développer des politiques d'ordonnancement en-ligne avec prise en compte des délais depréemption dus au cache. En e�et, les performances d'un système temps réel dépendent simultanémentdes pires durées d'exécution des tâches, de la politique de gestion du cache (politique de remplacement,techniques de partitionnement et/ou de verrouillage), de l'analyse d'ordonnançabilité permettant la va-lidation du système, mais aussi de l'ordonnanceur utilisé pour prendre les décisions d'ordonnancementlors de l'exécution du système. Comme chacune de ces techniques introduit des surestimations a�n degarantir la prédictibilité du système, des améliorations dans la conception des systèmes temps réel nepeuvent avoir lieu qu'en traitant tous ces problèmes à la fois. Par exemple, l'utilisation du partition-nement ou de techniques de verrouillage permet de réduire l'interférence extrinsèque due au cache etainsi de diminuer le coût d'une préemption. Ensuite, une politique d'ordonnancement contrôlant lespréemptions, utilisant par exemple des points de préemption �xés dans le code des tâches, peut êtreemployée pour réduire le nombre de préemptions et ainsi le délai de préemption global dû au cache.Finalement, une analyse d'ordonnançabilité prenant en compte le cache peut être utilisée pour validerle système et ainsi éviter de gaspiller les ressources matérielles (en particulier le temps processeur).Ainsi, des travaux futurs devront être menés pour améliorer ces combinaisons. Les principaux enjeuxportent notamment sur l'identi�cation du modèle requis pour représenter le système (modèle à grain�n ou grossier) ainsi que sur la proposition de solutions permettant de gérer à la fois les accès desdi�érentes tâches à la mémoire (en particulier la politique de gestion du cache), l'ordonnancement deces tâches (par exemple par le contrôle des préemptions) et la question de l'ordonnançabilité.En�n, le cas des systèmes multiprocesseurs est particulièrement intéressant. En e�et, les problèmesdus au cache deviennent encore plus di�ciles lorsque des multiprocesseurs sont considérés : di�érentsniveaux de mémoires cache sont utilisés, qui peuvent être soit privés soit partagés. L'analyse du cachen'en est rendue que plus complexe. De plus, lorsque un ordonnancement global est considéré, les tâches(ou les instances de ces tâches) peuvent migrer d'un c÷ur à un autre, provoquant l'apparition de délaissupplémentaires appelés délais de migration dus au cache (en anglais Cache-Related Migration Delays).

26

Remerciements

Je souhaite, en premier lieu, remercier mon directeur de thèse, Pascal RICHARD, pour la qualité deson encadrement durant ces trois années ainsi que pour sa gestion humaine remarquable. Sans lui, cettethèse n'aurait pu aboutir. Je tiens ensuite à remercier tout particulièrement Emmanuel GROLLEAU,co-directeur du Laboratoire d'Informatique et d'Automatique pour les Systèmes (LIAS), qui fut àl'initiative de ma venue au LIAS et grâce à qui j'ai eu l'opportunité d'entreprendre cette thèse. Jesouhaite également exprimer ma gratitude aux di�érentes personnes avec qui j'ai été amené à collaborerau cours de cette thèse, en particulier Claire MAIZA, Joël GOOSSENS et Laurent GEORGE.

Je voudrais aussi exprimer mes plus vifs remerciements à Isabelle PUAUT, Liliana CUCU-GROSJEANet Sanjoy BARUAH pour avoir accepté d'être rapporteurs de ma thèse. L'attention qu'ils ont apportéeà ce travail et leur retours positifs m'ont particulièrement touché.

Je remercie également toutes les personnes au sein du LIAS avec qui j'ai été amené à travaillerou que j'ai cotôyées durant ces trois années de thèse : Ladjel, Hallel, Mickaël et Michaël, Laurent,Stéphane, Fred, Annie, Brice, Zoé... Un merci en particulier à Henri pour les nombreux échanges quenous avons pu avoir, en particulier lors de séances communes de Travaux Pratiques, et qui furent trèsenrichissants. Je remercie aussi M. Boubaker EL HADJ AMOR, qui fut l'un de mes responsables lorsde l'encadrement de séances de Travaux Pratiques, pour son aide et ses conseils qui me furent très utilespour cette première expérience d'enseignement. Et bien sûr, je remercie Claudine, Fred, Monique etChantal pour l'aide qu'ils ont su m'apporter mais aussi pour leur présence chaleureuse.

Mes remerciements vont ensuite à tout-e-s mes collègues et ami-e-s doctorant-e-s : Yves, Georges,Paule, Akli, Anh Toan, Cyrille, Badran, Zakaria, Abdallah, Amine, Abdelkader, Abdelkrim... et no-tamment ceux avec qui j'ai partagé, depuis le début, de nombreux repas, excursions, cinémas, foots ensalle..., Selma, Geraud, Olga, Thomas, Yassine et mes amis algériens : Okba, Nadhir, Zohir, Lahcene,

Ahcene, Jamal... ù

KA

�¯Y�

@ ÕºË @Qº

��

Merci bien sûr aussi à mes amis de l'extérieur : Edwin, Eric, Alex, Bertrand et Rafael pour cesmoments de sueur (un peu) et de rires (surtout). Merci à Wafa et Sabri, mes ami-e-s tunisien-e-s,notamment pour ces quatre jours magni�ques à Hammamet qui furent une pause exceptionnelle etplus que bienvenue durant ces longues semaines de rédaction. En�n, merci à mes amis de longue date,

27

Remerciements

Stéphane avec qui j'ai partagé tant de soirées teintées de bonne humeur, et Zied et Hussein qui eurentle plus à sou�rir de mes sautes d'humeur. Leur présence à mes côtés fut inestimable.

Je voudrais conclure ces remerciements en exprimant ma plus profonde gratitude à ma famille, ettout particulièrement mes parents et ma soeur, pour m'avoir accompagné, soutenu et encouragé depuistant d'années. Je leur dois beaucoup.

Merci à tout-e-s !

28

Contents

Acronyms 39

Notations 41

Foreword 43

General Introduction 49

I Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49II Objectives and Solution overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51III Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Part I Research Foundations 55

1 Real-Time Systems and Cache Memories 57

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59II Real-Time and Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

II.1 De�nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59II.2 Structures of Real-Time Embedded Systems . . . . . . . . . . . . . . . . . . . . 62II.3 Real-Time cots-based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

III Cache Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65III.1 Main principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65III.2 Cache organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67III.3 Timing anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

IV Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2 Real-Time Scheduling and Cache-Related Preemption Delays 73

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75II Real-Time Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

II.1 De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75II.2 Periodic/Sporadic Task characteristics . . . . . . . . . . . . . . . . . . . . . . . 76II.3 Task Worst-Case Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . 78

29

CONTENTS

III Scheduling Real-Time Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81III.1 Real-Time Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81III.2 Real-Time Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 83III.3 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84III.4 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

IV Scheduling and preemption delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86IV.1 The problem of preemption delays . . . . . . . . . . . . . . . . . . . . . . . . . 87IV.2 Cache-Related Preemption Delays . . . . . . . . . . . . . . . . . . . . . . . . . 87IV.3 E�ects of crpds on real-time scheduling: Motivational example . . . . . . . . . 90

V Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

3 Cache and Real-Time Scheduling 93

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95II Accounting for Cache-Related Preemption Delays . . . . . . . . . . . . . . . . . . . . . 95

II.1 Into the Worst-Case Execution Times . . . . . . . . . . . . . . . . . . . . . . . 95II.2 Into the schedulability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

III Memory management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101III.1 Cache partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102III.2 Cache locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103III.3 Memory layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105III.4 Other techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

IV Enhanced scheduling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107IV.1 Limited preemption scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 107IV.2 Cache-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

V Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111VI Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Part II Contributions 113

4 Real-Time Scheduling with Cache Memories 115

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117II Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

II.1 Limitations of classic scheduling approaches . . . . . . . . . . . . . . . . . . . . 117II.2 Scheduling approaches accounting for the cache . . . . . . . . . . . . . . . . . . 118II.3 Proving computational complexity results . . . . . . . . . . . . . . . . . . . . . 119

III The Cache-aware Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 120III.1 Cache-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121III.2 Core Problem De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122III.3 Complexity of the Preemptive Problem . . . . . . . . . . . . . . . . . . . . . . . 124III.4 Complexity of the Non-Preemptive Problem . . . . . . . . . . . . . . . . . . . . 129

30

CONTENTS

III.5 Limitations of the Cache-aware approach . . . . . . . . . . . . . . . . . . . . . 130IV The crpd-aware Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

IV.1 crpd-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131IV.2 Core Problem De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131IV.3 Complexity of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133IV.4 Optimal algorithm for corner cases . . . . . . . . . . . . . . . . . . . . . . . . . 136IV.5 Discussion on the crpd-aware approach . . . . . . . . . . . . . . . . . . . . . . 137

V Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5 Online Scheduling with Cache-Related Preemption Delays 139

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141II Sustainability for crpd-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 141

II.1 De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142II.2 Sustainability of rm, dm and edf scheduling policies . . . . . . . . . . . . . . . 143II.3 Sustainability of schedulability tests and analyses . . . . . . . . . . . . . . . . . 150

III Optimal Online Scheduling accounting for crpds . . . . . . . . . . . . . . . . . . . . . 152III.1 Scheduling independent jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153III.2 Scheduling periodic tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153III.3 Scheduling sporadic tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

IV Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6 O�ine Scheduling with Cache-Related Preemption Delays 163

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165II Basis of the approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

II.1 Presentation of the main idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165II.2 Feasible schedule property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

III A nearly-optimal o�ine approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168III.1 Transformed model and assumptions . . . . . . . . . . . . . . . . . . . . . . . . 168III.2 Mathematical Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170III.3 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174III.4 Limitations of the nearly-optimal approach . . . . . . . . . . . . . . . . . . . . 175

IV An optimal o�ine approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177IV.1 Overcoming the limitations of the nearly-optimal approach . . . . . . . . . . . 177IV.2 Mathematical Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181IV.3 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184IV.4 Comparison with the nearly-optimal approach . . . . . . . . . . . . . . . . . . . 185

V Discussion on the optimal o�ine approach . . . . . . . . . . . . . . . . . . . . . . . . . 186V.1 Mathematical complexity and solving time issues . . . . . . . . . . . . . . . . . 188V.2 Impact of the crpd parameter model . . . . . . . . . . . . . . . . . . . . . . . 191

VI Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

7 Evaluation of the cache impact on schedulability 193

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195II General experimental plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

31

CONTENTS

II.1 Common experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 195II.2 Common metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

III Experiments based on a crpd parameter . . . . . . . . . . . . . . . . . . . . . . . . . . 198III.1 Additional experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . 198III.2 Additional metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198III.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

IV Experiments based on cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 203IV.1 Additional experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . 204IV.2 Additional metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205IV.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

V Discussion on the experimental plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212VI Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

Part III Conclusion and Perspectives 217

General Conclusion 219

Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

Appendices 223

A Related Publications 225

Bibliography 227

32

List of Figures

1 Example of software bug leading to a critic failure: explosion of the Ariane 5 launcherfor its �rst �ight on June 4th, 1996. Photo: ESA. . . . . . . . . . . . . . . . . . . . . . 49

2 Circular dependency between task execution times and Cache-Related Preemption Delays. 51

1.1 Schematic representation of a Real-Time System. . . . . . . . . . . . . . . . . . . . . . 591.2 Example of the AMADO drone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611.3 Architecture of a real-time system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621.4 Evolution of embedded systems for the aeronautic industry based on data from Airbus

and [LEJ15]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651.5 Schematic representation of the increasing gap between the CPU frequency and the

main memory access time based on [HP90]. . . . . . . . . . . . . . . . . . . . . . . . . 661.6 Example of memory hierarchy for a uniprocessor system with two levels of cache with

typical values found on ARM® processors [ARM]. . . . . . . . . . . . . . . . . . . . . 671.7 Di�erent cache mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681.8 Cache organization of a k-way Set-associative cache. . . . . . . . . . . . . . . . . . . . 691.9 Example of block replacement for a 4-way fully-associative cache using the lru policy. 701.10 Example of a timing anomaly: experiencing a cache hit instead of a cache miss leads to

a longer execution time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.1 Schematic representation of the main timing parameters of a real-time periodic task. . 762.2 Possible execution times for a task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782.3 wcet estimation chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792.4 Example of wcet computation using ipet. . . . . . . . . . . . . . . . . . . . . . . . . 802.5 Schedule example for two synchronous periodic tasks τ1(2, 5, 5) and τ2(5, 10, 10) over the

hyperperiod H = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822.6 Usual representation of the main complexity classes. . . . . . . . . . . . . . . . . . . . 862.7 Example of Cache-Related Preemption Delay. . . . . . . . . . . . . . . . . . . . . . . . 882.8 Conventional representation of the crpds. . . . . . . . . . . . . . . . . . . . . . . . . . 892.9 crpds and critical instant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.1 Schedule for tasks τ1(2, 9, 9), τ2(2, 9, 9) and τ3(3, 9, 9) for the worst-case scenario. . . . 1003.2 Example of software cache partitioning for three tasks τ1, τ2 and τ3. . . . . . . . . . . 1023.3 Example of di�erent locking techniques for two tasks τ1 and τ2. . . . . . . . . . . . . . 1043.4 Schedules for Tasks τ1(1, 3, 3), τ2(1.5, 12, 12) and τ3(3, 6, 6) using Rate Monotonic for

di�erent task layouts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

33

LIST OF FIGURES

3.5 Non dominance of �xed-task priority preemptive scheduling and �xed-task priority non-preemptive scheduling for the taskset presented in Table 3.2. . . . . . . . . . . . . . . . 108

3.6 Example of �xed-task preemption threshold scheduling for the taskset presented in Ta-ble 3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

3.7 Example of Fixed Preemption Point scheduling for the taskset presented in Table 3.2. 110

4.1 Valid schedule for Taskset TExample1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184.2 Pseudo-code example and corresponding pseudo-assembly code. . . . . . . . . . . . . . 1214.3 Schedule of three jobs using Memory blocks a, b and c. . . . . . . . . . . . . . . . . . . 1244.4 Feasible schedule for Jobs J1(p, d, aba) and J2(p, d, bab) with p = 1, r = 0.5. . . . . . . 1254.5 edf schedule and feasible schedule for Taskset τ . . . . . . . . . . . . . . . . . . . . . . 1324.6 Pattern of feasible schedules in proof of Theorem 8. . . . . . . . . . . . . . . . . . . . . 134

5.1 Example of a scheduling anomaly when decreasing an execution time used in the proofof Theorem 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5.2 Example of a scheduling anomaly when increasing a period used in the proof of Theorem 13.1465.3 Example of a scheduling anomaly when increasing a deadline used in the proof of The-

orem 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.4 Example of a scheduling anomaly when decreasing a crpd used in the proof of Theorem 16.1495.5 Schedule constructed by the online algorithm and feasible schedule in the proof of The-

orem 21 - Case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1545.6 Schedule constructed by the online algorithm and feasible schedule in the proof of The-

orem 21 - Case 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1545.7 Di�erent cases in the proof of Theorem 23. . . . . . . . . . . . . . . . . . . . . . . . . . 1585.8 Schedule constructed by the online algorithm and feasible schedule in the proof of The-

orem 24 - Case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1595.9 Schedule constructed by the online algorithm and feasible schedule in the proof of The-

orem 24 - Case 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.1 Schedule produced by rm and edf and feasible schedule for Taskset Texample1. . . . . . 1666.2 Illustration of a subjob permutation in the proof of Theorem 4. . . . . . . . . . . . . . 1686.3 Illustration of slice constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1726.4 Illustration of job-piece disjunctive constraints. . . . . . . . . . . . . . . . . . . . . . . 1726.5 Illustration of preemption conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1736.6 Schedule constructed by the nearly-optimal o�ine approach. . . . . . . . . . . . . . . . 1746.7 Feasible schedule for Jobset Jexample2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1776.8 Schedule for the proof of Property 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1806.9 Schedule constructed by the optimal o�ine approach. . . . . . . . . . . . . . . . . . . . 1846.10 Schedule constructed by the optimal o�ine approach for Jobset Jexample2. . . . . . . . 1866.11 Number of slices for the optimal o�ine approach as a function of the number of jobs

per taskset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1896.12 Number of milp variables for the optimal o�ine approach as a function of the number

of jobs per taskset and of the number of slices per taskset. . . . . . . . . . . . . . . . . 189

34

LIST OF FIGURES

6.13 Number of milp constraints for the optimal o�ine approach as a function of the numberof jobs per taskset and of the number of slices per taskset. . . . . . . . . . . . . . . . . 190

6.14 Evaluation of the total computation time for the o�ine approach as a function of thenumber of jobs per taskset and of the number of slices per taskset. . . . . . . . . . . . 190

6.15 Feasible schedule for Jobset Jexample3 with a crpd parameter model taking into accountboth the preempting and preempted tasks. . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.1 Number of schedulable tasksets under rm, edf, lp-edf and off as a function of thetotal processor utilization for a Maximum Preemption Delay Factor equal to 0.2 . . . . 200

7.2 Average number of preemptions per job for rm, edf, lp-edf and off as a function ofthe total processor utilization for a Maximum Preemption Delay Factor equal to 0.2 . 201

7.3 Processor utilization due to the total crpd for rm, edf, lp-edf and off as a functionof the total processor utilization for a Maximum Preemption Delay Factor equal to 0.2 201

7.4 Weighted schedulability for rm, edf, lp-edf and off as a function of the MaximumPreemption Delay Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

7.5 Average number of preemptions per job for rm, edf, lp-edf and off as a function ofthe Maximum Preemption Delay Factor for U = 0.8 . . . . . . . . . . . . . . . . . . . 203

7.6 Processor utilization due to the total crpd for edf, lp-edf and off as a function ofthe Maximum Preemption Delay Factor for U = 0.8 . . . . . . . . . . . . . . . . . . . 203

7.7 Example of ecb and ucb placement in the cache for 4 tasks and a direct-mapped cacheof size 8 cache lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

7.8 Number of schedulable tasksets under rm, edf and off as a function of the totalprocessor utilization U . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

7.9 Processor and Memory Speedup Factor for rm, edf and off as a function of the totalprocessor utilization U . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

7.10 Evaluation of the impact of the Cache Utilization. . . . . . . . . . . . . . . . . . . . . 2097.11 Evaluation of the impact of the Cache Reuse. . . . . . . . . . . . . . . . . . . . . . . . 2107.12 Evaluation of the impact of the Cache Size. . . . . . . . . . . . . . . . . . . . . . . . . 2117.13 Evaluation of the impact of the Block Reload Time. . . . . . . . . . . . . . . . . . . . 2137.14 Number of schedulable tasksets under rm, edf and off as a function of the total

processor utilization for several crpd bound approaches. . . . . . . . . . . . . . . . . . 214

35

LIST OF FIGURES

36

List of Tables

2.1 Classic complexity results for the feasibility problem for periodic/sporadic tasks withconstrained deadlines executed on a uniprocessor from [EY15a]. . . . . . . . . . . . . . 87

3.1 Overview of di�erent methods taking the cache into account. . . . . . . . . . . . . . . 963.2 Taskset example used in Figure 3.5, 3.6 and 3.7 made of three synchronously-released

periodic tasks with constrained deadlines. . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.1 Taskset TExample1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184.2 Construction of a common supersequence (w9) starting from the cache request sequence

(w0) associated to an arbitrary schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.3 Taskset τ used for the proof of Theorem 6. . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.1 Sustainability results for Rate Monotonic (rm), Deadline Monotonic (dm) and EarliestDeadline First (edf) accounting for Cache-Related Preemption Delays. . . . . . . . . . 143

5.2 Set of independent jobs JTheorem21 used for the proof of Theorem 21. . . . . . . . . . . 1555.3 Set of asynchronously-released periodic tasks TTheorem22 used for the proof of Theorem 22.1555.4 Sporadic taskset TTheorem24 used for the proof of Theorem 24. . . . . . . . . . . . . . . 156

6.1 Tasks and jobs generated over the hyperperiod H = 12 for Taskset Texample1. . . . . . . 1666.2 Data and variables for the milp of the nearly-optimal approach. . . . . . . . . . . . . . 1696.3 Complete milp for the nearly-optimal solution. . . . . . . . . . . . . . . . . . . . . . . 1716.4 Output variables computed by the solver for Taskset Texample1 using the nearly-optimal

approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1756.5 Complete milp for Taskset Texample1 corresponding to the nearly-optimal o�ine approach.1766.6 Jobset Jexample2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1776.7 Data and variables for the milp of the optimal o�ine approach. . . . . . . . . . . . . . 1786.8 Complete milp for the optimal o�ine solution. . . . . . . . . . . . . . . . . . . . . . . 1826.9 Output variables computed by the solver for Taskset Texample1 using the optimal o�ine

approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1866.10 Complete milp for Taskset Texample1 using the optimal o�ine approach. . . . . . . . . 1876.11 Output variables computed by the solver for Jobset Jexample2 using the optimal o�ine

approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1886.12 Jobset Jexample3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

37

LIST OF TABLES

38

Acronyms

The main acronyms used throughout this PhD work are listed hereafter:

RTES Real-Time Embedded System

RTOS Real-Time Operating System

COTS Component O�-The-Shelf

CPU Central Processing Unit

DM Direct-Mapped (cache)

SA Set-Associative (cache)

FA Fully-Associative (cache)

LRU Least Recently Used (cache replacement policy)

AH Always Hit (memory reference)

AM Always Miss (memory reference)

ECB Evicting Cache Block

UCB Useful Cache Block

BRT Block Reload Time

CRPD Cache-Related Preemption Delay

CFG Control Flow Graph

WCET Worst-Case Execution Time

FTP Fixed-Task Priority (scheduling)

FJP Fixed-Job Priority (scheduling)

39

Acronyms

RM Rate Monotonic (scheduling policy)

DM Deadline Monotonic (scheduling policy)

EDF Earliest Deadline First (scheduling policy)

RTA Response Time Analysis

PDA Processor Demand Analysis

MILP Mixed-Integer Linear Programming

40

Notations

The main notations used throughout this PhD work are listed hereafter:

T → taskset

n → number of tasks in the taskset

τi → ith task in the taskset

oi → o�set of Task τi

Ci → Worst-Case Execution Time of Task τi

Ti → period of Task τi

Di → relative deadline of Task τi

J → jobset

Ji → ith job in the jobset

ri → release of Job Ji

pi → Worst-Case Execution Time of Job Ji

di → absolute deadline of Job Ji

Jij → ith job of Task τi

si → upper-bound on the crpd for Task τi or Job Ji

hp(i) → set of tasks with priorities higher than the priority of Task τi

hep(i) → set of tasks with priorities higher or equal to the priority of Task τi

lp(i) → set of tasks with priorities lower than the priority of Task τi

Ri → worst-case response time of Task τi

41

Notations

H → hyperperiod of the taskset

U → total processor utilization for the taskset

ui → processor utilization for Task τi

CU → total cache utilization for the taskset

PDF → Maximal Preemption Delay Factor for the taskset

RF → Reutilization Factor for the taskset

CS → cache size in number of cache lines for the taskset

42

Foreword

"If a listener nods his head when you're explaining your pro-gram, wake him up."

� Alan J. Perlis (Epigrams on programming, 1982)

The �rst pages of this PhD work are actually a brief foreword intended for non specialists (brave orfoolish enough to start reading a 200-page manuscript full of abstruse formulas). Readers alreadyfamiliar with the real-time �eld, and in particular real-time scheduling, can skip those lines and begindirectly with the general introduction to this PhD work on Page 49.

Motivation. For the last three years, I have been confronted with people, either relatives, friends oreven professionals (but not from the real-time research �eld), asking me about my PhD work. Mostof the time, I have answered with technical words such as "real-time", "scheduling", "preemption"...These people were often polite enough to nod their heads without understanding a single word fromsuch gibberish. So, after these three years, I �nally try to answer that question in a way that willhopefully be understandable by everyone. Of course, my goal here is not to explain precisely my workbut to give some clues about the issues raised in our researches (as we are obviously a lot of peopleworking on similar subjects in di�erent laboratories around the world).Hereafter, I proceed by analogy, using simple examples from everyday life, as it is nicely done for in-stance in Nelissen's PhD work [Nel12] or in [MSH11] to introduce Ada programming. I will particularlyfocus on the following concepts: task, scheduling, cache and preemption. Eventually, I will brie�y tryto link those concepts with what we actually do in the real-time research �eld.

Hereafter, we consider the example of Mr. Piccione, an old ice cream vendor. Mr. Piccione has a smallshop near the railway station. He sells his famous "multi-layered" ice creams: each ice cream is madeof layers of di�erent �avours. For example, he sells an ice cream called IC1 consisting in (from bottom

43

Foreword

to top) one layer of vanilla ice cream, one layer of black chocolate ice cream, one more layer ofvanilla ice cream, one layer of black chocolate ice cream again and �nally one layer of strawberryice cream:

vanillablack chocolate

vanillablack chocolatestrawberry

IC1

Task. To make an ice cream, Mr. Piccione follows the corresponding recipe which consists in thelist of steps to be executed (for example spread a layer of vanilla ice cream, then spread a layer ofblack chocolate ice cream, and so on). These instructions have to be performed by Mr. Piccione in theorder given by the recipe so as to spread the right ice cream. This list of actions is equivalent to atask for computers. Indeed, a task is made of several basic operations (read a character entered fromthe keyboard, display some text on the screen...) and computations, executed in a given order by thecomputing unit of the computer (called Central Processing Unit, CPU). Mr. Piccione, who executesthe di�erent actions listed in his recipes, corresponds to the CPU. Note that the same ice cream recipecan be prepared several times, and so a task can be executed several times.

Scheduling. His shop being quite famous, Mr. Piccione has often several ice cream orders to ful�ll.But having no employee, he cannot prepare them all at the same time: every step of each ice creamrecipe has to be performed at a time. Moreover, Mr. Piccione might have some constraints relatedto his customers' wishes: one might have ordered his/her ice cream within one hour, but another onemay want it as soon as possible. So, Mr. Piccione has to decide in which order to prepare each icecream, i.e. in which order each step will be performed. Similarly, the computing unit of a computerhas several tasks to execute at the same time with given due dates. But a uniprocessor system can onlyperform one operation at the time. Scheduling de�nes the order in which those tasks will be executedso that every due date is met.

Preemption. Now, suppose that Mr. Piccione is preparing an ice cream when a customer comes in ahurry and wants his/her ice cream immediately because his/her train is leaving soon. As Mr. Piccioneis a helpful man, he decides to stop his current ice cream preparation to begin preparing the new order.Stopping a task before its completion so as to start executing another one is called a preemption. OnceMr. Piccione has �nished preparing this urgent ice cream, he can switch back to the previous order,resuming the preparation step where he had stopped.

Cache. Mr. Piccione uses a lot of di�erent ice cream �avours, each �avour being stored in its ownbox. All ice cream boxes are stored in a large freezer. But as Mr. Piccione's shop is quite small, thefreezer had to be placed downstairs in the basement. So, at each step of an ice cream preparation,Mr. Piccione has to go downstairs to get the adequate �avour (speci�ed by the current recipe step).

44

But as the temperature in the shop is very hot, he has to bring the box back in the freezer, immediatelyafter the ice cream layer has been spread, to prevent the ice cream from melting. As Mr. Piccione isquite old and the stairs are steep, he can only carry one ice cream box at the time and has to walk veryslowly. So, he wastes more time getting the ice cream boxes than actually preparing the ice creams andhis customers have to wait longer. Consider, for example, the time needed to prepare the IC1 recipe(Van. stands for Vanilla ice cream, B.Ch. for Black Chocolate ice cream and Straw. for Strawberryice cream):

IC1 Van. B.Ch. Van. B.Ch. Straw.

time: 0

spreadVan.

1 2 3

spreadB.Ch.

4 5 6

spreadVan.

7 8 9

spreadB.Ch.

10 11 12

spreadStraw.

13 14 15 min.

Dark grey boxes correspond to the amount of time needed by Mr. Piccione to go downstairs to pickan ice cream box (1 minute) and bring it upstairs (1 more minute). Light grey boxes correspond tothe amount of time needed by Mr. Piccione to spread an ice cream layer (again 1 minute). The �gureabove can be interpreted in the following way: �rst, Mr. Piccione goes downstairs, picks the vanillabox and comes upstairs (total of 2 minutes), then he delicately spreads one layer of vanilla ice cream (1more minute); in a third phase, he brings the vanilla box back in the freezer and returns upstairs withthe black chocolate box (2 additional minutes), and so on. As a result, preparing an IC1 ice creamtakes a total time of 15 minutes.As depicted in the previous example, some ice cream recipes request the same �avours several timesbut not necessarily consecutively. Mr. Piccione, being a clever man, decides to buy a small fridge thatcan �t into his shop. Thus, he can store a few ice cream boxes right in his shop. Now, at each stepof an ice cream preparation, Mr. Piccione �rst checks whether the requested �avour is in the fridge.If not, he has to bring the corresponding box from the freezer downstairs. But if the box is alreadyin the fridge (because he has just used it for the previous order, for example), he does not need to godownstairs anymore and so he saves time. Consider again the preparation of the IC1 ice cream:

IC1 Van. B.Ch. Van. B.Ch. Straw.

fridgeVan. Van. Van. Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Straw.

time: 0

spreadVan.

1 2 3

spreadB.Ch.

4 5 6

spreadVan.

7

spreadB.Ch.

8

spreadStraw.

9 10 11 12 13 14 15 min.

This time, when Mr. Piccione has used the vanilla box (and later on the black chocolate one), he doesnot bring it back in the freezer but stores it in his tiny fridge. So, when he has to spread the secondlayer of vanilla ice cream (6 minutes after starting the ice cream making), he no longer needs to godownstairs and so he saves 2 minutes. The same situation arises again one minute later when he needsto spread the second layer of black chocolate ice cream. So, Mr. Piccione can save a total time of 4minutes when preparing the IC1 recipe.

45

Foreword

But as the fridge is very small (it can store only four ice cream boxes), it becomes full very quickly.So, after some time, when Mr. Piccione needs to store a new �avour in the fridge, he �rst has to geta previously used ice cream box from the fridge to put it back downstairs into the freezer. Here, thefridge is equivalent to a cache memory for a computer. Data used by the tasks (and so requested bythe CPU when executing those tasks), which are stored in the main memory (equivalent to the freezer),can be temporarily stored into a small memory called the cache. It takes far less time for the CPU toget a data from the cache than from the memory. So, if the same data is reused later, it will save timeas the data will be retrieved the second time from the cache and not from the main memory.

Cache-Related Preemption Delay. We suppose now that every evening, when closing his shop,Mr. Piccione empties his fridge and puts everything back into the freezer which is cooler than thefridge. So, when he opens his shop at 7:00 in the morning, the fridge is empty. First thing in themorning, Mr. Piccione has to prepare the daily order from Ms. Regolare: Ms. Regolare wants her IC1ice cream by 7:20. Then, usually at 7:12, Mr. Ansioso orders an IC2 ice cream (consisting in one whitechocolate ice cream layer, one cherry ice cream layer and �nally on top one to�ee ice cream layer).:

white chocolatecherryto�ee

IC2

Mr. Ansioso needs it by 7:22 as his train leaves at 7:25. That is no problem for Mr. Piccione as, thanksto his fridge, he only needs 11 minutes to prepare the IC1 ice cream, and so he will have �nishedMs. Regolare's order before Mr. Ansioso arrives (W.Ch. stands for White Chocolate ice cream):

IC1

Ms. Regolare's order

Van. B.Ch. Van. B.Ch. Straw.

due date

IC2

Mr. Ansioso's order

W.Ch. Cherry To�ee

due date

time: 7:00 :01 :02 :03 :04 :05 :06 :07 :08 :09 7:10 :11 :12 :13 :14 :15 :16 :17 :18 :19 7:20 :21 :22

But, today, Mr. Ansioso leaves earlier because of a meeting. His train departure being at 7:20, hecomes at Mr Piccione's shop at 7:07 and asks for his IC2 ice cream by 7:17 at the latest:

IC1


Van. B.Ch. Van.

due date

IC2

Mr. Ansioso's order due date


B.Ch.

Van.

B.Ch.

time: 7:00 :01 :02 :03 :04 :05 :06 7:07 :08 :09 7:10 :11 :12 :13 :14 :15 :16 :17 :18 :19 7:20 :21 :22

Mr Piccione, being very �exible, stops Ms. Regolare's preparation (that is a preemption) to start imme-diately the preparation of Mr. Ansioso's order. Before starting Mr. Ansioso's preparation, Mr. Piccione

46

had already the time to perform the �rst three steps of the IC1 preparation. So the vanilla and blackchocolate ice cream boxes are already in the fridge. When preparing the IC2 recipe, Mr. Piccione needsthree new �avours. The �rst two boxes �t into the fridge. But when Mr. Piccione needs the to�ee icecream box, his fridge is already full. So, he has to take an ice cream box back into the freezer. Beingin a hurry, he gets confused and takes the black chocolate ice cream to the freezer and stores the to�eeice cream box instead into the fridge. The IC2 ice cream is �nished at 7:16, meeting Mr. Ansioso'sdeadline:

IC1


Van. B.Ch. Van.

due date

IC2

Mr. Ansioso's order


due date


B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Cherry

Van.

W.Ch.

Cherry

Van.

W.Ch.

Cherry

Van.

To�ee

W.Ch.

Cherry

time: 7:00 :01 :02 :03 :04 :05 :06 :07 :08 :09 7:10 :11 :12 :13

spreadTo�ee

:14 :15 7:16 :17 :18 :19 7:20 :21 :22

So, Mr. Piccione can resume Ms. Regolare's ice cream preparation. He needs now to spread a layer ofblack chocolate ice cream. But unfortunately, the right �avour box is no longer in the fridge so he hasto bring it back from the freezer, thus wasting two minutes. As a result, he cannot end the IC1 icecream by 7:20 as needed:

IC1



due date

IC2

Mr. Ansioso's order


due date


B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Cherry

Van.

W.Ch.

Cherry

Van.

W.Ch.

Cherry

Van.

To�ee

W.Ch.

Cherry

To�ee

W.Ch.

Cherry

To�ee

W.Ch.

Cherry

B.Ch.

To�ee

W.Ch.

Cherry

B.Ch.

To�ee

Cherry

B.Ch.

To�ee

Cherry

B.Ch.

To�ee

Straw.

Cherry

time: 7:00 :01 :02 :03 :04 :05 :06 :07 :08 :09 7:10 :11 :12 :13 :14 :15 :16

spreadB.Ch.

:17 :18 :19 7:20 :21 :22

The problem is that an ice cream box which should have normally been in the fridge (if he had preparedthe IC1 ice cream without any interruption) is no longer there because he had to replace that box inthe fridge by boxes needed for a more urgent order. The additional delay needed to get the box thesecond time corresponds to a Cache-Related Preemption Delay.

Of course, such a hectic situation could have been avoided, if Mr. Piccione had been more thoughtful.For example, he could have taken the vanilla box downstairs instead of the black chocolate one. Buthe could also have scheduled the preparation steps di�erently:

47

Foreword

IC1



due date

IC2

Mr. Ansioso's order


due date


B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Van.

B.Ch.

W.Ch.

Cherry

Van.

W.Ch.

Cherry

Van.

W.Ch.

Cherry

Van.

To�ee

W.Ch.

Cherry

To�ee

W.Ch.

Cherry

To�ee

W.Ch.

Cherry

Straw.

To�ee

W.Ch.

Cherry

time: 7:00 :01 :02 :03 :04 :05 :06 :07 :08 :09 7:10 :11 :12 :13 :14 :15 :16 :17 :18 :19 7:20 :21 :22

At 7:07, when Mr. Ansioso orders his ice cream, Mr. Piccione can continue the preparation of Ms. Re-golare's ice cream for one more minute. So he can spread the second layer of black chocolate. Then,he can switch to the IC2 recipe. Mr. Ansioso's ice cream will still be ready on time, but now, whenresuming the IC1 preparation, there is no additional delay as the black chocolate spreading step hasalready been performed. So this ice cream will also be �nished on time.

Back to reality. Of course, real-time research does not unfortunately deal with ice creams. But,instead, it considers other useful applications such as airplanes, for example. Nowadays, airplanes(such as cars, trains or other systems we use in our everyday lives) use more and more electronics. Ina commercial plane, there is an automatic pilot but also a lot of other functionalities to control theair pressure in the cabin, to check the fuel tanks... So, modern airplanes have a lot of computers onboard, each computer having a lot of di�erent tasks to execute. As well as for Mr. Piccione, e�ciencyis requested as the amount of electronic functionalities increases, so computers with cache memorieshave become a natural trend in the aeronautic world to get better performances at a lower cost. But, asseen with our previous ice cream vendor example, using caches adds some new constraints. Especially,Cache-Related Preemption Delays might occur. As we do not want the plane to crash because sometask has not completed its computation on time, we shall analyze precisely the system by taking thoseadditional delays into account.Usually, research in real-time scheduling focuses on two main considerations:

� propose algorithms (i.e. general methods) to schedule di�erent tasks so that every deadline ismet: the idea is to de�ne scheduling rules (for example schedule tasks starting by the one withthe closest due date) that can be applied to a vast range of cases instead of �nding an individualsolution for every case,

� provide analyses for those algorithms that allow to certify beforehand that a given system is surenot to miss a deadline when being executed.

In this PhD work, these two issues are studied as soon as caches are considered.

48

General Introduction

I. Motivation

Real-Time Embedded Systems, i.e. computer hardware and software designed to perform a dedicatedfunction [Gro07, Nel12], are now widespread in every aspect of our everyday lives. Today, 99% ofthe processors that are produced every year in the world are dedicated to the embedded system mar-ket [BW01]. In particular, real-time applications can be found in critical systems such as nuclearplants, personal cars, trains, satellites, space ships and commercial or military airplanes.

Figure 1: Example of software bug leading to a critic failure: explosion of the Ariane 5 launcher forits �rst �ight on June 4th, 1996. Photo: ESA.

The aeronautical and space sectors are especially critical. A failure can result in huge costs and dramatichuman losses. To avoid (as much as possible) such dramatic failures, great e�orts are made to ensuresafety, in particular through the use of speci�cations, such as the ARINC 653 5, and standards, such

5http://www.aviation-ia.com/cf/store/catalog.cfm?prod_group_id=1&category_group_id=3

49

http://www.aviation-ia.com/cf/store/catalog.cfm?prod_group_id=1&category_group_id=3


as the DO-178B 6. This need for safety extends from the structure (the wing of an aircraft must notbreak because of the aerodynamic forces) to the software. Software code, of course, has to be correct,in particular the application must be functionally correct, i.e. must respond to given speci�cations(the �rst Ariane 5 �ight was a failure, as depicted in Figure 1 because the guiding system could notcope with the high accelerations experienced by the launcher 7). But, as a real-time application isoften made of several tasks subjected to strict timing constraints, it must be ensured that all thesetasks can and will be executed concurrently without missing a deadline (the Mars Path�nder landerexperienced some troubles because one task could not be executed due to an incorrect managementof shared resource accesses resulting in a priority inversion [Wil97]). For this last case, methods areneeded to schedule those tasks correctly. But analyses must also be developed to ensure beforehandthat the system will experience no scheduling problem later on.A lot of research has been conducted in the real-time scheduling �eld, ranging from the pioneer workfrom Liu and Layland [LL73] to the study of multiprocessor systems [Bar07, FLS11] and more complextask systems (in particular tasks that can execute in parallel [KI09, GR16] or that can execute part oftheir code in parallel [MBB+15]).

The current trend in the aeronautic industry is to use more and more electronic systems. Those systemsinclude of course the automatic pilot and air pressure regulation in the cabine. But, more and moreactuator commands are now electric. In particular, since the A320 aircraft from Airbus, manual �ightcontrols are replaced by �y-by-wire systems [LEJ15]. This trend has reached such an extend that onthe new A380, even �ight control back-up systems are purely electrical 8. As a consequence, airplanesneed more and more processing power. To limit the number of on-board computers, more e�cienthardware systems (in particular more powerful processors) have to be used.As embedded systems now represent a more and more important part of the total system, the currenttrend to decrease the development costs is to use Components O�-The-Shelf (cots) [Adm01, Adm04],i.e. components that have been developed and are already widely used for other embedded applica-tions. Such cots systems use some hardware features intended to improve the average performancesand which have already been used for a long time in personal computers. In particular, almost allcurrent processors use pipelines, branch predictors and several levels of cache memories. A pipelineenhances the CPU performances by overlapping the execution steps (fetch, decode, execution, memoryaccess, register write back) of the di�erent instructions of a program, as those steps require separatehardware circuits. As for branch prediction, it allows to improve the pipeline performances by tryingto predict the next instruction to be fetched, when a conditional branch is encountered into the taskcode. Both pipeline and branch prediction allow to speed up the processor in addition to the increasein the CPU frequency over those last decades. As a result, the gap between the processor speed andthe main memory access time has increased exponentially. Cache memories were introduced to bridgethis gap. They are fast memories located between the processor registers and the main memory. Theyare much faster than the main memory but much more expensive. The idea is to store in the cacheinstructions and/or data that are frequently used (because, for example, of loops in the task code) todecrease further loading times.

6http://www.rtca.org/store_product.asp?prodid=5817http://www.esa.int/esapub/bulletin/bullet89/dalma89.htm8http://www.fzt.haw-hamburg.de/pers/Scholz/dglr/hh/text_2007_09_27_A380_Flight_Controls.pdf

50

http://www.rtca.org/store_product.asp?prodid=581

http://www.esa.int/esapub/bulletin/bullet89/dalma89.htm

http://www.fzt.haw-hamburg.de/pers/Scholz/dglr/hh/text_2007_09_27_A380_Flight_Controls.pdf

II. OBJECTIVES AND SOLUTION OVERVIEW

Cache memories (as well as pipelines and branch predictors) allow to increase the average performancesof a system. But to ensure safety, worst-case scenarios have to be considered when dealing with crit-ical applications. When using a processor with a cache, the execution time of an instruction dependswhether the instruction is found in the cache or has to be loaded from the main memory, which canbe up to 10 times more costly [Lev09]. As multiple tasks are executed on the same CPU, they mightalso access the same locations in the cache. For example, when preempting another task, a task canoverwrite some cache locations already used, and needed later on, by the preempted task. So, whenthe preempted task resumes its execution, it needs to reload those evicted instructions/data from themain memory rather than accessing them directly from the cache. Those additional reloads are knownas Cache-Related Preemption Delays (crpds). crpds can represent up to 40% of a task worst-caseexecution time as shown in [PC07]. Thus, the cache behaviour has to be precisely studied to boundthose e�ects and take them into account when dealing with scheduling. But this is no easy exercise. Asdepicted in Figure 2, there is a circular dependency between task execution times and Cache-RelatedPreemption Delays: crpds increase the task execution time, so, when scheduled, this task is likely tobe preempted more often and as a consequence will experience more crpds...

Task

Execution

Times

Cache

Interference

Preemptions

increase

increasesincrease

Figure 2: Circular dependency between task execution times and Cache-Related Preemption Delays.

Using cache memories in critical real-time embedded systems makes the scheduling problem much morecomplex. So, this problem has to be studied further on to propose new scheduling strategies minimizingcache e�ects and also schedulability analyses accounting for crpds to solve the predictability problem.

II. Objectives and Solution overview

The overall purpose of this PhD work is to study Hard Real-Time Scheduling subjected to Cache-RelatedPreemption Delays. This aim being very general, we focus in particular on the following goals:

1. formalize the problem of scheduling hard real-time tasks on a uniprocessor system with a cachememory,

2. conduct a theoretical study of this problem by focusing, in particular, on the computationalcomplexity of taking optimal scheduling decisions when considering cache e�ects,

51


3. study the impact of the cache interference (assessed through Cache-Related Preemption Delays)on existing online scheduling policies, in particular Rate Monotonic and Earliest Deadline First,

4. propose solutions to the problem of scheduling hard real-time tasks on a uniprocessor systemwith a cache memory.

To achieve such goals, we �rst focus on de�ning two di�erent scheduling problems. For the Cache-awareScheduling Problem, the cache state at each instant and the task memory requirements are preciselyknown, so the impact of each scheduling decision on the cache content can be precisely analyzed. Forthe crpd-aware Scheduling Problem, the cache interference is assessed through upper-bounds on theCache-Related Preemption Delays.Then, we study the computational complexity of those two problems. We prove that both problemsare NP-hard in the strong sense, which means that no scheduling algorithm, running in polynomial orpseudo-polynomial time, can optimally solve those problems.In a third time, we study the impact of Cache-Related Preemption Delays on online scheduling policies.We consider in particular Rate Monotonic and Earliest Deadline First, which have been widely studiedin the real-time scheduling literature. We �rst show that both policies su�er from scheduling anomalies.More precisely, neither Rate Monotonic nor Earliest Deadline First are sustainable when crpds areconsidered. Then, we prove that no online scheduling policy can be optimal when Cache-RelatedPreemption Delays are accounted for.As no online scheduling policy can be optimal for the crpd-aware scheduling problem, we proposein a fourth time two o�ine approaches which use mathematical programming. We show that thesecond approach is optimal for the crpd-aware scheduling problem when a simple crpd parameter isconsidered in the task model.Finally, we conduct several experiments to measure the loss of schedulability of Rate Monotonic,Earliest Deadline First and our optimal o�ine scheduling approach when Cache-Related PreemptionDelays are considered.

III. Organization

The remainder of this PhD work is divided into two main parts. First, we introduce concepts linked toreal-time, scheduling and cache memories and also brie�y present the main works from the real-timeliterature dealing with caches. Then, we present our di�erent contributions.

This PhD work starts with a brief overview of the current state-of-the-art (Part I).In Chapter 1, we begin by introducing some general concepts dealing with real-time and embeddedsystems. We also brie�y describe the functioning of a cache memory.Then, in Chapter 2, we present the main de�nitions and results about real-time scheduling whichwill be useful for the remainder of this PhD work. In particular, we explain what scheduling is andwhat a schedulability analysis is about. In the last part of this chapter, we link scheduling and cachememories and describe some consequences the use of caches has.Finally, in Chapter 3, we brie�y describe the main existing works dealing with real-time schedulingand cache memories. In particular, we propose there a classi�cation of those di�erent works. Thatallows us to identify some aspects that have not been much studied, in particular the problem of taking

52

III. ORGANIZATION

optimal scheduling decisions when dealing with Cache-Related Preemption Delays.

Once those necessary foundations are laid, we present the di�erent contributions developed during thethree years of this PhD (Part II).First, in Chapter 4, we formalize the problem of scheduling real-time tasks on a CPU with a cachememory. In particular, we identify two distinct scheduling problems which we call respectively Cache-aware Scheduling Problem and crpd-aware Scheduling Problem. Then, we study the computationalcomplexity of both problems. Finally, we brie�y discuss the crpd-aware scheduling problem as we willconsider it in the remainder of this work.In Chapter 5, we present two contributions dealing with online real-time scheduling. First, westudy classic online scheduling policies, in particular Rate Montonic and Earliest Deadline First, whenCache-Related Preemption Delays are accounted for. We show that some timing anomalies might arise.Then, in a second part, we study the general problem of �nding an optimal online scheduling algorithmaccounting for crpds. Unfortunately, we prove that such an online algorithm does not exist.As optimal online scheduling is impossible, we focus in Chapter 6 on o�ine real-time scheduling withCache-Related Preemption Delays. Thus, we propose two approaches, using Mixed-Integer LinearProgramming, to compute o�ine schedules for real-time tasks subjected to crpds. In particular, thesecond scheduling approach is optimal for the crpd-aware scheduling problem under some assumptionson the task model.Finally, inChapter 7, we study the impact of Cache-Related Preemption Delays on the system schedu-lability. The goal of these experiments is two-fold. On one hand, we evaluate the loss of schedulabilityof classic online scheduling policies such as Rate Monotonic and Earliest Deadline First as soon asCache-Related Preemption Delays are accounted for. On the other hand, we assess the performancesof our o�ine solution against Rate Monotonic and Earliest Deadline First. Two series of experimentsare conducted, based on a di�erent representation of the Cache-Related Preemption Delay parameter.

Eventually, in Part III, we conclude this whole work and propose some perspectives for possible futurework.

53


54

Part I

Research Foundations

55

Chapter

1 Real-Time Embedded Systems and

Cache Memories: Basic Notions

Contents

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

II Real-Time and Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

II.1 De�nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

II.2 Structures of Real-Time Embedded Systems . . . . . . . . . . . . . . . . . . . 62

II.3 Real-Time cots-based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 64

III Cache Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

III.1 Main principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

III.2 Cache organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

III.3 Timing anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

IV Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Abstract

This chapter introduces basic notions related to real-time embedded systems and cache memo-ries. First, the notion of real-time system is addressed. Classic hardware and software architec-tures for real-time systems are brie�y presented. Then, the trend to use new hardware featuresin embedded systems is shortly discussed. Finally, cache memories are introduced. We presentvery brie�y the main cache features.

57

CHAPTER 1. REAL-TIME SYSTEMS AND CACHE MEMORIES

58

I. INTRODUCTION

I. Introduction

In this chapter, we present brie�y some basic notions and de�nitions related to real-time embeddedsystems and cache memories. In particular, we discuss the trend of using Components O�-The-Shelfin modern real-time embedded systems and what is the impact on the system architecture.In Section II, we �rst introduce the notion of real-time system and then the notion of embedded system.Then, we deal very brie�y with hardware and software architecture matters for such systems. We focusin a second phase on the new trends in modern real-time embedded systems, and in particular on theuse of Components O�-The-Shelf. We present several features which are now more and more oftenfound in real-time embedded systems. Then, in Section III, we focus on one of these hardware features,namely the cache. In particular, we discuss several cache features which are commonly found. Finally,we conclude this �rst chapter in Section IV.

II. Real-Time and Embedded Systems

We �rst introduce real-time embedded systems which represent the main area of interest for this PhDwork.

II.1. De�nitions

We start with some de�nitions dealing with real-time embedded systems.

De�nition 1. (from [Sta88]) A Real-Time System is a system for which correctness depends not onlyon the logical result of the computation but also on the moment at which this result is produced.

Process and

environment

Real-Time

Control

SystemSENSORSmeasures

andevents

ACTUATORS

commandsand

messages

Figure 1.1: Schematic representation of a Real-Time System.

Most real-time systems are actually control/command systems, i.e. systems which are in interactionwith their physical environment. As depicted in Figure 1.1, the environment is assessed through sensorswhich measure physical values (for example altitude, speed... for an aircraft �ight control system) anddetect events (for example a command from the pilot). As a result of these inputs, the system computescommands which are applied on the environment through actuators.

59


Thus, a Real-Time system is subjected to timing constraints related to the dynamics of its environment.Real-Time systems can be classi�ed according to the criticality of those timing constraints (see forexample [But11, LLS07, SDG09]):

Soft real-time systems: some timing constraints may not be respected without compromising thesystem (for example IP communication, body electronics in cars...),

Firm real-time systems: some timing constraints may not be respected without compromising thesystem but the result might become useless (for example �nancial forecast systems)

Hard real-time systems: not respecting a timing constraint results in the system failure (for exam-ple �ight control system for an airplane, brake system for a car...).

Note that the distinction between soft and �rm real-time systems is quite blurred, depending on theauthors (for example, a video playing system can be classi�ed as �rm or soft).

Very often, a real-time system (such as a control �ight system) is part of a larger system (as thewhole avionics of an aircarft) and so is referred to as a Real-Time Embedded System (rtes).

De�nition 2. (from [Gro07, Nel12]) An Embedded System is a combination of computer hardware andsoftware, and perhaps additional mechanical or other parts, designed to perform a dedicated function.

Embedded systems are widespread nowadays in every aspect of our daily lives: about 99% of allproduced processors are for the embedded system market ([BW01]), from smartphones to cars, planes orsatellites. Embedded systems are subjected to speci�c constraints in comparison with classic computingsystems (such as a personal computer or a computer server):

� space constraints,

� energy consumption constraints,

� energy dissipation constraints,

� ...

As a result, embedded systems are often less powerful than non-embedded ones (lower CPU speed,smaller memories...).

Example 1.1: The mini-drone AMADO is an example of Real-Time Embedded System.It was developped in 2002 at ENSMA (a French aeronautical engineering school) by studentsand researchers as part of an international competition between universities led by the ON-ERA (the French national aerospace research center) and the DGA (the French Governmentagency responsible for weapon management and development). A schematic representationof the embedded system of the AMADO drone is depicted in Figure 1.2. The drone iscontrolled from the ground. A modem allows to communicate with the drone. The �ightcontrol system, executing on the CPU, uses the commands from the pilot and the �ight data(attitude, altitude, speed) coming from two captors (an inertial unit and a GPS receiver)

60

II. REAL-TIME AND EMBEDDED SYSTEMS

Environment

CPU

GroundStation

InertialUnit

attitude

Modem

instructions �

ightdata

GPSReceiver

GPS data(position,altitude,speed)

Servomotor

elevonangle

Servomotor

elevonangle

PropellerEngine

angular

velocity

Figure 1.2: Example of the AMADO drone.

to manoeuver the drone through several actuators (servomotors to move the elevons andpropeller engine). To ensure the drone stability, the �ight control system must respect stricttiming constraints.

For Real-Time systems, determinism has to be ensured, i.e. they have to provide the same results underthe same condition. As Real-Time systems are subjected to timing constraints, their design is alsovery di�erent from other systems. Very often, for a classic computing system, the aim is to minimizethe average response time of the application, in other words the system is designed to be as fast aspossible on average. For Real-Time systems, and in particular hard real-time ones, average cases areless relevant. Instead, worst-case scenarios have to be considered to ensure that timing constraints willalways be met whatever the system might experience during its execution (various computing times,congestion to access data in the memory...). This notion is referred to as predictability [SR990].

Predictability is a core matter for real-time systems. It is a�ected by the internal characteristics of thehardware system but also by the applications running on the system (see for example [But11]).

61


II.2. Structures of Real-Time Embedded Systems

A Real-Time system consists of one or several applications (software part) running on a physical system(hardware part).

APPLICATION

Task1

Task2

Task3

communications,

inputs/outputs...

EXECUTIVE KERNEL

real-time clock,

scheduler...

S

O

F

T

W

A

R

E

HARDWARE

CPU bridge

BUS

MemoryInputs/

Outputs

BUS

Figure 1.3: Architecture of a real-time system.

II.2.a. Hardware architecture

The hardware part (lower part of Figure 1.3) consists in at least a processor to execute the applicationinstructions, a memory to store the application code and used data and possibly input/output cardsbut also network facilities and storage support.

Processor. The core component of a processor, referred to here as Central Processing Unit (CPU),consists in the hardware circuits dedicated for processing. Its goal is to perform the basic operations(arithmetic but also input/output operations) speci�ed by the application instructions. A CPU isde�ned by its characteristics (speed, power consumption...) but also by its architecture (register sizesbut also its instruction set). The CPU processes an instruction in several steps, requiring separatehardware circuits:

� Fetch: the instruction is retrieved from the program memory and stored in an instruction register,

62

II. REAL-TIME AND EMBEDDED SYSTEMS

� Decode: the instruction is interpreted (which depends on the instruction set architecture) todetermine which operation has to be performed,

� Execute: the operation corresponding to the instruction is performed.

Depending on the instruction, an additional step might be performed by the CPU, corresponding to aWriteback (i.e. the result of the operation is written to the memory).

Example 1.2: Consider the following instruction from the ARMv7 instruction set:

11100010100000110011000000000001

This instruction is interpreted (Decode step) as:

1110︸︷︷︸always

00 1︸︷︷︸immediate

0100︸︷︷︸ADD opcode

0︸︷︷︸condition update

0011︸︷︷︸Rn=R3

0011︸︷︷︸Rd=R3

0000︸︷︷︸rotate

00000001︸︷︷︸immediate︸︷︷︸

operand

which corresponds to:

add r3, r3, #1

i.e. add 1 to the integer stored in Register r3 and put the result back in Register r3. Thecomputation is performed using the Arithmetic Logic Unit.

Several CPUs can be associated on the same chip to increase the system performances. Such architec-tures are referred to as multiprocessors. Several processors can also be linked together using a network(CAN network for the automative, AFDX for aircrafts) to form a distributed architecture. In theremainder of this PhD work, we only deal with uniprocessors, i.e. processors with only one CPU.

Memory. The memory consists of electronic components used to store the application code and data.Hereafter, we will mainly consider the main memory which refers to the memory where the applicationcode and the data used by the application are stored at runtime (and which can be either RAM orFLASH memory).The CPU retrieves instructions and data from the main memory and writes back data to it using amemory bus.

II.2.b. Software architecture

A typical real-time application is made of several tasks. Each task consists in a sequence of instruc-tions representing one or several functionalities. These tasks might be executed concurrently on aCPU. The notion of concurrent programming, �rst implemented by Kilburn and Howarth on the AtlasComputer [Han13], has been studied from the mid-60s onwards in particular by Dijkstra [Dij65].

63


Example 1.3: The embedded software for the AMADO drone is for example structuredin several tasks:

� one task to read the pilot commands received through the MODEM,

� one task to read the current altitude and speed from the GPS receiver,

� one task to compute commands to be applied on the di�erent actuators

� ...

All these tasks are recurrent and are executed at di�erent rates. These tasks are executedconcurrently, which means that at some time, the task reading the GPS data might beexecuted after the task reading the pilot commands, but at some other time it may be thecontrary. Those task executions might even be overlapped.

To use the hardware on which it is executing (inputs, outputs...), an application requires an interme-diate software layer called the executive or operating system (depending on its size and the services itprovides to the application layer). For Real-Time Systems, a real-time executive/operating system isused to ensure determinism. As depicted in Figure 1.3, a real-time executive is structured in (see forexample [CGG+14]):

� a real-time kernel, which manages the access to hardware resources (real-time clock, memory...),provides interrupt handlers but also a dispatcher (responsible for assigning a task to the CPU)and a scheduler (for implementing a scheduling policy, see next chapter)...,

� the executive itself, which includes the real-time kernel and provides in addition system calls tomanage communications, use of inputs/outputs...

A real-time operating system (RTOS) is made of a real-time executive plus a third layer providingadditional services such as communication means for the user (shell...). Very often, a RTOS implementsan existing standard: real-time extension of POSIX (for general-purpose operating systems), OSEK(for the automotive), APEX which is an Application Programming interface from ARINC 653 (foravionics)...Tasks are often implemented as threads at the RTOS level, that is to say they share common resources(in particular the memory). Some programming languages, such as Ada, can handle tasks directly andso can be used with smaller real-time kernels, while other languages, such as C, need a real-time kernelproviding full task management facilities.

II.3. Real-Time cots-based Systems

Nowadays, embedded systems are increasingly used and become larger and more complex. For ex-ample, the avionics of a plane is made of di�erent systems to manage di�erent sensors (Pitot tubes,altimeters...) and actuators (engines, servomotors to move the ailerons...) [LEJ15]. As a result, thenumber of embedded systems but also the size of the software code have increased a lot in the aeronau-tics industry, see Figure 1.4. One consequence of such an evolution is that more and more processorcapacity is needed. So more e�cient hardware systems (in particular more powerful processors) haveto be used.

64

III. CACHE MEMORIES

0

10

20

30

40

50

60

70

80

90

100

∼1k

∼10k

∼100k

∼1M

∼10M

∼100M

∼1G

years1960 1970 1980 1990 2000 2010

A300A310

A320A330

A340-600A380

number ofelectronicequipments

softwaresize

Figure 1.4: Evolution of embedded systems for the aeronautic industry based on data from Airbusand [LEJ15].

But developing such systems is quite expensive, in particular for aircrafts or satellites, as they will onlybe produced in a small quantity. To decrease the cost, there is an increasing trend to use ComponentsO� The Shelf (COTS) [Adm01, Adm04], in particular processors using some hardware features, alreadyused in personal computers, intended to improve the average performances:

pipeline: technique to enable CPU acceleration by overlapping the execution steps (fetch, decode,execution, memory access, register write back) of di�erent instructions, as those steps requireseparate hardware circuits,

branch predictor: hardware circuit which tries to guess which instruction will be accessed next (inparticular in case if-then-else statements are used in the task code),

cache: it is a fast memory located between the processor registers and the main memory to bridgethe speed gap between the CPU frequency and the main memory access time.

The main issue is to ensure determinism when using such features. In the remainder of this PhD work,we focus mainly on the cache.

III. Cache Memories

III.1. Main principles

In the last decades, processors have become faster and faster: CPU frequencies have achieved hun-dreds of MHz (even GHz for processors on personal computers), pipelines and branch prediction also

65


performance

1

10

100

1000

10000

100000

years1980 1985 1990 1995 2000 2005 2010

CPU performance

Memory performance

CPU-Memoryperformance gap

Figure 1.5: Schematic representation of the increasing gap between the CPU frequency and the mainmemory access time based on [HP90].

contribute to increase the processor performances. But on the other hand, the main memory speed(i.e. memory access time) has not increased as the same rate as depicted in Figure 1.5. Actually, to getfaster memories, faster technologies should be used, such as SRAM (Static random-access memory)instead of DRAM (Dynamic random-access memory). But such technologies have an expensive cost.So, one solution is to use a memory hierarchy (see Figure 1.6): the main memory still consists in alarge and slow but less expensive memory such as DRAM, whereas an additional faster but smallerand more expensive memory, called cache, is added between the CPU registers and the main memoryas depicted in Figure 1.6.Cache memories are faster than the main memory but still slower than CPU registers. They are used tosave memory blocks loaded from the main memory (amemory block corresponds to the smallest amountof contiguous bytes that can be transferred in the memory bus, either data or several instructions).So, later accesses to those blocks by the CPU will be served directly by the cache, resulting in timeearning and less power consumption according to [VLX03].The e�ectiveness of caches is based on the principle of reference localities, which can be:

� spatial locality : a resource is more likely to be referenced if a reference close to it has beenreferenced recently, e.g. instructions are often referenced sequentially.→ it is the case for program codes, in which instructions are referenced sequentially (as soon asno jump is involved). So, when an instruction is accessed, it is often judicious to also load thefollowing instructions (neighbourhood) into the cache, as they are likely to be referenced after.

� temporal locality : already-referenced resources are more likely to be re-referenced in a short lapseof time, e.g. instructions in a loop.

66

III. CACHE MEMORIES

Processor chip

CPU

registersL1 instr.cache

L1 datacache

L2sharedcache

mainmemory

memory bus

size:latency:

managed by:

∼ 128 B1 cycle

compiler

∼ 32 KB∼ 2 cycles

hardware

∼ 128-256 KB∼ 8 cycles

hardware

∼ MB∼ 16-100 cycles

executive

Figure 1.6: Example of memory hierarchy for a uniprocessor system with two levels of cache withtypical values found on ARM® processors [ARM].

→ it is the case for program loops where the same instructions are reused several times consec-utively.

Caches can be classi�ed depending on the kind of data they store:

� Instruction Cache: contains only program instructions,

� Data Cache: contains only program data,

� Uni�ed Cache: which can store both instructions and data at the same time.

To be more e�cient, nowadays architectures are often composed of several cache levels: each cache isbigger but slower than the previous one. For example, as depicted in Figure 1.6, we can �nd a processorwith two Level 1 (L1) caches (one for instructions and one for data) and a Level 2 (L2) uni�ed cache.The cache size is also an important issue. According to [KW03], small L1 caches have low accesslatencies (1 or 2 cycles) whereas larger L1 caches have higher ones.In the remainder of this PhD work, for sake of simplicity and if not stated otherwise, we only considerone cache level and very often only instruction caches.

III.2. Cache organization

A cache is divided into cache lines of equal size. Each cache line can store one memory block loadedfrom the main memory. A memory block is a logical partition of the main memory. It is the smallestamount of bytes which can be loaded at a time from the main memory. It can contain several data(for data caches) or instructions (for instruction caches), to increase spatial locality.The access to a memory block stored into the cache is classi�ed as a cache hit whereas the access toa non-cached memory block is classi�ed as a cache miss. A cache miss has a much higher cost than acache hit (see [MB91]), because missing blocks have to be loaded into the cache from the main memory.

67


cacheLine 0Line 1Line 2Line 3

main memoryAddress 0Address 1Address 2Address 3Address 4Address 5

...

(a) Direct-Mapped Cache.



...

(b) 2-Way Set-Associative Cache.



...

(c) Fully-Associative Cache.

Figure 1.7: Di�erent cache mappings.

This miss penalty is very dependent on the architecture and whether data caches are considered ornot, because writing back to the main memory can be needed if the data has been modi�ed.Sometimes, as the amount of data or instructions used by the system might exceed the cache capacity,blocks stored in the cache will be replaced by new ones loaded from the main memory. As depicted inFigure 1.7, di�erent strategies can be used to decide to which cache line a given memory block will bemapped, see for example Altemeyer and Burguière [AB11]:

� direct-mapped caches: a memory block has only one possible location into the cache (one givencache line to which it can be mapped), depending on its address, as shown in Figure 1.7a,

� fully-associative caches: a memory block might be mapped to every line, depending on the cachehistory (i.e. previous accesses, blocks already into the cache) and a replacement policy (seebelow), as shown in Figure 1.7c,

� set-associative caches, which is the intermediate case: the cache is divided into sets of equalnumber of lines (called cache associativity) and a given memory block can only be mapped to aparticular set, depending on its address, but then be placed anywhere into that particular set,depending on the cache history and a replacement policy, as shown in Figure 1.7b,.

When a memory access is peformed by the CPU, then the cache has to be checked to determine whetherthe corresponding memory block is already in the cache. To do so, the memory block address in themain memory is split (as shown in Figure 1.8) to �nd the index of the cache set which this block shouldbe mapped to. Then the tag is compared to each cache line tag. If the tags match, then the memoryblock is already cached, and the appropriate part of the memory block (as a block may contain forexample several instructions) is returned using the block o�set.

Example 1.4: Consider a 2-way set-associative cache with a total size of 64 Bytes anda line size (and so a memory block size) of 8 Bytes. This cache is so divided into 4 sets.We consider once again the ARM7 instruction set where each instruction is stored using 4Bytes. So each memory block contains 2 instructions.Consider that the CPU requests the instruction stored at address 0x0000000c which corre-sponds to b00000000000000000000000000001100 using the binary notation. This address is

68

III. CACHE MEMORIES

Address tag index blocko�set

Cache...

tag

tag

tag

tag

set i

tag

tag

tag

tag

set i+ 1

...

Figure 1.8: Cache organization of a k-way Set-associative cache.

interpreted as follows:

000000000000000000000000000︸︷︷︸tag

01︸︷︷︸index

100︸︷︷︸o�set

Bits 3 and 4 (starting at 0 from the right) corresponds to the cache set index: b01 i.ethe second cache set. Then the tag (b000000000000000000000000000) is used to determinewhether the block is in the cache set. If so, the instruction can be found at o�set b100, i.e.4 bytes from the block start (it is actually the second instruction contained in the memoryblock).

For fully- and set-associative caches, di�erent replacement policies exist:

� o�ine policies for which the sequence of accessed memory blocks is known a priori:

� an optimal replacement policy has been proposed by Belady [Bel66]: the memory block thatwill be reused the furthest in the future is replaced.

� online policies for which the sequence of accessed memory blocks is not known a priori:

� as no online policy can be optimal [ST85], di�erent sub-optimal policies have been proposed:

* Least Recently Used (lru): memory blocks are replaced in the reverse order in whichthey were used (see Figure 1.9),

69


* First-In First-Out (fifo): memory blocks are replaced in the order in which they wherecached,

* Pseudo lru (plru): which is a tree-based approximation of lru and so is easier toimplement in comparison with lru,

* random,* ...

� fifo and lru have the best possible performances for any online cache replacement pol-icy [ST85] and lru is also the best possible online policy in terms of predictability [Rei08].

The idea behind the lru policy is to replace memory blocks in the reverse order in which they wereused. So, when a cache miss occurs and the cache set is already full, the least recently referenced blockis evicted. To determine which block is the least recently used, an age is associated to each blockcontained in the cache. Each time there is an access to another block, the block age is increased byone until it is evicted from the cache. But if the block is accessed, its age is reset to 0.

Example 1.5: Consider the 4-way fully-associative cache depicted in Figure 1.9. Thecache is supposed to contain at �rst 4 blocks labeled a to d, a being the most recently usedone and d the least recently used one. At that point, an access is performed to MemoryBlock e. As this block is not yet in the cache, it results in a cache miss. So, the block isloaded from the main memory and Block d is evicted being the least recently used one. NowBlock e is the most recently used block and c the least recently used one. Then, an accessis performed to Block b, which is already into the cache: it is a cache hit. So, no block isevicted from the cache, only ages are changed: b becomes the most recently used block.

age

a b c d e a b c b e a cacces to e:cache miss

acces to b:cache hit

Figure 1.9: Example of block replacement for a 4-way fully-associative cache using the lru policy.

More details on replacement policies can be found in Reineke's thesis [Rei08].

III.3. Timing anomalies

If cots technologies such as pipeline and cache memories increase substantially the processor per-formances, they make the system behaviour harder to predict. In particular, such processors canface timing anomalies (�rst introduced by Lundqvist et al. [LS99] and latter formalized by Reineke etal. [RWT+06]): considering the worst-case behaviour of a component (for example a cache miss) canlead to an optimistic approximation of the overall system behaviour. In particular, having a cache missat some program point rather than a cache hit may result in a shorter overall execution time.

70

IV. CONCLUSION

Example 1.6: Consider a sequence of 4 instructions, denoted I1, I2, I3 and I4, whichare dispatched at times t1 < t2 < t3 < t4. Instruction I2 is dependent on Instruction I1(i.e. it cannot start its execution before I1 is completed) and Instruction I4 is dependent onInstruction I3. I1, I4 and I2, I3 use two di�erent processor resources A and B (for examplethe Load/Store Unit for I1 and I4 and the Arithmetic Logic Unit for I2 and I3). We considerthese resources to be out of order (two independent instructions can be executed in a di�erentorder than the one of the program code). As depicted in Figure 1.10, A being a cache missresults in a lesser execution time for the instruction sequence, as the delay due to I1 beingloaded from the main memory makes it possible for I3 to execute before I2 and as a resultI4 can also be executed earlier.

I1 is a MISS

time

resource B

resource A

t1 t2 t3 t4

I1 I4

I3 I2

I1 is a HIT

time

resource B

resource A

t1 t2 t3 t4

I1 I4

I2 I3

Figure 1.10: Example of a timing anomaly: experiencing a cache hit instead of a cache miss leads toa longer execution time.

In the remainder of this paper, we only consider fully timing compositional architectures (accordingto the architecture classication presented in Cullmann et al. [CFG+10]), i.e. processors for which notiming anomaly occurs.

IV. Conclusion

In this �rst chapter, we introduced several basic notions related to real-time embedded systems. Thenwe focus on new hardware features more and more frequently used in real-time embedded systems,namely cache memories. We present the main features of such cache memories as they have an impactof the system analysis. In the next chapter, we introduce real-time scheduling and illustrate the impactof the cache on scheduling.

71


72

Chapter

2 Real-Time Scheduling and Cache-

Related Preemption Delays

Contents

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

II Real-Time Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

II.1 De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

II.2 Periodic/Sporadic Task characteristics . . . . . . . . . . . . . . . . . . . . . . 76

II.3 Task Worst-Case Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . 78

III Scheduling Real-Time Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

III.1 Real-Time Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

III.2 Real-Time Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . . . 83

III.3 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

III.4 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

IV Scheduling and preemption delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

IV.1 The problem of preemption delays . . . . . . . . . . . . . . . . . . . . . . . . 87

IV.2 Cache-Related Preemption Delays . . . . . . . . . . . . . . . . . . . . . . . . 87

IV.3 E�ects of crpds on real-time scheduling: Motivational example . . . . . . . . 90

V Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Abstract

This chapter is a brief introduction to the real-time scheduling theory. In particular, we presentsome basic notions that will be reused throughout this PhD work. First, we describe how atask can be simply modeled. Then, we give a brief overview of real-time scheduling. Finally,we discuss the problem of scheduling real-time tasks with cache memories.

73

CHAPTER 2. REAL-TIME SCHEDULING AND CACHE-RELATED PREEMPTION DELAYS

74

I. INTRODUCTION

I. Introduction

We give here a very brief introduction to real-time scheduling. Our goal is not to be exhaustive butonly to present the main results that will be useful for the remainder of this PhD work. First, wedeal with real-time tasks. We describe a simple task model that we will consider hereafter. Then,we brie�y introduce the real-time scheduling theory. In particular, we present some classic schedulingalgorithms that are commonly used throughout the real-time scheduling literature. We also discussdi�erent tests and analyses to ensure the system schedulability. In addition, we give a brief insightinto the computational complexity theory. Finally, we consider the problem of accounting for cachememories in real-time scheduling. In particular, we brie�y present a cache analysis and identify somemajor issues.Note that we deal hereafter (and in the remainder of this PhD work) only with uniprocessor systems.As a consequence, the real-time scheduling theory for multiprocessors is not considered.

II. Real-Time Tasks

II.1. De�nition

As stated in the previous chapter, a real-time application is usually divided into several tasks. Eachtask is made of several instructions and is usually implemented as a program thread, which means thatmultiple tasks of a same real-time application share the same memory resources. As only one task canbe executed at the time by the CPU, each task experiences several states during its existence:

� ready state: the task is ready to be executed by the processor,

� running state: the task is currently executed by the processor,

� blocked state: the task is not executed by the processor and cannot be executed till some resource(shared resource...) becomes available,

� sleeping state: the task has achieved its current execution and waits for an event to become readyonce again.

Tasks can either be executed only once during the system life or be executed several times duringthe system life. Each successive activation of a recurrent task is referred to as a job. Each job ofa task executes the code of that task but execution paths may be di�erent from one job to another(in particular because of conditional statements). As a result, job execution times may be di�erent.Depending on their nature, tasks can be classi�ed as:

� periodic tasks: recurrent tasks for which jobs are released at regular times (based on clock-basedinterrupts),

� sporadic tasks: recurrent tasks for which job releases are not regular (usually based on externalevents) but for which a minimal inter-arrival time between two consecutive jobs is known,

� aperiodic tasks: non-recurrent tasks, or recurrent tasks for which no minimal inter-arrival timebetween two consecutive jobs is known.

75


0 oi

Ci

oi + Ti oi + 2× Ti

...

oi + (i− 1)× Ti oi + i× TiDi

Ti

1st job activation 2nd job activation ith job activation

preemption resumescheduling window

Figure 2.1: Schematic representation of the main timing parameters of a real-time periodic task.

In real-time systems, most tasks are actually recurrent (i.e. either periodic or sporadic) as they corre-spond to repetitive activities such as reading a sensor or applying a command to an actuator. So, inthe remainder of this PhD work, we focus only on periodic and sporadic tasks.

Real-time tasks are subjected to timing constraints called deadlines: their computation should be com-pleted before a given date. For hard real-time tasks, no deadline can be missed without compromisingthe system. On the contrary, for �rm or soft real-time tasks, some deadlines can be missed but veryoften this delay has to be bounded, see for example [But11]. In this PhD work, we consider only hardreal-time tasks.Tasks can share some resources, for example some variables protected by semaphores. Sometimes,their execution may also have to be executed in a certain order, i.e. tasks are subjected to precedenceconstraints. For the sake of simplicity and as it is the case in many works in the real-time schedulingliterature, we consider hereafter only independent tasks (i.e. tasks without any precedence constraint)without any shared resource.

II.2. Periodic/Sporadic Task characteristics

Hereafter, we consider only hard real-time periodic or sporadic tasks. The set of tasks executing on thesystem CPU is denoted T. Very often, when analyzing a system, it is not necessary to deal directly withthe task code. Instead, a task can be modeled using several timing parameters depicted in Figure 2.1:

� an o�set oi corresponding to the time at which the task becomes available for execution for the�rst time:

� synchronous or synchronously-released tasks: all tasks have the same o�set (assumed to be0: oi = 0,∀i),

� asynchronous or asynchronously-released tasks: if not all task o�sets are equal.

� aWorst-Case Execution Time (wcet) Ci corresponding to the maximal amount of time requestedby the task if executed on its own.

76

II. REAL-TIME TASKS

� a period Ti corresponding:

� for periodic tasks, to the exact amount of time between two consecutive activations of thetask,

� for sporadic tasks, to the minimal amount of time between two consecutive activations ofthe task.

� a relative deadline Di corresponding to the maximal amount of time from the task activationavailable for execution:

� tasks with implicit deadlines: task deadlines are equal to periods (Di = Ti, ∀i),� tasks with constrained deadlines: task deadlines are no greater than periods (Di ≤ Ti,∀i),� tasks with arbitrary deadlines: task deadlines can be greater than periods.

As a result, a periodic or sporadic task can be represented by the following tuple:

τi(oi, Ci, Ti, Di) or τi(Ci, Ti, Di) when oi = 0, ∀τi ∈ T

A periodic or sporadic task generates an in�nite number of jobs during the system life. Each job ismodeled by:

Ji(ri, pi, di)

ri being the job release date (i.e. the time at which the job is ready to be executed), pi being itsexecution time and di its absolute deadline.Very often, the successive jobs of a periodic/sporadic task τi are denoted Jij , j ≥ 1. Usually, theexecution time of each job of a task is assumed to be equal to the task wcet: pij = Ci,∀j. As a result,all jobs of a task can be treated the same way. Note that, very often, both in the real-time literatureand in this PhD work, the word task is used instead of job. For example, we could say: "τj is executedafter τi at time 4" instead of "the 2nd job of τj is executed after the 3rd job of τi at time 4".For periodic tasks, it is easy to know the set of jobs that will be executed during the system life: onejob will be issued after each Ti interval of time, i.e. rij = oi + (j − 1) × Ti, ∀j ≥ 1. In particular, forperiodic tasks, we usually consider the jobs generated over the hyperperiod H which corresponds tothe least common multiple of the task periods. However, for sporadic tasks, Ti is only a lower bound:job releases are separated by at least Ti units of time. As a consequence, a sporadic task can generateseveral real-time instances, i.e. sets of jobs, as explained for example in [FGB10].

Example 2.1: Consider the following sporadic task: τ1(0,1,2,2). This task can issue thefollowing set of jobs: J11(0,1,2), J12(2,1,4), J13(4,1,6)..., but it can also generate the followingset: J11(0,1,2), J12(5,1,7), J13(7,1,9)...

The processor utilization of a task corresponds to the percentage of the CPU time required for executingthe di�erent jobs of a task:

ui =CiTi

77


and the total processor utilization for a taskset T is equal to:

U =∑∀τi∈T

ui

II.3. Task Worst-Case Execution Time

As real-time scheduling focuses on ensuring timing requirements, the time needed for a task to completemust be known. But, as depicted in Figure 2.2, this execution time is very dependent on the possibleinputs for this task and on the hardware behavior (pipeline and cache states for example). As statedbefore, worst-case scenarios have to be constructed in order to ensure predictability: so the worst-caseexecution time (wcet) of each task has to be considered.

time

occurencies

bcet wcet

measured execution times

possible execution times

Figure 2.2: Possible execution times for a task. bcet corresponds to the Best-Case Execution Timeof the task.

Timing analysis. The wcet of a task can be obtained using measurement-based methods, as pre-sented for example in [Pet00]. To get a safe value for the wcet, all possible execution paths have tobe measured in order to get the longest one. If not so, the measured execution time value might bean underestimation of the wcet which is unacceptable for hard real-time systems. But as stated forexample in [Pua02, The04], it is almost always impossible to conduct exhaustive testing on a systemand worst-case inputs are often hard to determine.So, a static analysis is often used instead of measurement-based methods. The task code is analyzed incombination with a model representing the hardware behavior. As depicted in Figure 2.3, the analysisis usually performed in two steps:

1. A low-level analysis computes the worst-execution times for each instruction of the task. Be-cause of complex architectures, the analysis has to deal with instructions overlapping becauseof pipelines, branch prediction, or caches. As a consequence, the whole task code has to beconsidered during the analysis. An example of pipeline (respectively branch prediction) analysiscan be found in [The04] (respectively [CP000, BR07]).

2. Using the values computed at the previous step, a high-level analysis determines the longestexecution path (also called worst-case execution path) for the task. The wcet corresponds to thetask execution time along this worst-case execution path. To estimate the worst-case executionpath, and so compute the wcet, several analyses can be used. They are usually based on thetask Control Flow Graph (cfg), in which each block corresponds to a sequence of instructions

78

II. REAL-TIME TASKS

microarchitecture analysis

path analysis

executableCFGCG

value analysiscache/pipeline

analysisexecutiontimes

ILP generator ILP solver WCET

Figure 2.3: wcet estimation chain

without conditional jumps and each edge to a control �ow, and the task Call Graph (cg), inwhich each block corresponds to a function and each edge to a function call. Very often (as in thewcet analysis tool aiT1), the Implicit Path Enumeration Technique (ipet), introduced in [LM95]and later extended in [TFW00, The02], is used. The idea is to use integer linear programming.The objective function corresponds to the wcet and the constraints represent structural aspects(incoming and outcoming edges in the task Control Flow Graph) and functional ones (loopbounds, mutual exclusive paths). An example of a simpli�ed wcet computation using ipet isdepicted in Figure 2.4. Other techniques, such as the tree-based analysis proposed in [CP000],can also be used.

More details on wcet computation methods and tools can be found in the survey by Wilhelm etal. [WEE+08].

Accouting for the cache e�ects. The easiest way to compute an upper-bound on the wcet whendealing with cache memories would be to consider all memory accesses to be cache misses. But suchan approach leads to highly pessimistic results, in particular for tasks with loops. For example, asstated in [Rei08], in some cases a system can have its performances divided by a factor up to 20 whencaches are disabled. So, a cache analysis has to be conducted to determine which accesses will resultin hits and which ones will be misses. To do so, di�erent cache analyses have been developed, mainlyfor instruction caches as they are easier to study, see for example [Mue00, FW99]... Data caches havealso been studied for example in [FW99] and [RM06a]. But the analysis is more complex as potentialwrite operations to the cache (and then to the main memory) might occur, see [FW99]. Moreover,data addresses cannot always be determined statically, see [WMH+97]. For more details about cacheanalyses and in particular how they are generalized to set- and fully-associative caches, see for examplethe survey presented in [LGY+15]. Hereafter, we focus on the cache analyses presented in [FW99].Di�erent cache states can co-exist at a given program point (i.e. when switching to the next instructionin the program code) due, for example, to loop iterations. So abstract cache states are used to representall the possible concrete states of the cache at that point. Three �xpoint analyses, the must, may andpersistence analyses, are used to categorize the di�erent instructions of a program.The must analysis is used to determine the Always Hit (AH) references, i.e. referenced memory blocksthat are already into the cache at a given program point (independently of the previous executionpath) and so which will not incur any additional cost if re-referenced later on. So, at each programpoint, the cache must content corresponds to the memory blocks that are sure to be in the cache.

1http://www.absint.com/ait/

79

http://www.absint.com/ait/



instructionaddress

0x00000000

0x00000004

0x00000008

0x0000000c

0x00000010

0x00000014

0x00000018

0x0000001c

0x00000020

0x00000024

0x00000028

0x0000002c

0x00000030

0x00000034

0x00000038

0x0000003c

0x00000040

memoryblock

a

b

c

d

e

f

g

h

i

cfg

node

1

2

3

4

5

6

7

(a) Mapping of instruction addresses to memory blocks(denoted by letters) and cfg nodes (denoted by digits).

1d1

2d12

3d13

4

d24

d34

5d45

6d56

d65

7d57 d7

xP1 xP2

(b) Task cfg.

max wcet =(∑

ci · xi)

where:

x1 = d1 = d12 + d13x2 = d12 = d24x3 = d13 = d34x4 = d24 + d34 = d45x5 = d45 = d56 + d57x6 = d56 = d65x7 = d57 = d7

structuralconstraints

d1 = d7 = 1x6 ≤ 5

}functionalconstraints

(c) ipet.

Figure 2.4: Example of wcet computation using ipet (ci stands for the execution time of node i, xifor the number of times the node is executed and dij the number of times the edge between Nodes iand j is taken). The loop is executed at least once and at most �ve times. A direct-mapped cachewith 4 lines is assumed.

Example 2.2: Consider Program point P1 of the cfg depicted in Figure 2.4b. Memoryblock a is always loaded into the cache as Node 1 is always executed. Then, if the left branchof the conditional statement (i.e. Node 2) has been executed, Blocks b and c will also be inthe cache. Otherwise, if the right branch (i.e. Node 3) is executed, then Blocks c and d willbe loaded into the cache. So, only Memory blocks a and c are sure to be cached at Programpoint P1 as depicted in Figure 2.4b; indeed, accesses to b and d depend on the path whichhas been taken to reach P1.

The may analysis allows to determine the Always Miss (AM) references. At each program point, thecache may content corresponds to the blocks that may have been accessed before and may not havebeen evicted yet from the cache.

Example 2.3: At Program point P1 of the cfg depicted in Figure 2.4b, memory blocksa, b, c and d may be in the cache as those blocks are mapped to di�erent locations in thecache. At Program point P2, memory blocks b, c, d and e may be in the cache but not a ase will have evicted it.

80

III. SCHEDULING REAL-TIME TASKS

AM references are then computed by taking the complement of the may content. References that arein the may content but not in the must one are said to be Non-Classi�ed. Usually, to upper-boundthe wcet, these references are changed to AM.When loops are considered with at least one iteration, a same program point is reached several times.It might be a miss when executed for the �rst time, but if the loop code is small enough (not tooverwrite that cache location) every later access will be a hit. So, for the cache analysis, the �rstiteration of the loop is virtually unrolled.Note that Ferdinand and Wilhelm also introduce a persistence analysis. It allows to decrease the cacheanalysis pessimism in particular for conditional statements inside loops, see [FW99] for further details.

III. Scheduling Real-Time Tasks

We recall that we deal in this paper only with uniprocessor systems. Details on multiprocessor schedul-ing can be found for example in [DB11].As multiple tasks share a common processor, the available execution time (provided by the CPU) hasto be shared among the tasks. The real-time scheduling theory studies this mapping problem. On onehand, it focuses on �nding appropriate mappings with regard to some criterion. On the other hand, itconsiders the question of ensuring before runtime that the system can execute without any failure.

III.1. Real-Time Scheduling

Usually, a real-time application is divided in several tasks (i.e. concurrent threads having to be executedon a CPU). As we consider here only a uniprocessor system, there is only one processing unit whichcan deal with at most one task at a time. So, as explained for example in [But11], a scheduling strategyis needed to determine the order in which the tasks will be executed on that single processor in orderfor all timing requirements (in particular deadlines) to be met. The set of tasks assigned to a givenprocessor is noted τ .A schedule corresponds to the timing allocation of tasks to the processor. As each task might issueseveral jobs during the system life, the schedule describes at every instant of the system life which jobis executing and for how much time. More formally, a schedule can be de�ned as:

De�nition 3. (from [BG04]) A schedule S for of a set of jobs J (issued from the tasks executing onthe system) executing on a common uniprocessor corresponds to:

S : R× J → {0, 1}

S(t, Ji) is equal to 1 if Job Ji is executed on the processor at time t, 0 otherwise. Schedules are usuallyrepresented using Gantt charts as depicted for example in Figure 2.5.

O�ine vs. Online. Two main approaches exist to construct a schedule for a given taskset. Onone hand, a schedule can be constructed o�ine, i.e. prior to the system execution. At runtime,a dispatcher (implemented in the real-time kernel) simply reads a precomputed table storing everyscheduling decision (which job to execute at each instant). On the other hand, scheduling decisions

81


0 1 2 3 4 5 6 7 8 9 10

τ1 J11 J12

τ2 J21 J21

Figure 2.5: Schedule example for two synchronous periodic tasks τ1(2, 5, 5) and τ2(5, 10, 10) over thehyperperiod H = 10.

can be taken online, i.e. during the system execution. At runtime, a scheduler (implemented in thereal-time kernel) chooses at each instant which job to execute based on a scheduling algorithm.O�ine scheduling strategies are often used in real-time systems, as speci�ed for example in ARINC6532, because they require less runtime overhead than online ones and do not su�er any schedulinganomaly [XP93]. A complete knowledge of the whole system life [Mok83], i.e. all jobs that will beissued and all their parameters, is needed to compute an o�ine schedule. As a result, o�ine schedulingis only suited for periodic tasks.On the contrary, online scheduling algorithms are more �exible as they can deal with unpredictedevents and in particular sporadic and aperiodic tasks. So online scheduling is very popular in thereal-time scheduling literature.

Main concepts. We present here some de�nitions which will be useful for the remainder of this PhDwork:

De�nition 4. A scheduling algorithm is work-conserving if it does not leave the CPU idle if there isat least one job ready to be executed.

De�nition 5. A feasible schedule for Tasket T is a schedule for which all the jobs issued by the tasksfrom T are executed and all timing requirements are met.

De�nition 6. A taskset is feasible if at least one feasible schedule can be constructed for this taskset.

For example, we have:

Property 1. A periodic or sporadic taskset with implicit deadlines is feasible on a uniprocessor systemas soon as U ≤ 1.

De�nition 7. A taskset is schedulable with a given scheduling algorithm if the schedule constructedby the scheduling algorithm for this taskset is feasible.

Note that a taskset can be feasible but be unschedulable with a given scheduling algorithm.

De�nition 8. A scheduling algorithm is optimal if it can construct a feasible schedule for every feasibletaskset.

Usually, we study the optimality for subsets of systems with given characteristics: synchronous periodictasks with implicit deadlines...

2http://www.aviation-ia.com/standards/index.html

82

http://www.aviation-ia.com/standards/index.html


Preemptive vs. Non-preemptive. When scheduling a set of tasks, a scheduling algorithm caneither interrupt any executing job when needed to start executing another job, which is called preemp-tive scheduling, or on the contrary has to wait for the job to complete its execution, which correspondsto non-preemptive scheduling. In general, preemptive scheduling allows more �exibility and so is moresuited to devise optimal schedulers. But in some cases (for example for �xed-task priority scheduling),neither preemptive scheduling nor non-preemptive scheduling dominates the other one.

Online scheduling algorithms. In this PhD work we deal mainly with online scheduling. Wepresent hereafter three online scheduling algorithms which are the most commonly studied in the real-time scheduling literature. These algorithms use priorities to decide which job to execute at eachinstant:

� �xed-task priority scheduling (FTP): all jobs of a task have the same priority which does notchange throughout the application life,

� the Rate Monotonic algorithm (rm) [LL73] (for tasks with implicit deadlines): priorities areassigned inversely proportional to task periods,

� the Deadline Monotonic algorithm (dm) [LW82] (for tasks with constrained deadlines):priorities are assigned inversely proportional to task deadlines.

� �xed-job priority scheduling (FJP): each task job has its own priority.

� the Earliest Deadline First algorithm (edf) [LL73]: priorities are assigned to jobs propor-tionally to their urgency (characterized by the proximity of their deadline).

Note that dynamic scheduling algorithms also exist: priorities can be recomputed at any time by thescheduler.

Property 2. (from [LL73]) edf is optimal for periodic and sporadic tasksets.

III.2. Real-Time Schedulability Analysis

Under o�ine scheduling, the schedule is correct by construction. But when using an online schedulingpolicy, schedulability has to be studied o�ine to ensure beforehand that the scheduler will be able toschedule the taskset at runtime without missing any deadline.Several schedulability tests or analyses have been devised in the real-time scheduling literature de-pending on tasks and scheduling policies. Such tests/analyses can be su�cient, necessary or exact.We detail hereafter some classic schedulability tests/analyses for rm, dm and edf.

Simulation. For periodic tasks, schedulability can be ensured by simulating o�ine rm, dm or edfon a time interval [0, H) (respectively [0,max∀τi∈T{oi}+ 2×H) for synchronous (resp. asynchronous)releases [LM80].

83


rm and dm. Note �rst that U ≤ 1 is only a necessary but not su�cient test under rm/dm.For periodic or sporadic tasks with implicit deadlines scheduled using rm (or equivalently dm asdeadlines are equal to periods), the Liu and Layland utilization bound, introduced in [LL73], is asu�cient schedulability test:

U ≤ n · (21n − 1)

where n is the size of the taskset.For synchronous periodic or sporadic tasks with implicit or constrained deadlines, an exact schedu-lability analysis for Fixed-Task Priority schedulers, called Response Time Analysis (rta), has beenintroduced in [JP86]:

∀τi ∈ T, Ri = Ci +∑

∀j∈hp(i)

⌈RiTj

⌉· Cj ≤ Di

where hp(i) represents the set of tasks with higher priorities than the priority of τi. The smallest �xed-point of the equation given above corresponds to the worst-case response time Ri. rta is based on thenotion of critical instant [LL73]: it corresponds to the time instant ensuring that every task responsetime will be maximized. For synchronous periodic or sporadic tasks with implicit or constraineddeadlines, this critical instant occurs at time zero (i.e. when all tasks are starting simultaneously). So,predictability is ensured as the worst-case scenario is considered.

edf. For periodic or sporadic tasks with implicit deadlines scheduled using edf, U ≤ 1 is an exactschedulability test as edf is an optimal scheduler in this case.For periodic or sporadic tasks (with either implicit, constrained or arbitrary deadlines), an exactschedulability analysis, called Processor Demand Analysis (pda), has been introduced in [BMR90]:

∀t > 0, h(t) =∑∀τi∈T

max

{0, 1 +

⌊t−Di

Ti

⌋}· Ci ≤ t

The previous condition needs to be checked for every t within a bounded time interval [BMR90].

III.3. Sustainability

To ensure predictability, worst-case behaviours have to be considered. But as shown in [BB06], worst-case scenarios are not necessarily obvious to determine. To study this matter, Burns and Baruahintroduce the notion of sustainability :

De�nition 9. (from [BB08]) A scheduling policy and/or a schedulability test for a scheduling policyis sustainable if any system deemed schedulable by the schedulability test remains schedulable when theparameters of one or more individual task(s) are changed in any, some, or all of the following ways:

1. decreased execution requirements,

2. larger periods, and

84


3. larger relative deadlines.

Note that, actually, Burns and Baruah also consider the impact of the task jitter. However, we do notdeal with this parameter in this PhD work, so we will omit it hereafter.To be sustainable, a scheduling policy and/or a schedulability test must be sustainable with regardto all the parameters listed above. As stated in [BB08], rm and dm are sustainable with regard toexecution requirements and relative deadlines but not with regard to the period parameter. The edfscheduling policy for periodic tasks is sustainable as soon as some conditions are ful�lled:

� edf is sustainable with regard to execution requirements and relative deadlines for any periodictaskset,

� edf is sustainable with regard to the period parameter for synchronous periodic tasksets.

III.4. Computational complexity

The aim of the computational complexity theory is to classify problems according to the resource(usually computation time or memory space) needed to solve problem instances of arbitrary size.Complexity results help designers in directing their e�ort toward approaches that have the greatestlikelihood of leading to useful algorithms ([GJ79]).We deal hereafter only with computation time considerations. Informally, the computational timecomplexity of an algorithm corresponds to the worst-case time needed by the algorithm to completeits execution for any input of a given size. For a more formal de�nition, using Turing machines, seefor example [GJ79].A decision problem consists in answering a YES or NO question on a set of input instances calledproblem instances. Feasibility and schedulability problems are typical decision problems. The comple-mentary problem of a problem corresponds to the decision problem for which positive input instances(YES answer) correspond to the negative ones (NO answer) for the initial problem.

Main complexity classes. The set of decision problems that can be solved in polynomial time inthe size of the input forms the complexity class P.Some decision problems cannot be solved in polynomial time. But a given instance of the problem canbe veri�ed to be a solution to the problem (results in a YES-answer) in polynomial time. This set ofdecision problems forms the complexity class NP. We have P ⊆ NP.P and NP classes are usually assumed to be disjoint (P 6= NP) even if this claim remains unproved yet.However, we will consider throughout this paper that P 6= NP.Some problems in NP are called NP-complete. Intuitively, they correspond to the hardest problems inNP. More formally, NP-completeness can be de�ned as:

Π ∈ NP-complete⇔ Π ∈ NP ∧ (∀Π′ ∈ NP,Π′ ∝ Π)

Π′ ∝ Π means that Problem Π′ can be transformed to Problem Π using a function f executing inpolynomial time.A list of known NP-complete problems can be found for example in [GJ79].When a problem can be transformed from a NP-complete problem but cannot be shown to be NP,then it is called NP-hard. Among NP-complete and NP-hard problems, we distinguish between:

85


IF P 6= NP AND P 6= co-NP AND NP 6= co-NP

complexity

P

NP

NP-hard

NP-complete

co-NP

co-NP-hard

co-NP-complete

Figure 2.6: Usual representation of the main complexity classes.

� NP-hard problems in the weak sense,

� NP-hard problems in the hard sense.

A NP-hard problem in the weak sense can be solved in pseudo-polynomial time, whereas NP-hardproblems in the strong sense requires exponential time running algorithms. De�ning e�cient heuristicsfor such problems involves the determination of properties of underlying combinatorial structure (e.g.,graphs, integers, boolean formulas...) ([Kar72]).The co-NP class corresponds to the set of decision problems with complementary problems belongingto NP. P ⊆ co−NP and we usually assume P 6= co−NP and NP 6= co−NP even if those claims remainunproved yet.A (possible) view of the di�erent complexity classes is depicted in Figure 2.6.We present in Table 2.1 some classic complexity results for the feasibility problem for periodic andsporadic tasks executed on a uniprocessor. Note that, if the system is bounded by a constant c,0 < c < 1, the analysis of sporadic tasks with constrained deadlines is co-NP-complete in the weaksense [EY15a].

IV. Scheduling and preemption delays

Preemptive scheduling is often used throughout the real-time scheduling literature as it allows more�exibility and is needed to achieve optimality in general. In the classic scheduling theory, preemptionsare assumed to be performed at a zero cost: in particular, context switch costs are considered to beaccounted for in the task wcet. But what is the validity of such an assumption, in particular whenenhanced features, such as cache memories, are used?

86

IV. SCHEDULING AND PREEMPTION DELAYS

Tasksunbounded utilization

(general case)bounded utilization

(0 < c < 1)asynchronous

periodicstrongly

co-NP-complete [BRH90]strongly

co-NP-complete [BRH90]synchronous

periodic/sporadicstrongly

co-NP-complete [EY15b]weakly

co-NP-complete [EY15a]

Table 2.1: Classic complexity results for the feasibility problem for periodic/sporadic tasks with con-strained deadlines executed on a uniprocessor from [EY15a].

We �rst discuss the general problem of preemption delays and then focus on the particular case ofpreemption delays due to cache memories. In particular, we show what consequences those delays canhave on scheduling.

IV.1. The problem of preemption delays

When a preemption occurs, additional delays are actually experienced by the system, see for exam-ple [But11]. These preemption costs are due to di�erent sources:

� scheduling delays correspond to the context switch cost and also result from the scheduler invo-cation,

� pipeline delays are due to the time needed to �ush the pipeline when a task is preempted, andto re�ll it when the task resumes its execution,

� extra-bus interference delays result from potential contentions in the bus used to access the mainmemory,

� cache-related preemption delays (crpd) occur if a preempted task has to reload cache lines thathave been evicted by preempting tasks when resuming its execution after a preemption.

Contrary to what is usually assumed in the classic scheduling theory, those delays are not negligible.In particular, crpds can be as high as 40% of the task worst-case execution time (wcet) as shownin [BCSM08, PC07]. So, those additional delays have to be considered alongside the normal executiontime of the task.In the remainder of this PhD work, we only focus on crpds, as the other delays are less penalizingand can often be bounded by a constant [ADM11b]. More information regarding pipeline costs can befound in [SF99, The04]. For bus delays, more details can be found in [WGR+09]. Note that most of theresults presented in the remainder of this PhD work for crpds are actually still valid for preemptiondelays in general.

IV.2. Cache-Related Preemption Delays

As stated in [BN94], the interference the cache has on the total execution time of a task is both:

87


� intrinsic (also referred to as intra-task): the interference is independent of the execution envi-ronment, i.e. of the other tasks of the taskset,

� extrinsic (also referred to as inter-task): the interference depends on the execution environment,i.e. of the other tasks that may preempt the considered task.

Intrinsic cache interference is accounted for in the task wcet and has already been studied in Sec-tion II.3. Extrinsic cache interference is responsible for Cache-Related Preemption Delays.

τi

access to A

cache

AA

(a) Execution without preemption.

τi

access to A Block ReloadTime (BRT)

τj

access to B

cache

B

(b) Execution with a preemption.

Figure 2.7: Example of Cache-Related Preemption Delay.

Example 2.4: Consider a task τi. As depicted in Figure 2.7a, τi accesses twice a memoryblock a. We assume that a is still in the cache when accessed the second time. Now considerthat, as depicted in Figure 2.7b, τi is preempted by another task τj before accessing a again.τj accesses Memory block b during its execution. b is assumed to be mapping to the samecache line as a. As a consequence, a is evicted from the cache. So, when τi resumes itsexecution and accesses a again, a is no longer in the cache and has to be reloaded from themain memory. τi'execution time is thus increased by the Block Reload Time. This additionaldelay corresponds to the crpd.

In practice, the Cache-Related Preemption Delay, consecutive to a given preemption, is spread over theremaining task execution as an additional cost is incurred every time the task, after the task resume,references a block that is no longer in the cache because of the interference due to the preemptingtask. However, in Gantt charts, for sake of simplicity, preemption delays and in particular crpds aredepicted as a whole immediately after the task resumes its execution after a preemption as shown inFigure 2.8.As stated for example in [BMSO+96], there are several ways of assessing the extrinsic cache interfer-ence penalty associated to every preemption. For a given preemption, the associated Cache-RelatedPreemption Delay can be upper-bounded by:

1. the time to re�ll the entire cache,

88

IV. SCHEDULING AND PREEMPTION DELAYS

0 1 2 3 4 5 6 7 8 9 10

τ1

τ2

(a) Actual crpd.

0 1 2 3 4 5 6 7 8 9 10

τ1

τ2

(b) Conventional representation.

Figure 2.8: Conventional representation of the crpds.

2. the time to re�ll all the cache lines evicted by the preemption, i.e. cache lines accessed bypreempting tasks,

3. the time to re�ll all the cache lines used by the preempted task,

4. the time to re�ll either the maximum number of useful cache lines that the preempted task mayhold in cache at the worst-case instant a preemption may arise or the intersection of lines betweenthe preempting and preempted tasks.

Assuming that the entire cache is reloaded after each preemption is of course very pessimistic. On thecontrary, considering the cache locations used by the preempted task and the preempting ones allowsto compute a crpd closer to reality. But such an improvement comes at a complexity cost: the cachecontent has to be precisely modeled and the task executions have to be accurately known. So, veryoften, a trade-o� between those two approaches has to be adopted.

Cache analyses. To upper-bound the extrinsic cache interference of a task being preempted, severalcache analyses can be conducted.The notion of Useful Cache Block (ucb), �rst introduced by Lee et al. [LHS+97], is used to assessthe maximal possible damage a preempted task can su�er if being preempted. ucbs correspond to thereuse of the available cache contents by the preempted task. At a given program point P , a ucb is acache line used by a memory block which may be in that cache line at P and may be reused by someother program point P ′ that can be reached from P [AB11]. Contrary to the Must/May analyses,actual useful memory blocks are not considered. Instead, it is the set of cache lines ucbi,P that maycontain them which is computed for Task τi at Program point P : overwriting any of those cache lineswill result in additional delays. As it is often di�cult to know the location of the preemption in thetask code, we consider the set of the maximal number of ucbs over the whole task code. So, the ucbset for Task τi ucbi does not depend on the actual preemption point in Task τi's code.

Example 2.5: Consider again the cfg depicted in Figure 2.4b. At Program point P2,Memory blocks e and d are sure to be in the cache (as instructions of Node 3 and 4 havenecessarily been executed). But b (if it is the �rst iteration of the loop) or f and g (if atleast one iteration of the loop has been executed) may also be cached. After P2, f , g (inNode 6) and e (in Node 5) are referenced again. So, if a preempting task overwrites one (orseveral) of the �rst three cache lines, at least one block will have to be reloaded later on. Asd is never re-referenced, overwriting the fourth cache line does not result in any later reload

89


time. So there are 3 ucbs at Program point P2: ucbi,P2 = {0, 1, 2}. Moreover, we haveucbi = ucbi,P2 = {0, 1, 2}

Then, the easiest way to upper-bound the crpd for Task τi would be to consider that at each preemp-tion, a delay equal to brt× |ucbi| is paid, where brt corresponds to the Block Reload Time, i.e. thetime needed to load a block from the main memory into the cache.Similarly, the notion of Evicting Cache Block (ecb) [TD00] assesses the possible damage a preemptingtask can have on the cache. ecbs correspond to all cache lines that might be used by the task and asa consequence can evict some memory blocks used by a preempted task.

Example 2.6: For the task, which cfg is depicted in Figure 2.4b, all cache lines are usedthroughout the task execution, so there are four ecbs: ecbi = {0, 1, 2, 3}.

More details on ecb/ucb computation can be found in [LHS+98, AM11, NMR03, SE07]. How ecbsand ucbs can be used to upper-bound the crpd will be discussed in the remainder of this PhD work.

IV.3. E�ects of crpds on real-time scheduling: Motivational example

As we shall see in the next chapters, preemption delays, and in particular crpds, have severe conse-quences on scheduling. As stated before, they can represent more than 30% of the wcet [BCSM08].This increase in execution times might cause a deadline to be missed. So crpds have to be consideredwhen studying schedulability in order to ensure predicatibility. For example, the simple edf schedu-lability test U ≤ 1 is no longer valid as soon as preemptions are authorized: for a taskset with U = 1,a single preemption may cause an additional delay and thus the processor capacity will be exceeded.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1

τ2

(a) The worst-case response time for τ2 is 9 when releasesare synchronous.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1

τ2

(b) The worst-case response time for τ2 is increasedwhen releases are no longer synchronous.

Figure 2.9: crpds and critical instant.

Considering crpds has also an impact on worst-case scenarios. In particular, the critical instant for�xed-priority tasks does not necessarily correspond anymore to synchronous releases as shown forexample in [RM06a, YS07] and depicted in Figure 2.9.Actually, edf is no longer optimal as soon as crpds are considered, as proven for example in Chapter 4.crpds also a�ect the sustainability as shown in Chapter 5. As crpds can have a high impact on real-time scheduling, they will be the main focus of the remainder of this PhD work.

90

V. CONCLUSION

V. Conclusion

In this chapter, we introduced real-time scheduling. Then we presented preemption delays and focusedon Cache-Related Preemption Delays. We showed that such delays have an impact on schedulingas they can represent up to 40% of a task wcet. So, we deal hereafter with real-time schedulingaccounting for Cache-Related Preemption Delays.As shown in the previous motivational example, crpds make the predictability problem more complexand may also threaten the system schedulability. Actually, taking Cache-Related Preemption Delaysinto account changes radically the scheduling problem. As cache memories are now commonly foundin real-time embedded systems, new task models and new scheduling techniques accounting for crpdsare required. In the next chapter, we present the main strategies to deal with crpds that can be foundin the real-time literature. Then, in the next part part of this PhD work, we study the schedulingproblem accounting for crpds and present new scheduling approaches to minimize the impact of thosedelays.

91


92

Chapter

3 Dealing with Cache Memories in

Real-Time Scheduling

Contents

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

II Accounting for Cache-Related Preemption Delays . . . . . . . . . . . . . . . . . . . 95

II.1 Into the Worst-Case Execution Times . . . . . . . . . . . . . . . . . . . . . . 95

II.2 Into the schedulability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 98

III Memory management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

III.1 Cache partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

III.2 Cache locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

III.3 Memory layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

III.4 Other techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

IV Enhanced scheduling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

IV.1 Limited preemption scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 107

IV.2 Cache-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

V Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

VI Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Abstract

Cache memories have a high impact on scheduling. In particular, Cache-Related PreemptionDelays make the predictability problem more complex and may also threaten the system schedu-lability. In this chapter, we summarize the main existing strategies to deal with crpd issues.These strategies range from reducing the crpds at the cache level to modifying the schedulingstrategy in order to control preemptions. We eventually discuss some possible combinationsbetween those di�erent methods.

93

CHAPTER 3. CACHE AND REAL-TIME SCHEDULING

94

I. INTRODUCTION

I. Introduction

Cache memories highly a�ect real-time scheduling. Using a system with a cache makes the predictableproblem even more complex: instruction and data loading depend on whether the reference is found inthe cache or has to be reloaded from the main memory. Moreover, Cache-Related Preemption Delaysare not negligible as they might represent more than 30% of a task wcet [BCSM08]. The easiestway to get rid of these issues would be to disable the cache. But such a radical solution is not alwayspossible on modern hardwares and, in any case, leads to a drastic drop in performances. So, cachememories have to be considered when dealing with real-time scheduling.To deal with cache-related issues, numerous strategies have been proposed. Some consist only inbounding the crpd and incorporating it either directly in the wcet or rather in the schedulabilityanalysis. Other strategies focus on reducing the main sources of pessimism: the cache behaviour can bemodi�ed to reduce or even eliminate possible cache thrashing by other tasks, or the scheduling policycan be adapted to reduce the number of preemptions and/or reduce the overall crpd. Note that,although many of these methods are mutually dependent, they have been mostly studied in isolationfrom each other.In Table 3.1 hereafter, we synthesize the main approaches that are used throughout the real-timeliterature. First, we show how predictability can be ensured as soon as caches are considered. Then,we see how the cache e�ects can be decreased or even eliminated at the memory management level.Finally, we focus on enhanced scheduling approaches to overcome the main issues driven by the cache.We conclude this chapter by brie�y discussing combinations between those di�erent methods.

II. Accounting for Cache-Related Preemption Delays

To ensure determinism and predictability, additional delays occurring after preemptions due to thecache have to bounded and accounted for somewhere when dealing with schedulability issues.On one hand, crpds can be incorporated in wcets, as preemption costs are usually assumed to bein the classic real-time scheduling theory. So, from the scheduling point of view, preemptions areperformed at no cost. As a consequence, classic schedulability analyses can still be used. On the otherhand, crpds can be accounted for during the schedulability analysis. Thus, classic analyses such as theResponse Time Analyis for rm or dm, or the Processor Demand Analysis for edf have to be modi�ed.We �rst present hereafter some classic approaches to include crpds into task wcets. Then, we dealwith schedulalibity analyses accounting for crpds.

II.1. Into the Worst-Case Execution Times

In the classic real-time scheduling theory, possible preemption delays are supposed to be negligible oraccounted for in task wcets. As crpds cannot be neglected, one solution to ensure predictability isto bound such costs and include them into the wcets. So the cache e�ects would be accounted forone and for all, and all classic scheduling results would then be valid assuming those wcets in�atedto account for crpds. Hereafter, we denote Ci, the wcet of Task τi computed as if τi was executingfully non-preemptively (i.e. no crpd is accounted for). On the contrary, Ci stands for the wcet ofTask τi accounting for crpds.

95


Technique References �

scheduling←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

timing

memory

management cache

partitioningfully-partitioning [Mue95, PLM09, ADLD14]

III.1hybrid-partitioning [BMGGW00, BCSM08]

cachelocking

fulllocking

static [CIBM01, FPT07, LLX09, LLX12]

III.2dynamic [AP06, PP07, LLX12]

partiallocking

static [DLM13]dynamic [DLM14]

partitioning + locking [VLX03] Vmemorylayout

code positioning [KW03, FK11]III.3

task positioning [GA07, AG08, LAD12]other techniques [WA12, ANGM14, WP14, RAG+14] III.4

wcet

wcet with cache analysis [HAM+99, FW99, RM06a] 2/II.3

wcet withcrpd

preempting task [BN94, TD00]II.1preempted task [AB11]

both tasks [TM04, WTA14]

scheduling schedulability

analysis

preempting task [BMSMOC+96]II.2preempted task [LHS+98]

both tasks [ADM12, LAMD13]

preemptioncontrol

fpts with crpd [BAVH+14, WGZ15]IV.1�oating-npr with crpd [RM08, MNPP12a]

fpp with crpd [ABW09, BXM+11, CTF15]optimal crpd-aware scheduling [PRM15a, PRG+15] IV.2

Table 3.1: Overview of di�erent methods taking the cache into account.

96

II. ACCOUNTING FOR CACHE-RELATED PREEMPTION DELAYS

The easiest way to compute a wcet accounting for crpds is to assume a cache miss for eachmemory access. This is a very pessimistic but general approach, as it is only dependent on the taskbeing analyzed and not from any scheduling strategy (so the computed wcet can be used for severaltask systems scheduled with di�erent algorithms, for example rm and edf). But such an approachis of course very pessimistic. To decrease the pessimism, crpds have to be more precisely boundedbefore being included into task wcets. Ward et al. [WTA14] distinguish between preemption-centricand task-centric methods.

Preemption-centric methods. Those methods, as in [BN94], work from the preempting task pointof view. Each task wcet is increased by the maximal interference it may cause on any task it canpreempt:

Ci = Ci + δi

The upper-bound on the preemption delay δi can be computed as the cost of reloading the entire cache,or less pessimistic, as the damage the preempting task can have on the cache content. This interferencecan be modeled using the notion of Evicting Cache Block (ecb) presented in Chapter 2, Section IV.For direct-mapped caches, an upper-bound can be computed straight-forward as:

δj = brt · |ecbj |

brt being the time to reload a memory block from the main memory into the cache.

Task-centric methods. Such methods, as in [Sch00] and [AB11], adopt the point of view of thepreempted task. The task wcet is increased by the total interference it can su�er from all the tasksthat can preempt it. To do so, an upper-bound on the crpd for one preemption, γi, is multiplied byan upper-bound on the number of possible preemptions, n, the task may su�er from:

Ci = Ci + ni · γi

To compute the upper-bound on the number of preemptions, several methods have been proposed,depending on the scheduling policy. As stated in [WTA14], for �xed-task priority scheduling (for

example under rm or dm), a simple bound is given by: ni =∑i−1

j=1

⌈TiTj

⌉. However, such a bound

overestimates the number of potential preemptions. So more precise computations have been proposedin [RM06b] and in [ESLS06] where upper-bounds on the number of preemptions for both �xed-taskpriority scheduling policies (such as rm and dm) and edf are proposed.As for an upper-bound on the crpd γi, Useful Cache Blocks (ucbs), presented in Chapter 2, Section IV,can be used:

γi = BRT · |UCBi|

Tighter crpd upper-bounds can be computed by using combinations of ucbs and ecbs as proposedfor example in [NMR03] and [TM04].

97


Combined methods. According to [WTA14], task-centric methods are highly pessimistic when thenumber of tasks is high as many possible preemptions have to be considered. On the other side,preemption-centric methods become highly pessimistic when task working set sizes (which correspondbasically to the amount of cache lines accessed by each task during its execution) are highly variant.As a consequence, Ward et al. propose in [WTA14] a mixed-approach where part of the crpd isaccounted for in the preempting tasks and the remainder of this delay is included into the wcet ofthe preempted task.

II.2. Into the schedulability analysis

Including crpds into wcets has a major drawback: it often results in overestimated execution timesand so in an increased processor utilization. That leads to a waste of resources as the system has tobe oversized: as task average execution times will be far smaller than the wcets, the CPU will beunderutilized at runtime. In particular, the number of preemptions is hard to determine accurately.Moreover, this number of preemptions depends on the scheduling policy and on the tasks executing onthe system. So, wcets must be recomputed if the adopted scheduling policy is changed, or if tasks areadded or deleted from the system (or if their timing characteristics are modi�ed). To overcome theseissues, crpds can be considered apart from the wcets. The idea is �rst to compute an upper-boundon the crpd due to one preemption from a higher priority task, by considering the preempted and/orpreempting tasks. Then these costs are incorporated at the schedulability analysis level.

II.2.a. Schedulability analyses accounting for crpds

Classic scheduling tests/analyses have to be modi�ed to safely account for preemption delays.For �xed-task priority scheduling (rm or dm), an extended version of the classic Response TimeAnalysis is proposed in [BMSO+96]:

∀τi, Ri = Ci +∑

∀j∈hp(i)

⌈RiTj

⌉· (Cj + γi,j) ≤ Di

where γi,j represents an upper-bound on the crpd experienced by Task τi each time it is preemptedby a higher priority Task τj . Note that Ci is the wcet of the task considered on its own, i.e. withouttaking into account possible delays due to other tasks (contrary to Ci introduced in the previoussubsection).Schedulability tests/analyses for edf can be similarly modi�ed. An extended version of the ProcessorDemand Analysis accounting for crpds is presented in [LAMD13]:

∀t > 0, h(t) =∑∀τj∈T

max

{0, 1 +

⌊t−Dj

Tj

⌋}· (Cj + γt,j) ≤ t

where γDmax,i represents an upper-bound on the crpd caused by a single job of τi preempting jobs ofother tasks τj having both their release dates and deadlines in a time interval of length t.For periodic tasks with implicit deadlines, a simpler schedulability test can be used [LAMD13]:∑

∀τj∈T

Cj + γDmax,j

Tj≤ 1

98

II. ACCOUNTING FOR CACHE-RELATED PREEMPTION DELAYS

where Dmax is the largest relative deadline in the taskset.Note that neither the Response Time Analysis nor the Processor Demand Analysis are exact anymorewhen accounting for crpds. They are only su�cient [ADM12, LAMD13].In all cases, the main issue is to bound the crpd.

II.2.b. Computing upper-bounds on the crpd

crpd bounds can be computed considering the e�ects on the cache of either the preempting task only,the preempted one only or both of them.

Preempting task. The ecb-only approach focuses on the preempting task to upper-bound the crpd.It can be used for �xed-task priority scheduling [BMSMOC+96, TD00] and edf scheduling [LAMD13]:

γecb-onlyi,j = γecb-onlyt,j = brt · |ecbj |

Busquets-Mataix et al. show in [BMSMOC+96] that, when using the crpd bound computed by theecb-only approach, the cache-aware rta clearly outperforms the cached version of the schedulabilitytest (U ≤ 1) presented in [BN94] which uses a wcet including crpds.

Preempted task. The preempted task can be considered to compute an upper-bound on the crpdusing Useful Cache Blocks (ucbs). For �xed-task priority scheduling, the ucb-only approach can beused [LHS+98]:

γucb-onlyi,j = brt · max∀k∈hep(i)∩lp(j)

{|ucbk|}

where hep(i) is the set of tasks of priority higher or equal to τi and lp(j) the set of tasks of lowerpriority than τj . Note that intermediate tasks (between the preempting one and the preempted one)have to be considered to account for potential nested preemptions.

Example 3.1: Consider three tasks: τ1(2, 9, 9), τ2(2, 9, 9) and τ3(3, 9, 9). We assume aBlock Reload Time brt = 1. τ2 has two ucbs and τ3 only one. When computing theresponse time for τ3, we have to take into account the delay paid by τ2 when preempted byτ1 as this delay postpones τ3's completion:

R3 = C3 +

⌈R3

T1

⌉(C1 + γucb-only3,1 ) +

⌈R3

T2

⌉(C2 + γucb-only3,2 )

R3 = C3 +

⌈R3

T1

⌉(C1 + brt ·max{|ucb2|, |ucb3|}) +

⌈R3

T2

⌉(C2 + brt · |ucb3|)

R3 = 3 +

⌈R3

9

⌉(2 + 1×max{2, 1}) +

⌈R3

9

⌉(2 + 1× 1)

which results in R3 = 10 > D3 = 9 and so the system is deemed unschedulable as depictedin Figure 3.1.

99


0 1 2 3 4 5 6 7 8 9 10 11 12

τ1

τ2

τ3

Figure 3.1: Schedule for tasks τ1(2, 9, 9), τ2(2, 9, 9) and τ3(3, 9, 9) for the worst-case scenario. τ2 has 2ucbs and τ3 1 ucb.

The ucb-only approach can be adapted for edf scheduling [LAMD13]:

γucb-onlyt,j = brt · max∀k∈T, t≥Dk>Dj

{|ucbk|}

Both tasks. Considering either the preempting task or the preempted task on their own is verypessimistic: it is possible that those tasks do not con�ict into the cache. As a consequence, sometasks, deemed unschedulable with either the ecb-only approach or the ucb-only one, are actuallyschedulable.

Example 3.2: Consider again the following three tasks: τ1(2, 9, 9), τ2(2, 9, 9) and τ3(3, 9, 9).Suppose that τ3's only ucb is mapped to the cache such that it does not con�ict with anymemory blocks used by either τ1 or τ2. So preempting τ3 will result in no crpd and so thesystem is deemed schedulable using a more precise crpd upper-bound.

To decrease the pessimism, the preempted task has to be considered alongside the preempting one.Intersection between ecbs from the preempting task and ucbs from the preempted one are consid-ered to get a tighter crpd upper-bound. But still because of nested preemptions, all intermediatetasks executing while the preempted task being preempted have to be considered as they may evictcache lines used by the preempted task. Di�erent approaches have been proposed in the literaturefor both �xed-task priority scheduling and edf. For �xed-task priority scheduling, the ecb-unionapproach [ADM11a]:

γecb-unioni,j = brt · max∀k∈hep(i)∩lp(j)

∣∣∣∣∣∣ucbk ∩

⋃∀h∈hp(j)∪{j}

ecbh

∣∣∣∣∣∣

or the ucb-union approach [TM07]:

γucb-unioni,j = brt ·

∣∣∣∣∣∣ ⋃∀k∈hep(i)∩lp(j)

ucbk

∩ ecbj∣∣∣∣∣∣

100

III. MEMORY MANAGEMENT

allows to compute a more accurate crpd upper-bound. Similarly, for edf scheduling, an ecb-unionapproach has been proposed [LAMD13]:

γecb-uniont,j = brt · max∀k∈T, t≥Dk>Dj

∣∣∣∣∣∣ucbk ∩

⋃∀h∈hp(j)∪{j}

ecbh

∣∣∣∣∣∣

as well as a ucb-union approach [LAMD13]:

γucb-unioni,j = brt ·

∣∣∣∣∣∣ ⋃∀k∈T, t≥Dk>Dj

ucbk

∩ ecbj∣∣∣∣∣∣

However, these approaches often overestimate the possible number of preemptions each task mayexperience. So, to estimate more accurately the number of preemptions in the schedulability analysis,improved methods, called ucb- and ecb-union Multiset approaches are proposed in [ADM12] for�xed-task priority scheduling (respectively in [LAMD13] for edf). A comparison of these di�erentapproaches can be found in [ADM12] for �xed-task priority and in [LAMD13] for edf. Note that thosemultiset-based approaches dominate the previous methods (as the ucb- and ecb-union approachesdominate respectively the ecb- and ucb-only ones). But they are incomparable with each other, axexplained in [ADM12, LAMD13].In [LAD14], Lunniss et al. compare �xed-task priority scheduling and edf as soon as crpds areconsidered. They show that edf still o�ers better performances than �xed-task priority schedulingalgorithms, but the gap is narrower than for scheduling without crpds.

III. Memory management

In the previous section, the aim was just to ensure predictability by safely taking crpds into account.No change was made either at the scheduling level or at the memory management level. But preemptiondelays increase the processor utilization and so might threaten the system schedulability. Moreover, ascrpd bounds are often pessimistic (to ensure predictability), this issue is worsened: more systems arelikely to be deemed unschedulable by schedulability tests/analyses using those bounds. To overcomethis problem, the idea is to eliminate, or at least reduce those preemption delays. One solution todeal with it is to work at the memory management level to reduce/eliminate the cache interference bymodifying the cache mapping. Decreasing or even removing these cache side-e�ects can be intendedin order to:

� minimize a given task wcet,

� decrease the overall processor utilization, or

� maximize the system schedulability.

Di�erent techniques have been proposed throughout the real-time literature to achieve those goals.The most common ones, in particular cache partitioning, cache locking and task layout, are brie�ypresented hereafter.

101


III.1. Cache partitioning

Under cache partitioning, cache lines are grouped into several sets which might be of di�erent sizes.Tasks are then assigned to those partitions and so cannot interfere with one another. The aim is toeliminate potential inter-task cache con�icts as they are source of unpredictability.Cache partitioning can either be implemented at the hardware level [Kir89] by modifying the mem-ory management unit behaviour, or be software-based, as �rst introduced in [Wol94] and improvedin [Mue95]. Software-based partitioning is mostly used because the hardware-based one has severaldrawbacks [Mue95]: the partition sizes are �xed beforehand and custom-made hardware architectureshave to be used whereas software-based partitioning can be applied directly to all o�-the-shelf archi-tectures. Software-based partitioning can be implemented using OS-controlled techniques to managethe cache [LHH97] or by introducing code modi�cations at the compiler (linker) level to change theprogram code location in the main memory [Mue95]. As depicted in Figure 3.2, changing properlytask reference locations in the main memory results in those references mapping only to a restrictivenumber of cache lines creating a partition (as the location of a task memory reference in the cache isdetermined by its position in the main memory). But, as the task code may be split, unconditionaljumps might be added resulting in a potential increase in the task wcet. Note that additional delaysmay also occur under hardware-partitioning as it needs additional circuits to be implemented, as shownin [KS90].Cache partioning is often more suited for tasks with short wcets and periods as, in this case, crpdsmight be very high in comparison with their execution times [ADLD14].

cachemain memory

τ1:

τ2:

τ3:

Figure 3.2: Example of software cache partitioning for three tasks τ1, τ2 and τ3.

When dealing with cache partitioning, the main issues are to �nd the number of partitions and theirrespective sizes.

Fully-partitioning. Extrinsic interference (i.e. crpds) can be fully eliminated if private partitionsare used as all tasks are isolated from one another [Mue95]. But, because tasks have access to a smalleramount of cache memory, their wcets may increase and thus threaten the system schedulability.To overcome this problem, partition sizes can be computed based on the goal of minimizing theoverall system processor utilization [Kir89] using for example integer linear programming [PLM09].However, minimizing the processor utilization does not necessarily lead to optimality in terms ofschedulability for the system [ADLD14]. So, partitioning algorithm aiming directly at maximizing thesystem schedulability can be used instead [ADLD14].

102


Hybrid-partitioning. Fully-partitioning allows to eliminate crpds but at the cost of potentiallyincreasing wcets which may have dreadful consequences on the system schedulability [ADLD14]. Toovercome this problem, hybrid partitioning techniques can be used: some tasks can share a samepartition. Hybrid-partitioning allows to achieve better processor utilizations than fully-partitioningtechniques when the cache gets smaller [BMGGW00]. Note that, for large caches, both approachesperform quite identically [BMGGW00]. The assignment of tasks to partitions can be based on taskpriorities under �xed-task priority scheduling [BMGGW00] or be stated as an optimization problemwhich goal is to minimize the total processor utilization. As this assignment problem is NP-hard, agenetic algorithm can be used as a heuristic. A hardware-based partitioning solution using additionalregisters for �xed-priority tasks, called prioritized cache, can also be used [TM05]. Priorities areassigned to cache partitions such that only tasks with priorities higher or equal to the one of thepartition have access to it.

III.2. Cache locking

Instead of partitioning, cache locking can be used: hardware mechanisms (implemented in o�-the-shelfarchitectures such as the ARM9 processors1) allow to control the cache contents at the software level.By adding lock/unlock instructions in the task code, some cache lines can be prevented from beingoverwritten once some content has been loaded into them. So, as cache contents are more preciselyknown, predictability can be easily ensured.Cache locking can be static, i.e. the cache locked content does not change during the whole systemexecution, or dynamic, i.e. the locked cache content can change at runtime.

Example 3.3: Consider the example depicted in Figure 3.3. Memory accesses for tasksτ1 and τ2 are denoted by letters from the Latin alphabet. τ1's and τ2's worst-case executionpaths are depicted by the sequence of their memory accesses, respectively abcdbcde andfghighij. A 2-line direct-mapped cache is assumed. A cache hit is supposed to result in anexecution time of 0.25 whereas a miss leads to an additional delay of 0.25 (corresponding tothe Block Reload Time). For τ2, memory blocks g, h and i are reused, but as g and i aremapped to the same cache line, there is only one Useful Cache Block for that task, whichmeans that the maximal crpd for τ2 is equal to 0.25 (i.e. the time to reload one block fromthe main memory). Tasks are scheduled under Rate Monotonic. As a result, τ1 preempts τ2twice over the hyperperiod H = 15.When no locking technique is used, those two preemptions incur two additional delays causingτ2 to miss its deadline.In 3.3b, static locking is used: Blocks g and h belonging to τ2 are loaded in the cache atthe system start-up (that means that no load time appears in the schedule) and locked. Asa consequence, τ1 cannot use the cache and its wcet is increased as shown in the table ofFigure 3.3b. But no crpd is incurred anymore when τ1 preempts τ2 and as a result thesystem becomes schedulable.Using dynamic locking as shown in 3.3c, each time τ1 (respectively τ2) is released or resumesits execution, Blocks c and d (resp. g and h) are loaded into the cache corresponding to

1http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0092b/I14301.html

103

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0092b/I14301.html


an additional delay of 2 × brt = 0.5. It allows smaller wcets as shown in the table ofFigure 3.3c and, once more, the system is schedulable.

The main issues for cache locking is to select the cache content to be locked. We present hereafterdi�erent locking strategies.

cache{a,c,e,...}{b,d,f,...}

C1 7M+1H=3.75C2 7M+1H=3.75

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

τ1

τ2

(a) No locking.

τ2 lockinggh

C1 8M=4C2 4M+4H=3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

τ1

τ2

(b) Static locking.

τ1 lockingcd

τ2 lockinggh

C1 4M+4H=3C2 4M+4H=3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

τ1

τ2

(c) Dynamic locking.

Figure 3.3: Example of di�erent locking techniques for two tasks τ1 and τ2. The sequence of memoryaccesses for τ1 (respectively τ2) is abcdbcde (resp. fghighij ). We assume: Hit (H) = 0.25, brt = 0.25⇒Miss (M) = 0.5.

Full Locking. Under full locking techniques, the whole cache is locked at each instant. Lockedcontents (i.e. referenced memory blocks to be locked once loaded into the cache) can be staticallychosen in order to minimize the average task response times [CIBM01] or with the goal of minimizingthe wcets [FPT07]. Heuristic algorithms have been proposed as this optimization problem is NP-hard [LLX09]. In [CIBM01] and [FPT07], global static locking is considered: at every moment each taskowns a portion of the cache and the locked contents do not change during the system life. This allowsto achieve higher predictability but with a performance loss [CPRBM03]. To get better performances,local dynamic locking can be used [AP06]: at each instant, a task owns the whole cache. As a result,the cache locked contents change during the system execution. Algorithms are used to choose thecontents to be locked in the cache and the program points at which the lock operations have to beperformed [AP06, Pua06, PP07] in order to minimize the task wcets. Tree-based approaches can alsobe used [LLX12] allowing to reduce the processor utilization.

Partial Locking. As for cache partitioning, full locking techniques, either static or dynamic, resultin tasks having accessed to a limited cache space which may in turn increase their wcets and thusthreaten the system schedulability. To solve this problem, partial locking can be used: one portion of

104


the cache is statically locked for each task while a common portion of the cache is left unlocked andcan be used by every task. Heuristic algorithms can be used to select the locked contents accordingto a cost-bene�t analysis [DLM13]. When rm or dm is used as a scheduler, the aim is to minimizethe worst-case response times whereas for edf the goal is to minimize the total processor utilization.Both static locking [DLM13] and dynamic locking [DLM14] can be used, dynamic approaches showingbetter results.

III.3. Memory layout

As software-based cache partitioning, memory layout techniques rely on memory block mapping fromthe main memory into the cache. They focus on reducing either the intrinsic or the extrinsic cacheinterference by changing the location of a memory reference in the main memory, and as a result itslocation in the cache.

Example 3.4: Consider for example three tasks τ1(1,3,3), τ2(3,6,6) and τ3(1,12,12) (thisexample is quite similar to the one given in [GA07]). The program code of these three tasksis positioned consecutively in the main memory. We assume a 8-line direct-mapped cache.As depicted in Figure 3.4a, because of their consecutive position in the main memory, τ1is mapped into the �rst �ve cache lines, τ2 into the following three cache lines and τ3 intothe �rst three cache lines. As a result, τ1 and τ3 might evict each other cached contents ifpreemptions arise. If we consider the schedule constructed using rm over the hyperperiodH = 12, we see that the two jobs of τ3 are both preempted once. As a consequence, twocrpds are paid over the hyperperiod and (assuming a crpd is equal to 0.5) τ2 misses itsdeadline. So the system is deemed unschedulable under rm. Note that τ2 pays no crpdwhen resuming its execution after a preemption as it does not con�ict in the cache withother tasks. Now suppose that we switch τ2's and τ3's positions into the main memory. As aconsequence, τ3 and τ1 do not use any common cache line anymore as depicted Figure 3.4b.As a result, τ1 preempting τ3 no longer results in any crpd. And as τ2 is preempted onlyonce by τ1, there is only one crpd paid over the hyperperiod and the system becomesschedulable.This example shows that, very often, it is better to have tasks with high rates (short periods)mapping to di�erent cache locations. As they are executed more frequently, they are morelikely to preempt each other and so generate crpds if sharing some lines in the cache.

Note that, contrary to software-based cache partitioning, these methods do not necessarily result increating cache partitions. Two main approaches can be used.

Code positioning. Code positioning aims to reduce intra-task con�icts by modifying, during thecompilation process, the position of code sections of a task in the main memory. The idea is to decreasethe task miss rate and so achieve intra-task con�ict reduction [TY97]. Several code modi�cationtechniques such as loop interchange, as well as data layout optimization, can be used in order toincrease locality and so reduce cache misses [KW03]. Positioning algorithms are used to reduce intra-task con�icts and as a result to decrease task wcets [LFM08]. They focus in particular on allocating

105


main memory

τ1

τ2

τ3

cache

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1

τ3

τ2

(a) Non-optimized task layout

main memory

τ1

τ3

τ2

cache

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1

τ3

τ2

(b) Optimized task layout

Figure 3.4: Schedules for Tasks τ1(1, 3, 3), τ2(1.5, 12, 12) and τ3(3, 6, 6) using Rate Monotonic fordi�erent task layouts.

procedures that are frequently called so that they do not interfere with each other. A more generalcache-aware code positioning optimization is proposed in [FK11], driven by wcet information basedon con�ict graphs to determine potential intrinsic con�icts. Tasks are split in fragments and a greedy-approach-based heuristic is used to position the di�erent fragments in the main memory. At each stepof the algorithm, a new wcet is computed and compared to the previous one, in order to know if anyimprovement has been achieved. Because tasks are split, code modi�cations have to be introducedsuch as unconditional jumps.

Task placement. Task placement aims to reduce inter-task cache con�icts by modifying, whenloading the tasks into the main memory, the position of each task as a whole, see the example depictedin Figure 3.4. The idea is to maximize the number of persistent cache sets to allow more precise wcetestimations for preemptively scheduled tasks [GA07]. However, the problem of �nding an optimaltask layout is NP-complete [GA07]. So heuristic approaches, using for example a simulated annealingalgorithm [GA07], can be used. ucb and ecb sets can be used to determine whether an evicted blockwill need to be reloaded or not [LAD12]. In [AG08], a metric to compare di�erent memory layouts isproposed. It allows, in particular, to approximate an optimal layout and memory accesses are classi�edas persistent or endangered. Eventually, a safe bound on the wcet can be computed, thanks to thatclassi�cation. Note that, for �xed-task priority scheduling, task positioning can allow similar processorutilization as edf [LAD14]. Moreover, in most cases, the use of a task layout leads to better resultsthan cache partitioning in terms of the number of schedulable tasksets [ADLD14]. However, as soon astask positioning is considered, changes in the taskset or in the scheduling policy implies to recomputelayouts, so possibly modifying the task wcets.

106

IV. ENHANCED SCHEDULING APPROACHES

III.4. Other techniques

Other methods to deal with cache interference reduction at the memory management level can be foundin the real-time literature. To eliminate crpds, the memory management can be altered. In [WA12],the memory management is modi�ed using a mechanism called Carousel: when a task starts its exe-cution, it saves the content of the cache lines it will use in the main memory and restores them when�nishing its execution. The additional delays needed to save/restore cache contents are included inthe task wcet. As a result, no crpd occurs anymore at the scheduling level. Another solution is todivide the cache into two layers [ANGM14]: one is used as a usual cache while the second one saves itscontent to the main memory or restores a previous content from the main memory. As a consequenceof these save/restore operations, any scheduling decision is delayed by the time needed to restore oneof the layers. So deadlines must be reduced by this delay: virtual deadlines are introduced. A similarapproach is used in [WP14] alongside with a non-preemptive �xed-priority scheduling algorithm. Notethat new cache replacement policies can also be implemented. The Sel�sh-lru policy [RAG+14] takesinto account the task to which the cached memory block belongs. Thus, crpds are decreased byavoiding some block reloads.

IV. Enhanced scheduling approaches

Reducing inter-task cache interference can also be handled at the scheduling level. The aim is toincrease the system schedulability ratio. Two approaches can be followed:

� reducing the number of preemptions or selecting some preemption points through schedulingmodi�cations to reduce the overall crpd overhead,

� explicitly considering cache-related preemption delays when taking scheduling decisions.

The �rst approach works on modifying existing algorithms while the second one deals with the moregeneral problem of �nding optimal scheduling algorithms when crpds are considered.

IV.1. Limited preemption scheduling

The system schedulability can be increased by controlling the preemptions, in particular under �xed-task priority scheduling, even when no crpd is considered. Indeed, as stated in [WS99] and illustratedin Figure 3.5, for �xed-task priority scheduling, preemptive and non-preemptive schedulings are in-comparable. Moreover, using limited-preemption scheduling can make a system, unschedulable underboth preemptive and non-preemptive schedulings, schedulable, as depicted in Figure 3.6.Historically, limited preemption methods only focus on increasing schedulability by controlling pre-emptions without considering any preemption delay. So, when taking crpds into account, the overallpreemption delay will not be necessarily reduced, see [BXM+11]. However, recent works have fo-cused on combinations between classic limited-preemption techniques and crpd-aware schedulabilityanalyses. We focus in particular on two limited-preemption scheduling approaches: �xed-task preemp-tion thresholds scheduling and �xed-task deferred preemption scheduling. Under �xed-task deferredpreemption scheduling a preemption can be postponed for some amount of time corresponding to a

107


Task Ci Ti Di Priorityτ1 1 6 4 3τ2 4 9 8 2τ3 5 18 15 1

Table 3.2: Taskset example used in Figure 3.5, 3.6 and 3.7 made of three synchronously-releasedperiodic tasks with constrained deadlines. We consider �xed-task priorities with priority(τ1) >priority(τ2) > priority(τ3). Note that no crpd is considered.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

τ1

τ2

τ3

(a) Schedule under �xed-task priority preemptive scheduling.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

τ1

τ2

τ3

(b) Schedule under �xed-task priority non-preemptive scheduling.

Figure 3.5: Non dominance of �xed-task priority preemptive scheduling and �xed-task priority non-preemptive scheduling for the taskset presented in Table 3.2.

Non-Preemptive Region (npr). Two di�erent models can be used for deferred preemption schedul-ing: the �oating-Non-Preemptive Region model (f-npr) and the �xed-Non-Preemptive Region model,also referred to as the Fixed Preemption Point model (fpp). A generalization of both �xed-taskpreemption thresholds scheduling and �xed-task deferred preemption scheduling has been introducedin [BvdHKL12]. Note that there also exist other techniques such as the one introduced by Dobrinand Fohler in [DF04]: instead of modifying classic preemptive scheduling policies, tasks attributes arechanged to reduce the number of preemptions.

Preemption Thresholds. Under preemption threshold scheduling, each task is given a preemptionthreshold θi alongside its priority [WS99]. A task can only be preempted by a higher priority taskwhich priority is also higher than the lower-priority task threshold.

Example 3.5: Consider an example of preemption threshold scheduling applied to thetakset presented in Table 3.2. For τ1 and τ2, the same preemption threshold is assumed θ1 =

108

IV. ENHANCED SCHEDULING APPROACHES

θ2 = 3, whereas for τ3 we assume a lower preemption threshold θ3 = 2. As a consequence,when at time 9, the second job of Task τ2 is released, it cannot preempt the running job ofτ3 as τ2's priority is not strictly higher than τ3's threshold (priority(τ2) = 2 = θ3). So nopreemption occurs at this point as depicted in Figure 3.6. Likewise, at time 12, the thirdjob of τ1 cannot preempt the running job of τ2 as τ1's preemption is not strictly higherthan τ2's threshold (priority(τ1) = 3 = θ2). As a consequence, every task meets its deadlineand the system is deemed schedulable which is not the case when classic �xed-task priorityscheduling is used as depicted in Figure 3.5a.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

τ1

τ2

τ3

Figure 3.6: Example of �xed-task preemption threshold scheduling for the taskset presented in Table 3.2with preemption thresholds: θ1 = θ2 = 3 and θ3 = 2.

Preemption thresholds were �rst implemented in the ThreadX RTOS2. The schedulability analysis forpreemption threshold scheduling [WS99, Reg02], based on the classic Response Time Analysis, can bemodi�ed to account for Cache-Related Preemption Delays [BAVH+14] using the ecb- and ucb-basedapproaches proposed amongst others in [ADM12].

Floating-Non-Preemptive Region. For deferred preemption scheduling under the �oating-Non-Preemptive Region model, each task has a maximum interval of time, called a Non-Preemptive Region(npr) during which it cannot be preempted by any other task [Bar05, BB10, YBB11]. When a higherpriority task is ready, the task already executing will only be preempted after a time equal to thelength of the npr. To account for crpds under the �oating-npr model, preemption delays can beincluded into the task wcet by bounding the number of preemptions a task may su�er [YBB11]. Anupper-bound on the crpd to be included into the wcet can also be computed using a preemptiondelay function [MNPP12b, MNPP12a]: the function represents the preemption delay tied with theprogression of the program execution. Worst-case response times can also be bounded when crpds areconsidered using best-case and worst-case execution times [RM08]. But as the Non Preemptive Regionis �oating, it is nearly impossible to take crpds into account to decide when to preempt or not.

Fixed Preemption Points. For deferred preemption scheduling under the Fixed Preemption Pointmodel (fpp), each job of each task is divided into non-preemptive subjobs. A task can only bepreempted between two consecutive subjobs, which is called a preemption point.

2http://rtos.com/products/threadx

109

http://rtos.com/products/threadx


Example 3.6: Consider again the taskset presented in Table 3.2. We assume that Taskτ2 consists in two subjobs of length 3 and 1 and τ3 in two subjobs of equal length 2.5. So τ2(respectively τ3) can only be preempted after it has been executed for exactly 3 (resp. 2.5)units of time. For example, as depicted in Figure 3.7, the second job of τ1 cannot preemptthe running job of τ3 at time 6. It has to wait until time 7.5 which corresponds to the end ofthe �rst subjob of τ3 (i.e. a Fixed Preemption Point). Likewise, τ2's execution is delayed attime 9, and τ1's execution is also postponed from time 12 to time 14. As a consequence, thetaskset, deemed unschedulable under classic �xed-task priority scheduling (see Figure 3.5a),is schedulable when using Fixed Preemption Points.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

τ1

τ2

τ3

Figure 3.7: Example of Fixed Preemption Point scheduling for the taskset presented in Table 3.2, withτ2 split in two subjobs of size 3 and 1 and τ3 split in two subjobs of equal size 2.5.

Note that the fpp model makes it possible to protect some code sections (small loops or sections withaccesses to shared resources) by including those critical sections in non-preemptive subjobs. Schedulingwith Fixed Preemption Points without dealing with crpds has been studied in particular in [BLV07,BLV09, YBB10, YBB11]. When crpds are considered, preemption delays have to be accounted forwhen computing the maximum blocking time a task can su�er because of a non-preemptive subjobof a lower priority task [ABW09]. Indeed, because of the preemption delay experienced by a taskwhen resuming its execution after a preemption, the next preemption point may be delayed and, asa consequence, the blocking time is increased. Preemption point placement can be e�ciently used todecrease the total crpd for a task as presented in the next section. Note that implementing preemptionpoints usually requires task code modi�cations [HBL08] which can be a challenge. In addition, non-preemptive subjob sizes have to be recomputed as soon as the taskset changes, and, as a consequence,the position of the preemption points in the task code may have to be changed.

IV.2. Cache-aware scheduling

The methods presented in the previous subsections aim to increase the system schedulability by con-trolling the preemptions. crpds are not considered or only accounted for during the schedulabilityanalysis. However, decreasing the number of preemptions does not necessarily decrease the totalcrpd [BXM+11]. So, crpds have to be considered when taking scheduling decisions. Such a goal canbe achieved by modifying the pre-processing step of di�erent classic scheduling policies: task prioritiescan be chosen or �xed preemption points be placed in the task code according to some crpd-basedcriterion. Those modi�cations have an e�ect at runtime on the decision of scheduling a job at a given

110

V. PROSPECTS

instant. But as the in�uence of cache issues on the scheduling decisions is only indirect, these solu-tions might not be optimal. So the problem of devising scheduling policies directly based on cacheparameters has also to be investigated.

crpd-aware pre-processing The Audsley's algorithm [Aud91] for assigning task priorities can beextended to take crpds into account [TSRB15]: at each priority assignment step the system schedula-bility is tested using the cache-aware version of the Response Time Analysis (presented in Section II.2).For preemption threshold scheduling, the adapted schedulability analysis can be used to devise an op-timal algorithm to assign preemption thresholds with the aim of minimizing the crpd [BAVH+14].Preemption threshold scheduling and partitioning can also be combined [WGZ15]: a same thresholdis given to tasks assigned to a same cache partition such that they cannot preempt each other. As aconsequence, no crpd is incurred. The partition and threshold assignment problem is formulated asan Integer Linear Program. A heuristic algorithm is also proposed. Finally, crpds can also be consid-ered when selecting preemption points under deferred preemption scheduling with Fixed PreemptionPoints. The easiest way to do it is to split each task at program points having the minimum numberof Useful Cache Blocks while ensuring that the longest non-preemptive interval for the task is notexceeded [SP95]. Such a method allows to reduce the total preemption overhead but does not computea globally optimal solution [CTF15]. So, optimal preemption point placement algorithms have beenproposed in [BBM+10, BXM+11, PFB14, CTF15] to minimize the total preemption overhead ratherthan necessary minimizing the number of preemption points. The crpd for each selected preemptionpoint is then added into the task wcet (as the number of preemption points and their respectivecosts are known). But such methods tend to overestimate the number of preemptions [LS12]. Indeed,when computing the task wcet accounting for crpds, all preemption points are assumed to result ina preemption.

Optimal cache-aware scheduling. None of the methods presented above is optimal in the sensethat they can schedule any feasible taskset as soon as Cache-Related Preemption Delays are consid-ered. So the more general problem of cache-aware scheduling, i.e. taking optimal scheduling decisionsaccording to cache issues, has to be considered. To the best of our knowledge, little research hasbeen conducted on optimal uniprocessor cache-aware scheduling. Some work exists however as far asmultiprocessors are concerned. But most of them deal with partitioned multiprocessor soft-real timescheduling [CA08, GSYY09]: cache-aware decisions in�uence only the taskset partitioning process inorder to reduce con�icts between tasks.Optimal cache-aware scheduling is one of the main focuses on this PhD work and has resulted in severalpublications [PRM15a, PRG+15]. We will in particular explore the complexity of this schedulingproblem in Chapter 4 and focus on �nding an optimal solution in Chapter 6.

V. Prospects

Most often, the methods presented before aim at solving a speci�c issue related to the use of cachememories in real-time embedded systems. Some focus on solving the predictability problem by ac-counting for crpds either in the task wcets or during the schedulability analysis. Other works focus

111


on eliminating those delays using for example a fully-partitioned cache or static locking. Finally, someauthors prefer to work on the schedulability problem using for example limited preemption techniques.Along with many approaches to reduce either crpds or the number of preemptions, a cache-awareschedulability analysis is proposed to ensure predictability, such as in [BMSO+96] for cache hybrid-partitioning or in [BAVH+14] for preemption threshold scheduling. Another solution can be to associatecache partitioning and cache locking as proposed in [VLX03]: cache partitioning is used to eliminateinter-task interference (i.e. crpds) whereas cache locking is aimed at insuring intra-task (intrinsic)interference predictability (for wcet computation).The two problems of predictability and schedulability are hard to solve at the same time. For example,fully-partitioning ensures predictability as no crpd is incurred. However, the consequent increase intask wcets may threaten the system schedulability. To overcome this issue, an interesting solution isto combine di�erent approaches focusing on di�erent goals. The work presented in [WGZ15] is a goodexample of what can be done in this direction. The authors combine partitioning with preemptionthreshold scheduling. They take advantage of the fact that preemption thresholds create groups ofnon-preempting tasks. So these tasks can share a common cache partition without incurring anycrpd. So the advantage of hybrid-partitioning (which allows a trade-o� between reducing extrinsiccache interference, i.e. crpds, without increasing too much the intrinsic cache interference, and as aresult the wcet) is coupled with the bene�t of fully-partitioning (which is to eliminate every crpd).Another solution is to apply cache-aware schedulability analyses to determine thresholds [BAVH+14]or task priorities [TSRB15]. Memory layouts can also be used in combination with limited preemptionpolicies. Cache locking could also be used along with Fixed Preemption Point scheduling as the numberof memory blocks needing to be reloaded (and so which would have to be locked) might be quite smaller(when using an analysis such as in [CTF15]).Those approaches are just a �rst step. Future work should be conducted on improving these combina-tions.

VI. Conclusion

In this chapter, we gave a brief glimpse of the existing strategies that can be found in the real-timeliterature to deal with cache issues in uniprocessor real-time scheduling. The methods described inthis chapter aim either at tightening crpd bounds to improve predictability or modifying existingscheduling policies to reduce these crpds in order to increase the system schedulability. What caneasily be seen is that no solution clearly outperforms the other ones. Moreover, very often, no optimalsolution exists such as proved for cache-aware scheduling or cache locking.Combinations between di�erent methods can allow signi�cant improvements. For example, the use ofcache partitioning/locking or memory layouts enables to decrease extrinsic interference which in turndecreases the cost of a preemption. Then, using a limited preemptive policy such as Fixed PreemptionPoint scheduling, the number of preemptions can be reduced and so is the total crpd. Finally, usinga cache-aware schedulability analysis ensures predictability and avoids wasting hardware resources.But as none of these methods is optimal, the more general problem of scheduling with cache memorieshas to be further studied. This is the main goal of the following chapters of this PhD work.

112

Part II

Contributions

113

Chapter

4 Real-Time Scheduling with Cache

Memories

Contents

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

II Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

II.1 Limitations of classic scheduling approaches . . . . . . . . . . . . . . . . . . . 117

II.2 Scheduling approaches accounting for the cache . . . . . . . . . . . . . . . . . 118

II.3 Proving computational complexity results . . . . . . . . . . . . . . . . . . . . 119

III The Cache-aware Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 120

III.1 Cache-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

III.2 Core Problem De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

III.3 Complexity of the Preemptive Problem . . . . . . . . . . . . . . . . . . . . . . 124

III.4 Complexity of the Non-Preemptive Problem . . . . . . . . . . . . . . . . . . . 129

III.5 Limitations of the Cache-aware approach . . . . . . . . . . . . . . . . . . . . . 130

IV The crpd-aware Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 131

IV.1 crpd-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

IV.2 Core Problem De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

IV.3 Complexity of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

IV.4 Optimal algorithm for corner cases . . . . . . . . . . . . . . . . . . . . . . . . 136

IV.5 Discussion on the crpd-aware approach . . . . . . . . . . . . . . . . . . . . . 137

V Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Abstract

In this chapter, we consider the general problem of scheduling hard real-time independenttasks on uniprocessor systems with cache memories. In particular, we focus on the problemof taking scheduling decisions accounting for cache issues. First, we identify parameters torepresent the cache. In particular, we distinguish between two di�erent problems: for the Cache-aware scheduling problem, the scheduler bases its decisions on information about the cache stateand the system memory requirements, whereas for the crpd-aware scheduling problem, thescheduler takes its decisions using information about crpds. Then, we study the computationalcomplexity of those problems and show that, unfortunately, they are both NP-hard which meansthat no scheduling algorithm, running in polynomial time, can optimally solve them.

115

CHAPTER 4. REAL-TIME SCHEDULING WITH CACHE MEMORIES

116

I. INTRODUCTION

I. Introduction

As seen in the previous chapters, the e�ect of cache memories has to be accounted for in real-timescheduling in order to ensure predictability. The cache impact on task execution times can be boundedand taken into account for classic policies such as Rate Monotonic and Earliest Deadline First (seeChapter 3, Section II.2). But because of the preemption overhead due to additional cache misses, thereis a loss of schedulability. This schedulability loss will be studied more precisely in Chapter 7. Toovercome the schedulability problem, the cache has to be considered when taking scheduling decisions.In this chapter, we focus on the general problem of scheduling hard real-time independent tasks ona uniprocessor with a cache memory. To be able to design schedulers taking the cache into account,we �rst identify parameters that can be used to represent the cache impact on the system. Then, weevaluate the computational complexity of taking scheduling decisions while accounting for the cachee�ect.First, in Section II, we discuss which parameters can be used to represent the cache impact on thesystem and identify two distinct scheduling problems. Then, in Sections III and IV, we study thecomputational complexity of each of these problems. Finally, Section V summarizes and concludesthis chapter.

II. Problem Statement

We �rst show the need of considering cache e�ects when taking scheduling decisions. Then, we presentbrie�y two scheduling approaches to deal with cache memories. Eventually, we recall some complexityresults which will be used to prove the computational complexity of both scheduling problems.

II.1. Limitations of classic scheduling approaches

Additional delays due to inter-task cache interference can compromise the system predictability and alsoits schedulability. As shown, in Chapter 3, Section II.2, predictability can be ensured at the schedulinglevel by introducing upper-bounds on the Cache-Related Preemption Delays in the schedulabilityanalysis. As for the schedulability matter, improvements can be achieved at the scheduling levelthrough the reduction in the number of preemptions or the limitation of the number of preemptionpoints in the task code (see Chapter 3, Section IV.1). These reductions/limitations can be guidedby crpd-aware constraints. However, they mainly occur at a pre-processing step: task priorities aremodi�ed or thresholds are computed based on crpd considerations and then tasks are scheduled usingclassic scheduling policies such as rm and edf. Of course, through thresholds or preemption points, thebehaviour of such policies is altered in a way to reduce the cache impact on the system schedulability.But in some cases, such modi�cations may still induce more additional delays than necessary, causingthe system to miss a deadline. As a consequence, those techniques are not optimal for the problem ofscheduling tasks on a uniprocessor system with a cache memory.

Example 4.1: Consider the taskset de�ned in Table 4.1.No feasible �xed preemption point placement can be computed for this taskset using theformulas given in [BBM+10]. Indeed, Task τ3 would require to be split in non-preemptive

117


Task Ci Ti Di si

τ1 1.5 2 2 0.25τ2 0.5 4 4 0.1τ3 1.5 16 16 0.5

Table 4.1: Taskset TExample1.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

τ1

τ2

τ3

Figure 4.1: Valid schedule for Taskset TExample1.

regions no greater than 0.5. But as the crpd for τ3, i.e. s3, is also equal to 0.5, no placementis possible (as the crpd has to be accounted for in the preemptive region).But as depicted in Figure 4.1, a valid schedule can be constructed for this taskset and as aconsequence, the taskset is feasible.As τ1 and τ2 must execute for 7 units of time over [0, 8) and also 7 units of time over [8, 16),then τ3 can only be executed for one unit of time in [0, 8) and also one unit of time in [8, 16).As a result, the schedule depicted in Figure 4.1 is the only valid schedule for this taskset.Suppose that this schedule could be constructed with a �xed-task priority scheduler. Then,at time 4, τ3 has necessarily either a priority or preemption threshold greater than the oneof τ1 and τ2. But τ3 cannot have a priority greater than the one of τ1 or τ2, otherwise, τ3could have begun its execution at time 0. Moreover, τ3 cannot have a threshold higher thanthe priority of τ1 and τ2, otherwise, it could not be preempted at time 4.5. So, no schedulerusing �xed-task priorities with preemption thresholds can construct such a schedule.

The problem is that rm and edf base their scheduling decisions without considering explicitly anycache parameter. rm's scheduling choices are based on the task periods: if a task has a smaller period,it must be executed sooner to avoid missing a deadline. For edf, scheduling decisions are taken usingthe job absolute deadlines: if a job has a nearest deadline, then it must be executed sooner to avoid adeadline miss.

II.2. Scheduling approaches accounting for the cache

Classic schedulers take their decision to schedule a ready job at a given time based on the urgency ofthat job. There are several ways to assess the urgency of a job:

� the period of the task generating the job (i.e. the rate at which the task will issue new jobs),

118

II. PROBLEM STATEMENT

� the relative deadline of the task generating the job,

� the absolute deadline of the job,

� the laxity of the job de�ned as D(t) − C(t), D(t) being the remaining time before the job'sdeadline and C(t) the job's remaining execution time,

� ...

All of these measures rely on classic task parameters such as the task period Ti or the task deadlineDi. But none of them considers the cache impact. As a consequence, preemptions may occur causingadditional delays and resulting in deadline misses. So, considering parameters which capture the cachee�ect is necessary when taking scheduling decisions.But it raises a main issue: which cache parameters are relevant to design e�cient schedulers?

Cache-aware scheduling. The �rst way is to consider precisely the cache accesses performed duringthe system execution. As a consequence, the cache content at each instant has to be known. Moreover,the memory requirements of each task must be known, i.e. the sequence of memory blocks accessedduring the task execution. Using those pieces of information, alongside classic timing parametersassessing the urgency of each job, the scheduler can decide which job to schedule in order to encouragecache reuse or equivalently to try to minimize cache thrashing. This approach will be referred to asCache-aware scheduling.

crpd-aware scheduling Another way to assess the cache impact on schedulability is to considerinformation about the crpds. At each instant, the scheduler knows the cost of preempting a runningjob. As a consequence, it can decide to preempt the job or not in order to reduce the overall Cache-Related Preemption Delay and, as a result, the processor workload. This second approach will bereferred to as crpd-aware scheduling.

II.3. Proving computational complexity results

We recall here some notions about complexity (already presented in Chapter 2, Section III.4) whichwill be useful to prove complexity results for the Cache-aware and crpd-aware scheduling problems.

Complexity classes. The main complexity classes we will consider are:

� P: corresponds to the problems that can be solved using an algorithm running in polynomialtime,

� NP: corresponds to the problems that cannot be solved using an algorithm running in polynomialtime (unless P 6= NP).

We will assume P and NP classes to be disjoint (P 6= NP) even if this claim remains unproved yet.In particular, we will deal with:

� NP-complete problems: they informally correspond to the hardest problems in NP,

119


� NP-hard problems: they informally correspond to problems as hard as any NP-complete problembut which are not necessarily in NP.

For those problems, we distinguish between:

� NP-complete/NP-hard problems in the weak sense: those problems may be solved using algo-rithms running in pseudo-polynomial time,

� NP-complete/NP-hard problems in the strong sense: those problems require algorithms runningin exponential time.

Unless P = NP, the number of elementary operations required to solve a strongly NP-complete/NP-hard problem is exponential in the size of the input instance [GJ79]. Hence, assuming that suchelementary operations are performed with a �xed amount of time, then the algorithm runs necessarilyin exponential time.

Proof technique. To prove that a problem is a NP-hard, we will use the following classic prooftechnique:

1. �nd an already-known NP-complete problem,

2. construct a polynomial transformation to reduce this NP-complete problem to our consideredproblem.

Informally, it means that solving our considered problem is at least as hard as solving the consideredNP-complete problem.

Adopted approach. Hereafter, to prove that both the Cache-aware and crpd-aware schedulingproblems are NP-hard, we will consider simpli�ed problems. The purpose of using simplistic schedulingmodels is to simplify as much as possible the proofs without loss of generality. Actually, these simpli�edproblems serve as core problems to cover the largest set of scheduling problems on uniprocessor systemswith cache memories. These core problems are indeed reducible to more general problems since theyde�ne particular cases for these general problems.

III. The Cache-aware Scheduling Problem

We �rst consider the Cache-aware scheduling approach. We study in particular the computationalcomplexity of this scheduling problem. To do so, we �rst introduce some assumptions about cacheand task models that are used to de�ne the Cache-aware core scheduling problem. Then, using thosesimpli�ed cache and task models, we prove that the corresponding scheduling problem is NP-hard bothfor preemptive and non-preemptive systems.

120

III. THE CACHE-AWARE SCHEDULING PROBLEM

i t e r a t i o n := 0 ;while i t e r a t i o n <= 1 loop

i t e r a t i o n := i t e r a t i o n + 1 ;end loop ;i t e r a t i o n := 0 ;

(a) Pseudo-code.

a {0x00000000: mov r3, #0

0x00000004: b 0xc

b {0x00000008: add r3, r3, #1

0x0000000c: cmp r3, #1

c {0x00000010: ble 0x8

0x00000014: mov r3, #0

(b) Pseudo-assembly code.

Figure 4.2: Pseudo-code example and corresponding pseudo-assembly code.

III.1. Cache-aware scheduling

We recall that the Cache-aware scheduling approach is based on the knowledge of the system memoryrequirements and the content of the cache at each instant.Under this approach, every task code is modeled by the set of memory blocks requested by the taskduring its execution. Actually, we consider the addresses of those memory blocks in the main memoryin order to compute their mapping in the cache. These blocks may be either instructions or data. Notethat the results presented hereafter also hold for separate instruction caches and data caches. Thescheduler knows the sequence of memory blocks accessed by each task and uses it to take schedulingdecisions depending on the current cache state, in order to respect all timing requirements, i.e. alldeadlines have to be met.Hereafter, requested memory blocks are represented by letters from the Latin alphabet.

Example 4.2: We consider the simple task code example depicted in Figure 4.2a. Thetask is made of a simple loop with two iterations. The corresponding pseudo-assembly codeis depicted in Figure 4.2b, assuming the ARMv7-R instruction set used for example in theARM® Cortex®-R4 processor. For sake of simplicity, we assume the code to start at Address0. ARMv7-R instructions have an identical size of 4 Bytes. If we consider a cache with aline size of 8 Bytes, then we have 2 instructions per memory block and so per cache line. Asdepicted in Figure 4.2b, the �rst two instructions belong to Memory block a, the followingtwo to Block b and so on. As a result, the code of this simple task can be represented bythe following string: aabcbbcbbcc.

The main idea under this approach is for the scheduler to minimize cache thrashing or, equivalently,to maximize cache-reuse. In particular, maximizing cache-reuse could be appropriate for:

� instruction caches when tasks share large amounts of code, which is the case when sharedlibraries are used,

� data caches when tasks work on the same data, which can be the case when doing matrixcomputation.

121


III.2. Core Problem De�nition

We list hereafter the simpli�cations used to derive our two Cache-aware core scheduling problems.

Assumption 1. The cache consists of a single cache line. A hit is performed at no cost and the misspenalty is equal to a constant brt corresponding to the Block Reload Time.

By considering a cache memory with only one cache line, Assumption 1 de�nes the simplest particu-lar case that covers all cache types: either direct-mapped, set-associative or fully-associative caches.Precisely, it is always possible to de�ne input instances (i.e. cache memory accesses and mapping ofblocks in the main memory) that lead a set-associative cache as well as a fully-associative cache toreload/evict the same blocks as for a direct-mapped cache.

Example 4.3: Consider a cache with four cache lines. Memory blocks, represented byletters from the Latin alphabet, are mapped into the cache according to their position inthe alphabet. For example, considering a direct-mapped mapping, Memory block a is atposition 0 in the alphabet and so will be stored in the �rst cache line. Memory block e,being at position 4 in the alphabet, will also be stored in the �rst cache line as 4 mod(4) = 0.

� Consider the cache as a direct-mapped cache with the following memory access se-quence: a → e → a. We assume the cache to be initially empty. Memory block a isloaded, then evicted by Block e and so has to be reloaded when accessed next.

� Consider now the cache as a 2-way set-associative cache using either the lru or fiforeplacement policy. Memory blocks a, c, e, ..., are mapped to the �rst cache line. Wede�ne the following input memory access sequence: a → c → e → a. Assuming aninitially empty cache, we have Block a loaded into the �rst cache set, then Block c isalso loaded into the �rst cache set without evicting any block as there is an empty cacheline. Then, Block e is loaded into the �rst cache and evicts Block a as there is no freecache line in the cache set and Block a is the oldest memory block in that set. So, Blocka has to be reloaded when accessed next. So, as for the direct-mapped mapping, theaccess to Block e leads Block a to be evicted from the cache and the following accessto Block a results in a reload.

� Finally, consider the cache as a fully-associative cache using either the lru of fiforeplacement policy. We de�ne the following input memory access sequence: a → b →c → d → e → a. Assuming an initially empty cache, we have Memory blocks a, b, cand d which are successively loaded into the cache as there is at least one free cacheline. Then, Block e is loaded into the cache, evicting Block a as there is no free cacheline and Block a is the oldest block in the cache. So, Block a has to be reloaded whenaccessed next. So, as for the direct-mapped and set-associative mappings, the access toBlock e leads Block a to be evicted from the cache and the following access to Block aresults in a reload.

Assumption 1 only focuses on corner cases that lead to worst-case cache utilization. Hence, thepresented NP-hardness results remain valid for all cache mapping strategies (i.e. direct-mapped, fully-

122


associative and set-associative). Furthermore, this assumption also exhibits that the computational

complexity of the Cache-aware scheduling problem is independent of the cache size.We assume a basic task model in which every job has a single execution path in its control �owgraph. This is the most simple task structure that can be considered. For the Cache-aware schedulingapproach, every task code is modeled by the set of memory blocks that are accessed by the task duringits execution. These blocks may contain either instructions or data. The results presented hereafteralso hold for separate instruction and data caches. Referenced memory blocks are denoted by lettersin a �nite alphabet Σ. As a consequence, the sequence Si of accessed memory blocks for Task τi canbe modeled by a word of Σ∗.

Assumption 2. The Control Flow Graph (cfg) of every job consists of a single execution path withall loops being unrolled.

For our core problem, we only consider a �nite set of jobs. Each job is characterized by its executionrequirements (i.e. wcet) assuming a cache hit for every requested memory block, its deadline and itssequence of accessed memory blocks. The instant at which a memory block is requested is not de�nedin the task model.We now present the Cache-aware job model used hereafter:

De�nition 10. A job Ji is de�ned by Ji(pi, di,Si), where:

� pi is the job's wcet assuming a cache hit for any reference,

� di is the absolute deadline of the job,

� Si is a string from Σ∗ denoting the sequence of memory blocks used during the job execution.

Finally, we make a third assumption to limit preemptions just before speci�c points in the code:

Assumption 3. For preemptive systems, a preemption can only occur just before a job requests itsnext memory block.

Limiting preemption points in the code can only improve the number of cache hits since the cachecontent can only change at discrete points in time [BBY13].For a data cache, the intersection Si ∩ Sj corresponds to common data blocks used by both Tasks τiand τj . In this case, the sequence Si can be totally arbitrary, without any pattern constraints. Onthe contrary, for an instruction cache, Si ∩ Sj represents code shared by Tasks τi and τj (e.g. sharedfunctions, library code, operating system). In this case, instruction blocks that are referenced severaltimes in sequences Si, 1 ≤ i ≤ n must necessarily correspond to either a function call or code withinloops inside a task. Without modifying De�nition 10, Sequence Si can represent requested blockscorresponding simultaneously to instructions and data.Note that, as a consequence of Assumption 2, pi is de�ned by the length of Si (i.e. pi = |Si|). Hence,this wcet assumes that all requested memory blocks are hits in the cache.

Example 4.4: Consider the following instance in which all blocks are shared by all tasks(e.g. data cache): Σ = {a, b, c}, three jobs with a Block Reload Time brt = 0.5 and acommon deadline d = 21:

123


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

J1 b a b c c

J2 a a c c b c

J3 b a c c a

Cache Miss Miss Hit Miss Hit Hit Miss Hit Miss Miss Hit Hit Miss Miss Miss Hit

Figure 4.3: Schedule of three jobs using Memory blocks a, b and c. A timing penalty brt = 0.5 isincurred at each cache miss.

� J1(5, 21, babcc),

� J2(6, 21, aaccbc),

� J3(5, 21, bacca).

Since∑

i pi = 16 and there are exactly 16 memory block requests, any feasible schedule mustnot experience more than 10 cache misses in order to meet the overall deadline d. Withoutloss of generality, we assume in this example that every portion of code requests one memoryblock for one unit of time. If the corresponding block is in the cache, there is no additionaltiming penalty, but if it is a miss, then loading the block from the main memory requires 0.5unit more, due to the main memory access (i.e. the brt). These values have been chosen forthe ease of graphical representation in Figure 4.3.In Figure 4.3, the �rst two blocks that are requested by J1 and J2 are not cached. Asa consequence, their executions incur a penalty of brt units of time. But, when J3 �rstrequires Block b, it results in a cache hit, and thus no penalty is incurred.Clearly, it is easy to de�ne a schedule in which there are only cache misses (for example byexecuting J1 then J2 then J3 non preemptively) with a total length of 24 (i.e. a brt hasbeen incurred at every cache access). As a consequence, at least one job misses the deadline.

Since, we present a rather simpli�ed scheduling model, the complexity results presented hereafter

can be reused to prove the hardness of more general task and cache models in which all

the previous assumptions are no longer assumed.

III.3. Complexity of the Preemptive Problem

We �rst consider the preemptive case. The core scheduling problem for the Cache-aware preemptivecase is de�ned as the following decision problem that will be proved to be NP-hard:

De�nition 11. The Preemptive Scheduling problem with Cache Memory (PSCM) is:

� INSTANCE: a �nite alphabet Σ, a �nite set of n jobs Ji(pi, d,Si) released at time 0, with executionrequirements pi, sequences of accessed memory blocks Si ∈ Σ∗, a common deadline d and a positivenumber brt.

124


0 1 2 3 4 5 6 7 8

J1 a b a

J2 b a b

Cache Miss Miss Hit Miss Hit Miss

Figure 4.4: Feasible schedule for Jobs J1(p, d, aba) and J2(p, d, bab) with p = 1, r = 0.5.

� QUESTION: Is there a uniprocessor preemptive schedule meeting the overall deadline d so thatevery hit in the cache is performed without any penalty and every miss has a penalty of brt unitsof time?

We �rst show that classic scheduling policies cannot optimally solve the PSCM problem:

Theorem 1. Task-level and job-level �xed priority schedulers are not optimal for the PreemptiveScheduling problem with Cache Memory (PSCM).

Proof. To prove that job-level �xed priority schedulers are not optimal for the PSCM problem, we usea simple counter-exampleWe consider a cache with brt = r, r being an arbitrary positive number and two synchronously releasedjobs having the same absolute deadline equal to d = 2 × p + 4 × r: J1(p, d, aba) and J2(p, d, bab), pbeing an arbitrary positive number. As both jobs start at time 0, any job-level �xed priority scheduler(as well as task-level �xed priority ones) de�nes a priority ordering before starting one of these twojobs. Without loss of generality, we assume that the scheduler gives to J1 a priority higher than thepriority of J2. The sequence of memory blocks in the cache will be: σ = ababab leading to 6 cachemisses and no hit. As a consequence, the schedule has a length of 2× p + 6× r > d. For example, ifp = 3 and r = 0.5, the schedule has a length of 9 units of time.But a feasible schedule for J1 and J2 can be constructed using a full dynamic priority scheduler leadingto the memory access sequence σ = abbaab. This schedule can be obtained if:

� J1 has a higher priority than the one of J2 in the interval [0, 3],

� then J2 is given a higher priority than the one of J1 in [3, 5.5],

� and �nally J1 is given a higher priority than the one of J2 in [5.5, 8].

There are two cache hits and four cache misses leading to a schedule of length 2 × p + 4 × r ≤ d .So both jobs meet the common deadline d. This is the smallest number of hits that can be achievedwhile scheduling these two jobs. An example of feasible schedule is depicted in Figure 4.4 assumingonce again p = 3 and r = 0.5.This counter example is also valid for �xed-level task priority schedulers since those schedulers are onlya particular case of job-level �xed priority ones.

125


Algorithm 1: SCHED(J ,w).

input :Jj(Cj , Sj), 1 ≤ j ≤ n ;w : Shortest Common Supersequence of si's;

kj := 1 ∀j = 1..n ;foreach i := 1, . . . , |w| do

/* For each block in the Shortest Common Supersequence w */

foreach j := 1, . . . , n do/* For each job in J */

while sj,kj = wi do

/* the job uses block wi until next preemption point */

DISPATCH(Jj); /* Execute Jj up to its next preemption point in time */

kj := kj + 1;end

end

end

To prove that the PSCM problem is NP-hard in the strong sense, we will make a reduction from theShortest Common Supersequence problem, denoted SCS hereafter, which is known to be NP-hard inthe strong sense [Mai78, GJ79]. We �rst recall basic de�nitions used for the problem statement:

De�nition 12. Given a �nite sequence σ = s1, s2..., sm, a subsequence σ′ of σ, written σ′ < σ, isde�ned as any sequence which consists of σ with between 0 and m terms deleted.

De�nition 13. Given a set R = {σ1, . . . , σp} of sequences, a Shortest Common Supersequence of R,denoted SCS(R), is a shortest sequence such that every σi, 1 ≤ i ≤ p is a subsequence of SCS(R) (i.e.σi < SCS(R), 1 ≤ i ≤ p).

Example 4.5: Consider for example the following three sequences abbb, bab and bba. Acorresponding Shortest Common Supersequence is:

SCS({abbb, bab, bba}) = abbab

We can now present the SCS problem:

De�nition 14. (Problem [SR8] from [GJ79]) The Shortest Common Supersequence (SCS) problemis:

� INSTANCE: a �nite alphabet Σ, a �nite set R of strings from Σ∗ and a positive integer K.

� QUESTION: Is there a string w ∈ Σ∗ with |w| ≤ K such that each string x ∈ R is a subsequenceof w, i.e w = w0x1w1x2 . . . xkwk where each wi ∈ Σ∗ and x = x1x2 . . . xk?

126


Theorem 2. (from [Mai78]) The SCS problem is NP-complete in the strong sense.

This complexity result implies that there does not exist a polynomial or pseudo-polynomial timealgorithm to solve it. Note that polynomial special cases are known if |R| = 2 or if all x ∈ R have|x| ≤ 2. Furthermore, it is also MAX-SNP hard [JL95], meaning that it is hard to approximate (i.e.no polynomial time approximation scheme - PTAS - exists) unless P = NP.We can now formulate the NP-hard result for the PSCM problem:

Theorem 3. The Preemptive Scheduling problem with Cache Memory (PSCM) is NP-hard in thestrong sense.

Proof. We construct a reduction from the SCS problem to the PSCM problem. We de�ne an instanceof the PSCM problem from an arbitrary instance (Σ, R,K) of the SCS problem as follows:

� the �nite alphabet Σ is used to de�ned memory blocks of the considered cache line,

� the common deadline is equal to d =∑

x∈R |x|+K×brt, where brt is assumed to be a positivenumber,

� for every x ∈ R we de�ne a job Ji with an execution requirement pi = |x| and a cache requestsequence Si = x (i.e. Ji(|x|, d, x).

We now prove that there exists a solution to the SCS instance (Σ, {Ji(pi, d,Si)},brt), if and onlyif, there exists a solution to the SDCM instance (Σ, R,K). The principle of the transformation is toestablish that the Shortest Common Supersequence corresponds to the schedule with the minimumnumber of cache misses.

(if part) There exists a shortest common supersequence w of length K or less. By construction, wis a shortest common supersequence of jobs Si, 1 ≤ i ≤ n. We use w to schedule jobs so thatcache accesses exactly follow w. We describe the simple scheduling algorithm. Let sj,k be thekth block requested in Sj for Job Jj and let kj be the next requested block in that sequence:starting from w1, we schedule every job so that sj,kj = w1 for one unit of time in arbitrary order.For these scheduled jobs, we increment indexes kj , 1 ≤ j ≤ n. Then, the same scheduling rule isapplied to every subsequent wj , j ≤ |w|. Algorithm 1 presents the corresponding pseudo-code.In this algorithm, at most one cold cache miss is paid for every wi (i.e., exactly one if wi 6= wi−1,zero otherwise since the block is already cached). The number of loaded blocks in the cache lineis thus bounded by K, the length of the supersequence. As a consequence, the length of thisschedule is bounded by

∑ni=1Ci +K.BRT ≤ D and the common deadline is met for all jobs.

(only if part) Assume that we have a schedule meeting the overall deadline D. Let σ be the cor-responding sequence of block requests in the considered cache line. Necessarily,

∑ni=1Ci +

K.BRT ≤ D, where K is de�ned by construction.

Without loss of generality, we assume that subsequences in Σ composed of one single memoryblock (i.e. single letters) are sorted using a wrapping around technique on job indexes. Such anordering does not change the number of cache hits or misses since reordered blocks are consecutiveand identical in σ. Notice that, by construction, σ contains all x ∈ R since |σ| =

∑x∈R |x|.

127


Step Sequence0 w0 = abbaaaccbcccbacc1 w1 = abbaaaccbcccbacc2 w2 = abaaaccbcccbacc3 w3 = abaccbcccbacc4 w4 = abacbcccbacc5 w5 = abacbcccbacc6 w6 = abacbcbacc7 w7 = abacbcbacc8 w8 = abacbcbacc9 w9 = abacbcbac

Table 4.2: Construction of a common supersequence (w9) starting from the cache request sequence(w0) associated to an arbitrary schedule.

We start from the beginning of w0 = σ by repeating the following step: at step i we aggregateall subsequent identical blocks corresponding to di�erent jobs for de�ning wi (i.e. only one blockof a given job will be aggregate at every step). We denote w, the sequence obtained from thisprocess. Due to the assumption on the ordering of identical blocks, aggregated letters are alwaysconsecutive in the sequence.

The next two claims prove that w is a supersequence of at most length K:

Claim 1: w is a supersequence.(By induction) Initially, all x ∈ R are subsequences of w0. Let wi, i ≥ 0, be the sequence ata step verifying the induction hypothesis. Consider the obtained sequence wi+1 computedfrom wi: since aggregated letters come from di�erent jobs by construction, then it followsthat all x ∈ R are subsequences of the sequence wi+1. Thus, the sequence w obtained bythis process is a supersequence of all x ∈ R.

Claim 2: the length of w is at most K.The worst-case number of cache misses is necessarily obtained when all subsequent lettersin w are distinct. We consider hereafter this worst-case scenario. As a consequence, |w|corresponds to the number of cache misses which is less than or equal to K by construction,since the overall deadline D =

∑x∈R |x|+K.BRT is met. Hence, w is a common sequence

of length at most K, in the considered worst-case scenario.

Example 4.6: We illustrate the reduction proposed in the proof of Theorem 3, on thesystem which schedule is depicted in Figure 4.3. For this system, we have:

� Σ = {a, b, c}, and

128


� J1(5, babcc), J2(6, aaccbc) and J3(5, bacca).

The construction of a common supersequence from a feasible schedule is illustrated in Ta-ble 4.2. At each step, subsequent identical blocks coming from di�erent jobs are aggregated.

III.4. Complexity of the Non-Preemptive Problem

We now consider Cache-aware non-preemptive scheduling. Uniprocessor non-preemptive scheduling isa well studied problem when no cache is considered. The problem of scheduling non-preemptively a�nite set of jobs with release dates and deadlines is already known to be NP-hard (see [GJ79], Problem[SS1]). Note that, if the jobs are known a priori and they are released simultaneously and are subjectedto individual deadlines, then edf (also known as Jackson's rule) is a universal scheduling algorithm.To de�ne the core scheduling problem for the Cache-aware non-preemptive case, we assume a simpli�edtask model in which every task accesses only one memory block:

Assumption 4. Each task accesses only one memory block during its execution.

As for the preemptive case, we still assume that the cache memory consists in a single cache linecontaining one memory block.

De�nition 15. The Non-Preemptive Scheduling Problem with Cache Memory (NPSCM) is:

� INSTANCE: a �nite alphabet Σ, a �nite set of n jobs Ji(pi, di,Si), with execution requirementspi, deadlines di, sequences of accessed memory blocks Si ∈ Σ, and a positive number brt corre-sponding to the Block Reload Time.

� QUESTION: Is there a uniprocessor non preemptive schedule meeting all deadlines di so thatevery hit in the cache is performed without any penalty and every miss has a penalty of brt

units of times?

The hardness proof is based on a transformation from the following sequencing problem that is knownto be NP-complete in the weak sense [BD78, GJ79]:

De�nition 16. (Problem [SS6] from [GJ79]) Sequencing with deadlines and setup times is de�ned asfollows :

� INSTANCE: a set C of "compilers", a set T of tasks, for each t ∈ T a length l(t) ∈ Z+, adeadline d(t) ∈ Z+, and a compiler k(t) ∈ C and for each c ∈ C a "setup time" l(c) ∈ Z+.

� QUESTION: Is there a uniprocessor schedule σ for T that meets all task deadlines and thatsatis�es the additional constraint that, whenever two tasks t and t′ with σ(t) < σ(t′) are scheduled"consecutively" (i.e., no other task t′′ has σ(t) < σ(t′′) < σ(t′)) and has di�erent compilers (i.e.k(t) ≤ k(t′)), then σ(t′) ≥ σ(t) + l(t) + l(k(t′))?

Theorem 4. (from [BD78]) The problem of sequencing with deadlines and setup times ([SS6]) isNP-complete in the weak-sense.

129


Note that this result holds even if setup times are identical [GJ79].Now we state that the NPSCM problem is NP-hard in the weak sense:

Theorem 5. The Non-Preemptive Scheduling problem with Cache Memory (NPSCM) is NP-hard inthe weak sense.

Proof. We construct a polynomial reduction from the [SS6] problem which is NP-Complete in theweak sense.Hereafter, we consider a constant setup time L for all c ∈ C. Let us consider an arbitrary instance of[SS6] to de�ne an instance of our scheduling problem:

� Σ = C.

� For every task t ∈ T , we de�ne a job Ji with parameters pi = l(t), di = d(t) and Si = k(t). Thus,we assume that every job uses only one memory block mapped into the cache line.

� brt = L is the Block Reload Time.

Clearly, a task compiler corresponds to the memory block used by a job to be cached in the cacheline. The setup time corresponds to the block reload time whenever a memory block is not in thecache line. The block reload time in our scheduling problem with cache memory corresponds exactlyto the setup time in the sequencing problem. As a consequence, both problems are equivalent. Hence,the [SS6] problem has a solution if, and only if, the corresponding scheduling problem has a feasiblesolution.

The previous transformation only establishes that the non-preemptive cache-aware scheduling problemis NP-hard in the weak sense. This means that the existence of a pseudo-polynomial time algorithmcannot be excluded. Nevertheless, we think that, as for the preemptive case, this problem is actuallyharder and that no pseudo-polynomial time algorithm exists to solve this problem. However, wecurrently have no formal proof that NPSCM is NP-hard in the strong sense.

III.5. Limitations of the Cache-aware approach

The Cache-aware scheduling approach is interesting as it allows a precise view of the system and, as aconsequence, could achieve better schedulability results. But, because of this higher level of precision,the Cache-aware scheduling approach has several drawbacks:

� First, in real-life applications, a task code can be made of thousands of instructions and sothousands of memory accesses. As a result, the task model becomes very complex and hard todeal with.

� Secondly, even for a task with a small code, representing the task memory requirements is notan easy work. Indeed, very often, a task cfg is made of several conditional statements whichmight de�ne mutual exclusive paths. As a result, the task has multiple possible execution paths.To compute the wcet, the worst-case execution path is assumed. But during normal executiontime, a task might probably not execute its worst-case execution path. Moreover, the di�erentjobs of a same task might execute di�erent execution paths. So, for the Cache-aware approach,every possible path (and so every combination with every other task) should be considered or asafe worst-case scenario has to be found.

130

IV. THE CRPD-AWARE SCHEDULING PROBLEM

As a result, the Cache-aware scheduling approach is nearly impossible to use in practice, except forvery simple systems.

IV. The crpd-aware Scheduling Problem

We now consider the crpd-aware scheduling approach. As for the Cache-aware scheduling problem,we study the computational complexity of the crpd-aware scheduling problem. First, the crpd-awarecore scheduling problem is stated. Then, we prove this problem to be NP-hard in the strong sense.Finally, we discuss the crpd-aware scheduling approach and conclude by presenting the task modeladopted in the remainder of this thesis.

IV.1. crpd-aware scheduling

Under the crpd-aware approach, the cache e�ect is assessed through induced preemption delays.Contrary to the Cache-aware scheduling problem, the task memory accesses and the cache content ateach instant are not known. Instead, we use pre-computed upper-bounds on the additional delays dueto the cache every time a task resumes after a preemption. Scheduling decisions are taken based oncrpd information.Under this approach, an additional crpd parameter is used for each task to represent the cost ofpreempting this task. We will discuss the crpd parameter model in Section IV.5.Note that this approach corresponds to the one adopted in most of the existing work dealing with cachememories. In [LHS+97, ADM12, LAMD13], crpd bounds are used only to ensure determinism for theschedulability analysis, but in [BBM+10, BXM+11], crpd bounds are used to compute preemptionpoints and so alter the scheduling decisions for the purpose of increasing the system schedulability.

IV.2. Core Problem De�nition

We investigate the simplest crpd model in which the upper-bound on the crpd is the same for everytask and every possible preemption point. That means that the worst-case crpd for all the systemis taken into account every time a preemption occurs. A safe upper-bound is, for instance, to assumethat the entire cache must be re�lled each time a task resumes its execution after a preemption asin [BMSO+96]. Hence, the task execution requirement Ci corresponds to the wcet when the taskis executed alone without any preemption. A cache-related preemption delay γ is paid by everytask τi, 1 ≤ i ≤ n, every time it resumes its execution after a preemption. As a consequence, theNP-hardness result presented in this section is also valid for any problem generalization

integrating a precise crpd upper-bound that may be di�erent from one task to another

and from one preemption point to another (see for example the crpd models considered in[LHS+98, TD00, NMR03, SSE05, TM07, BRA09, ADM12]).We now formally state our �rst core scheduling problem and then prove its intractability:

De�nition 17. The Scheduling problem with crpds is:

� INSTANCE: Finite set of n tasks τi(Ci, Ti, Di), 1 ≤ i ≤ n, with execution requirements Ci (wcetwithout any preemption cost estimated when τi is executed fully non-preemptively), relative dead-lines Di, periods Ti between two successive job releases of a same task, and a positive number γ

131


τi Ci Ti Di si

τ1 1 3 3 γ

τ2 8−2γ 12 12 γ

Table 4.3: Taskset τ used for the proof of Theorem 6.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1

τ2 γ γ γ

(a) edf schedule.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1

τ2 γ γ

(b) Feasible schedule.

Figure 4.5: edf schedule and feasible schedule for Taskset τ .

representing the worst-case Cache-Related Preemption Delay incurred by every task τi at everyresume point after a preemption.

� QUESTION: Is there a uniprocessor preemptive schedule meeting all deadlines?

Note that this scheduling problem is not restricted to the problem of scheduling with crpds. It isalso applicable to scheduling problems with context switches occurring when preempting a job. Suchproblems have been studied for instance in [YS07, LS14] for uniprocessor real-time systems, withoutany cache concern.We �rst show that classic scheduling policies cannot be optimal for scheduling tasks with crpds (orequivalently with context switches):

Theorem 6. Neither task-level �xed nor job-level �xed priority rules can be optimal for the Schedulingproblem with crpds.

Proof. The proof is based on a counter-example. We consider Taskset τ presented in Table 4.3:τ1(1, 3, 3) and τ2(8 − 2γ, 12, 12), with an arbitrarily small positive number γ representing the worst-case crpd for the whole system. Let us consider the edf schedule presented in Figure 4.5a where,without loss of generality, γ is set to 0.5. Rectangles labelled by γ represent block reload time delaysdue to the loss of cache a�nity that are paid by the preempted task (i.e. Cache-Related PreemptionDelays). In practice, block reload times are paid when these blocks are referenced. To simplify the

132


graphical presentation in Figure 4.5, all these block reload times associated to the crpd are groupedtogether and depicted just at the resume point. The edf schedule, depicted in Figure 4.5a, completesTask τ2 at time 4×C1+C2+3×γ = 12+γ and τ2 misses its deadline. But, as depicted in Figure 4.5b,a feasible schedule exists for this taskset. In this schedule, τ2 su�ers two preemptions from τ1 ratherthan three as in the edf schedule (there is one γ less to account for). A necessary condition to de�nea feasible schedule is that τ2 su�ers at most two preemptions. This can be achieved as follows:

� τ1 has a priority higher than the priority of τ2 in the time interval [0, 3),

� τ2 has a higher priority than τ1 in [3, 5),

� τ1 has a higher priority than τ2 in [5, 6) (otherwise, τ1 will miss its second deadline).

In this feasible schedule (Figure 4.5b), τ2 completes by time 3× C1 + C2 + γ = 12 and thus meets itsdeadline at time 12. Note that in Figure 4.5a, the second preemption experienced by τ2 can be avoidedby raising the priority of τ2 in the interval [9, 10.5) and thus, ties cannot be broken arbitrarily using adeadline-based scheduling rule.Clearly, job-level or task-level �xed priority schemes cannot de�ne a feasible schedule for Taskset τ .

IV.3. Complexity of the Problem

The hardness proof we use is based on a transformation from a well-known decision problem called the3-Partition problem:

De�nition 18. (Problem [SP15] from [GJ79]) The 3-Partition problem is:

� INSTANCE: a set A of 3m elements, a bound B ∈ N and a size sj ∈ N for each j = 1..3m suchthat B/4 < sj < B/2 and

∑j=1..3m sj = mB.

� QUESTION: Can A be partitioned into m disjoint sets A1, A2, ..., Am such that, for 1 ≤ i ≤ m,∑j∈Ai

sj = B (each elements from A)?

Theorem 7. (from [GJ79]) The 3-Partition problem is NP-complete in the strong sense.

This means that the 3-Partition problem cannot be solved in polynomial time or in pseudo-polynomialtime.We can now state the hardness result for our core scheduling problem:

Theorem 8. Scheduling with crpds is NP-hard in the strong sense.

Proof. We construct a polynomial transformation from the 3-Partition problem as follows:

� 3m tasks τ1, . . . , τ3m with parameters:

Ci = si, Di = Ti = m× (B + 1), 1 ≤ i ≤ 3m.

� Task τ3m+1 with:

C3m+1 = D3m+1 = 1 and T3m+1 = (B + 1)

133


0 1

τ3m+1B τ3m+1

B + 1

B

2× (B + 1)

τ3m+1. . .

(m− 1)× (B + 1)

τ3m+1B

m× (B + 1)

Figure 4.6: Pattern of feasible schedules in proof of Theorem 8.

We now prove that there is a solution to the instance (A,B, {sj}) from the 3-Partition if, and onlyif, there is a feasible schedule. By construction, the taskset utilization factor without any preemptionpenalty is exactly 1. Hence, every preemption with a positive crpd (or equivalently a nonzero contextswitch delay) will necessarily lead to a deadline failure.

(if part) Assume we have a 3-Partition A1, . . . , Am, then we schedule τ3m+1 as early as possible. Thecorresponding schedule pattern is presented in Figure 4.6. Each interval between every τ3m+1

job's execution (i.e. [(k − 1) × (B + 1) + 1, k × (B + 1)), k = 1..m) is of length B. Then, weschedule any task corresponding to Ak in interval [(k− 1)× (B+ 1) + 1, k× (B+ 1)), 1 ≤ k ≤ m.Since there is a 3-Partition,

∑j∈Ak

sj = B, ∀k = 1 . . .m, thus the corresponding jobs can bescheduled in the interval k without any preemption. All jobs scheduled in these intervals meettheir deadlines at time m× (B+1) (i.e. at the end of the last interval). Hence, no Cache-RelatedPreemption Delay will be incurred and all deadlines are met.

(only if part) Assume we have a feasible schedule. Observe that in any feasible schedule, the totalworkload in interval [0,m× (B + 1)) is exactly equal to m×C3m+1 +

∑3mi=1Ci = m+

∑3mi=1 si =

m+m×B = m× (B+1). Hence, there is no preemption in any feasible schedule since otherwiseat least one Cache-Related Preemption Delay would be incurred by a task and at least onedeadline should be missed (e.g. see the pattern of any feasible schedule presented in Figure 4.6).Furthermore, due to the 3-Partition problem, execution requirements verify B/4 < Ci < B/2, 1 ≤i ≤ 3m. Hence, exactly 3 tasks are executed in every interval. So, we can de�ne a 3-Partition withAk by selecting the tasks executed in the intervals [(k−1)× (B+ 1) + 1, k× (B+ 1)), 1 ≤ k ≤ m.

As a consequence of the previous hardness result, it can be proved that there is no universal schedulingalgorithm taking into account Cache-Related Preemption Delays, unless P = NP. We recall that ascheduling algorithm is said to be universal if it can successfully schedule every schedulable tasksystem [JSM91].

Theorem 9. If there exists a universal scheduling algorithm with crpds, then P = NP.

Proof. To prove this theorem, we use a classical proof approach, such as the one presented in [JSM91].Precisely, we show that if such an algorithm exists, and if it takes a polynomial amount of time(in the length of the input) to choose the next processed job, then P = NP, because we can �nd apseudo-polynomial time algorithm to solve the 3-Partition problem.We assume that there exists a scheduling algorithm for the scheduling problem with crpds. Wedenote this algorithm A. Given an instance of the 3-Partition problem, we de�ne a set I of tasks using

134


the same reduction technique as in the proof of Theorem 8. The tasks are synchronously released.Consequently, to check that every task does not miss its deadline, we only need to study the interval[0,m × (B + 1)]. Then, we use Scheduling algorithm A to de�ne a schedule and thus we are able tocheck that all deadlines are met. Since the length of the schedule is m×(B+1) and A is assumed to bea polynomial time algorithm, the whole algorithm for checking deadline is at most pseudo-polynomial(i.e. it is clearly performed in time proportional to m× B). Using the reduction techniques proposedin the proof of Theorem 8, the instance I is schedulable by Algorithm A if, and only if, there existsa partition of tasks τ1, . . . , τ3m into m disjoint sets A1, A2, . . . , Am. Consequently, for each set Ai,i ∈ {1, . . . ,m}, we have

∑τj∈Ai

Cj = B. Thus, the solution delivered by Algorithm A gives a solutionto solve the 3-Partition problem. To �nd this solution, we transform from the 3-Partition problem bysimply constructing the set of tasks as in the proof of Theorem 8 and then presenting this task systemto the decision procedure based on Algorithm A.Therefore, we found a pseudo-polynomial time algorithm to solve the 3-Partition problem. However,3-Partition problem is NP-complete in the strong sense. As a consequence, if Algorithm A exists thenP = NP. This is a contradiction and we can conclude that such an algorithm does not exist.

Next, we show that the scheduling problem with crpds is still NP-hard even if the tasks do not havesynchronous releases and if preemption delays are equal to one processor cycle (i.e. one unit of time).To prove this hardness result, we use a transformation from the Partition decision problem:

De�nition 19. (Problem [SP12] from [GJ79]) The Partition problem is:

� INSTANCE: m positive integers s1, . . . , sm with∑

i=1m si = 2B.

� QUESTION: Is there a partition of I = {1, . . . ,m} into two disjoint subsets I1 and I2 such that∑k∈Ij sk = B, 1 ≤ j ≤ 2?

Theorem 10. (from [Kar72]) The Partition problem is NP-complete in the weak sense.

This means that the Partition problem cannot be solved in polynomial time but can be solved inpseudo-polynomial time using dynamic programming (see [GJ79]).

Theorem 11. The Scheduling problem with crpds is NP-hard in the weak sense even if there are twodistinct release dates and deadlines and the preemption delay is equal to one unit of time.

Proof. We construct a polynomial reduction from the Partition decision problem. Given an instanceof the Partition problem, we de�ne the following scheduling instance with m+ 1 jobs:

� m jobs Ji released at time ri = 0, a deadline equal to di = 2 × B + 1 and processing timespi = si, 1 ≤ i ≤ m,

� Job Jm+1 released at time rm+1 = B with a deadline dm+1 = B + 1 and pm+1 = 1.

The previous scheduling instance has two distinct release dates and two distinct deadlines. Preemptiondelays are equal to one unit of time.

(if part) Assume that we have a Partition (I1, I2). Then a feasible schedule is obtained by schedulingnon-preemptively jobs of I1 in the interval [0, B) and jobs of I2 in the interval [B+ 1, 2×B+ 1).Both intervals have a length of B and since

∑k∈Ij sk = B, 1 ≤ j ≤ 2, then all jobs meet their

deadlines.

135


(only if part) Assume that we have a feasible schedule. In every feasible schedule:

� Job Jm+1 is scheduled in the interval [B,B + 1),

� jobs are scheduled non-preemptively without any idle time since if one preemption is payed,then one job necessarily misses its deadline due to the preemption delay.

We de�ne a feasible partition by selecting in I1 the jobs scheduled in [0, B) and in I2 the jobsscheduled in [B + 1, 2 × B + 1). Both intervals have a length equal to B and the Partitionconstraint is satis�ed:

∑k∈Ij sk = B, 1 ≤ j ≤ 2.

IV.4. Optimal algorithm for corner cases

As stated in Theorem 11, the case of non recurring jobs is also hard. Nevertheless, for a �nite set ofjobs there are special cases for which edf creates no preemptions. For all these special cases, edf isan optimal online scheduler:

Property 3. For the set of systems schedulable by edf without generating any preemption, edf is anoptimal scheduling algorithm for the scheduling problem with crpds.

Proof. edf is an optimal scheduler for scheduling independent jobs when no crpd is considered. Inparticular, it is optimal for the subset of systems for which it can compute schedules without anypreemption. For those systems, the schedule constructed by edf for scheduling without crpds isidentical to the one constructed by edf for scheduling with crpds (as no preemption occurs, nocrpd has to be added). So every feasible system for the scheduling problem without crpds remainsschedulable by edf when accounting for crpds. Moreover, no infeasible system under the schedulingproblem without crpds can become feasible under the scheduling problem with crpds (as no timingrequirement is changed and only potential overload is added). So edf is an optimal scheduler in thiscase.

As a consequence, edf is an optimal scheduler for the following scheduling subproblems:

Corollary 1. edf is an optimal scheduler for the Scheduling problem with crpds for a set of n jobswith equal release dates: ri = r, 1 ≤ i ≤ n.

Corollary 2. edf is an optimal scheduler for the Scheduling problem with crpds for a set of n jobswith equal deadlines: di = d, 1 ≤ i ≤ n.

Corollary 3. edf is an optimal scheduler for the Scheduling problem with crpds for a set of n jobswith pi = 1, 1 ≤ i ≤ n (assuming that all release dates and deadlines are integers).

Corollary 4. edf is an optimal scheduler for the Scheduling problem with crpds for a set of n jobswith similarly-ordered release dates and deadlines: ri ≤ rj ⇒ di ≤ dj , 1 ≤ i ≤ n, 1 ≤ j ≤ n, i < j.

136


IV.5. Discussion on the crpd-aware approach

Contrary to the Cache-aware scheduling strategy, the crpd-aware approach is more suited in practice.Indeed, the crpd-parameter represents an upper-bound on the preemption delay a task might experi-ence and, as a result, possible execution paths and cache states have no longer to be considered at thescheduling step.So, the main issue is to de�ne the crpd parameters in order to have a safe upper-bound on theCache-Related Preemption Delay. To do so, several approaches can be used:

1. all tasks have the same common crpd parameter,

2. each task has its own crpd parameter,

3. each task has a set of crpd parameters depending on the possible preempting tasks,

4. each task has a set of crpd parameters depending on the possible preempting tasks and also thelocations of preemption points in the task code.

The easiest way is of course to consider that the entire cache is reloaded after each preemption. Butthis approach is very pessimistic.On the contrary, the last approach, is the most precise one. But then, the same issue as for the Cache-aware scheduling strategy is raised: a task may have several execution paths making the schedulingproblem very complex.To deal with the third approach, the best way is to consider the sets of ecbs and ucbs for each task.As a result, we could compute the preemption parameter for task i for a given preemption as:

si = brt×

∣∣∣∣∣∣ucbi ∩ ⋃τj∈preempting tasks

ecbj

∣∣∣∣∣∣This approach makes the scheduling problem complex has the crpd parameter is dependent on theset of preempting tasks for every preemption.So, we will consider only the second approach in the remainder of this work. As a result, we use thefollowing task model, modi�ed to account for the crpd parameter:

τi(oi, Ci, Ti, Di, si)

where:

� Ci corresponds to the task wcet as if the task was executing non-preemptively,

� si represents an upper-bound on the crpd the task has to pay every time it resumes its executionafter a preemption.

The crpd parameter si can be computed for example as:

si = BRT · |UCBi|

This approach is pessimistic as we assume that all the ucbs of the job are evicted from the cache atevery preemption and so have to be reloaded when the job resumes its execution. But it allows to geta safe upper-bound depending only on the considered task.

137


V. Conclusion

In this chapter, we considered two scheduling problems on uniprocessor systems with cache memories:

� the Cache-aware scheduling problem, which corresponds to the scheduling problem with cachecontent and task memory requirement information,

� the crpd-aware scheduling problem, which corresponds to the scheduling problem with Cache-Related Preemption Delays.

By studying simpli�ed core problems, we prove both problems to be NP-hard. As a result, neither theCache-aware scheduling problem nor the crpd-aware scheduling problem can be optimally solved byany polynomial time algorithm (unless P = NP). Actually, for the crpd-aware scheduling problemand the Cache-aware preemptive scheduling problem there can be no pseudo-polynomial time optimalalgorithm. So, taking explicitly into account cache memories leads to harder scheduling problems. Asa result, straightforward generalizations of well-known uniprocessor scheduling theoretic results cannotbe used for the problem of scheduling with cache memories.In the remainder of this PhD work, we will focus exclusively on the crpd-aware scheduling problemas it is easier to deal with than the Cache-aware scheduling problem.

138

Chapter

5 Online Scheduling with Cache-


Contents

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

II Sustainability for crpd-aware scheduling . . . . . . . . . . . . . . . . . . . . . . . . 141

II.1 De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

II.2 Sustainability of rm, dm and edf scheduling policies . . . . . . . . . . . . . . 143

II.3 Sustainability of schedulability tests and analyses . . . . . . . . . . . . . . . . 150

III Optimal Online Scheduling accounting for crpds . . . . . . . . . . . . . . . . . . . . 152

III.1 Scheduling independent jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

III.2 Scheduling periodic tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

III.3 Scheduling sporadic tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

IV Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

Abstract

In this chapter, we focus on online scheduling accounting for Cache-Related Preemption Delays.We �rst consider classic online scheduling policies such as Rate Monotonic and Earliest DeadlineFirst. We show that those polices experience several scheduling anomalies as soon as crpds areconsidered. Then, we study the general online crpd-aware scheduling problem and prove thatthere exists no optimal scheduler for scheduling sporadic tasks as soon as crpds are accountedfor.

139

CHAPTER 5. ONLINE SCHEDULING WITH CACHE-RELATED PREEMPTION DELAYS

140

I. INTRODUCTION

I. Introduction

In the previous chapter, we considered two di�erent approaches to account for the cache interferenceinto scheduling. We studied the computational complexity of those two scheduling problems with cachememories. In particular, we showed that taking the cache interference into account radically changesthe scheduling problem. As the Cache-aware scheduling approach exhibits several limitations, wefocus hereafter exclusively on the crpd-aware scheduling problem. We use the task model introducedin Chapter 4, Section IV.5.In this chapter, we study online scheduling accounting for crpds. We recall that an online schedulertakes its scheduling decisions at runtime knowing only the current system states, i.e. the parameters ofall jobs that have been released at the current date. The scheduler has no knowledge of jobs to come.Some early work dealing with online scheduling and preemption delays can be found in particular inThomas Chapeaux's master thesis [Cha14]. The results presented hereafter extend and complete thispreliminary work.We �rst consider classic online policies, namely Rate Monotonic (rm), Deadline Monotonic (dm) andEarliest Deadline First (edf), which are commonly used for scheduling without crpds. As presentedin Chapter 2, Section IV.3, those policies, when accounting for crpds, exhibit several limitations incomparison with the non-crpd-aware case. In particular, for �xed-task priority scheduling, as shownfor example in [RM06b, YS07], synchronous releases for sporadic tasks do not necessarily representthe worst-case scenario. Moreover, edf is no longer an optimal scheduler for independent jobs orperiodic/sporadic tasks with implicit deadlines. Besides, we show that, as soon as crpds are accountedfor, those polices also su�er from scheduling anomalies. As no known scheduling policy is optimal forthe crpd-aware scheduling problem, we deal, in a second phase, with the question of �nding anoptimal online scheduler. We prove that, unfortunately, no such scheduler exists for scheduling eitherindependent jobs or sporadic tasks as soon as they are subjected to crpds.In Section II, we study the sustainability of rm, dm and edf when accounting for crpds. Then inSection III, we prove that there exists no optimal online scheduler for scheduling a set of independentjobs or sporadic tasks as soon as crpds are considered. Finally, in Section IV, we summarize ourresults on online crpd-aware scheduling and conclude this chapter.

II. Sustainability for crpd-aware scheduling

The hard real-time scheduling theory focuses on ensuring predictability. To do so, schedulabilityanalyses have to consider the worst-case scenario to ensure that all deadlines will always be met atruntime for every possible scenario that could occur during the system life (see Chapter 2, Section III.1).An important and di�cult issue is to construct such worst-case scenarios. Indeed, worst-cases arenot necessarily obvious to determine. For example, under �xed-task priority scheduling, synchronousreleases do not represent the worst-case scenario for sporadic tasks subjected to crpds, as shownin [RM06b].Usually, worst-case scenarios are assumed to occur for the worst-case values of the task parameters:in particular, the task Worst-Case Execution Time is considered during the schedulability analysisinstead of the task average execution time. But, during the system life, execution times will often (ifnot always) be smaller thanwcets and for sporadic tasks, job inter-arrival times can be greater than the

141


task periods. We want to be sure that a system, deemed schedulable for worst-case parameters, remainsschedulable under normal (and normally more favorable) conditions. To study this matter, Burns andBaruah introduced in [BB08] the notion of sustainability. Sustainability for classic scheduling, i.e.scheduling without crpds, is recalled in Chapter 2, Section III.3. But are these sustainable resultsstill valid when crpds are accounted for?We �rst extend the sustainability de�nition to cope with crpd-aware scheduling. Then, we study rm,dm and edf policies and show that, unfortunately, none of them is sustainable anymore when crpdsare accounted for. Finally, we discuss the sustainability of several crpd-aware schedulability tests oranalyses.

II.1. De�nition

We �rst have to extend the de�nition of sustainability, given in [BB08] and recalled in Chapter 2,Section III.3 (see De�nition 9), to cope with the crpd-aware scheduling model. To do so, we introducethe notion of sustainability with regard to the Cache-Related Preemption Delay parameter :

De�nition 20. A scheduling policy and/or a schedulability test is sustainable with regard to theCache-Related Preemption Delay parameter if any system deemed schedulable by the schedulability testremains schedulable when the value of the Cache-Related Preemption Delay parameter of one or moreindividual task(s) is decreased.

The motivation for considering the crpd parameter in the sustainable analysis is similar to the reasonfor considering the wcet parameter (see [BB08]). As for the task execution times, only upper-boundson the Cache-Related Preemption Delays are considered. Moreover, those bounds are computed in-dependently of the real program points at which the preemptions will actually occur. So, for allrealistic systems, variability in Cache-Related Preemption Delays is to be expected. As a consequence,sustainability with regard to the crpd parameter is required.We can now extend Burn and Baruah's sustainability de�nition to crpd-aware scheduling:

De�nition 21. A scheduling policy and/or a schedulability test for a scheduling policy is sustainableif any system deemed schedulable by the schedulability test remains schedulable when the parameters ofone or more individual task(s) are changed in any, some, or all of the following ways:

1. decreased execution requirements, and

2. larger periods, and

3. larger relative deadlines, and

4. decreased Cache-Related Preemption Delays.

Note that Burn and Baruah also consider the task jitter. However, under our crpd-aware schedulingmodel, we assume that all tasks have no jitters. As a consequence, the jitter cannot be decreased. Sowe do not consider hereafter sustainability with regard to the jitter parameter and focus exclusivelyon the task execution requirements, periods, relative deadlines, and crpds:

142

II. SUSTAINABILITY FOR CRPD-AWARE SCHEDULING

parameter rm dm edf

Ci7 7 7

(Theorem 12) (Theorem 12) (Theorem 12)

Ti7 7 7

([BB08]) ([BB08]) (Theorem 13)

Di3 3 7


si7 7 7


Table 5.1: Sustainability results for Rate Monotonic (rm), Deadline Monotonic (dm) and EarliestDeadline First (edf) accounting for Cache-Related Preemption Delays (3 stands for sustainable and7 for non-sustainable).

De�nition 22. A scheduling policy and/or schedulability test for a scheduling policy is sustainable ifand only if it is sustainable with regard to:

1. the execution requirement parameter, and

2. the period parameter, and

3. the deadline parameter, and

4. the crpd parameter.

We now study the sustainability of rm, dm and edf when Cache-Related Preemption Delays areconsidered. We also consider the sustainability of their related schedulability tests/analyses.

II.2. Sustainability of rm, dm and edf scheduling policies

We �rst study the sustainability of Rate Montonic, Deadline Monotonic and Earliest Deadline Firstscheduling policies when accounting for crpds.We deal hereafter with synchronously-released periodic tasks with implicit or constrained deadlines.We consider implicitly that a taskset is schedulable under a given scheduling policy if the scheduleconstructed by the scheduling policy over the taskset's hyperperiod meets all deadlines (which isequivalent to the Leung and Whitehead test for periodic tasks with zero o�sets under �xed-task priorityscheduling).All sustainability results for rm, dm and edf are synthesized in Table 5.1.

II.2.a. Sustainability with regard to the execution time

We �rst consider the sustainability of Rate Monotonic, Deadline Monotonic and Earliest Deadline Firstwith regard to execution times:

143


Tasks Generated jobsτi(Ci,Ti,Di,si) Jij(rij ,pij ,dij ,sij)

τ1(1, 4, 4, 0.6)J11(0, 1, 4, 0.6)J12(4, 1, 8, 0.6)J13(8, 1, 12, 0.6)

τ2(3, 12, 12, 0.6) J21(0, 3, 12, 0.6)

τ3(3, 12, 12, 0.6) J31(0, 3, 12, 0.6)

τ4(2, 12, 12, 0.6) J41(0, 3, 12, 0.6)

(a) Tasks and jobs generated over the hyperperiod H = 12 for Taskset TTheorem12.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J11 J12 J13

τ2 J21

τ3 J31

τ4 J41

(b) Schedule constructed by rm, dm and edf for Taskset TTheorem12 with C2 = 3.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J11 J12 J13

τ2 J21

τ3 J31 s21 J31

τ4 J41 s31 J41

(c) Schedule constructed by rm, dm and edf for Taskset TTheorem12 with C2 = 2.

Figure 5.1: Example of a scheduling anomaly when decreasing an execution time used in the proof ofTheorem 12.

144


Theorem 12. rm, dm and edf are not sustainable with regard to the execution requirement parameterwhen crpds are accounted for.

Proof. We consider the example of Taskset TTheorem12 made of four synchronously-released periodictasks τ1, τ2, τ3 and τ4, which characteristics are synthesized in Table 5.1a. The crpd parameter is thesame for the four tasks and is equal to 0.6. Over the hyperperiod, τ1 issues three jobs J11, J12 and J13whereas τ2, τ3 and τ4 issue one job each, respectively J21, J31 and J41. All job characteristics can alsobe found in Table 5.1a.We �rst consider the schedules constructed by rm, dm and edf over the hyperperiod H = 12. Weassume that task indexes are used to break the ties, which means that the priority ordering is the samefor the three scheduling policies. As a result rm, dm and edf construct the same schedule over Hwhich is depicted in Figure 5.1b. We recall that, for graphical representation ease, we depict the crpdparameter sij as a whole immediately after the preempted job Jij resumes its execution. Looking atFigure 5.1b, we can see that the system experiences no preemption and the taskset is schedulable asall deadlines are met.However, if we decrease the execution time of Task τ2 (and so of its job J21) to 2, then the tasksetbecomes unschedulable. Indeed, as depicted in Figure 5.1c, J21 executes for less time and so J31 andJ41 can start their execution earlier. But, as a result, they both experience a preemption from Taskτ1. The overall crpd overhead incurred by those two preemptions causes J41 to miss its deadline andthe system becomes unschedulable.So, rm, dm and edf are no more sustainable with regard to the execution requirement parameterwhen crpds are accounted for.

II.2.b. Sustainability with regard to the period

We now study sustainability with regard to the task periods. As Rate Monotonic and Deadline Mono-tonic (and actually any �xed-priority scheduling policy) are not sustainable with regard to the periodparameter for scheduling without crpds, they remain of course not sustainable with regard to theperiod parameter when accounting for crpds.We consider now the case of the Earliest Deadline First scheduling policy:

Theorem 13. edf is not sustainable with regard to the period parameter when crpds are accountedfor.

Proof. We consider Taskset TTheorem13 which is composed of three synchronously-released periodictasks τ1, τ2 and τ3, which characteristics are synthesized in Table 5.2a. The crpd parameter for eachtask is equal to 1. Over the hyperperiod H = 12, τ1 issues three jobs, J11, J12 and J13, τ2 releases twojobs, J21 and J22, and τ3 only one job, J31.The schedule constructed by edf over the hyperperiod is depicted in Figure 5.2b. We can see that alldeadlines are met. So the system is deemed schedulable.However, if we increase τ2's period T2 by one unit of time, then J22 is released one unit of time later asdepicted in Figure 5.2c. As a consequence, J22 has not �nished its execution at time 8. As Job J13'sdeadline d13 is smaller than J22's deadline d22, the edf scheduler preempts J22 at time 8 to executeJ13. Because of the crpd incurred by this preemption, J22 cannot complete its execution before itsdeadline. As a result, the system becomes unschedulable.So, edf is no more sustainable with regard to the period parameter when crpds are accounted for.

145



τ1(2, 4, 2, 1)J11(0, 2, 2, 1)J12(4, 2, 6, 1)J13(8, 2, 10, 1)

τ2(2, 6, 4, 1)J21(0, 2, 4, 1)J22(6, 2, 10, 1)

τ3(1, 12, 12, 1) J31(0, 1, 12, 1)


0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 τ11 τ12 τ13

τ2 τ21 τ22

τ3 τ31

(b) Schedule constructed by edf for Taskset TTheorem13 with T2 = 6.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 τ11 τ12 τ13

τ2 τ21 τ22 s22 τ22

τ3 τ31

(c) Schedule constructed by edf for Taskset TTheorem13 with T2 = 7.

Figure 5.2: Example of a scheduling anomaly when increasing a period used in the proof of Theorem 13.

146


II.2.c. Sustainability with regard to the deadline

We now study sustainability with regard to the task deadlines. First, we consider the case of RateMonotonic and Deadline Monotonic:

Theorem 14. rm and dm are sustainable with regard to the deadline parameter when crpds areaccounted for.

Proof. Under rm and dm policies (and for any �xed-priority scheduling algorithm), priorities arecomputed before the system starts its execution. So, increasing a task deadline does no change thetask priority nor the priority of any other job. As a result, scheduling decisions for rm and dm arenot impacted by the deadline change. So, if the job completed before its initial deadline, then it wouldstill complete before its increased deadline.So, rm and dm are still sustainable with regard to the deadline parameter when crpds are accountedfor.

We now consider the sustainability of Earliest Deadline First with regard to deadlines:

Theorem 15. edf is not sustainable with regard to the deadline parameter when preemption delaysare accounted for.

Proof. We consider Taskset TTheorem15, presented in Table 5.3a, which is composed of three synchronously-released periodic tasks τ1, τ2 and τ3. The crpd parameter for each task is equal to 1. Over thehyperperiod H = 12, τ1 issues 3 jobs J11, J12 and J13, τ2 releases 2 jobs, J21 and J22, and τ3 issuesonly one job, J31.We �rst consider the schedule constructed by edf over the hyperperiod H = 12, which is depicted inFigure 5.3b. We see that all deadlines are met and so the system is schedulable.However, if we increase D3 to 11, then Job J31 experiences 2 preemptions as depicted in Figure 5.3c.Indeed, as edf uses absolute deadlines to compute job priorities at runtime, J31 has now a lowerpriority than J12 (resp. J22), at time 4 (resp. 6). So, J31 is preempted by both J12 and J22 and as aresult misses its deadline. As a consequence, the system becomes unschedulable.Note that we assume task indexes to be used as a tie breaker. So, at time 8, edf gives J13 a higherpriority than J31 despite their common deadline. Considering the reverse would also have resulted inthe system not being schedulable any more as, that time, J13 would have missed its deadline.So, edf is no more sustainable with regard to the deadline parameter when crpds are accountedfor.

II.2.d. Sustainability with regard to the crpd

Finally, we study the sustainability of Rate Monotonic, Deadline Monotonic and Earliest DeadlineFirst with regard to Cache-Related Preemption Delays:

Theorem 16. rm, dm and edf are not sustainable with regard to the crpd parameter.

Proof. We consider Taskset TTheorem16 which is composed of the four synchronously-released periodictasks τ1, τ2, τ3 and τ4 presented in Table 5.4a. The crpd parameter is the same for the four tasks and

147



τ1(1, 4, 3, 1)J11(0, 1, 3, 1)J12(4, 1, 7, 1)J13(8, 1, 11, 1)

τ2(2, 6, 4, 1)J21(0, 2, 4, 1)J22(6, 2, 10, 1)

τ3(3, 12, 6, 1) J31(0, 3, 6, 1)


0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J11 J12 J13

τ2 J21 J22

τ3 J31

(b) Schedule constructed by rm, dm and edf for Taskset TTheorem15 with D3 = 6.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J11 J12 J13

τ2 J21 J22

τ3 J31 s31 s31 J31 J31

(c) Schedule constructed by rm, dm and edf for Taskset TTheorem15 with D3 = 11.

Figure 5.3: Example of a scheduling anomaly when increasing a deadline used in the proof of Theo-rem 15.

148



τ1(1, 4, 1, 1)J11(0, 1, 1, 1)J12(4, 1, 5, 1)J13(8, 1, 9, 1)

τ2(3, 12, 10, 1) J21(0, 3, 10, 1)

τ3(3, 12, 11, 1) J31(0, 3, 11, 1)

τ4(2, 12, 12, 1) J41(0, 3, 12, 1)


0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J11 J12 J13

τ2 J21

τ3 J31 s31 J31

τ4 J41

(b) Schedule constructed by rm, dm and edf for Taskset TTheorem16 with s3 = 1.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J11 J12 J13

τ2 J21

τ3 J31 s31 J31

τ4 J41 s41 J41 J41

(c) Schedule constructed by rm, dm and edf for Taskset TTheorem16 with s3 = 0.6.

Figure 5.4: Example of a scheduling anomaly when decreasing a crpd used in the proof of Theorem 16.

149


is equal to 1. Over the hyperperiod H = 12, τ1 issues four jobs, J11, J12 and J13, whereas τ2, τ3 and τ4issue one job each, respectively J21, J31 and J41. All job characteristics can also be found in Table 16.We �rst consider the schedules constructed by rm, dm and edf over the hyperperiod H = 12. Thepriority ordering being the same for the three scheduling policies, rm, dm and edf construct the sameschedule over H which is depicted in Figure 5.4b. We can see that all deadlines are met and so thesystem is schedulable.However, if τ3's crpd parameter s3 is reduced to 0.6, then Job J31 pays a lesser preemption delaywhen resuming its execution at time 5. So it can complete its execution time sooner as depicted inFigure 5.4c. As a result, J41 can begin its execution earlier (at time 7.6). But, then, it is preemptedby J13 at time 8 as d13 is smaller than d41. The crpd incurred by this additional preemption causesJ41 to miss its deadline and the system becomes unschedulable.So, rm, dm and edf are not sustainable with regard to the crpd parameter.

II.2.e. Sustainability of rm, dm and edf

Eventually, we can conclude on the sustainability of Rate Monotonic, Deadline Monotonic and EarliestDeadline First accounting for Cache-Related Preemption Delays:

Corollary 5. rm, dm and edf are not sustainable when crpds are considered.

Proof. To prove that a scheduling policy is not sustainable, it is su�cient to show that it is notsustainable with regard to one of the criteria listed in De�nition 21. As rm, dm and edf are notsustainable with regard to either execution times (Theorem 12), periods (Theorem 13), deadlines(Theorem 15) or preemption delays (Theorem 16), then they are not sustainable when crpds areaccounted for.

This result means that we must be particularly careful when validating the schedules constructed byeither rm, dm or edf as soon as crpds are accounted for, those policies are not sustainable in general.

II.3. Sustainability of schedulability tests and analyses

We now study the sustainability of four classic schedulability tests or analyses used for Rate Mono-tonic, Deadline Monotonic and Earliest Deadline First when accounting for Cache-Related PreemptionDelays.

Theorem 17. Simulation-based schedulability tests accounting for crpds are not sustainable.

Proof. The proof is a direct consequence of the results presented in the previous section. Indeed,for synchronously-released period tasks, schedulability through simulation is ensured if simulating thesystem over the taskset hyperperiod, using the consider scheduling policy, results in a valid schedule.A system, which is deemed schedulable as simulation results in a valid schedule, might become un-schedulable when decreasing either the execution time parameter (proof of Theorem 12), the crpdparameter (proof of Theorem 16) or increasing either the period parameter (proof of Theorem 13) orthe deadline parameter (proof of Theorem 15) as, for the modi�ed system, simulation fails to producea valid schedule.

150


In particular, the Leung and Whitehead schedulability test for �xed-task priority scheduling is notsustainable when crpds are taken into account.We now consider a schedulability analysis for rm and dm accounting for crpds, based on the classicResponse Time Analysis (rta). We recall that synchronous releases do not always correspond to theworst-case scenario for sporadic (or periodic) tasks as soon as crpds are accounted for. However, asstated in [ADM12], a bound on the worst-case number of preemptions experienced by a task becauseof every higher priority task can be computed. The crpd-aware Response Time analysis presentedhereafter is a slightly modi�ed version of the one introduced in [BMSO+96] and recalled in Chapter 3,Section II.2, considering the crpd parameter si rather than γi,j :

∀τi, Ri = Ci +∑

∀τj∈hp(i)

⌈RiTj

⌉· (Cj + max

∀τk∈hep(i)∩lp(j){sk}) ≤ Di (5.1)

hp(i) being the set of tasks with priorities higher than the one of τi, hep(i) being the set of tasks withpriorities higher or equal to the one of τi and lp(j) being the set of tasks with priorities lower thanthe one of τj . We recall that the priority ordering is based on decreasing periods for rm (Ti < Tj ⇒priority(τi) > priority(τj)) and decreasing deadlines for dm (Di < Dj ⇒ priority(τi) > priority(τj)).Note that this crpd-aware rta simply corresponds to the rta using the ucb-only approach presentedfor example in [BMSO+96], as soon as the crpd is upper-bounded by:

si = brt · |ucbi|

We now study the sustainability of this �xed-task priority schedulability analysis:

Theorem 18. The Response Time Analysis accounting for crpds is sustainable.

Proof. The sustainability proof is based on the same arguments as the ones used in [BB08].The worst-case response time Ri corresponds to the smallest value satisfying Ai(t) ≥ Ci, where Ai ≥t−∑∀j∈hp(i)

⌈tTj

⌉(Cj + max

∀k∈hep(i)∩lp(j){sk}) is the amount of available execution time guaranteed for τi

over [0, t).It is easy to see that either decreasing a task execution time or crpd or increasing a task period ordeadline can only increase (or leave unchanged) Ai(t). We note A′i(t) the new amount of availableexecution time guaranteed for τi over [0, t) computed with the parameter value change. In all cases,we have A′i(t) ≥ Ai(t) ≥ Ci,∀t, and so A′Ri

(t) ≥ ARi(t) ≥ Ci, ∀t. As a result, the new worst-caseresponse time for Task τi R′i cannot be greater than Ri. If the system was deemed schedulable beforethe parameter value change, then we had Ri ≤ Di. After the parameter value change, we have:R′i ≤ Ri ≤ Di. So the system is still schedulable.So the crpd-aware rta is sustainable.

We now consider schedulability tests and analyses for the edf scheduling policy.We �rst present a slightly modi�ed version of the Utilization-based su�cient test introduced in [LAMD13]for periodic tasks with implicit deadlines. Once again, we deal with si instead of parameter γi,j :

∑∀τi

Ci + max∀τj , Dj>Di

{sj}

Ti

≤ 1 (5.2)

151


This crpd-aware test is equivalent to the edf test using the ucb-only approach presented in [LAMD13],as soon as the crpd is upper-bounded by:

si = brt · |ucbi|

We now consider the sustainability of this crpd-aware test:

Theorem 19. The edf schedulability test (Equation 5.2) for periodic tasks with implicit deadlinesaccounting for crpds is sustainable.

Proof. Decreasing either Ci or si or increasing either Ti or Di always results in a smaller or equal valuefor the left-hand side of Equation 5.2. So the test result will still be valid for the system with themodi�ed parameter.

Finally, we deal with a more general schedulability analysis for edf, which is a slightly adapted versionof the Processor Demand Analysis accounting for crpds presented in [LAMD13]. si is used instead ofγi,j :

∀t > 0, h(t) =∑∀τi

max

{0, 1 +

⌊t−Di

Ti

⌋}(Ci + max

∀τj , Dmax≥Dj>Di

{sj}) ≤ t

Once again, this crpd-aware schedulability analysis is equivalent to the Processor Demand Analysisusing the ucb-only approach presented in [LAMD13], as soon as the crpd is upper-bounded by:

si = brt · |ucbi|

We now consider the sustainability of this schedulability analysis:

Theorem 20. The Processor Demand Analysis accounting for crpds is sustainable.

Proof. h(t) can only decrease (or remain equal) when decreasing a task execution time or crpd orincreasing a task period or deadline. So h(t) ≤ t will still be satis�ed.

The result of Theorem 17 shows that simulation cannot be used as a schedulability test as soon ascrpds are considered. But all other tests or analyses accounting for crpds are sustainable and so canbe safely used.

III. Optimal Online Scheduling accounting for crpds

In the previous section, we showed that classic scheduling policies such as rm or edf su�er fromseveral scheduling anomalies as soon as crpds are accounted for. Moreover, edf is no more optimalfor the crpd-aware scheduling problem. In Chapter 4, Section IV, we also proved that the crpd-aware problem is NP-hard in the strong sense, meaning that no polynomial or pseudo-polynomial timealgorithm can optimally solve this problem. We show hereafter that, actually, there exists no optimalscheduler for scheduling online independent jobs or sporadic tasks when crpds are considered.To prove that no online scheduler can be optimal, we will show that an optimal algorithm to solvethe crpd-aware scheduling problem needs to be clairvoyant with regard to release dates, i.e. needs to

152

III. OPTIMAL ONLINE SCHEDULING ACCOUNTING FOR CRPDS

know every release dates (even those of jobs not yet released) when taking its scheduling decision. Asa consequence it cannot be an online scheduler. Proofs are based on a competitive analysis using aclairvoyant adversary (see for example [BEY05]). At some instant t, an online algorithm has to takea scheduling decision, i.e. to choose one job to be executed instead of another one. Depending on thischoice, the adversary modi�es the release date of a future job, such that the online algorithm failsto schedule the whole system without missing a deadline. On the other hand, the adversary, beingclairvoyant, can �nd a feasible schedule by taking at t another decision than the one of the onlinealgorithm. Then, we consider the case where the online algorithm takes the opposite decision at t, andthe adversary proposes another release date to make the system unschedulable once again with respectto the online algorithm.

III.1. Scheduling independent jobs

We �rst consider the problem of optimally scheduling a set of independent jobs when accounting forcrpds:

Theorem 21. Optimal online scheduling of a set of independent jobs accounting for crpds is impos-sible.

Proof. We consider a competitive analysis to prove the result. We de�ne an o�ine adversary that willgenerate three jobs which characteristics are synthesized in Table 5.2. Jobs J1(0,5,12,1) and J2(4,5,6,1)have �xed parameters whereas Job J3's release date r3 will be set by the adversary according to thedecisions taken by the online algorithm (but other parameters for J3 are �xed: p3 = 1, d3 = r3 + 1and s3 = 1).Any online scheduling algorithm has to take a scheduling decision at time 4, i.e. at J2's release. It caneither choose to continue executing Job J1 or to preempt it and switch to Job J2.

� Case 1: J1 continues its execution.In this case, the adversary chooses to release J3 at time 9. As depicted in Figure 5.5a, the onlinealgorithm fails to schedule the system as Job J2 misses its deadline at time 10. However, asshown in Figure 5.5b, a feasible schedule exists.

� Case 2: J2 is chosen to be executed.In this case, the adversary chooses to release J3 at time 10. As a consequence, J1 misses itsdeadline at time 12, as shown in Figure 5.6a. But once more, a feasible schedule exists asdepicted in Figure 5.6b.

Thus, forthcoming releases have to be known in order to devise an optimal online scheduling algorithm.

So no online scheduler can optimally schedule independent jobs subjected to crpds.

III.2. Scheduling periodic tasks

We now consider the problem of optimally scheduling a set of periodic tasks subjected to crpds.We �rst deal with the general case where the periodic tasks can be asynchronously-released. In thatcase, the result from Theorem 21 can be easily extended:

153


0 1 2 3 4 5 6 7 8 9 10 11 12

J1

J2 s2

J3

(a) Schedule constructed by the online algorithm.

0 1 2 3 4 5 6 7 8 9 10 11 12

J1 s1

J2

J3


Figure 5.5: Schedule constructed by the online algorithm and feasible schedule in the proof of Theo-rem 21 - Case 1. The adversary releases J3 at time 9: J3(9, 1, 10).

0 1 2 3 4 5 6 7 8 9 10 11 12 13

J1 s1 s1

J2

J3


0 1 2 3 4 5 6 7 8 9 10 11 12 13

J1

J2

J3


Figure 5.6: Schedule constructed by the online algorithm and feasible schedule in the proof of Theo-rem 21 - Case 2. The adversary releases J3 at time 10: J3(10, 1, 11).

154


Job ri pi di si

J1 0 5 12 1J2 4 5 10 1J3 r3 1 r3 +1 1

Table 5.2: Set of independent jobs JTheorem21 used for the proof of Theorem 21.

Task oi Ci Ti Di si

τ1 0 5 12 12 1τ2 4 5 12 6 1τ3 o3 1 12 1 1

Table 5.3: Set of asynchronously-released periodic tasks TTheorem22 used for the proof of Theorem 22.

Theorem 22. Optimal online scheduling of a set of asynchronously-released periodic tasks accountingfor crpds is impossible.

Proof. Consider Taskset TTheorem22 presented in Table 5.3. TTheorem22 is made of three tasks: τ1(0,5,12,12,1)and τ2(4,5,12,6,1) have �xed parameters whereas the third task τ3(o3,1,12,1,1) can start its executioneither at time o3 = 9 or oi = 10. The set of jobs JTheorem21 used in the proof of Theorem 21 corre-sponds to the �rst job of each task of TTheorem22 so the competitive analysis is still valid. The onlydi�erence is to ensure that the adversary can construct a valid schedule for every further job issuedafter time 12. As the three tasks have the same period 12, the pattern represented in Figure 5.5b foro3 = 9 (resp. Figure 5.6b for o3 = 10) can be repeated in�nitely to get a valid schedule.

When periodic tasks are synchronously-released, the problem is di�erent. The characteristics of everyfuture job are implicitly known as all tasks are released at the same time. So, in that case, the need ofclairvoyance for an optimal online scheduler does not hold. Currently, the existence of an optimal

online scheduler for synchronously-released periodic tasks is still an open problem.

III.3. Scheduling sporadic tasks

We now consider the problem of optimally scheduling a set of sporadic tasks subjected to Cache-RelatedPreemption Delays.We �rst show that the proof of Theorem 21 cannot be used for sporadic tasks as it was used for thecase of asynchronously-released periods tasks (see the proof of Theorem 22). Actually, for a sporadictaskset, we need to prove that all possible instances of the taskset are feasible as explained in [FGB10].Consider the sporadic taskset constructed using Jobset JTheorem21: each job is seen as the �rst job ofa sporadic task. This taskset is actually not feasible as at least two preemptions occur for the instancein which Job J3 is released at time 6 and so no valid schedule can be constructed. As a result, theproof of Theorem 21 cannot be extended to handle sporadic tasks.

155


task Ci Ti Di si

τ1 5 T 13 2τ2 4 T 6 1τ3 1 T 3 1

Table 5.4: Sporadic taskset TTheorem24 used for the proof of Theorem 24.

To prove that online scheduling of sporadic tasks with crpds is impossible, we use Taskset TTheorem24

presented in Table 5.4. The three tasks have the same period T which can be set as large as possible(for example T =∞). Thus, we can only deal with the �rst job generated by each task.Then, our obligation proof is twofold as explained in [FGB10]:

1. prove that TTheorem24 is feasible (see Lemma 1 and Theorem 23), i.e. that all possible instancesof Taskset TTheorem24 are feasible, and

2. �nd an instance of TTheorem24 that cannot be scheduled by any online scheduler whereas a clair-voyant optimal algorithm can de�ne a feasible schedule for this instance (see Theorem 24).

We �rst prove the feasibility of Taskset TTheorem24.To do so, we consider the notion of inter-task interference. Two tasks τ1 and τ2 are said to be interferingwith each other if their scheduling windows, i.e. the time interval [oi, oi+Di), are interleaved. In otherwords, that mean that they could possibly preempt each other. We state the following condition oninter-task interference for Taskset TTheorem24:

Lemma 1. A necessary condition for Taskset TTheorem24, de�ned in Table 5.4, not to be schedulable isthat one task su�ers from the interference of the other two (i.e. ∃τi ∈ TTheorem24 : ∀τj ∈ TTheorem24 \τi, [oi, oi +Di) ∩ [oj , oj +Dj) 6= ∅).

Proof. We prove Lemma 1 by contradiction.Suppose that only two tasks from TTheorem24 interfere together and there is a deadline miss. Withoutloss of generality, we assume that these two tasks are active simultaneously in order to maximise theinterference, i.e. their execution windows are not disjoined. We consider an edf scheduler and showthat all deadlines are always met:

� Case 1: both tasks are executed in sequence, i.e. without any preemption, if:

� the second released task has a deadline greater that the one of the �rst released task,

� the second task is released after the �rst task has �nished its execution.

� Case 2: the second released task preempts the �rst released task which has not yet �nished itsexecution. As a consequence the �rst task has to pay a preemption delay. The second taskdoes not su�er from any preemption and meets its deadline. For the �rst task we consider thefollowing three cases:

� τ1 is preempted by τ2: the response time for τ1 is equal to C1 + s1 + C2 = 11 ≤ D1 = 13.

156




In every case, no deadline is missed.

So edf can construct a feasible schedule as soon as only two tasks are interfering with each other,which is a contradiction.

Using Lemma 1, we can now prove that all instances of Taskset TTheorem24 are feasible, i.e. TTheorem24

is feasible:

Theorem 23. Sporadic taskset TTheorem24, de�ned in Table 5.4, is feasible.

Proof. As a result of Lemma 1, the feasibility study for Taskset TTheorem24 can be limited to a timeinterval of length D1 as the three tasks have to interfere with each other in order to have a possibledeadline miss. We note I2 (respectively I3) the time interval of length D2 (resp. D3) in which Task τ2(resp. τ3) is active: I2 = [o2, o2 + D2] (resp. I3 = [o3, o3 + D3]). For the feasibility proof, we have toconsider the following cases:

� Case 1: I2 and I3 are not overlapped. Without loss of generality, we assume that τ2 is executedbefore τ3. Then we consider the following two sub-cases:

� Sub-case 1: I2 and I3 are separated by at least one time unit: we have o3 ≥ o2+D2+1⇒ o3+D3−o2 ≥ D2+D3+1. Executing τ2 at the start of its interval I2 and τ3 at the end of I3 leavesat least 5 idle time units in the middle: (o3+D3−C3)−(o2+C2) ≥ D2+D3+1−o2−C2 = 5.So τ1 can be executed between τ2 and τ3 without any preemption as depicted in Figure 5.7a.

� Sub-case 2: I2 and I3 are separated by strictly less than one time unit: we have o3 − (o2 +D2) < 1⇒ o3 +D3− o2 ≥ D2 +D3. Executing τ2 at the end of its interval I2 and τ3 at thestart of I3 leaves at least a cumulative length of 7 idle time units at the beginning and at theend of the studied interval of lengthD1: D1−(o3+C3−(o2+D2−C2)) > D1−C3−C2−1 = 7.So τ1 can be executed in two parts, i.e. with one preemption, as C1 + s1 = 7, as depicted inFigure 5.7b.

� Case 2: I2 and I3 are overlapped. We can always schedule τ2 and τ3 such that none of thempreempts the other one. We execute τ2 and τ3 contiguously. So in the interval of length D1,only 5 contiguous time units are required by τ2 and τ3 leaving 8 idle time units in two parts forτ1. As C1 + s1 = 7 < 8 we can construct once more a valid schedule in all cases as depicted inFigure 5.7c

So Taskset TTheorem24 is feasible as we can construct a schedule meeting all deadlines for every case.

Now we have shown that Taskset TTheorem24 is feasible, we can prove that:

Theorem 24. Optimal online scheduling of sporadic tasks accounting for crpds is impossible.

157


0 1 2 3 4 5 6 7 8 9 10 11 12 13

τ1

τ2

τ3

I2 I3

(a) Possible schedule for Case 1, Subcase 1.

0 1 2 3 4 5 6 7 8 9 10 11 12 13

τ1

τ2

τ3

I2 I3

(b) Possible schedule for Case 1, Subcase 2.

0 1 2 3 4 5 6 7 8 9 10 11 12 13

τ1

τ2

τ3

I2

I3

0 1 2 3 4 5 6 7 8 9 10 11 12 13

τ1

τ2

τ3

I2

I3

(c) Possible schedules for Case 2.

Figure 5.7: Di�erent cases in the proof of Theorem 23.

158


0 1 2 3 4 5 6 7 8 9 10 11 12 13

τ1

τ2

τ3


0 1 2 3 4 5 6 7 8 9 10 11 12 13

τ1

τ2

τ3


Figure 5.8: Schedule constructed by the online algorithm and feasible schedule in the proof of Theo-rem 24 - Case 1. The adversary set τ3's o�set to 4: τ3(4, 1, T, 3, 1).

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

τ1

τ2

τ3


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

τ1

τ2

τ3


Figure 5.9: Schedule constructed by the online algorithm and feasible schedule in the proof of Theo-rem 24 - Case 2. The adversary set τ3's o�set to 8: τ3(8, 1, T, 3, 1).

159


Proof. We show that there is an instance of Taskset TTheorem24, de�ned in Table 5.4, that cannot bescheduled by any online algorithm whereas a feasible schedule can be constructed by a clairvoyantalgorithm. The proof is based on a competitive analysis using a clairvoyant adversary.We consider the instance of TTheorem24 de�ned by the following task o�sets: o1 = 0 and o2 = 2. Theo�ine adversary will release the third sporadic task τ3 either at time oi = 4 or oi = 8, depending onthe scheduling decisions taken by the online algorithm. Any online scheduling algorithm has to take ascheduling decision at time 2, i.e. at τ2's release. It can either choose to continue executing Task τ1 orchoose to preempt it and switch to Task τ2:

� Case 1: τ1 continues its execution.In this case, the adversary chooses to release τ3 at time 4. As τ1 is still executed for some amountof time after time 2, τ2 cannot complete its execution without a deadline miss: if τ2 is executedprior to τ3, then a preemption necessarily occurs in order for τ3 to meet its deadline. The onlysolution to avoid this preemption, is to start τ2 after τ3's execution. In both cases, τ2 misses itsdeadline. The second case is depicted in Figure 5.8a. So the online algorithm fails to schedulethis instance of Taskset TTheorem24. However, as shown in Figure 5.8b, a feasible schedule exists.

� Case 2: τ2 is chosen to be executed.In this case, the adversary chooses to release τ3 at time 8. As a consequence, τ1 necessarilyexperiences two preemptions and as C1 +2×s1 +C2 +C3 = 14 then it misses its deadline at time13 as shown in Figure 5.9a. But once more, a feasible schedule exists as depicted in Figure 5.9b.

Note that we should also consider the cases in which the online algorithm inserts idle times even if thereis either τ2 or τ3 ready to be executed. But, inserting idle times means that the processor demand willbe higher for the rest of the time interval. As a consequence, the system will still be unschedulable.Finally, the feasibility of Taskset TTheorem24 is ensured by Theorem 23.

So no online scheduler can optimally schedule sporadic tasks subjected to crpds.

IV. Conclusion

In this chapter, we studied the sustainability of several classic scheduling policies, namely Rate Mono-tonic, Deadline Monotonic and Earliest Deadline First, when they are subjected to Cache-RelatedPreemption Delays. We showed that, because of crpds, those policies may experience schedulinganomalies: for example, a taskset, deemed schedulable, may become unschedulable when one job exe-cutes for less than the task's wcet. As a consequence, rm, dm and edf are not sustainable. Moreover,we showed that simulation is not suitable for studying the system schedulability under rm, dm andedf as soon as crpds are accounted for. However, classic crpd-aware schedulability tests and analysesare sustainable. As a consequence, they can be safely used when studying the system schedulabilityunder rm, dm and edf.As neither Rate Monotonic nor Earliest Deadline First are optimal for the scheduling problem withcrpds, we studied the question of �nding an optimal online crpd-aware scheduler. Unfortunately, weproved that no such optimal online scheduler exists for scheduling either independent jobs, asynchronously-released periodic tasks and sporadic tasks. The problem remains open for periodic tasks with syn-chronous releases.

160

IV. CONCLUSION

As no online scheduler can be optimal for the crpd-aware scheduling problem (except maybe forsynchronously-released periodic tasks), we focus in the next chapter on �nding an o�ine solution tothe scheduling problem accounting for crpds.

161


162

Chapter

6 O�ine Scheduling with Cache-


Contents

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

II Basis of the approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

II.1 Presentation of the main idea . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

II.2 Feasible schedule property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

III A nearly-optimal o�ine approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

III.1 Transformed model and assumptions . . . . . . . . . . . . . . . . . . . . . . . 168

III.2 Mathematical Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

III.3 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

III.4 Limitations of the nearly-optimal approach . . . . . . . . . . . . . . . . . . . 175

IV An optimal o�ine approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

IV.1 Overcoming the limitations of the nearly-optimal approach . . . . . . . . . . . 177

IV.2 Mathematical Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

IV.3 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

IV.4 Comparison with the nearly-optimal approach . . . . . . . . . . . . . . . . . . 185

V Discussion on the optimal o�ine approach . . . . . . . . . . . . . . . . . . . . . . . 186

V.1 Mathematical complexity and solving time issues . . . . . . . . . . . . . . . . 188

V.2 Impact of the crpd parameter model . . . . . . . . . . . . . . . . . . . . . . . 191

VI Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

Abstract

In this chapter, we focus on o�ine scheduling to �nd an optimal solution to the crpd-awarescheduling problem. To construct o�ine a valid schedule, we use two approaches based onMixed-Integer Linear Programming. First, we introduce a nearly-optimal solution to the crpd-aware scheduling problem. Then, we present an optimal o�ine approach which overcomes theissues of the previous solution.

163

CHAPTER 6. OFFLINE SCHEDULING WITH CACHE-RELATED PREEMPTION DELAYS

164

I. INTRODUCTION

I. Introduction

In the previous chapter, we showed that no online scheduler can be optimal for the crpd-aware schedul-ing problem (unless maybe for synchronously-released periodic tasks). So, in this chapter, we focus ono�ine scheduling. As no scheduling algorithm, executing in polynomial or pseudo-polynomial time,can be optimal for the crpd-aware scheduling problem unless P = NP (as proved in Chapter 4, Sec-tion IV), we use Mixed-Integer Linear Programming to compute an optimal solution, i.e. to constructa feasible schedule whenever it is possible.The next section presents the common basis of our two o�ine approaches. Then, in Section III, weintroduce our �rst o�ine approach to solve the crpd-aware scheduling problem. As this solutionis not fully optimal, we present in Section IV a new o�ine approach based on a slightly modi�edMixed-Integer Linear Program. In Section V, we discuss our optimal o�ine solution with regard to themathematical complexity, solving time issues and the crpd parameter model. Eventually, Section VIsummarizes and concludes this chapter.

II. Basis of the approach

II.1. Presentation of the main idea

We consider a �nite set of n independent jobs denoted Ji(ri, pi, di, si). We recall that each job can becharacterized by a release date ri, a worst-case execution time pi, an absolute deadline di and an upperbound on the crpd si that is added to the wcet of the job each time it resumes its execution after apreemption. Note that we refer here to the crpd only. But all results are still valid for any preemptiondelay as soon as it is only dependent on the preempted task, see Chapter 4, Section IV.5. When dealingwith synchronously-released periodic tasks, which is often the case in real-time scheduling, we considerthe jobs generated by the di�erent tasks over the hyperperiod (i.e. the least common multiple of thetask periods).

Example 6.1: Consider Taskset Texample1 de�ned in Table 6.1. This taskset is made of twosynchronously-released periodic tasks τ1(1, 3, 3, 0.25) and τ2(7, 12, 12, 0.5). The hyperperiodof the system is equal to the least common multiple of T1 and T2, i.e. H = 12. Over thehyperperiod, τ1 generates four jobs, J1, J2, J3 and J4, whereas τ2 issues only one job, J5.As shown in Figure 6.1a, Texample1 is not schedulable under rm and edf (assuming that tiesare broken according to task indexes). However, a feasible schedule exists as depicted inFigure 6.1b.

The main idea behind the two o�ine approaches presented in Sections III and IV is to use mathematicalprogramming to compute a feasible schedule (i.e. a schedule ensuring all timing constraints) wheneverit is possible. For a given problem, mathematical programming aims at �nding the best solution, withregard to some criterion (also called the objective function), from the set of all possible solutions for thisproblem. We consider in particular a special class of mathematical programming, called Mixed-IntegerLinear Programming (milp), where:

165



τ1(1, 3, 3, 0.25)J1(0, 1, 3, 0.25)J2(3, 1, 6, 0.25)J3(6, 1, 9, 0.25)

τ2(7, 12, 12, 0.5) J4(0, 7, 12, 0.5)

Table 6.1: Tasks and jobs generated over the hyperperiod H = 12 for Taskset Texample1.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J1 J2 J3 J4

τ2 J5 s5 J5 s5 J5 s5 J5

(a) rm and edf schedule.

0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J1,1 J2,2 J3,3 J4,4

τ2 J5,1 J5,2 s5 J5,3 J5,4


Figure 6.1: Schedule produced by rm and edf and feasible schedule for Taskset Texample1:τ1(1, 3, 3, 0.25) and τ2(7, 12, 12, 0.5).

� the objective function is a linear function,

� subjected to linear equality and inequality constraints,

� with some unknown variables required to have integer values.

More formally, a milp can be described as:

�nd vector x solving: minx

fT · x subjected to:

x(I) ∈ ZAineq · x ≤ bineqAeq · x = beqlb ≤ x ≤ ub

with vectors f , lb and ub, matrices Aineq, Aeq, Bineq and Beq and a set of indexes I being inputs forthe problem.Our scheduling problem consists in computing o�ine a feasible schedule whenever it is possible. Torepresent this problem using a mathematical program, we need to de�ne equality and inequality con-straints to ensure that all possible solutions for the mathematical problem are feasible schedules, i.e.

166

II. BASIS OF THE APPROACH

no job, for example, must start before its release date and �nish after its deadline. In order to select afeasible schedule among the set of valid solutions to the mathematical program (i.e. the set of feasibleschedules), we de�ne the objective function such that the feasible schedule that minimizes the overallcrpd will be chosen. Note that minimizing the overall crpd is equivalent to reducing the worst-caseworkload, which is an aim pursued by most scheduling algorithms as it allows more robustness (whichcorresponds to the ability of a system to remain schedulable when operating beyond worst-case scenar-ios: additional tasks, larger execution times..., see for example [But11]). The need for integer values isdue to the use of boolean variables to express what can only be expressed as integers between 0 and 1.To use a milp formulation, we need to represent a valid schedule in terms of equality and inequalityconstraints. To do so, we consider a schedule as a set of slices represented by their index: S ={1, . . . ,m}. Slices are delimited by subsequent scheduling events, i.e. job release dates or deadlines.For every slice k ∈ S, bk denotes the slice starting time, whereas ek denotes the slice ending time. The�rst slice begins by the earliest job release date whereas the last slice ends with the latest job deadline.The set of jobs that can be executed in a slice j is denoted by Jj, 1 ≤ j ≤ m. We denote Si the set ofslices for Job Ji, S1

i its �rst slice and Si = Si \ S1i its set of slices except the �rst one. In any feasible

schedule, each job must be scheduled in a time interval delimited by its release date and its deadline.Every job will be scheduled in slices: we call a job-piece Ji,j the part of a job Ji executed in a slicej ∈ Si. So each job Ji is considered as a set of subjobs {Ji,j |j ∈ Si}. The milp will construct a feasibleschedule by computing for every job the starting date and execution time of each subjob of this joband whether the subjob pays a crpd when beginning its execution. If Ji,j 's execution time is set to 0,then it will mean that Ji is not scheduled in Slice j.

Example 6.2: For Taskset Texample1, de�ned in Table 6.1, a schedule will be made up of4 slices: [r1, d1) = [0, 3), [r2, d2) = [3, 6), [r3, d3) = [6, 9) and [r4, d4) = [9, 12). Each job ofτ1 can be scheduled in one slice, i.e. each job of τ1 has only one sub-job: J1 → J1,1 in [0, 3),J2 → J2,2 in [3, 6), J3 → J3,3 in [6, 9) and J4 → J4,4 in [9, 12). Task τ2's only job over thehyperperiod, J5, can potentially be scheduled in any of the four slices. So J5 is made up of4 job-pieces: J5,1 in [0, 3), J5,2 in [3, 6), J5,3 in [6, 9) and J5,4 in [9, 12).

II.2. Feasible schedule property

Solving the milp associated to the crpd-aware scheduling problem consists �rst in constructing a setof valid solutions, i.e. feasible schedules, with regards to the milp's mathematical constraints. But,the number of possible solutions can be very high: for example, if the crpds are equal to 0, feasibleschedules with an arbitrary high number of preemptions can be constructed. In order to limit thenumber of possible solutions, our approach uses the following property:

Property 4. Among feasible schedules (if any), there exists a schedule in which a job resumes at mostonce in every slice.

Proof. Suppose we have an optimal schedule S in which a job Ji resumes twice in a slice j. We noteJ1i,j and J

2i,j the two non-contiguous subjobs of τi. Consider Schedule S ′ in which, after some subjob

167


bj ej

... J1i,j

... si J2i,j

... ⇒bj ej

... J1i,j J2

i,j... ...

idle time

Figure 6.2: Illustration of a subjob permutation in the proof of Theorem 4.

permutations, J1i,j and J

2i,j have been put consecutively as depicted in Figure 6.2. Such a reordering

cannot increase the occupied time of Slice j as subjobs have only been reordered and not split (i.e.preempted), so no additional preemption is introduced. Moreover, having J1

i,j and J2i,j now contiguous

actually suppresses the crpd paid by Job J2i,j . Finally, the reordering does not jeopardize schedulability

as a job deadline can only occur at the end of a slice. So, if S is a valid schedule, S ′ is still one.By repeating these permutations to every job in every slice we can get a feasible schedule in which ajob resumes at most once in every slice.

A direct consequence of Property 4 is that every job executed in a slice can pay at most one crpd.Thus, the previous property limits the number of schedule patterns to consider in order to de�ne anoptimal o�ine schedule in which all job deadlines are met.Note that the schedules constructed by the two milps presented hereafter won't be necessarily work-conserving, even if a feasible work-conserving schedule exists.

III. A nearly-optimal o�ine approach

III.1. Transformed model and assumptions

We will now present our �rst o�ine approach. To get a simpler mathematical program and withoutloss of generality, we consider a slightly modi�ed scheduling problem in which job execution times arede�ned as p′i = pi− si. In this scheduling problem, si can be interpreted as the delay incurred by a jobwhen it starts its execution for the �rst time or resumes its execution after a preemption. So, underthis model, a virtual crpd is paid by each job when starting its execution for the �rst time. Bothproblems are equivalent, as soon as the following assumption is satis�ed:

Assumption 5. Every job's crpd is smaller (or equal) to its wcet: si ≤ pi, 1 ≤ i ≤ n.

This assumption is obviously true in real-life: the crpd a job can experience cannot be greater than thetotal execution time of that job. However, the si parameter is an upper-bound on the crpd and so maybe very pessimistic, for example, if the whole cache is assumed to be reloaded after each preemption.Using p′i instead of pi allows to reduce the number of constraints for the milp.The main variables for our �rst mathematical program are:

� ti,j ∈ R, corresponding to the starting time of Job-piece Ji,j ,

� p′i,j ∈ R, standing for the execution time of Job-piece Ji,j as de�ned for the transformed problem,

� ∆i,j ∈ {0, 1}, indicating whether Job-piece Ji,j has to pay a preemption delay si.

168

III. A NEARLY-OPTIMAL OFFLINE APPROACH

Notation Type DescriptionInput data

S Set Set of slice indexesSi Set Slice indexes of JiS1i Set Index of the �rst slice of JiSi Set Slices of Ji except the �rst onebj real starting time of Slice jej real ending time of Slice jJj Set Jobs in Slice jp′i real transformed processing time of Job Jisi real preemption delay of Job Ji

Output variables

ti,j real starting time of Job-piece Ji,jp′i,j real transformed processing time of Job-piece Ji,j∆i,j binary Ji resumes in Slice j after a preemption

Internal variables

ai,j binary non-null job-piece execution conditionbi,j binary non-contiguous job-piece conditionyi,k,j binary job-piece disjunctive constraints

Table 6.2: Data and variables for the milp of the nearly-optimal approach.

169


All notations are summarized in Table 6.2.Moreover, to de�ne our milp, we only consider schedules where:

Assumption 6. The crpd paid by a job when resuming its execution in a slice �ts in the sliceboundaries.

In other words, it means that a crpd cannot be spread over two consecutive slices.The validity of both assumptions will be discussed in Section III.4.

III.2. Mathematical Program

We recall that the objective is to compute a preemptive schedule that minimizes the overall crpd inorder to reduce the worst-case workload.Using the notations introduced before, the objective function can be written as:

minn∑i=1

∑j∈Si

si ×∆i,j (6.1)

Equations 6.2 to 6.15 de�ne the constraints for the Mixed-Integer Linear Program. Those constraintsare divided in di�erent categories presented hereafter. The whole milp is synthesized in Table 6.3.

III.2.a. Processing time constraints

The �rst set of constraints ensures that the sum of the execution times of all job-pieces of a jobcorresponds to exactly the execution time of that job (we recall that by execution time we meanexecution time as de�ned for the transformed scheduling problem):∑

j∈Si

p′i,j = p′i 1 ≤ i ≤ n (6.2)

III.2.b. Slice constraints

Job-pieces Ji,j are executed inside a slice j. This means that every job-piece starts and completes inthe interval [bj , ej). An arbitrary small value ε is used to forbid a job-piece to start at time ej .

ti,j + p′i,j + si ×∆i,j ≤ ej 1 ≤ i ≤ n, j ∈ Si (6.3)

ti,j ≥ bj 1 ≤ i ≤ n, j ∈ Si (6.4)

ti,j ≤ ej − ε 1 ≤ i ≤ n, j ∈ Si (6.5)

These three constraints are illustrated in Figure 6.3.Job-pieces executed inside every slice, including possible crpds, do not exceed the interval size.

∑i∈Jj

(p′i,j + si ×∆i,j) ≤ ej − bj j ∈ S (6.6)

170


Objective function:

minn∑i=1

∑j∈Si

si ×∆i,j (6.1)

subjected to:

Processing time constraints:∑j∈Si

p′i,j = p′i 1 ≤ i ≤ n (6.2)

Slice constraints:

ti,j + p′i,j + si ×∆i,j ≤ ej 1 ≤ i ≤ n, j ∈ Si (6.3)

ti,j ≥ bj 1 ≤ i ≤ n, j ∈ Si (6.4)

ti,j ≤ ej − ε 1 ≤ i ≤ n, j ∈ Si (6.5)

Job-piece disjunctive constraints:∑i∈Jj

(p′i,j + si ×∆i,j) ≤ ej − bj j ∈ S (6.6)

ti,j + p′i,j + si ×∆i,j ≤ tk,j + (1− yi,k,j)×M j ∈ Si ∩ Sk (6.7)

tk,j + p′k,j + sk ×∆k,j ≤ ti,j + yi,k,j ×M j ∈ Si ∩ Sk (6.8)

Preemption penalty constraints:

p′i,j ≤ ∆i,j ×M 1 ≤ i ≤ n, j ∈ S1i (6.9)

p′i,j ≤ ai,j ×M 1 ≤ i ≤ n, j ∈ Si (6.10)

ti,j − (ti,j−1 + p′i,j−1 + si ×∆i,j−1) ≤ bi,j ×M 1 ≤ i ≤ n, j ∈ Si (6.11)

∆i,j ≤ ai,j 1 ≤ i ≤ n, j ∈ Si (6.12)

∆i,j ≤ bi,j 1 ≤ i ≤ n, j ∈ Si (6.13)

∆i,j ≥ ai,j + bi,j − 1 1 ≤ i ≤ n, j ∈ Si (6.14)

∆i,j ≥ 0 1 ≤ i ≤ n, j ∈ Si (6.15)

Table 6.3: Complete milp for the nearly-optimal solution.

171


bj ejti,j

si ×∆i,j p′i,j

ti,j ≥ bj ti,j < ej

ti,j + p′i,j + si ×∆i,j ≤ ej

Figure 6.3: Illustration of slice constraints.

bj ejti,j

si ×∆i,j p′i,j

tk,j

sk ×∆k,j p′k,j

ti,j + p′i,j + si ×∆i,j ≤ tk,j

(a) Case where Ji,j is executed before Jk,j .

bj ejtk,j

sk ×∆k,j p′k,j

ti,j

si ×∆i,j p′i,j

tk,j + p′k,j + sk ×∆k,j ≤ ti,j

(b) Case where Jk,j is executed before Ji,j .

Figure 6.4: Illustration of job-piece disjunctive constraints.

III.2.c. Job-piece disjunctive constraints

Inside every slice, two job-pieces cannot be executed simultaneously. For every pair of job-pieces Ji,jand Jk,j , we have either:

ti,j + p′i,j + si ×∆i,j ≤ tk,j

which corresponds to the case depicted in Figure 6.4a, or

tk,j + p′k,j + sk ×∆k,j ≤ ti,j

which corresponds to the case depicted in Figure 6.4b.The previous disjunctive constraints can be linearized using a big value M and a binary variable yi,k,j ,i < k, set to 1 by the solver if Ji,j is executed before Jk,j in Slice j, 0 otherwise:

ti,j + p′i,j + si ×∆i,j ≤ tk,j + (1− yi,k,j)×M j ∈ Si ∩ Sk (6.7)

tk,j + p′k,j + sk ×∆k,j ≤ ti,j + yi,k,j ×M j ∈ Si ∩ Sk (6.8)

172


bj−1 ej−1 = bjti,j−1

si ×∆i,j−1 p′i,j−1

ejti,j

si ×∆i,j p′i,j

ti,j−1 + p′i,j−1 + si ×∆i,j−1 < ti,j p′i,j > 0

Figure 6.5: Illustration of preemption conditions.

III.2.d. Preemption penalty constraints

The binary variable ∆i,j is equal to 1 if Job-piece Ji,j is subjected to a crpd in Slice j, to 0 otherwise.Ji,j is subjected to a crpd in Slice j if p′i,j > 0 and (except for the �rst slice for Job Ji) Ji,j−1 and Ji,jare not contiguous:

pi,j > 0 1 ≤ i ≤ n, j ∈ S1i

pi,j > 0 and tij > ti,j−1 + pi,j−1 + si ×∆i,j−1 1 ≤ i ≤ n, j ∈ SiThe �rst line corresponds to the �rst slice in which Ji is executed, whereas the second line copes withall other slices in which Ji can be executed (i.e. Si).In all other scenarios, the solver always chooses to set ∆i,j = 0 to minimize the objective function andthus no crpd is paid by the corresponding job-pieces.The following constraint ensures that, for the �rst slice of every job, a crpd is paid if, and only if, thejob is executed during this slice:

p′i,j ≤ ∆i,j ×M 1 ≤ i ≤ n, j ∈ S1i (6.9)

The second set of constraints deals with all the slices except the �rst one for every job. Two conditionsmust be checked:

p′i,j > 0

and

p′i,j > ti,j−1 + p′i,j−1 + ∆i,j−1 × siThese conditions are illustrated in Figure 6.5.We de�ne both conditions separately by introducing binary variables ai,j and bi,j . These two conditionsare linearized as follows:

p′i,j ≤ ai,j ×M 1 ≤ i ≤ n, j ∈ Si (6.10)

ti,j − (ti,j−1 + p′i,j−1 + ∆i,j−1 × si) ≤ bi,j ×M 1 ≤ i ≤ n, j ∈ Si (6.11)

According to these new binary variables, ∆i,j is de�ned by the logical result of ai,j ∧ bi,j which can belinearized by computing ∆i,j = min(ai,j , bi,j):

∆i,j ≤ ai,j 1 ≤ i ≤ n, j ∈ Si (6.12)

∆i,j ≤ bi,j 1 ≤ i ≤ n, j ∈ Si (6.13)

∆i,j ≥ ai,j + bi,j − 1 1 ≤ i ≤ n, j ∈ Si (6.14)

∆i,j ≥ 0 1 ≤ i ≤ n, j ∈ Si (6.15)

173


0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J1,1 J2,2 J3,3 J4,4

τ2 J5,1 J5,2 s5 J5,3 J5,4

Figure 6.6: Schedule constructed by the nearly-optimal o�ine approach.

III.3. Application Example

Consider again Taskset Texample1, de�ned in Table 6.1. Texample1 has 2 tasks τ1 and τ2 which issue 5jobs over the hyperperiod H = 12: J1, J2, J3, and J4 for τ1 and J5 for τ2.Job release dates and deadlines de�ne four temporal slices: [0, 3), [3, 6), [6, 9) and [9, 12). J1 must beexecuted in the �rst slice, J2 in the second one, J3 in the third one and J4 in the last one, whereas J5can be executed in any of them.The complete milp is presented in Table 6.5. It has 34 variables:

� 24 output variables:

� 8 non-negative real variables corresponding to starting dates: t1,1, t2,2, t3,3, t4,4, t5,1, t5,2,t5,3, t5,4,

� 8 non-negative real variables corresponding to execution times: p′1,1, p′2,2, p

′3,3, p

′4,4, p

′5,1,

p′5,2, p′5,3, p

′5,4,

� 8 binary variables corresponding to preemptions: ∆1,1, ∆2,2, ∆3,3, ∆4,4, ∆5,1, ∆5,2, ∆5,3,∆5,4,

� 10 input variables:

� 6 binary variables corresponding to disjunctive constraints: a5,2, b5,2, a5,3, b5,3, a5,4, b5,4,

� 4 binary variables corresponding to preemption constraints: y1,1,5, y2,2,5, y3,3,5, y4,4,5.

and 64 constraints:

� 5 processing time constraints,

� 24 slice constraints,

� 12 job-piece disjunctive constraints,

� 23 preemption penalty constraints.

For this example, we set ε = 0.01 and M = 100.We use the CPLEX 12.6.1 solver from IBM to solve the milp. The minimum of the objective functionis found to be equal to 3. This value corresponds to the actual total crpd for the schedule plusthe crpd which is assumed to be paid by each job when starting its execution. The correspondingoutput variables computed by the solver are presented in Table 6.4. Remember that the milp solvesa modi�ed version of the crpd-aware scheduling problem in which a crpd is paid the �rst time a job

174


starts. Column p′i,j in Table 6.4 corresponds to the processing times as de�ned for the transformedproblem which are computed by the solver. Column pi,j corresponds to the actual job execution timeswhich are depicted in the Gantt chart (Figure 6.6). pi,j is computed that way: if Ji starts its executionin Slice j then pi,j = p′i,j + si, but if Ji has already started its execution prior to j then pi,j = p′i,j .Hence we have, for example, p5,1 = p′5,1 + s5 = 1.5 + 0.5 = 2 as J5 starts its execution in Slice 1 and forthe next job-pieces: p5,2 = p′5,2 = 1.5, p5,3 = p′5,3 = 1.5 and p5,4 = p′5,4 = 2 as J5 has already startedits execution prior to Slice 2. Similarly, a crpd is paid before a job Ji starts its execution in a slice jif ∆i,j = 1 and Ji has started its execution prior to j. For example, J5 does not pay a crpd in Slice1 as it starts its execution in this slice. But J5 pays a preemption in Slice 3 as ∆5,3 = 1 and J5 hasstarted its execution prior to Slice 3. Note that the schedule is not work-conserving: at time 5.5, theprocessor is left idle even if there is a ready job (J5).

Job SliceJob-piece

ti,j p′i,j ∆i,j pi,j

J1 1 J1,1 0 0.75 1 1J2 2 J2,2 4.5 0.75 1 1J3 3 J3,3 6 0.75 1 1J4 4 J4,4 11 0.75 1 1J5 1 J5,1 1 1.5 1 2J5 2 J5,2 3 1.5 0 1.5J5 2 J5,3 7 1.5 1 1.5J5 2 J5,4 9 2 0 2

Table 6.4: Output variables computed by the solver for Taskset Texample1 using the nearly-optimalapproach.

III.4. Limitations of the nearly-optimal approach

Here we discuss the validity of Assumptions 5 and 6 and their consequences on the optimality of theproposed o�ine approach.Assumption 5 states that crpds should not be greater than wcets. This assumption is obviously truein real life, as, after a preemption, a task only reloads memory blocks that have been evicted from thecache and are still needed by the task for its execution. The number of such blocks cannot be greaterthan the total number of memory blocks used by the task during its execution. As the load times ofall the blocks used by the task are accounted for in the task wcet, then the crpd cannot be greaterthan the wcet. But the crpd-aware scheduling problem deals with upper-bounds on crpds and suchbounds can be highly pessimistic. For example, in the worst-case, it may be assumed that the wholecache is reloaded after each preemption. So for a large cache and a task τi with a small wcet Ci, theupper-bound on the crpd for that task si might be greater than Ci. Moreover, several experimentsproposed in the literature use crpd bounds which are randomly generated independently from wcets(see for example [ADM12, LAMD13, LAD14] and Chapter 7, Section IV).Assumption 6 states that a crpd cannot be spread over two consecutive jobs. As a consequence, the

175


Objective function:

min 0.25×∆1,1 + 0.25×∆2,2 + 0.25×∆3,3 + 0.25×∆4,4 + 0.5×∆5,1 + 0.5×∆5,2 + 0.5×∆5,3 + 0.5×∆5,4 Eq.6.1

subjected to:

Processing time constraints:

p′1,1 = 0.75 and p′2,2 = 0.75 and p′3,3 = 0.75 and p′4,4 = 0.75 Eq.6.2

p′5,1 + p′5,2 + p′5,3 + p′5,4 = 6.5 Eq.6.2

Slice constraints:

t1,1 + p′1,1 + 0.25×∆1,1 ≤ 3 and t1,1 ≥ 0 and t1,1 ≤ 3− 0.01 Eq.6.3, 6.4, 6.5

t2,2 + p′2,2 + 0.25×∆2,2 ≤ 6 and t2,2 ≥ 3 and t2,2 ≤ 6− 0.01 Eq.6.3, 6.4, 6.5

t3,3 + p′3,3 + 0.25×∆3,3 ≤ 9 and t3,3 ≥ 6 and t3,3 ≤ 9− 0.01 Eq.6.3, 6.4, 6.5

t4,4 + p′4,4 + 0.25×∆4,4 ≤ 12 and t4,4 ≥ 9 and t4,4 ≤ 12− 0.01 Eq.6.3, 6.4, 6.5

t5,1 + p′5,1 + 0.25×∆5,1 ≤ 3 and t5,1 ≥ 0 and t5,1 ≤ 3− 0.01 Eq.6.3, 6.4, 6.5

t5,2 + p′5,2 + 0.25×∆5,2 ≤ 6 and t5,2 ≥ 3 and t5,2 ≤ 6− 0.01 Eq.6.3, 6.4, 6.5

t5,3 + p′5,3 + 0.25×∆5,3 ≤ 9 and t5,3 ≥ 6 and t5,3 ≤ 9− 0.01 Eq.6.3, 6.4, 6.5

t5,4 + p′5,4 + 0.25×∆5,4 ≤ 12 and t5,4 ≥ 9 and t5,4 ≤ 12− 0.01 Eq.6.3, 6.4, 6.5

p′1,1 + 0.25×∆1,1 + p′5,1 + 0.5×∆5,1 ≤ 3− 0 Eq.6.6

p′2,2 + 0.25×∆2,2 + p′5,2 + 0.5×∆5,2 ≤ 6− 3 Eq.6.6

p′3,3 + 0.25×∆3,3 + p′5,3 + 0.5×∆5,3 ≤ 9− 6 Eq.6.6

p′4,4 + 0.25×∆4,4 + p′5,4 + 0.5×∆5,4 ≤ 12− 9 Eq.6.6

Job-piece disjunctive constraints:

t1,1 + p′1,1 + 0.25×∆1,1 ≤ t5,1 + (1− y1,5,1)× 100 and t5,1 + p′5,1 + 0.5×∆5,1 ≤ t1,1 + y1,5,1 × 100 Eq.6.7, 6.8

t2,2 + p′2,2 + 0.25×∆2,2 ≤ t5,2 + (1− y2,5,2)× 100 and t5,2 + p′5,2 + 0.5×∆5,2 ≤ t2,2 + y2,5,2 × 100 Eq.6.7, 6.8

t3,3 + p′3,3 + 0.25×∆3,3 ≤ t5,3 + (1− y3,5,3)× 100 and t5,3 + p′5,3 + 0.5×∆5,3 ≤ t3,3 + y3,5,3 × 100 Eq.6.7, 6.8

t4,4 + p′4,4 + 0.25×∆4,4 ≤ t5,4 + (1− y4,5,4)× 100 and t5,4 + p′5,4 + 0.5×∆5,4 ≤ t4,4 + y4,5,4 × 100 Eq.6.7, 6.8


p′1,1 ≤ ∆1,1 × 100 Eq.6.9

p′2,2 ≤ ∆2,2 × 100 Eq.6.9

p′3,3 ≤ ∆3,3 × 100 Eq.6.9

p′4,4 ≤ ∆4,4 × 100 Eq.6.9

p′5,1 ≤ ∆5,1 × 100 Eq.6.9

p′5,2 ≤ a5,2 × 100 and t5,2 − (t5,1 + p′5,1 + 0.5×∆5,1) ≤ 3× 100 Eq.6.10, 6.11

p′5,3 ≤ a5,3 × 100 and t5,3 − (t5,2 + p′5,2 + 0.5×∆5,2) ≤ 6× 100 Eq.6.10, 6.11

p′5,4 ≤ a5,4 × 100 and t5,4 − (t5,3 + p′5,3 + 0.5×∆5,3) ≤ 9× 100 Eq.6.10, 6.11

∆5,2 ≤ a5,2 and ∆5,2 ≤ b5,2 and ∆5,2 ≥ a5,2 + b5,2 − 1 Eq.6.12, 6.13, 6.14

∆5,3 ≤ a5,3 and ∆5,3 ≤ b5,3 and ∆5,3 ≥ a5,3 + b5,3 − 1 Eq.6.12, 6.13, 6.14

∆5,4 ≤ a5,4 and ∆5,4 ≤ b5,4 and ∆5,4 ≥ a5,4 + b5,4 − 1 Eq.6.12, 6.13, 6.14

∆5,2 ≥ 0 and ∆5,3 ≥ 0 and ∆5,4 ≥ 0 Eq.6.15

Table 6.5: Complete milp for Taskset Texample1 corresponding to the nearly-optimal o�ine approach.

176

IV. AN OPTIMAL OFFLINE APPROACH

Job ri pi di si

J1 1 1 2 0.25J2 2 0.75 3 0.25J3 0 1.75 4 0.5

Table 6.6: Jobset Jexample2.

0 1 2 3 4

J1

J2

J3

Figure 6.7: Feasible schedule for Jobset Jexample2: J1(1, 1, 2, 0.25), J2(2, 0.75, 3, 0.25) andJ3(0, 1.75, 4, 0.5).

number of valid schedules with regards to the milp constraints is diminished. In most cases, a validschedule can be constructed that satis�es Assumption 6. However, for a nearly full-loaded processorand potentially large crpds, there may be systems for which the only valid schedules do not respectAssumption 6.

Example 6.3: Consider Jobset Jexample2, presented in Table 6.6. Jexample2 is schedulable asthere exists at least one feasible schedule which is depicted in Figure 6.7. However, Jexample2

does not respect Assumption 6: as shown in Figure 6.7, the crpd paid by J3 when it resumesits execution in the second slice has to be spread over the third slice in order to constructa valid schedule. Our nearly-optimal o�ine approach cannot �nd a feasible schedule in thiscase and so the the system is deemed unschedulable with our approach.

As a consequence, the o�ine approach presented here is not optimal for the crpd-aware schedulingproblem as, in some cases, it might fail to construct a schedule even if the system is feasible.

IV. An optimal o�ine approach

IV.1. Overcoming the limitations of the nearly-optimal approach

As discussed in the previous section, Assumptions 5 and 6 represent possible limitations for our of-�ine approach. In particular, those assumptions are obstacles for solving optimally the crpd-awarescheduling problem. So we propose hereafter a new milp to overcome those problems and achieveoptimality.

177


Notation Type DescriptionInput data

S Set Set of slice indexesSi Set Slice indexes of JiS1i Set Index of the �rst slice of JiSi Set Slices of Ji except the �rst onebj real starting time of Slice jej real ending time of Slice jJj Set Jobs in Slice jpi real processing time of Job Jisi real preemption delay of Job Ji

Output variables

ti,j real starting time of Job-piece Ji,jpi,j real processing time of Job-piece Ji,j accounting for crpd∆i,j binary Ji resumes in Slice j after a preemption

Internal variables

ai,j binary non-null job-piece execution conditiona′i,j binary already-executed job conditionbi,j binary non-contiguous job-piece conditionyi,k,j binary job-piece disjunctive constraints

Table 6.7: Data and variables for the milp of the optimal o�ine approach.

178


To get rid of Assumption 5, we do not consider the transformed scheduling problem to devise ournew o�ine solution. No virtual crpd is paid anymore at job's start. As a consequence, some of theprevious milp constraints, in particular the preemption penalty constraints, have to be modi�ed.To avoid Assumption 6, i.e. to allow crpds to be spread over consecutive slices, without addingadditional variables or constraints, we include possible crpds in the job-piece execution times. Thismeans that, for each job-piece Ji,j , we compute pi,j which corresponds to the part of Job Ji's wcetwhich is executed in Slice j plus the crpd parameter si if Ji resumes in j after a preemption. Asa consequence, we do not consider the actual position of the crpd anymore and so a crpd can beimplicitly spread over two slices.The main variables for our new milp becomes:

� ti,j ∈ R corresponding to the starting date of Job-piece Jij ,

� pi,j ∈ R standing for the execution time of Job-piece Ji,j plus the crpd si if Job Ji resumes in jafter a preemption,

� ∆i,j ∈ {0, 1} indicating whether Job Ji resumes in Slice j after a preemption and so has to paya crpd si.

All notations are summarized in Table 6.7.With this new model, no crpd si is explicitly paid anymore when a job resumes its execution. The ac-tual location where the crpd is paid is no more considered. However, under the crpd-aware schedulingmodel, a crpd is paid after each preemption, i.e. the crpd must not be anticipated. More formally,that means that at time ti,j , when Job Ji starts its execution in Slice j, the remaining execution timefor Ji must be greater than the sum of all crpds paid by Ji from Slice j on, otherwise that means thatat least one crpd has been (at least partially) anticipated.∑

k∈Si,k≥jpi,k >

∑k∈Si,k≥j

si ×∆i,k 1 ≤ i ≤ n, j ∈ Si

To prove that no crpd is anticipated, we �rst prove the following lemma:

Lemma 2. For a job Ji and l the index of the last slice in which Ji resumes its execution after apreemption, if the system is feasible then Ji executes for a least si units of time after resuming itsexecution in any schedule minimizing the overall crpd.

Proof. This lemma is proved by contradiction.We consider a feasible system and suppose that we have a feasible schedule S, minimizing the overallcrpd, in which the last crpd has been anticipated. We note l the index of the last slice in which JobJi resumes its execution after a preemption: ∆i,l = 1 and

∑k∈Si,k>l

∆i,l = 0.A crpd has been anticipated prior to Slice l means:∑

k∈Si,k≥lpi,k ≤ si

As S is a feasible schedule, we have necessarily:∑k∈Si

pi,k = pi +∑k∈Si

si ×∆i,k

179


0 = bk bjkejk bjk+1

ejk+1 em

pi,jk

∆i,jk = 1

pi,jk+1

∆i,jk+1 = 1S

Sk

Figure 6.8: Schedule for the proof of Property 5.

which can be rewritten as:∑k∈Si,k<l

pi,k +∑

k∈Si,k≥lpi,k = pi +

∑k∈Si,k<l

si ×∆i,k +∑

k∈Si,k≥lsi ×∆i,k∑

k∈Si,k<l

pi,k = pi +∑

k∈Si,k<l

si ×∆i,k + si ×∆i,l −∑

k∈Si,k≥lpi,k∑

k∈Si,k<l

pi,k ≥ pi +∑

k∈Si,k<l

si ×∆i,k + si − si∑k∈Si,k<l

pi,k ≥ pi +∑

k∈Si,k<l

si ×∆i,k

But Ji resumes in l after a preemption means necessarily that Ji could not complete its normal executiontime plus additional delays prior to l:∑

k∈Si,k<l

pi,k < pi +∑

k∈Si,k<l

si ×∆i,k

This is a contradiction.So the last crpd has not been anticipated in previous slices.

We can now prove that:

Property 5. crpds are not anticipated in any schedule minimizing the overall crpd.

Proof. We prove Property 5 by induction by applying iteratively Lemma 2.We apply the previous lemma to every slice jk in which Ji resumes its execution after a preemption,starting from the last one. {jk} is a subset of Si. For each iteration, we consider a schedule Sk,corresponding to the interval [0, bjk+1) of Schedule S (see Figure 6.8), and Jki , corresponding to thepart of Ji scheduled in Sk. Jki 's execution time, denoted pki , is equal to:

pki = pi −

∑k′∈Si,k′≥jk+1

pi,k′ −∑

k′∈Si,k′≥jk+1

si ×∆i,k′

180


If no crpd has been anticipated from Slice jk+1 on, then: ∑k′∈Si,k′≥jk+1

pi,k′ −∑

k′∈Si,k′≥jk+1

si ×∆i,k′

> 0

and so

pki < pi

1. We start with jk = l, l being the index of the last slice in which Ji resumes its execution after apreemption. S l = S and so J li = Ji and pli = pi. Lemma 2 can be applied directly to prove thatthe last crpd is not anticipated.

2. We now suppose that none of the crpds that must be paid from Slice jk+1 have been anticipated.We consider Schedule Sk and Job Jki with pki . Sk is a feasible schedule and jk is the last slicein which Jki resumes after a preemption. So we can use Lemma 2 with Jki instead of Ji and pkiinstead of pi.

By induction, we can conclude that no crpd has been anticipated.

IV.2. Mathematical Program

The objective of the milp is still to compute a preemptive schedule that minimizes the overall crpd inorder to reduce the worst-case workload.Using the notations introduced in the previous subsection, the objective function can be written as:

min

n∑i=1

∑j∈Si

si ×∆i,j (6.16)

Equations 6.17 to 6.32 de�ne the constraints for the Mixed-Integer Linear Program. Those constraintsare divided in di�erent categories presented hereafter. The whole milp is synthesized in Table 6.8.

IV.2.a. Processing time constraints

The �rst set of constraints ensures that the sum of the execution times accounting for crpds (pi,j) ofall job-pieces of a job corresponds to exactly the execution time of that job plus the total crpd dueto all the preemptions the job experiences during its execution:∑

j∈Si

pi,j = pi +∑j∈Si

si ×∆i,j 1 ≤ i ≤ n (6.17)

181


Objective function:

minn∑i=1

∑j∈Si

si ×∆i,j (6.16)

subjected to:

Processing time constraints:∑j∈Si

pi,j = pi +∑j∈Si

si ×∆i,j 1 ≤ i ≤ n (6.17)

Slice constraints:

ti,j + pi,j ≤ ej 1 ≤ i ≤ n, j ∈ Si (6.18)

ti,j ≥ bj 1 ≤ i ≤ n, j ∈ Si (6.19)

ti,j ≤ ej − ε 1 ≤ i ≤ n, j ∈ Si (6.20)

Job-piece disjunctive constraints:∑i∈Jj

pi,j ≤ ej − bj j ∈ S (6.21)

ti,j + pi,j ≤ tk,j + (1− yi,k,j)×M j ∈ Si ∩ Sk (6.22)

tk,j + pk,j ≤ ti,j + yi,k,j ×M j ∈ Si ∩ Sk (6.23)


∆i,j = 0 1 ≤ i ≤ n, j ∈ S1i (6.24)

pi,j ≤ ai,j ×M 1 ≤ i ≤ n, j ∈ Si (6.25)∑k∈Si,k<j

pi,k ≤ a′i,j ×M 1 ≤ i ≤ n, j ∈ Si (6.26)

ti,j − (ti,j−1 + pi,j−1) ≤ bi,j ×M 1 ≤ i ≤ n, j ∈ Si (6.27)

∆i,j ≤ ai,j 1 ≤ i ≤ n, j ∈ Si (6.28)

∆i,j ≤ a′i,j 1 ≤ i ≤ n, j ∈ Si (6.29)

∆i,j ≤ bi,j 1 ≤ i ≤ n, j ∈ Si (6.30)

∆i,j ≥ ai,j + a′i,j + bi,j − 2 1 ≤ i ≤ n, j ∈ Si (6.31)

∆i,j ≥ 0 1 ≤ i ≤ n, j ∈ Si (6.32)

Table 6.8: Complete milp for the optimal o�ine solution.

182


IV.2.b. Slice constraints

All job-pieces Ji,j are executed inside Slice j, i.e. start and complete their executions in the interval[bj , ej). An arbitrary small value ε is used to forbid a job-piece to start at time ej .

ti,j + pi,j ≤ ej 1 ≤ i ≤ n, j ∈ Si (6.18)

ti,j ≥ bj 1 ≤ i ≤ n, j ∈ Si (6.19)

ti,j ≤ ej − ε 1 ≤ i ≤ n, j ∈ Si (6.20)

Job-pieces executed inside a given slice j do not exceed the interval size:∑i∈Jj

pi,j ≤ ej − bj j ∈ S (6.21)

IV.2.c. Job-piece disjunctive constraints

Inside every slice, two job-pieces cannot be executed simultaneously. In Slice j, for every pair ofjob-pieces Ji,j and Jk,j , we have either:

ti,j + pi,j ≤ tk,j

or

tk,j + pk,j ≤ ti,j

The previous disjunctive constraints can be linearized using a big value M and a binary variable yi,k,j ,i < k, set to 1 by the solver if Ji,j is executed before Jk,j in Slice j, to 0 otherwise:

ti,j + pi,j ≤ tk,j + (1− yi,k,j)×M j ∈ Si ∩ Sk (6.22)

tk,j + pk,j ≤ ti,j + yi,k,j ×M j ∈ Si ∩ Sk (6.23)

IV.2.d. Preemption penalty constraints

The binary variable ∆i,j is set to 1 if Job-piece Ji,j is subjected to a crpd in Slice j, to 0 otherwise.For the �rst slice of every job, there cannot be any preemption:

∆i,j = 0 1 ≤ i ≤ n, j ∈ S1i (6.24)

In every other slice j, a crpd is paid by Job Ji (i.e. ∆i,j = 1) if and only if:

1. Ji is executed in Slice j: pi,j > 0,

2. Ji has already started its execution prior to j:∑

k<j pi,k > 0,

3. Ji's executions in Slices j − 1 and j are not contiguous: ti,j > ti,j−1 + pi,j−1.

183


0 1 2 3 4 5 6 7 8 9 10 11 12

τ1 J1,1 J2,2 J3,3 J4,4

τ2 J5,1 J5,2 s5 J5,3 J5,4

Figure 6.9: Schedule constructed by the optimal o�ine approach.

In all other scenarios, the solver will always choose to set ∆i,j = 0 to minimize the objective functionand thus no crpd is paid by the corresponding job-pieces.The previous three conditions can be linearized by introducing binary variables ai,j , a′i,j and bi,j :

pi,j ≤ ai,j ×M 1 ≤ i ≤ n, j ∈ Si (6.25)∑k∈Si,k<j

pi,k ≤ a′i,j ×M 1 ≤ i ≤ n, j ∈ Si (6.26)

ti,j − (ti,j−1 + pi,j−1) ≤ bi,j ×M 1 ≤ i ≤ n, j ∈ Si (6.27)

According to these three new binary variables, ∆i,j is de�ned by the logical result of ai,j ∧ a′i,j ∧ bi,jwhich can be linearized by computing ∆i,j = min(ai,j , a

′i,j , bi,j):

∆i,j ≤ ai,j 1 ≤ i ≤ n,j ∈ Si (6.28)

∆i,j ≤ a′i,j 1 ≤ i ≤ n,j ∈ Si (6.29)

∆i,j ≤ bi,j 1 ≤ i ≤ n,j ∈ Si (6.30)

∆i,j ≥ ai,j + a′i,j + bi,j − 2 1 ≤ i ≤ n,j ∈ Si (6.31)

∆i,j ≥ 0 1 ≤ i ≤ n,j ∈ Si (6.32)

IV.3. Application Example

We now detail an application of our new o�ine approach. We consider again the example of TasksetTexample1 de�ned in Table 6.1. All job and slice considerations are identical to those for the nearly-optimal o�ine approach, see Section III.3.The complete milp for this example corresponding to the new approach is presented in Table 6.9. Ithas 37 variables:

� 24 output variables:

� 8 non-negative real variables corresponding to starting dates: t1,1, t2,2, t3,3, t4,4, t5,1, t5,2,t5,3, t5,4,

� 8 non-negative real variables corresponding to execution times: p1,1, p2,2, p3,3, p4,4, p5,1,p5,2, p5,3, p5,4,

� 8 binary variables corresponding to preemptions: ∆1,1, ∆2,2, ∆3,3, ∆4,4, ∆5,1, ∆5,2, ∆5,3,∆5,4,

184


� 13 input variables:

� 9 binary variables corresponding to disjunctive constraints: a5,2, a′5,2, b5,2, a5,3, a′5,3, b5,3,

a5,4, a′5,4, b5,4,

� 4 binary variables corresponding to preemption constraints: y1,1,5, y2,2,5, y3,3,5, y4,4,5.

and 70 constraints:

� 5 processing time constraints,

� 24 slice constraints,

� 12 job-piece disjunctive constraints,

� 29 preemption penalty constraints.

For this example, we set ε = 0.01 and M = 100.We use the CPLEX 12.6.1 solver from IBM to solve the milp. The minimum of the objective functionis found to be equal to 0.5. For the new approach, this value corresponds directly to the actual totalcrpd for the schedule. The corresponding output variables computed by the solver are presentedin Table 6.9. Column pi,j in Table 6.9 corresponds to the job-piece processing times accounting forcrpds. Column pi,j corresponds to the actual job execution times which are depicted in the Ganttchart (Figure 6.9). Column si,j corresponds to the actual crpd paid by Job Ji in Slice j which isdepicted in the Gantt chart (Figure 6.9). pi,j and si,j are computed that way. If there has been apreemption (i.e. ∆i,j = 1) then, if pi,j < si then si,j = pi,j , pi,j = 0 and the rest of the crpd (si− pi,j)will have to be paid in the following slices, but if pi,j ≤ si,j then si,j = si and pi = pi,j−si. If there hasbeen no preemption (i.e. ∆i,j = 0) and no crpd part from previous slices has to be accounted for, thenpi,j = pi,j . Consider for example J5 in Slice 3: ∆5,3 = 1 and p5,3 = 2 > s5 = 0.5, so s5,3 = s5 = 0.5and p5,3 = p5,3 − s5 = 1.5. The complete schedule is depicted in Figure 6.9. Note that this scheduleis di�erent from the one produced by the nearly-optimal o�ine approach (depicted in Figure 6.6).Indeed, for Texample1, there are many feasible schedules with a total crpd of 0.5 over the hyperperiodwhich are solutions for the milp. The schedule produced by the nearly-optimal approach is also a validsolution for our new o�ine approach and could have been chosen by the solver as well. Finally, notealso that, as for the nearly-optimal approach, schedules constructed by the optimal approach are notnecessarily work-conserving.

IV.4. Comparison with the nearly-optimal approach

The optimal o�ine approach overcomes the two problems experienced by the nearly-optimal one. First,it does not require crpds to be smaller than wcet. Indeed, contrary to the nearly-optimal approach,our new approach does not consider the transformed scheduling problem (p′i = pi − si, 1 ≤ i ≤ n)anymore and so crpds larger than wcets have no impact on the milp construction. As a result, ournew o�ine approach is able to deal with very pessimistic crpd bounds (for example when the wholecache is assumed to be reloaded after each preemption). It can also deal with synthetic tasks for whichcrpds are generated independently from the wcets (see Chapter 7, Section IV).

185


Job SliceJob-piece

ti,j pi,j ∆i,j pi,j si,j

J1 1 J1,1 0 1 0 1 0J2 2 J2,2 5 1 0 1 0J3 3 J3,3 6 1 0 1 0J4 4 J4,4 10.5 1 0 1 0J5 1 J5,1 1 2 0 2 0J5 2 J5,2 3 2 0 2 0J5 3 J5,3 7 2 1 1.5 0.5J5 4 J5,4 9 1.5 0 1.5 0

Table 6.9: Output variables computed by the solver for Taskset Texample1 using the optimal o�ineapproach.

0 1 2 3 4

J1

J2

J3

Figure 6.10: Schedule constructed by the optimal o�ine approach for Jobset Jexample2: J1(1, 1, 2, 0.25),J2(2, 0.75, 3, 0.25) and J3(0, 1.75, 4, 0.5).

Our nearly-optimal approach also assumes crpds not being spread over two consecutive slices. As aresults, it cannot construct feasible schedules for some feasible systems and so fails to achieve optimality.As our new o�ine approach computes job-piece execution times accounting for possible crpds (pi,j),the actual position of the crpd is not computed by the solver. So crpds can be implicitly spread overconsecutive slices.

Example 6.4: Consider again Jobset Jexample2. In any valid schedule, a crpd for J3 hasto be spread over slices [2, 3) and [3, 4). As a result, the nearly-optimal approach cannotschedule that jobset. Using the optimal o�ine approach allows to construct the feasibleschedule depicted in Figure 6.10. The corresponding output variables computed with theCPLEX solver are presented in Table 6.11.

V. Discussion on the optimal o�ine approach

We consider herafter exclusively our optimal o�ine approach. We �rst deal with mathematical com-plexity and solving time considerations. Then, we discuss possible extensions of our o�ine scheduling

186

V. DISCUSSION ON THE OPTIMAL OFFLINE APPROACH

Objective function:

min 0.5×∆5,2 + 0.5×∆5,3 + 0.5×∆5,4 Eq.6.16

subjected to:

Processing time constraints:

p1,1 = 1 and p2,2 = 1 and p3,3 = 1 and p4,4 = 1 Eq.6.17

p5,1 + p5,2 + p5,3 + p5,4 = 7 + 0.5×∆5,2 + 0.5×∆5,3 + 0.5×∆5,4 Eq.6.17

Slice constraints:

t1,1 + p1,1 ≤ 3 and t1,1 ≥ 0 and t1,1 ≤ 3− 0.01 Eq.6.18, 6.19, 6.20

t2,2 + p2,2 ≤ 6 and t2,2 ≥ 3 and t2,2 ≤ 6− 0.01 Eq.6.18, 6.19, 6.20

t3,3 + p3,3 ≤ 9 and t3,3 ≥ 6 and t3,3 ≤ 9− 0.01 Eq.6.18, 6.19, 6.20

t4,4 + p4,4 ≤ 12 and t4,4 ≥ 9 and t4,4 ≤ 12− 0.01 Eq.6.18, 6.19, 6.20

t5,1 + p5,1 ≤ 3 and t5,1 ≥ 0 and t5,1 ≤ 3− 0.01 Eq.6.18, 6.19, 6.20

t5,2 + p5,2 ≤ 6 and t5,2 ≥ 3 and t5,2 ≤ 6− 0.01 Eq.6.18, 6.19, 6.20

t5,3 + p5,3 ≤ 9 and t5,3 ≥ 6 and t5,3 ≤ 9− 0.01 Eq.6.18, 6.19, 6.20

t5,4 + p5,4 ≤ 12 and t5,4 ≥ 9 and t5,4 ≤ 12− 0.01 Eq.6.18, 6.19, 6.20

p1,1 + p5,1 ≤ 3− 0 Eq.6.21

p2,2 + p5,2 ≤ 6− 3 Eq.6.21

p3,3 + p5,3 ≤ 9− 6 Eq.6.21

p4,4 + p5,4 ≤ 12− 9 Eq.6.21

Job-piece disjunctive constraints:

t1,1 + p1,1 ≤ t5,1 + (1− y1,5,1)× 100 and t5,1 + p5,1 ≤ t1,1 + y1,5,1 × 100 Eq.6.22, 6.23

t2,2 + p2,2 ≤ t5,2 + (1− y2,5,2)× 100 and t5,2 + p5,2 ≤ t2,2 + y2,5,2 × 100 Eq.6.22, 6.23

t3,3 + p3,3 ≤ t5,3 + (1− y3,5,3)× 100 and t5,3 + p5,3 ≤ t3,3 + y3,5,3 × 100 Eq.6.22, 6.23

t4,4 + p4,4 ≤ t5,4 + (1− y4,5,4)× 100 and t5,4 + p5,4 ≤ t4,4 + y4,5,4 × 100 Eq.6.22, 6.23


∆1,1 = 0 and ∆2,2 = 0 and ∆3,3 = 0 and ∆4,4 = 0 and ∆5,1 = 0 Eq.6.24

p5,2 ≤ a5,2 × 100 and p5,1 ≤ a′5,2 × 100 and t5,2 − (t5,1 + p5,1) ≤ b5,2 × 100 Eq.6.25, 6.26, 6.27

p5,3 ≤ a5,3 × 100 and p5,1 + p5,2 ≤ a′5,3 × 100 and t5,3 − (t5,2 + p5,2) ≤ b5,3 × 100 Eq.6.25, 6.26, 6.27

p5,4 ≤ a5,4 × 100 and p5,1 + p5,2 + p5,3 ≤ a′5,4 × 100 and t5,4 − (t5,3 + p5,3) ≤ b5,4 × 100 Eq.6.25, 6.26, 6.27

∆5,2 ≤ a5,2 and ∆5,2 ≤ a′5,2 and ∆5,2 ≤ b5,2 and ∆5,2 ≥ a5,2 + a′5,2 + b5,2 − 2 Eq.6.28, 6.29, 6.30



∆5,2 ≥ 0 and ∆5,3 ≥ 0 and ∆5,4 ≥ 0 Eq.6.31

Table 6.10: Complete milp for Taskset Texample1 using the optimal o�ine approach.

187


Job SliceJob-piece

ti,j pi,j ∆i,j pi,j si,j

J1 2 J1,2 1 1 0 1 0J2 3 J2,3 2 0.75 0 0.75 0J3 1 J3,1 0 1 0 1 0J3 2 J3,2 1 0 0 0 0J3 3 J3,3 2.75 0.25 1 0 0.25J3 4 J3,4 3 1 0 0.75 0.25

Table 6.11: Output variables computed by the solver for Jobset Jexample2 using the optimal o�ineapproach.

approach with regard to more precise crpd parameter models.

V.1. Mathematical complexity and solving time issues

We have a look here at the solving time and the mathematical complexity of our optimal o�ineapproach. We use 1000 synthetically generated tasksets. For the generation input parameters, weconsider a total processor utilization U = 0.8, a total cache utilization CU = 4, a cache size in numberof cache sets CS = 256, a reutilization factor RF = 0.3 and a Block Reload Time brt = 0.008. Themeaning of those parameters as well as the methodology for randomly generating the di�erent taskparameters are detailed in Chapter 7, Section IV. In particular, to contain the milp solving time, thenumber of jobs is limited to 200 per taskset and a time limit of 10 seconds is set for the solver.

First, we deal with the number of slices considered for the milp formulation. As depicted in Figure 6.11,the number of slices for each taskset increases quite linearly with the number of jobs. However, theresulting slice values for tasksets with a same number of jobs can be very di�erent as the number ofslices is also dependent on the task period distribution. For example, the number of slices for tasksetswith 119 jobs ranges from 64 slices to 90 slices (i.e. 40% more) as depicted in Figure 6.11. So, weconsider hereafter the impact of both the number of jobs and the number of slices per taskset.

Now, we compute the number of variables and constraints of the milp constructed by the optimalo�ine approach for every input taskset.As depicted in Figure 6.12, the number of variables for the milp increases with the number of jobs andwith the number of slices per taskset. But as depicted in Figure 6.12a, there can be several variablevalues for a same number of jobs: for tasksets with 119 jobs, the number of milp variables can rangefrom 1563 to 2163 (i.e. nearly 50% more). On the contrary, the increase is quite linear with the numberof slices (Figure 6.12b).Similarly, the number of constraints for the milp increases with the number of jobs and with the numberof slices per taskset as depicted in Figure 6.13. Once more, there can be several constraint values for

188


0

20

40

60

80

100

120

140

0 50 100 150 200

Num

ber

ofslices

Number of jobs

Figure 6.11: Number of slices for the optimal o�ine approach as a function of the number of jobs pertaskset.

0

500

1000

1500

2000

2500

3000

3500

4000

0 50 100 150 200

Num

ber

ofvariables

Number of jobs

(a) Number of variables as a function of the num-ber of jobs per taskset.

0

500

1000

1500

2000

2500

3000

3500

4000

0 20 40 60 80 100 120 140

Num

ber

ofvariables

Number of slices

(b) Number of variables as a function of the num-ber of slices per taskset.

Figure 6.12: Number of milp variables for the optimal o�ine approach as a function of the number ofjobs per taskset and of the number of slices per taskset.

a same number of jobs (Figure 6.13a): for tasksets with 119 jobs, the number of milp constraints canrange from 2934 to 4416 (i.e. about 50% more). As for the number of slices, the increase in the numberof milp constraints is again quite linear as depicted in Figure 6.13b.

The mathematical complexity of the optimal o�ine approach is correlated with both the number ofjobs and the number of slices per taskset but the number of slices has a more direct impact on thenumber of milp variables and constraints as it is related to both the number of jobs and the taskperiod distribution.

189


0

1000

2000

3000

4000

5000

6000

7000

0 50 100 150 200

Num

ber

ofconstraints

Number of jobs

(a) Number of constraints as a function of thenumber of jobs per taskset.

0

1000

2000

3000

4000

5000

6000

7000

0 20 40 60 80 100 120 140

Num

ber

ofconstraints

Number of slices

(b) Number of constraints as a function of thenumber of slices per taskset.

Figure 6.13: Number of milp constraints for the optimal o�ine approach as a function of the numberof jobs per taskset and of the number of slices per taskset.

0

10

20

30

40

50

60

70

0 50 100 150 200

Time(s)

Number of jobs

(a) Total computation time as a function of thenumber of jobs per taskset.

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120 140

Time(s)

Number of slices

(b) Total computation time as a function of thenumber of slices per taskset.

Figure 6.14: Evaluation of the total computation time for the o�ine approach as a function of thenumber of jobs per taskset and of the number of slices per taskset.

Then, we measure the total time needed by the optimal o�ine approach to compute a solution to thecrpd-aware scheduling problem. This time corresponds to the time to construct the inputs for themilp and also to the time needed by the solver to compute a solution to the mathematical program.As depicted in Figure 6.14, the total time needed for the optimal o�ine approach to compute a solutionto the crpd-aware scheduling problem tends to explode with both the number of jobs (Figure 6.14a)and the number of slices (Figure 6.14b): time rises from less than 10s in average for tasksets with lessthan 50 jobs (respectively less than 20 slices) to more than 60s for some tasksets with more than 180jobs (resp. more than 120 slices). The total time needed by the optimal o�ine approach to compute a

190


Job ri pi di ecbi ucbi

J1 1 1 2 {0, 1} {}J2 0 2 3 {1, 2, 3} {2, 3}

Table 6.12: Jobset Jexample3.

0 1 2 3

J1

J2

Figure 6.15: Feasible schedule for Jobset Jexample3 with a crpd parameter model taking into accountboth the preempting and preempted tasks.

solution is not strictly correlated with neither the number of jobs nor the number of slices per taskset:the time to construct the inputs for the milp is directly related to the number of jobs and slices butthe milp solving time is highly variable from one instance to another.

V.2. Impact of the crpd parameter model

The e�ciency of our optimal o�ine approach in comparison with other scheduling solutions is verydependent on the adopted crpd parameter model. The o�ine approach assumes a crpd bounddepending only on the preempted task. Under this crpdmodel, our solution is optimal, as it constructsa feasible schedule whenever it is possible.But under this model, the crpd bound can be very pessimistic. If the preempting and preempted taskmemory blocks do not map to the same cache locations, then the actual crpd is null as no memoryblock can be evicted from the cache by another task.

Example 6.5: Consider Jobset Jexample3 presented in Table 6.12. Note that ecbs and ucbsare represented by the indexes of the cache sets to which they are mapped. We only considercrpds and assume all other preemption delays to be either negligible or already accountedfor in the wcets. First, we consider a crpd bound depending only on the preemptedtask: si = brt × |ucbi|. As the processor is fully loaded, no additional delay can beintroduced without making the system unschedulable. As J2 has 2 ucbs, si is not null andas a consequence the milp fails to construct a feasible schedule for Jobset Jexample3. However,as J1's ecbs do not map to the same cache locations as J2's ucbs (ecb1 ∩ ucb2 = ∅), nocrpd is actually incurred when J2 resumes its execution after being preempted by J1. So,considering a more precise crpd model allows to �nd a feasible schedule for this system,depicted in Figure 6.15, as no crpd actually occurs.

To handle those cases, our o�ine approach has to be modi�ed to consider a more accurate crpd

191


parameter bound si,j , depending both on the preempted job and the preempting ones. However,designing such an milp is a di�cult matter. In particular, the crpd paid by the preempted job Jidoes not only depend on the damage to the cache done by the preempting job Jj , but also on thedamage due to all jobs Jk that execute while Ji being preempted. One solution is to use booleanvariables to represent whether Job Jk has been executed between Job Ji's preemption in Slice j′ andbefore Ji resumes its execution in Slice j. As a consequence, the milp size increases very fast. Such acomplexity would allow to deal only with a small number of jobs and few time slices and thus wouldnot be suitable in practice.

VI. Conclusion

In this chapter, we focused on �nding an optimal solution to the crpd-aware scheduling problem.As no online scheduler can be optimal for this problem (see Chapter 5, Section III), we consideredo�ine scheduling. We used mathematical programming to devise two o�ine scheduling approaches.While the �rst one is only a nearly-optimal solution to the crpd-aware, the second one achievesoptimality under the crpd parameter model considered throughout this work. Besides computing afeasible schedule whenever it is possible, this o�ine approach can be used as an exact feasibility testto determine whether a system is feasible or not as soon as crpds are considered. This can be veryuseful as no exact feasibility test is known for the crpd-aware scheduling problem.

192

Chapter

7 Evaluation of the cache impact on

schedulability

Contents

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

II General experimental plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

II.1 Common experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . 195

II.2 Common metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

III Experiments based on a crpd parameter . . . . . . . . . . . . . . . . . . . . . . . . 198

III.1 Additional experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . 198

III.2 Additional metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

III.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

IV Experiments based on cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . 203

IV.1 Additional experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . 204

IV.2 Additional metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

IV.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

V Discussion on the experimental plans . . . . . . . . . . . . . . . . . . . . . . . . . . 212

VI Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

Abstract

In this chapter, we conduct several experiments using synthetically generated tasks. This eval-uation aims to quantify the schedulability loss of classic online scheduling policies such as RateMonotonic and Earliest Deadline First as soon as Cache-Related Preemption Delays are con-sidered. Through these experiments, the o�ine scheduling approach presented in Chapter 6 isalso compared with existing online policies. We show that our o�ine approach clearly outper-forms classic policies such as Rate Monotonic and Earliest Deadline First when Cache-RelatedPreemption Delays are considered.

193

CHAPTER 7. EVALUATION OF THE CACHE IMPACT ON SCHEDULABILITY

194

I. INTRODUCTION

I. Introduction

In this chapter, we evaluate the e�ectiveness of the o�ine solution proposed in the previous chapterfor scheduling tasks as soon as Cache-Related Preemption Delays are considered. We study in partic-ular the schedulability of our o�ine approach against classic online scheduling policies such as RateMonotonic and Earliest Deadline First using synthetically generated tasks. The goal of this evaluationchapter is twofold. On one hand, we want to compare our o�ine approach with rm and edf whenvarying several parameters such as the processor utilization or the use of the cache by the di�erenttasks. We also study how our o�ine approach behaves when using enhanced crpd bound models.On the other hand, we want to evaluate the loss of schedulability of classic online schedulers (i.e.rm and edf) when crpds are accounted for. For a simple crpd bound model depending only onthe preempted task, the o�ine approach proposed in Chapter 6 is optimal. So, it can be used as acomparative point to evaluate the schedulability ratios of rm and edf when accounting for crpds.Di�erent experimental plans can be followed. A crpd parameter can be used as it has been thecase throughout this PhD work. This parameter is only dependent on the considered task and, asdiscussed in the previous chapter, it might be quite pessimistic. So crpd bounds exploiting Evictingand Useful Cache Blocks can also be computed as it is commonly found in the real-time schedulingliterature [ADM12, LAMD13] to get tighter crpd values.We �rst introduce the common experimental settings and metrics used for our experiments. Then, wepresent a �rst set of experiments for a task model with a crpd parameter (Section III). In Section IV,we conduct a second set of experiments but using this time ecbs and ucbs to compute the crpdbounds. In Section V, we compare the previous two approaches and discuss possible improvements.Finally, we conclude this chapter in Section VI.

II. General experimental plan

The two series of experiments conducted in the following two parts of this chapter are implementedusing the MATLAB programming language. They also share some common settings. For all ex-periments, we consider Rate Monotonic (rm) and Earliest Deadline First (edf) online schedulingalgorithms alongside the o�ine scheduling approach introduced in Chapter 6, Section IV, and referredto hereafter as off. The goal of these experiments is to measure the schedulability performances ofrm, edf and off when varying some key parameters, in particular the total processor utilization ofthe taskset.

II.1. Common experimental settings

To evaluate rm, edf and off, we generate sets of tasks with random timing parameters (wcetswithout crpds, periods, deadlines...). This generation follows the approach found in several workson scheduling with crpds, in particular in [ADM12, LAMD13]. The tasks used for the two series ofexperiments are synchronous with implicit deadlines and share some common timing parameters. Eachtask τi is de�ned at least by a wcet Ci (not accounting for any crpd), a period Ti and a relativedeadline Di.

195


milp restrictions. For off, the time to solve the mathematical program tends to increase expo-nentially with the size of the inputs. In particular, the number of jobs and the number of slices have ahigh impact on the milp solving time as shown in Chapter 6, Section V. In order to contain the milpsolving time, we only deal for these experiments with tasksets made of 4 tasks generating a total of atmost 200 jobs over their hyperperiod. For each taskset, timing characteristics are re-generated untilthis condition is ful�lled. For that purpose, we also limit our study to a period range from 1ms to10ms. Thus, the hyperperiod is contained and so are the numbers of jobs and slices.

Task generation. The generation process follows several steps.First, a processor utilization ui is generated for each task using the UUnifast algorithm introducedin [BB05]. The UUnifast algorithm allows to generate tasksets with a uniform distribution of taskprocessor utilizations, taking as input the total processor utilization U for the taskset (the number oftasks being always set to 4 as explained before).Then, task periods Ti are randomly chosen in [1, 10] using a uniform distribution. These restrictions,compared to the experiments presented in [ADM12, LAMD13], are placed to contain the explosionof the milp solving time as stated before. Moreover, we prefer to use a uniform distribution ratherthan a log-uniform one (as in [LAMD13]), since the task range is here reduced to only one order ofmagnitude.From the task period, the wcet of each task Ci can easily be computed as Ci = ui · Ti.As we limit our experiments to synchronously-released tasks with implicit deadlines, we simply setoi = 0 and Di = Ti.For each experimental point, i.e. each value of the input parameter (for example the total processorutilization), 1000 independent tasksets are randomly generated.

II.2. Common metrics

Through our experiments, we aim at comparing rm, edf and off in terms of schedulability whenvarying di�erent input parameters. In particular we evaluate, for both experiment series, the impactof the total processor utilization U of the taskset. For each experimental point (corresponding to agiven value of the input parameter), the schedulability of the 1000 randomly generated tasksets ismeasured under every studied scheduling policy.

II.2.a. Schedulability

When studying the in�uence of only one parameter at a time, we simply measure the number ofschedulable tasksets for each scheduling policy as a function of the varying parameter, all other inputparameters being set to their default values.

rm. For rm, we use the modi�ed version of the Response Time Analysis accounting for crpdsintroduced in [BMSO+96] and already presented in Chapter 3, Section II:

Ri = Ci +∑

∀j∈hp(i)

⌈RiTj

⌉· (Cj + γi,j)

196

II. GENERAL EXPERIMENTAL PLAN

hp(i) being the set of tasks with priorities higher than the one of τi and γi,j a bound on the crpdexperienced by τi each time it is preempted by a higher priority task τj . The way to compute γi,jdepends on the experiment series and is detailed in the following sections.

edf. To evaluate the schedulability of edf and lp-edf, we use a modi�ed version of the su�cient testfor periodic tasks with implicit deadlines accounting for crpds introduced in [LAMD13] and alreadypresented in Chapter 3, Section II: ∑

∀τi

Ci + γDmax,i

Ti≤ 1

Dmax being the largest relative deadline in the taskset. The way to compute γDmax,i also depends onthe experiment series and is detailed in the following sections.

off. For off, a taskset is deemed schedulable if the solver �nds a solution for this taskset. If nosolution has been found, then the taskset is declared unschedulable.To evaluate the schedulability of a taskset, we use the IBM ILOG CPLEX Optimization StudioV12.6.1 1 to solve the milp: if the solver �nds a solution, then the taskset is deemed schedulable.To further bound the solving time issue, we set a time limit of 10 seconds for the solver. If the timelimit is exceeded, then the solver keeps the best current solution. If no solution has been found whenthe time limit is exceeded, then the taskset is deemed unschedulable.

II.2.b. Weighted Schedulability

An exhaustive evaluation of every combination of input parameters is not possible. Therefore, to studythe impact of parameters other than the total processor utilization U , we use the weighted schedulabilitymeasure introduced in [BBA10] and used in [ADM12, LAMD13]. We �x all parameters except oneand we vary the remaining parameter for a set of values of the total processor utilization. For eachapproach and each value of the chosen varying parameter p, 1000 tasksets are generated for a set Q ofequally spaced processor utilization values:

Q = {u|u = k · 0.1, k ∈ J1, 10K}

Then, the results are graphically represented using the weighted schedulability W`(p) which combinesthe data of all tasksets for every processor utilization value in Q:

W`(p) =

∑∀U∈Q U · S`(U, p)∑

∀U∈Q U

where S`(U, p) is the binary result of the schedulability test for a taskset with a processor utilizationU and a value p for the studied parameter. The Weighted Schedulability measure allows to reduce3-dimensional plots to 2 dimensions only without having to give the processor utilization a �xed value.

1http://www.ibm.com/support/knowledgecenter/SSSA5P_12.6.1/ilog.odms.studio.help/Optimization_

Studio/topics/COS_intro_features.html?lang=en

197

http://www.ibm.com/support/knowledgecenter/SSSA5P_12.6.1/ilog.odms.studio.help/Optimization_Studio/topics/COS_intro_features.html?lang=en

http://www.ibm.com/support/knowledgecenter/SSSA5P_12.6.1/ilog.odms.studio.help/Optimization_Studio/topics/COS_intro_features.html?lang=en


III. Experiments based on a crpd parameter

For this �rst set of experiments, we consider the task model introduced in Chapter 4, Section IV.5, andused in most parts of this PhD work. Alongside their classic timing parameters (wcets, periods...),tasks have an additional parameter si representing the crpd they pay when resuming their executionsafter a preemption.For these experiments, as we measure the number of preemptions for every scheduling policy, weconsider alongside off, rm and edf a slightly modi�ed version of edf, denoted hereafter lp-edf,in which the tie breaking rule avoids unnecessary preemptions: if at a given instant, the job beingcurrently executed has the highest priority, then lp-edf never preempts it even if there are ready jobshaving the same highest priority.Note that, as it is the case throughout all this PhD work, we only focus on crpds in this chapter. Butthe results presented in this section can be easily extended to other types of preemption delays as soonas these preemption delays are only dependent on the preempted task.

III.1. Additional experimental settings

The following additions are made to the experimental plan presented in the previous section.

Input parameter. For these experiments, we consider an additional input parameter (in additionof the total processor utilization U), which we call the Maximum Preemption Delay Factor. Thisparameter will be denoted PDF hereafter. The Maximum Preemption Delay Factor corresponds to themaximum value of the crpd parameter for a task in proportion of its wcet. By default, we assume:

� a Maximum Preemption Delay PDF for all tasks of the taskset equal to 0.2.

crpd parameter generation. For every task, the crpd parameter si is generated randomly be-tween 0 and PDF× Ci using a uniform distribution.

III.2. Additional metrics

We explain now how schedulability is analyzed for these experiments and how we measure preemptionnumbers.

III.2.a. crpd bounds for the schedulability analyses

To study the schedulability of off, rm, edf and lp-edf when varying either the total processorutilization or the Maximum Preemption Delay Factor, we consider the following upper-bounds on thecrpd for the schedulability analyses, using the crpd parameter si.

rm. To evaluate the schedulability of rm, we consider the Response Time Analysis modi�ed tohandle crpds presented in the previous section. For the crpd bound, we consider an adaptation ofthe ucb-only approach as used in Chapter 5, Section II.3:

γi,j = max∀τk∈hep(i)∩lp(j)

{sk}

198

III. EXPERIMENTS BASED ON A CRPD PARAMETER

with hep(i) being the subset of tasks with priorities higher or equal to the priority of τi and lp(j) beingthe subset of tasks with priorities lower than the priority of τj .

edf and lp-edf. To evaluate the schedulability of both edf and lp-edf, we use the edf schedu-lability test modi�ed to handle crpds presented in the previous section. For the crpd bound, weconsider an adaptation of the ucb-only approach as used in Chapter 5, Section II.3:

γDmax,i = max∀τk, Dmax≥Dk>Di

{sk}

Dmax being the largest relative deadline in the taskset.

off. For off, we can directly use the crpd parameter si to construct the milp.

III.2.b. Measuring preemptions and crpds

To evaluate the impact of the preemptions and of the crpds when varying either the total processorutilization or the Maximum Preemption Delay Factor, we consider two di�erent metrics.

Preemptions. We compute the average preemption number per job for each taskset under everyapproach. To compute the number of preemptions, we solely consider tasksets deemed schedulableunder off. Indeed, when a taskset is deemed unschedulable under off, no schedule is constructedand so there is no comparative point. To get the average number of preemptions per job for rm, edfand lp-edf, we simulate the task executions over the hyperperiod of the taskset. Note that we considerworst-case durations (wcets and upper-bounds on the preemption delays) for these simulations. Then,we count the number of preemptions for each job. For off, we compute the same metric using thesolver output variables ∆i,j .

crpd. We compute the total preemption delay ratio for each taskset under every approach. Thisratio corresponds to the part of the processor utilization occupied by all the crpds occurring overthe hyperperiod of the taskset. As for preemptions, we compute this total preemption delay ratioonly for tasksets deemed schedulable under off. To compute the total preemption delay ratio, wesimulate once more rm, edf and lp-edf over the hyperperiod. For each job generated by each task ofa given taskset, the number of preemptions experienced by the job is multiplied by the crpd of thecorresponding task. Then, we sum the resulting crpds over every job of the taskset and �nally dividethe result by the length of the taskset hyperperiod.

III.3. Results

For these experiments, we evaluate the schedulability of rm, edf, lp-edf and off when varying thetwo key input parameters which are the total processor utilization U and the Maximum PreemptionDelay Factor PDF. We also measure the average number of preemptions for each scheduling policyand evaluate the total crpd for every taskset in proportion of the processor occupied time.

199


0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber

ofSchedulableTasksets

Total Processor Utilization

rmlp-edf=edf

off

Figure 7.1: Number of schedulable tasksets under rm, edf, lp-edf and off as a function of the totalprocessor utilization for a Maximum Preemption Delay Factor equal to 0.2

III.3.a. Impact of the Processor Utilization

As depicted in Figure 7.1, when the total processor utilization U increases, then rm, edf, lp-edf andoff experience a decrease in schedulability. Clearly, for U = 1, the only schedulable tasksets are thosewhich can be scheduled non-preemptively. Note that the results for edf and lp-edf are identical aswe use the same schedulability test on the same tasksets. It can be easily seen in Figure 7.1 that allfour approaches perform quite identically for low processor utilizations. For U ≥ 0.4, rm, edf andlp-edf start to behave worse than off. In particular, the schedulability analysis for rm achieves lowerschedulability ratios than the schedulability test for both edf and lp-edf and is far worse than off.When the processor utilization becomes high (U ≥ 0.8), both edf and lp-edf su�er an importantschedulability loss (only 59% of schedulable tasksets for U = 0.8, whereas off achieves to schedulesuccessfully about 98% tasksets).Indeed, as depicted in Figure 7.2, the average number of preemptions computed over 1000 tasksetssimulated under edf or lp-edf can be up to 10 times the average one occurring under off. Theaverage number of preemptions per job is far more important under rm that it is under edf (50% morepreemptions for U = 0.8), lp-edf (twice as many preemptions for U = 0.8) and off (10 times morepreemptions for U = 0.8) Note that, for high processor utilization values, the number of preemptionsfor edf and lp-edf does not necessarily increase. As explained in Chapter 5, Section II and also shownin [But05], larger wcets can result in less preemptions for the overall system. On the contrary, theaverage number of preemptions per job under rm always increases with the total processor utilizationU . Indeed, as stated in [But05], tasks with larger wcets (as the total processor utilization increases)are more likely to be preempted by higher priority tasks. However, note that for U > 0.9, the averagenumber of preemptions under rm seems to stall. This behaviour might be linked to the experimentalplan we have adopted to measure preemptions: only tasksets deemed schedulable under off areconsidered. For high values of the total processor utilization, tasksets deemed schedulable under off

200

III. EXPERIMENTS BASED ON A CRPD PARAMETER

have probably period distributions such that tasks are less likely to preempt each other. So, the averagenumber of preemptions would be reduced even for rm.

0

0.1

0.2

0.3

0.4

0.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Average

Num

ber

ofPreem

ptions

per

Job


rmedf

lp-edfoff

Figure 7.2: Average number of preemptions per job for rm, edf, lp-edf and off as a function of thetotal processor utilization for a Maximum Preemption Delay Factor equal to 0.2

0

0.02

0.04

0.06

0.08

0.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Totalcrpd/Hyperperiod


rmedf

lp-edfoff

Figure 7.3: Processor utilization due to the total crpd for rm, edf, lp-edf and off as a function ofthe total processor utilization for a Maximum Preemption Delay Factor equal to 0.2

When we measure the total crpd over the hyperperiod for each taskset, we �nd similar results, asdepicted in Figure 7.3. Indeed, preemptions and crpds are closely related. However, for high processorutilization values (U > 0.8), this total crpd over the hyperperiod decreases even under rm. A likely

201


explanation for this phenomenon is that tasksets, deemed schedulable under off for large values ofU , probably depict small values for the crpd parameter, in particular for tasks with large periodswhich are more likely to be preempted. As a result, the total crpd is decreased even if the number ofpreemptions still increases.

III.3.b. Impact of the Maximum Preemption Delay Factor

As depicted in Figure 7.4, as soon as preemption delays increase (in proportion of the wcet), thenrm, edf and lp-edf schedulability decreases. off schedulability also decreases but much slowly thanthe schedulability of the other three policies.To measure the average number of preemptions per job and the total crpd over the hyperperiod, we setthe total processor utilization to U = 0.8 and then vary the Maximum Preemption Delay Factor. Theaverage number of preemptions per job for rm increases with the Maximum Preemption Delay Factor.Indeed, as preemption delays are increased on average, tasks might have larger execution times, and soare more likely to be preempted by higher priority tasks. The average number of preemptions per jobfor edf and lp-edf does not change when increasing the preemption delays which is quite logical, asscheduling decisions do not depend on any preemption delay-parameter and, under those two policies,larger wcets can result in less preemptions for the overall system [But05]. But as preemption delaysincrease, the total overhead over the hyperperiod for each taskset increases and can represent morethan 20% of the processor load under rm, more than 12% under edf and nearly 10% under lp-edf(for a Maximum Preemption Delay Factor ≥ 0.4) as depicted in Figure 7.6. As for off, it produces avery low number of preemptions and chooses preemptions that will incur the smaller delays in orderto minimize the overall overhead. So, the total preemption delay remains very low (about 1% of theprocessor load) and almost constant in proportion of the hyperperiod.

0

0.2

0.4

0.6

0.8

1

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

WeightedSchedulability

Maximum Preemption Delay Factor

rmlp-edf=edf

off

Figure 7.4: Weighted schedulability for rm, edf, lp-edf and off as a function of the MaximumPreemption Delay Factor

202

IV. EXPERIMENTS BASED ON CACHE PARAMETERS

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Average

Num

ber

ofPreem

ptions

per

Job


rmedf

lp-edfoff

Figure 7.5: Average number of preemptions per job for rm, edf, lp-edf and off as a function of theMaximum Preemption Delay Factor for U = 0.8

0

0.04

0.08

0.12

0.16

0.2

0.24

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Totalcrpd/Hyperperiod


rmedf

lp-edfoff

Figure 7.6: Processor utilization due to the total crpd for edf, lp-edf and off as a function of theMaximum Preemption Delay Factor for U = 0.8

IV. Experiments based on cache parameters

For this second set of experiments, we do not consider an explicit crpd parameter in the task modelanymore. We proceed as in [ADM12, LAMD13]. In addition to its wcet (which does not account forany crpd), its period and its relative deadline, each task is characterized by its sets of Evicting CacheBlocks and Useful Cache Blocks. These sets of cache blocks are generated based on cache parameters

203


which are detailed hereafter.

IV.1. Additional experimental settings

To generate the ecb and ucb sets, additional input parameters are needed to represent the cache andits use by the di�erent tasks of the taskset.

Input parameters. We assume a direct-mapped cache modeled by its size in number of cache setsCS (which is equal to the number of cache lines as we deal only with a direct-mapped cache) and aBlock Reload Time (i.e. the time needed to load a memory block from the main memory into the cache)brt. As a cache is explicitly considered here, we deal, additionally to the total processor utilization,with the total cache utilization CU for the taskset which corresponds to the proportion of the cacheused by all tasks of the taskset put together. In particular, CU > 1 means that all task memoryblocks cannot �t together into the cache. Moreover, to generate the ecb and ucb sets, an additionalparameter called the reutilization factor and denoted RF is used. It corresponds to the maximumproportion of memory blocks for a task that might be useful and so which will incur a brt if evictedfrom the cache.The following default values are considered for the input parameters:

� a taskset cache utilization CU equal to 4,

� a cache size CS equal to 256,

� a Block Reload Time brt equal to 0.008ms (we assume the ms to be the basic measure unit forour experiments),

� a maximum proportion of ucbs per task, called the reutilization factor RF, equal to 30%,

� an approach to bound the crpd for rm and edf schedulability analyses corresponding by defaultto the ucb-only approach.

ecb and ucb generation. An ecb set ecbi and a ucb set ucbi have to be generated for every taskτi. To do so, in [ADM12], a task cache utilization is generated for every task as a fraction of the totalcache utilization CU for the taskset using the UUnifast algorithm. As a consequence, this task cacheutilization is a �oat. Then it is used to compute the number of ecbs for the task. As the number ofecbs has to be an integer, rounded computation is needed. So, the real cache utilization for the tasksetmight be slightly di�erent from the one given as an input. To avoid such an issue, we prefer here togenerate directly the number of ecbs for each task as an integer value with a total sum of CU× CS,using the UUnifast algorithm and a rounding technique similar to the one proposed in [LAD12]. Thecache set index of the �rst ecb of each task Si is randomly generated as a uniformly-distributed integerin [0,CS− 1]. All task ecbs are placed in a continuous group starting at Si.Finally, the number of ucbs per task is randomly generated in [0,RF · |ecbi|] according to a uniformdistribution as in [ADM12]. All task ucbs are also placed in a continuous group starting at cacheindex Si.

204


cache set index: 0 1 2 3 4 5 6 7

τ1 S1

τ2 S2

τ3 S3

τ4 S4

Figure 7.7: Example of ecb and ucb placement in the cache for 4 tasks (τi(Si,|ecbi|,|ucbi|)), τ1(4,7,2),τ2(3,10,3), τ3(7,2,0), τ4(2,3,2), and a direct-mapped cache of size 8 cache lines. The memory blocks foreach task are depicted as if the task has sole access to the cache. ecbs are represented by light-greyboxes and ucbs by dark-grey ones.

Example 7.1: Consider four di�erent tasks: τ1, τ2, τ3 and τ4 and a direct-mapped cachewith 8 cache lines. We assume that τ1 has 7 ecbs and 2 ucbs, τ2 has 10 ecbs and 3 ucbs,τ3 has 2 ecbs and no ucb and τ4 has 3 ecbs and 2 ucbs. Moreover, we assume that τ1's�rst ecb (and thus τ1's �rst ucb) is mapped to cache set 4, τ2's �rst ecb to cache set 3, τ3's�rst ecb to cache set 7 and �nally τ4's �rst ecb to cache set 2.As ecbs and ucbs are placed in continous groups, τ2's second ecb is mapped to cache set 4(S1 +1 mod 8), and τ2's third ecb to cache set 5 (S1 +2 mod 8) as depicted in Figure 7.7. Asthe cache has only 8 cache lines, τ2's fourth ecb is mapped to the �rst cache line (S1+4 mod 8)and so on.Note that some ecbs and ucbs can be mapped to the same cache sets. For example, τ2's�rst and ninth ecbs are both mapped to cache set 3 as S1 mod 8 = S1 + 8 mod 8.

IV.2. Additional metric

We present now how schedulability is analyzed for these experiments and we also introduce an addi-tional metric using speedup factors.

IV.2.a. crpd bounds for the schedulability analyses

To upper-bound the crpd, we will consider hereafter mainly the ucb-only approach to bound thecrpd. This approach will be recalled hereafter for rm and edf scheduling policies. But we will alsoperform some comparison between this approach and more enhanced crpd upper-bounds computedusing the ucb-union and ecb-union approaches (see Chapter 3, Section II).

205


Note that, as in [ADM12], crpd upper-bounds are not dependent on worst-case execution times Ci(as ecbi and ucbi are independent of Ci). As a consequence, some tasks can have a crpd larger thantheir worst-case execution time.

rm. Under rm scheduling, the ucb-only approach to compute an upper-bound on the crpd γi,jcorresponds to:

γi,j = brt · max∀k∈hep(i)∩lp(j)

{|ucbk|}

with hep(i) being the set of tasks with priorities higher or equal to the priority of τi and lp(j) the setof tasks with lower priority than the priority of τj .

edf. For edf, the ucb-only approach to compute an upper-bound on the crpd γDmax,i is given bythe following equation:

γDmax,i = brt · max∀τk, Dmax≥Dk>Di

{|ucbk|}

Dmax being the largest relative deadline in the taskset.

off. Finally, to construct the milp for off, a crpd parameter si is computed for each task as:

si = brt · |ucbi|

IV.2.b. Measuring the resource augmentation cost

For these experiments, we also compute speedup factors. This metric is used to quantify the resourcecost introduced when considering crpds in scheduling.In the real-time scheduling literature, the processor speedup factor is used, for example, to quantifythe cost paid by using e�cient polynomial time algorithm for computing worst-case response timeupper-bounds rather than su�cient schedulability conditions [NRG15]. The processor speedup factorsuf corresponds to the factor by which the processor speed needs to be increased in order for any tasksetdeemed schedulable with an exact schedulability test on a processor of speed 1 to be guaranteed to beschedulable according to the su�cient schedulability condition. suf is such that the taskset, schedulablewith an exact schedulability test but not with a su�cient one, is declared schedulable with the su�cientschedulability condition considering: ∀τi ∈ T, C ′i = Ci

suf. For our experiments, the processor speedup

factor cannot be used. Indeed, suf has only an impact on the task wcets and not on the crpds. Soa taskset may remain unschedulable even for arbitrary large values of suf.As a consequence, we consider here an other speedup factor, which we call complete speedup factor anddenote scf. This complete speedup factor deals with the processor speed but also the memory accesstime. For a given taskset deemed schedulable when no crpd is considered (e.g. U ≤ 1 for periodictasks with implicit deadlines), the speedup factor scf corresponds to the factor by which the processorspeed needs to be increased and the memory access time needs to be decreased for the taskset to beschedulable when crpds are considered. scf is such that a taskset, deemed schedulable when no crpdis considered, is schedulable with ∀τi : C ′i = Ci

scf∧ brt′ = brt

scfwhen crpds are accounted for.

We compute the average speedup factors among all tasksets per experiment point for di�erent inputparameters.

206


IV.3. Results

We compare the schedulability of rm, edf and off when varying di�erent key parameters: the totalprocessor utilization of the taskset, the total cache utilization of the taskset, the reutilization factor ofthe taskset, the cache size and the Block Reload Time. We also consider the impact of the approachto bound the crpd on the system schedulability.

IV.3.a. Impact of the Processor Utilization.

We �rst consider how the total processor utilization of the taskset impacts the system schedulability.All input parameters, except for the processor utilization U , are set to their default values, i.e. CU = 4,RF = 30%, CS = 256 and brt = 0.008. U is varied from 0.1 to 1.0 onwards by step of 0.1.

0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber



rmedfoff

Figure 7.8: Number of schedulable tasksets under rm, edf and off as a function of the total processorutilization U .

As depicted in Figure 7.8, when the total processor utilization increases, then rm, edf and off

experience a decrease in schedulability. For small values of U (U = 0.1), about 80% of the tasksets arefound schedulable by the schedulability analysis/test for rm and edf whereas off achieves a 100%success. Some tasksets are deemed unschedulable by the schedulability analysis/test for rm and edfeven for small values of U because of potential large crpds: as the crpd is independent from Ufor our experiments, some tasksets may experience crpds larger than wcets. For some tasks, thesecrpds may not �t in the task execution window (time interval between the task release time and itsdeadline) causing deadline miss. For off, no such problem occurs as it is more likely to constructa schedule without any preemption. For large values of U (U = 0.8), edf can only schedule 243tasksets out of 1000, whereas rm drops under 100 schedulable tasksets. As shown in [But05], thenumber of preemptions for rm increases with the processor utilization, causing a higher overall crpd.For edf, the number of preemptions does not necessarily increase. But as less processor utilization isleft free for executing preemption delays, only a few preemptions may result in a deadline miss. On

207


the contrary, off successfully schedules more than 95% of the tasksets for U = 0.8. This is becausethe milp focuses on reducing the cumulated preemption delay. As a result, the average number ofpreemptions per job and the average proportion of the processor time occupied by crpds remain quitelow (under 0.06 preemptions per job and 1.2% of the processor time). For U = 1, the only tasksetsdeemed schedulable are either those which can be scheduled non-preemptively or those for which aschedule can be constructed with zero-cost preemptions, i.e. tasks with 0 ucb.

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

SpeedupFactor


rmedfoff

Figure 7.9: Processor and Memory Speedup Factor for rm, edf and off as a function of the totalprocessor utilization U .

In Figure 7.9, we depict the factor by which the CPU speed should be increased and the memory accesstime be decreased in order to make an unschedulable taskset become schedulable. For large values ofU (U = 0.8), this speedup factor can increase up to 1.45 in average for rm (i.e. a 45% faster CPUand memory bus) and 1.33 in average for edf (i.e. a 33% faster CPU and memory bus) whereas itremains nearly equal to 1 for off. Note that for off, the maximum speedup factor for any tasksetnever exceeds 1.83 (for U = 1.0) whereas it can exceed 5 for edf and rm (for U ≥ 0.8).

IV.3.b. Impact of the Cache Utilization and the Cache Reuse

We now focus on the impact of task cache-related parameters on the system schedulability. Inputparameters CS and brt are set to their default values CS = 256 and brt = 0.008.

Impact of the Cache Utilization. First, we consider the impact of the cache utilization. RF isset to 0.3. CU is varied from 0 to 8 onwards by step of 1 and for each value of CU, we generate 1000tasksets per processor utilization value in Q = {u|u = k · 0.1, k ∈ J1, 10K}.As shown in Figure 7.10, when increasing the total cache utilization, the weighted schedulability curvesof rm, edf and off decrease as the crpds increase. For large crpds, it might be impossible evenfor off to construct a feasible schedule when U increases. However, for large values of U , off still

208


0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8


Total Cache Utilization

rmedfoff

(a) Weighted Schedulability as a function of CU.

0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber



rm, CU = 0edf, CU = 0off, CU = 0rm, CU = 4edf, CU = 4off, CU = 4rm, CU = 8edf, CU = 8off, CU = 8

(b) Schedulability as a function of U for di�erent values of CU.

Figure 7.10: Evaluation of the impact of the Cache Utilization.

behaves quite well even for large values of CU: for CU = 8, the number of tasksets deemed schedulableby off is more important than the number of tasksets deemed schedulable by rm or even edf forsmaller cache utilization values. Note that increasing the number of cache sets CS gives similar results.Indeed, ecbs are generated based on the total cache occupancy CU×CS. Increasing either CU or CSresults in a higher number of ecbs (and so of ucbs) in average.

Impact of the Cache Reuse. Then, we study the impact of the cache reuze. CU is set to 4. RFis varied from 0.0 to 0.5 onwards by step of 0.05 and for each value of RF, we generate 1000 tasksets

209


per processor utilization value in Q = {u|u = k · 0.1, k ∈ J1, 10K}.

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5


Reutilization Factor

rmedfoff

(a) Weighted Schedulability as a function of RF.

0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber



rm, RF = 0.0edf, RF = 0.0off, RF = 0.0rm, RF = 0.3edf, RF = 0.3off, RF = 0.3rm, RF = 1.0edf, RF = 1.0off, RF = 1.0

(b) Schedulability as a function of U for di�erent values of RF.

Figure 7.11: Evaluation of the impact of the Cache Reuse.

As depicted in Figure 7.11, varying the reutilization factor gives quite similar results to those obtainedwhen varying the total cache utilization. As for the total cache utilization, increasing RF might resultin larger crpds and so potential deadline misses. Indeed, increasing CU results in a higher numberof ecbs in average and so the number of ucbs also increases. Increasing RF also results in a highernumber of ucbs as a more important ratio of ecbs can be ucbs.

210


IV.3.c. Impact of the Cache Size and the Block Reload Time

We now focus on the impact of cache parameters on the system schedulability. Input parameters CUand RF are set to their default values CU = 4 and RF = 0.3.

Impact of the Cache Size. First, we consider the impact of the cache size. brt is set to 0.008. CSis varied from 26 = 64 cache sets to 210 = 1024 cache sets onwards by step of power of 2. For each valueof CS, we generate 1000 tasksets per processor utilization value in Q = {u|u = k · 0.1, k ∈ J1, 10K}.

0

0.2

0.4

0.6

0.8

1

64 128 256 512 1024


Number of Cache Sets

rmedfoff

(a) Weighted Schedulability as a function of CS.

0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber



rm, CS = 32edf, CS = 32off, CS = 32rm, CS = 256edf, CS = 256off, CS = 256rm, CS = 2048edf, CS = 2048off, CS = 2048

(b) Schedulability as a function of U for di�erent values of CS.

Figure 7.12: Evaluation of the impact of the Cache Size.

211


As shown in Figure 7.12, when increasing the cache size, the weighted schedulability curves of rm, edfand off decrease. Actually, the impact of the number of cache lines CS is quite similar to the impactof the total cache utilization CU as ecbs are generated based on the total cache occupancy CU×CS.So, increasing either CU or CS results in a higher number of ecbs (and so of ucbs) in average and asa consequence crpds are increased. Once more, for large crpds, it might be impossible even for offto construct a feasible schedule when U increases. But, for large values of U , off still behaves quitewell even for large values of CS: for CS = 2048, the number of tasksets deemed schedulable by off ismore important than the number of tasksets deemed schedulable by rm or even edf for smaller cachesize values.

Impact of the Block Reload Time. Then, we focus on the block reload time. CS is set to 256and brt is varied from 21 = 2µs to 24 = 32µs onwards by step of power of 2. For each value of brt,we generate 1000 tasksets per processor utilization value in Q = {u|u = k · 0.1, k ∈ J1, 10K}.As depicted in Figure 7.13, when increasing the Block Reload Time brt, the schedulability of rm,edf and off decreases. As the total cache utization and the cache size remain constant (CU = 4 andCS = 256), a higher value of brt means larger values for the crpd parameters. In particular, rm andedf experience a huge loss of schedulability, in particular for small values of U . off behaves quitewell even for large values of brt: for brt = 0.032, there are more tasksets deemed schedulable by offthan there are for rm or even edf for brt = 0.008.

IV.3.d. Impact of the approach used to bound the crpd

Finally, we consider the in�uence of the approach used to bound the crpd on the system schedulability.Input parameters CU, CS, RF and brt are set to their default values CU = 4, CS = 256, RF = 0.3and brt = 0.008.We vary U from 0.1 to 1.0 onwards by step of 0.1 and for each value of U , we generate 1000 tasksets.For each taskset, we consider the ucb- and ecb-union approaches for both rm and edf alongside theucb-only approach.As depicted in Figure 7.14, using enhanced crpd bound approaches results in a higher number ofschedulable tasksets. But, this number remains low in comparison with the number of tasksets deemedschedulable by the o�ine approach off. Actually, only 9 tasksets (resp. 3) out of all generated tasksets(i.e., 10 000 tasksets) are found schedulable by edf (resp. dm) using the ecb-union approach and notby off (the results are actually similar when using the ucb-union one). The o�ine approach is nomore optimal but still allows to achieve a better schedulability ratio.

V. Discussion on the experimental plans

For our �rst set of experiments (see Section III), we use preemption-based metrics. These metrics arebased on simulation using worst-case executing times and worst-case preemption delays. Those worst-case scenarios are useful to conduct real-time scheduling analyses, but they rarely occur in real-timelife. So preemption-based metrics do not seem to be really relevant for other purposes than comparingscheduling algorithms. It would be more adequate to measure the number of preemptions for a taskset

212

V. DISCUSSION ON THE EXPERIMENTAL PLANS

0

0.2

0.4

0.6

0.8

1

2 4 8 16 32


Block Reload Time (×103)

rmedfoff

(a) Weighted Schedulability as a function of brt.

0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber



rm, BRT = 0.000edf, BRT = 0.000off, BRT = 0.000rm, BRT = 0.008edf, BRT = 0.008off, BRT = 0.008rm, BRT = 0.032edf, BRT = 0.032off, BRT = 0.032

(b) Schedulability as a function of U for di�erent values of brt.

Figure 7.13: Evaluation of the impact of the Block Reload Time.

or the total overhead produced over the hyperperiod on real-life scenarios (i.e. tasks with averageexecution times instead of wcets).A second issue is linked with the crpd bound generation. In Section III, crpds are generated as aproportion of thewcet. No cache is considered which is a severe drawback as the impact of parameterssuch as the cache size, the brt, etc., can not be studied. On the contrary, in Section IV, crpds arecomputed using ecb and ucb sets which are generated based mainly on cache parameters (total cacheutilization). crpds are independent from the task wcet and so, in some cases, these crpds maybe larger than the task wcets. This can not be true in real-life. Another problem for this second

213


0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber



offrm, ucb-onlyrm, ucb-unionrm, ecb-union

(a) Comparison of rm and off.

0

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber



offedf, ucb-onlyedf, ucb-unionedf, ecb-union

(b) Comparison of edf and off.

Figure 7.14: Number of schedulable tasksets under rm, edf and off as a function of the total processorutilization for several crpd bound approaches.

set of experiments is linked with the cache parameter CS. Intuitively, for a given taskset, increasingCS (i.e. the cache size) results in a lesser cache utilization by the tasks and the schedulability mightbe improved on average. But, for the generation plan, the cache utilization and the cache size aredecoupled. So, increasing CS results in increasing the number of ecbs and ucbs for the tasks whichin turn might threaten the system schedulability.As a consequence, it would be interesting to develop an experimental plan to complete the presentedresults. The generation of ecbs and ucbs should be linked with both the task code and the cache size.

214

VI. CONCLUSION

One solution would be to generate cfgs based on the task wcets. Then, using these cfgs and somereutilization factor, ecb and ucb sets could be derived. So crpds and wcets could be loosely linked.It must be noticed that implementing such a task generator will require an important programminge�ort and for that reason is left for future work.

VI. Conclusion

In this chapter, we evaluated our o�ine approach (introduced in Chapter 6, Section IV) against RateMonotonic and Earliest Deadline First using synthetic generated tasksets.In the cases where the o�ine approach is optimal (�rst set of experiments and second one when theucb-only approach is considered for both rm and edf), we were able to quantify the schedulabilityloss of rm and edf as our o�ine approach could be used to evaluate the feasibility of every tasksetwhen Cache-Related Preemption Delays are considered.Through the experiments conducted in this chapter, we also showed that our o�ine approach clearlyoutperforms rm and edf even when enhanced crpd bounds (using the ucb-union approach or theecb-union approach) are considered. Indeed, for large values of the processor utilization, the totalcache utilization, the reutilization factor or the Block Reload Time, both rm and edf experience ahuge schedulability loss whereas the o�ine approach still achieves a high schedulability ratio.

215


216

Part III

Conclusion and Perspectives

217

General Conclusion

In this PhD work, we studied the problem of scheduling hard real-time tasks subjected to Cache-RelatedPreemption Delays. In particular, we focused on:

1. the theoretical problem of scheduling hard real-time tasks on a uniprocessor system with a cachememory,

2. the impact of Cache-Related Preemption Delays on classic online scheduling policies,

3. �nding optimal scheduling approaches to solve the problem of scheduling hard real-time taskssubjected to Cache-Related Preemption Delays.

After introducing in Chapter 1 some basic notions about real-time embedded systems and cachememories, we focused in Chapter 2 on real-time scheduling and described more precisely what Cache-Related Preemption Delays are. Once those necessary concepts had been presented, we gave in Chap-ter 3 a brief classi�cation of the main research works from the literature dealing with cache memoriesand real-time scheduling. In particular, we saw that no solution clearly outperforms the other ones.Very often, combination between di�erent methods are required to allow signi�cant improvements.Moreover, we found that no optimal scheduling solution exists when cache e�ects are accounted for.In Chapter 4, we focused on formalizing the problem of scheduling hard real-time tasks subjected tocache e�ects. We identi�ed two distinct scheduling problems with cache memories: the Cache-awareScheduling Problem and the crpd-aware Scheduling Problem. Under the Cache-aware model, thescheduler has a complete knowledge of the cache state at each instant and of the memory requirementsof each task in the system. On the contrary, under the crpd-aware model, the scheduler takesits scheduling decisions based only on upper-bounds on the extrinsic cache interference (i.e. Cache-Related Preemption Delays). Both problems were studied theoretically, using simpli�ed core problems,and were proved to be NP-hard in the strong sense. So, we showed that taking explicitly into accountcache memories radically changes the scheduling problem and that well-known uniprocessor scheduling-theoretic results cannot be generalized straightforward. We decided to focus afterwards on the crpd-aware Scheduling Problem as the Cache-aware scheduling problem is nearly impossible to use in pratice.

219

General Conclusion

We brie�y discussed the crpd parameter model for the crpd-aware Scheduling Problem to try to �nda compromise between precision and workability.Then, in Chapter 5, we studied the impact of the cache on online scheduling. We �rst showed thatRate Monotonic, Deadline Monotonic and Earliest Deadline First are not sustainable when Cache-Related Preemption Delays are accounted for. In a second time, we considered the problem of optimallyscheduling online tasks subjected to crpds. We proved that optimality cannot be achieved withoutclairvoyance. As a consequence, optimal online scheduling with crpds is impossible.As optimal online scheduling is impossible, we proposed in Chapter 6 two o�ine approaches tosolve the problem of scheduling hard real-time tasks subjected to crpds. Both solutions rely onMixed-Integer Linear Programming to compute a feasible o�ine schedule. We showed that our secondsolution is optimal when a simple crpd parameter model is considered in the task model. We alsoevaluated the solving time of our solution.Finally, in Chapter 7, we evaluated the impact of Cache-Related Preemption Delays on the systemschedulability under Rate Monotonic, Earliest Deadline First and our optimal o�ine scheduling ap-proach. We used synthetically generated tasks and considered two di�erent ways for generating theCache-Related Preemption Delay parameter. The loss of schedulability for Rate Monotonic, EarliestDeadline First and the o�ine approach was studied when varying various parameters such as the totalprocessor utilization, the total cache utilization, the Block Reload Time... We saw that those di�erentparameters have a hight impact on the system schedulability.

To summarize, the main contributions of this PhD work are:

� two scheduling problems with cache memories have been identi�ed and formalized: the Cache-aware Scheduling Problem and the crpd-aware Scheduling Problem,

� the Cache-aware Scheduling Problem and the crpd-aware Scheduling Problem are NP-hard inthe strong sense,

� Rate Monotonic, Deadline Monotonic and Earliest Deadline First are not sustainable when sub-jected to crpds,

� optimal online scheduling with crpds is impossible,

� an optimal o�ine crpd-aware scheduling approach is proposed.

Perspectives

Future work could improve and/or extend several points addressed in this PhD work.The evaluation part could be extended using enhanced crpd bounds for rm and edf, such as theecb- and ucb-union Multiset approaches proposed in [ADM12, LAMD13]. Other scheduling policies,such as limited preemption ones (introduced in Chapter 3, Section IV.2), could also be compared toour optimal o�ine approach. Moreover, the issue of implementing an experimental plan that wouldlink ecbs and ucbs (and so crpds) with the task code (ans so the wcet) has still to be tackled.Concerning our o�ine approach, optimality could be extended to more accurate crpd parametermodels by considering both the e�ects of the preempted and preempting tasks. But extending the

220

milp proposed in this work with new variables and constraints leads to a high mathematical complexityand would probably be unsuitable in practice. So, a completely di�erent approach might have to beproposed.The question of deriving a feasibility test for tasks subjected to Cache-Related Preemption Delays isalso an interesting and challenging issue. But such a goal might not be easy to achieve.In this PhD work, we proved that optimal online scheduling with crpds is impossible. So, heuristic ap-proaches could be proposed and compared with our optimal o�ine approach. In particular, to developonline scheduling policies dealing with crpds, combined approaches should be considered. Indeed,performances of real-time systems simultaneously depend on wcets, cache memory management (re-placement policy, locking and/or partitioning techniques), the schedulability analysis used to validatethe system, and last but not least, the scheduler that takes scheduling decisions at run-time. As thesetechniques introduce overestimations in order to design predictable systems, improvement in the designof real-time systems can be achieved by tackling these problems simultaneously. For instance, the useof cache partitioning/locking or memory layout enables to decrease the cache extrinsic interferencewhich in turn decreases the cost of a preemption. Then, using a limited preemption policy, such asfpp, reduces the number of preemptions and so the total crpd. Finally, using a cache-aware schedula-bility analysis ensures predictability and avoids wasting hardware resources. So, future work should beconducted on improving these combinations. The main issues are in particular to identify the requiredsystem model (�ne-grained or coarse scheduling model), and to propose solutions to jointly handletask memory accesses (for example replacement policy, locking and/or partitioning techniques), taskscheduling (for example by controlling preemptions) and the schedulability issue.Last but not least, the multiprocessor case is an interesting but very challenging matter. Indeed, thecrpd issue becomes much harder when multiprocessors are considered, as cache levels with privateand shared caches are used. The cache analysis problem becomes far more complex. Moreover, whenconsidering global scheduling, tasks or jobs can migrate from one core to another one, causing additionaldelays called Cache-Related Migration Delays.

221

General Conclusion

222

Appendices

223

Related Publications

Published

� [PRG+15]: Guillaume Phavorin, Pascal Richard, Joël Goossens, Thomas Chapeaux, and ClaireMaiza. Scheduling with Preemption Delays: Anomalies and Issues. In Proceedings of the 23rdInternational Conference on Real Time and Networks Systems, RTNS '15, pages 109�118, NewYork, NY, USA, 2015. ACM

� [PRM15a]: Guillaume Phavorin, Pascal Richard, and Claire Maiza. Complexity of schedulingreal-time tasks subjected to cache-related preemption delays. In Emerging Technologies FactoryAutomation (ETFA), 2015 IEEE 20th Conference on, pages 1�8, Sept 2015

� [PRM15c]: Guillaume Phavorin, Pascal Richard, and Claire Maiza. Static CRPD-Aware Real-Time Scheduling. In Work-in-Progress session of the 27th Euromicro Conference on Real-TimeSystems (ECRTS'2015), July 2015. 4p

Under submission

� Guillaume Phavorin, Pascal Richard, Joël Goossens, Claire Maiza, Laurent George, and ThomasChapeaux. Online and O�ne Scheduling with Cache-Related Preemption Delays. submitted toReal-Time Systems

� Guillaume Phavorin and Pascal Richard. Cache-Related Preemption Delays and Real-TimeScheduling: A Survey for Uniprocessor Systems. submitted to Journal Européen des SystèmesAutomatisés

Research reports

� [PR15]: Guillaume Phavorin and Pascal Richard. Cache-Related Preemption Delays and Real-Time Scheduling: A Survey for Uniprocessor Systems. Research Report no. 3, LIAS, Universitéde Poitiers, 2015. 18p

225

APPENDIX A. RELATED PUBLICATIONS

� [PRM15b]: Guillaume Phavorin, Pascal Richard, and Claire Maiza. Complexity of schedulingreal-time tasks subjected to cache-related preemption delays. Research Report no. 2, LIAS,Université de Poitiers, 2015

226

Bibliography

[AB11] Sebastian Altmeyer and Claire Maiza Burguière. Cache-related preemption delayvia useful cache blocks: Survey and rede�nition. Journal of Systems Architecture,57(7):707 � 719, 2011. Special Issue on Worst-Case Execution-Time Analysis.

[ABW09] S. Altmeyer, C. Burguière, and R. Wilhelm. Computing the maximum blocking timefor scheduling with deferred preemption. In Future Dependable Distributed Systems,2009 Software Technologies for, pages 200�204, March 2009.

[ADLD14] S. Altmeyer, R. Douma, W. Lunniss, and R.I. Davis. OUTSTANDING PAPER:Evaluation of Cache Partitioning for Hard Real-Time Systems. In Real-Time Systems(ECRTS), 2014 26th Euromicro Conference on, pages 15�26, July 2014.

[Adm01] Federal Aviation Administration. Commercial O�-The-Shelf (COTS) Avionics Soft-ware Study, 2001.

[Adm04] Federal Aviation Administration. Commercial O�-The-Shelf Real-Time OperatingSystem and Architectural Considerations, 2004.

[ADM11a] S. Altmeyer, R.I. Davis, and C. Maiza. Cache related pre-emption delay aware re-sponse time analysis for �xed priority pre-emptive systems. In Real-Time SystemsSymposium (RTSS), 2011 IEEE 32nd, pages 261�271, Nov 2011.

[ADM11b] Sebastian Altmeyer, Robert I. Davis, and Claire Maiza. Pre-emption Cost AwareResponse Time Analysis for Fixed Priority Pre-emptive Systems. Technical ReportYCS-2010-464, University of York, Department of Computer Science, May 2011.

[ADM12] Sebastian Altmeyer, RobertI. Davis, and Claire Maiza. Improved cache related pre-emption delay aware response time analysis for �xed priority pre-emptive systems.Real-Time Systems, 48(5):499�526, 2012.

227

BIBLIOGRAPHY

[AG08] Sebastian Altmeyer and Gernot Gebhard. WCET Analysis for Preemptive Scheduling.In Raimund Kirner, editor, 8th International Workshop on Worst-Case ExecutionTime Analysis (WCET'08), volume 8 of OpenAccess Series in Informatics (OASIcs),Dagstuhl, Germany, 2008. Schloss Dagstuhl�Leibniz-Zentrum fuer Informatik.

[AM11] Sebastian Altmeyer and Claire Maiza. In�uence of the task model on the precisionof scheduling analysis for preemptive systems � status report. In Robert I. Davis andNathan Fisher, editors, Proceedings of the 2nd International Real-Time SchedulingOpen Problems Seminar, pages 1�2, July 2011.

[ANGM14] Y. Allard, G. Nelissen, J. Goossens, and D. Milojevic. A context aware cache controllerto bridge the gap between theory and practice in real-time systems. In Embedded andReal-Time Computing Systems and Applications (RTCSA), 2014 IEEE 20th Interna-tional Conference on, pages 1�10, Aug 2014.

[AP06] A Arnaud and I Puaut. Dynamic instruction cache locking in hard real-time systems.In Proc. of the 14th Int. Conference on Real-Time and Network Systems, pages 179�188, 2006.

[ARM] ARM Information Center: CoreLink system controllers. http://infocenter.arm.

com/help/index.jsp?topic=/com.arm.doc.subset.primecell.system/index.

html.

[Aud91] N. C. Audsley. Optimal Priority Assignment and Feasibility of Static Priority TasksWith Arbitrary Start Times. Technical Report YCS-164, Department of ComputerScience, University of York, 1991.

[Bar05] S. Baruah. The limited-preemption uniprocessor scheduling of sporadic task systems.In Real-Time Systems, 2005. (ECRTS 2005). Proceedings. 17th Euromicro Conferenceon, pages 137�144, July 2005.

[Bar07] Sanjoy Baruah. Techniques for multiprocessor global schedulability analysis. 2013IEEE 34th Real-Time Systems Symposium, 0:119�128, 2007.

[BAVH+14] R.J. Bril, S. Altmeyer, M.M.H.P. Van Heuvel, R.I. Davis, and M. Behnam. IntegratingCache-Related Pre-Emption Delays into Analysis of Fixed Priority Scheduling withPre-Emption Thresholds. In Real-Time Systems Symposium (RTSS), 2014 IEEE,pages 161�172, Dec 2014.

[BB05] Enrico Bini and GiorgioC. Buttazzo. Measuring the Performance of SchedulabilityTests. Real-Time Systems, 30(1-2):129�154, 2005.

[BB06] S. Baruah and A. Burns. Sustainable Scheduling Analysis. In Real-Time SystemsSymposium, 2006. RTSS '06. 27th IEEE International, pages 159�168, Dec 2006.

[BB08] Alan Burns and Sanjoy Baruah. Sustainability in real-time scheduling. Journal ofComputing Science and Engineering, 2(1):74�97, 2008.

228

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.primecell.system/index.html



BIBLIOGRAPHY

[BB10] M. Bertogna and S. Baruah. Limited preemption edf scheduling of sporadic tasksystems. Industrial Informatics, IEEE Transactions on, 6(4):579�591, Nov 2010.

[BBA10] Andrea Bastoni, Björn Brandenburg, and James Anderson. Cache-related preemptionand migration delays: Empirical approximation and impact on schedulability. In6th International Workshop on Operating Systems Platforms for Embedded Real-TimeApplications (OSPERT 2010), pages 33�44, 2010.

[BBM+10] M. Bertogna, G. Buttazzo, M. Marinoni, Gang Yao, F. Esposito, and M. Caccamo.Preemption points placement for sporadic task sets. In Real-Time Systems (ECRTS),2010 22nd Euromicro Conference on, pages 251�260, July 2010.

[BBY13] G.C. Buttazzo, M. Bertogna, and Gang Yao. Limited preemptive scheduling for real-time systems. a survey. Industrial Informatics, IEEE Transactions on, 9(1):3�15, Feb2013.

[BCSM08] B.D. Bui, M. Caccamo, Lui Sha, and J. Martinez. Impact of Cache Partitioning onMulti-tasking Real Time Embedded Systems. In Embedded and Real-Time ComputingSystems and Applications, 2008. RTCSA '08. 14th IEEE International Conference on,pages 101�110, Aug 2008.

[BD78] John Bruno and Peter Downey. Complexity of Task Sequencing with Deadlines, Set-Up Times and Changeover Costs. SIAM Journal on Computing, 7(4):393�404, 1978.

[Bel66] L.A. Belady. A study of replacement algorithms for a virtual-storage computer. IBMSystems Journal, 5(2):78�101, 1966.

[BEY05] Allan Borodin and Ran El-Yaniv. Online computation and competitive analysis. cam-bridge university press, 2005.

[BG04] Sanjoy Baruah and Joël Goossens. Scheduling real-time tasks: Algorithms and com-plexity. Handbook of scheduling: Algorithms, models, and performance analysis, 3,2004.

[BLV07] R.J. Bril, J.J. Lukkien, and W.F.J. Verhaegh. Worst-case response time analysis ofreal-time tasks under �xed-priority scheduling with deferred preemption revisited. InReal-Time Systems, 2007. ECRTS '07. 19th Euromicro Conference on, pages 269�279,July 2007.

[BLV09] Worst-case response time analysis of real-time tasks under �xed-priority schedulingwith deferred preemption. Real-Time Systems, 42(1-3), 2009.

[BMGGW00] José V. Busquets-Mataix, Daniel Gil, Pedro Gil, and Andy Wellings. Techniquesto increase the schedulable utilization of cache-based preemptive real-time systems.Journal of Systems Architecture, 46(4):357 � 378, 2000.

[BMR90] S.K. Baruah, A.K. Mok, and L.E. Rosier. Preemptively scheduling hard-real-time spo-radic tasks on one processor. In Real-Time Systems Symposium, 1990. Proceedings.,11th, pages 182�190, Dec 1990.

229

BIBLIOGRAPHY

[BMSMOC+96] J.V. Busquets-Mataix, J.J. Serrano-Martin, R. Ors-Carot, P. Gil, and A. Wellings.Adding instruction cache e�ect to an exact schedulability analysis of preemptive real-time systems. In Real-Time Systems, 1996., Proceedings of the Eighth EuromicroWorkshop on, pages 271�276, Jun 1996.

[BMSO+96] J.V. Busquets-Mataix, J.J. Serrano, R. Ors, P. Gil, and A. Wellings. Adding in-struction cache e�ect to schedulability analysis of preemptive real-time systems. InReal-Time Technology and Applications Symposium, 1996. Proceedings., 1996 IEEE,pages 204�212, Jun 1996.

[BN94] Swagato Basumallick and Kelvin Nilsen. Cache issues in real-time systems. In ACMSIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Sys-tems, volume 5, 1994.

[BR07] Claire Burguière and Christine Rochange. On the complexity of modeling dynamicbranch predictors when computing worst-case execution time. In Proceedings of theERCIM/DECOS Workshop On Dependable Embedded Systems, August 2007.

[BRA09] Claire Burguière, Jan Reineke, and Sebastian Altmeyer. Cache-Related PreemptionDelay Computation for Set-Associative Caches - Pitfalls and Solutions. In NiklasHolsti, editor, 9th International Workshop on Worst-Case Execution Time Analysis(WCET'09), volume 10 of OpenAccess Series in Informatics (OASIcs), pages 1�11,Dagstuhl, Germany, 2009. Schloss Dagstuhl�Leibniz-Zentrum fuer Informatik.

[BRH90] Sanjoy K Baruah, Louis E Rosier, and Rodney R Howell. Algorithms and complexityconcerning the preemptive scheduling of periodic, real-time tasks on one processor.Real-time systems, 2(4):301�324, 1990.

[But05] Giorgio C. Buttazzo. Rate Monotonic vs. EDF: Judgment Day. Real-Time Systems,29(1):5�26, 2005.

[But11] Giorgio C. Buttazzo. HARD REAL-TIME COMPUTING SYSTEMS: PredictableScheduling Algorithms and Applications, volume 24 of Real-Time System Series.Springer US, 3 edition, 2011.

[BvdHKL12] R.J. Bril, M.M.H.P. van den Heuvel, U. Keskin, and J.J. Lukkien. Generalized �xed-priority scheduling with limited preemptions. In Real-Time Systems (ECRTS), 201224th Euromicro Conference on, pages 209�220, July 2012.

[BW01] Alan Burns and Andrew J Wellings. Real-time systems and programming languages:Ada 95, real-time Java, and real-time POSIX. Pearson Education, 2001.

[BXM+11] M. Bertogna, O. Xhani, M. Marinoni, F. Esposito, and G. Buttazzo. Optimal Selec-tion of Preemption Points to Minimize Preemption Overhead. In Real-Time Systems(ECRTS), 2011 23rd Euromicro Conference on, pages 217�227, July 2011.

230

BIBLIOGRAPHY

[CA08] J.M. Calandrino and J.H. Anderson. Cache-Aware Real-Time Scheduling on MulticorePlatforms: Heuristics and a Case Study. In Real-Time Systems, 2008. ECRTS '08.Euromicro Conference on, pages 299�308, July 2008.

[CFG+10] Christoph Cullmann, Christian Ferdinand, Gernot Gebhard, Daniel Grund, ClaireMaiza, Jan Reineke, Benoit Triquet, and Reinhard Wilhelm. Predictability consid-erations in the design of multi-core embedded systems. In Proceedings of EmbeddedReal Time Software and Systems, pages 36�42, 2010.

[CGG+14] Francis Cottet, Emmanuel Grolleau, Sébastien Gérard, Jérôme Hugues, YassineOuhamou, and Sarah Tucci. Systèmes temps réel embarqués: spéci�cation, conception,implémentation et validation temporelle. Dunod, 2014.

[Cha14] Thomas Chapeaux. Integrating preemption costs in the real-time uniprocessor schedu-lability analysis, 2014.

[CIBM01] Marti Campoy, A Perles Ivars, and JV Busquets-Mataix. Static use of locking cachesin multitask preemptive real-time systems. In Proceedings of IEEE/IEE Real-TimeEmbedded Systems Workshop (Satellite of the IEEE Real-Time Systems Symposium),pages 1�6, 2001.

[CP000] Worst case execution time analysis for a processor with branch prediction. Real-TimeSystems, 18(2-3), 2000.

[CPRBM03] A.M. Campoy, A. Perles, F. Rodriguez, and J.V. Busquets-Mataix. Static use oflocking caches vs. dynamic use of locking caches for real-time systems. In Electri-cal and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on,volume 2, pages 1283�1286 vol.2, May 2003.

[CTF15] J. Cavicchio, C. Tessler, and N. Fisher. Minimizing Cache Overhead via LoadedCache Blocks and Preemption Placement. In Real-Time Systems (ECRTS), 201527th Euromicro Conference on, pages 163�173, July 2015.

[DB11] Robert I. Davis and Alan Burns. A survey of hard real-time scheduling for multipro-cessor systems. ACM Comput. Surv., 43(4):35:1�35:44, October 2011.

[DF04] R. Dobrin and G. Fohler. Reducing the number of preemptions in �xed priorityscheduling. In Real-Time Systems, 2004. ECRTS 2004. Proceedings. 16th EuromicroConference on, pages 144�152, June 2004.

[Dij65] Edsger Wybe Dijkstra. Cooperating sequential processes, technical report ewd-123.Technical report, 1965.

[DLM13] Huping Ding, Yun Liang, and Tulika Mitra. Integrated instruction cache analysis andlocking in multitasking real-time systems. In Proceedings of the 50th Annual DesignAutomation Conference, DAC '13, pages 147:1�147:10, 2013.

231

BIBLIOGRAPHY

[DLM14] Huping Ding, Yun Liang, and Tulika Mitra. WCET-centric Dynamic InstructionCache Locking. In Proceedings of the Conference on Design, Automation & Test inEurope, DATE '14, pages 27:1�27:6, 3001 Leuven, Belgium, Belgium, 2014. EuropeanDesign and Automation Association.

[ESLS06] Arvind Easwaran, Insik Shin, Insup Lee, and Oleg Sokolsky. Bouding Preemptionsunder EDF and RM Schedulers. Technical report, Department of Computer andInformation Science, University of Pennsylvania, 2006.

[EY15a] P. Ekberg and W. Yi. Uniprocessor Feasibility of Sporadic Tasks Remains coNP-Complete under Bounded Utilization. In Real-Time Systems Symposium, 2015 IEEE,pages 87�95, Dec 2015.

[EY15b] P. Ekberg and W. Yi. Uniprocessor feasibility of sporadic tasks with constrained dead-lines is strongly conp-complete. In Real-Time Systems (ECRTS), 2015 27th EuromicroConference on, pages 281�286, July 2015.

[FGB10] Nathan Fisher, Joël Goossens, and Sanjoy Baruah. Optimal online multiprocessorscheduling of sporadic real-time tasks is impossible. Real-Time Systems, 45(1-2):26�71, 2010.

[FK11] Heiko Falk and Helena Kotthaus. Wcet-driven cache-aware code positioning. In Pro-ceedings of the 14th International Conference on Compilers, Architectures and Syn-thesis for Embedded Systems, CASES '11, pages 145�154, 2011.

[FLS11] Dp-fair: a unifying theory for optimal hard real-time multiprocessor scheduling. Real-Time Systems, 47(5), 2011.

[FPT07] Heiko Falk, Sascha Plazar, and Henrik Theiling. Compile-time decided instruc-tion cache locking using worst-case execution paths. In Proceedings of the 5thIEEE/ACM International Conference on Hardware/Software Codesign and SystemSynthesis, CODES+ISSS '07, pages 143�148, 2007.

[FW99] Christian Ferdinand and Reinhard Wilhelm. E�cient and Precise Cache BehaviorPrediction for Real-Time Systems. Real-Time Systems, 17(2-3):131�181, 1999.

[GA07] Gernot Gebhard and Sebastian Altmeyer. Optimal task placement to improve cacheperformance. In Proceedings of the 7th ACM &Amp; IEEE International Conferenceon Embedded Software, EMSOFT '07, pages 259�268, 2007.

[GJ79] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide tothe Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979.

[GR16] Joël Goossens and Pascal Richard. Optimal Scheduling of Periodic Gang Tasks. Leib-niz transactions on embedded systems, 2016.

[Gro07] Barr Group. Embedded Systems Glossary. http://www.barrgroup.com/

Embedded-Systems/Glossary, 2007.

232

http://www.barrgroup.com/Embedded-Systems/Glossary

http://www.barrgroup.com/Embedded-Systems/Glossary

BIBLIOGRAPHY

[GSYY09] Nan Guan, Martin Stigge, Wang Yi, and Ge Yu. Cache-aware scheduling and anal-ysis for multicores. In Proceedings of the Seventh ACM International Conference onEmbedded Software, EMSOFT '09, pages 245�254, 2009.

[HAM+99] C.A. Healy, R.D. Arnold, F. Mueller, D.B. Whalley, and M.G. Harmon. Bound-ing pipeline and instruction cache performance. Computers, IEEE Transactions on,48(1):53�70, Jan 1999.

[Han13] Per Brinch Hansen. The origin of concurrent programming: from semaphores to re-mote procedure calls. Springer Science & Business Media, 2013.

[HBL08] Mike Holenderski, Reinder J. Bril, and Johan J. Lukkien. Using �xed-priority schedul-ing with deferred preemption to exploit �uctuating network bandwidth. In ProceedingsWork in Progress (WiP) Session of the 20th Euromicro Conference on Real-Time Sys-tems (ECRTS'08), pages 40�43, 2008.

[HP90] John L. Hennessy and David A. Patterson. Computer Architecture: A QuantitativeApproach. Morgan Kaufmann, 1990.

[JL95] Tao Jiang and Ming Li. On the Approximation of Shortest Common Supersequencesand Longest Common Subsequences. SIAM Journal on Computing, 24(5):1122�1139,1995.

[JP86] M. Joseph and P. Pandya. Finding response times in a real-time system. 29(5):390�395, 1986.

[JSM91] K. Je�ay, D. F. Stanat, and C. U. Martel. On non-preemptive scheduling of periodand sporadic tasks. In Real-Time Systems Symposium, 1991. Proceedings., Twelfth,pages 129�139, Dec 1991.

[Kar72] Richard M Karp. Reducibility among combinatorial problems. Springer, 1972.

[KI09] S. Kato and Y. Ishikawa. Gang edf scheduling of parallel task systems. In Real-TimeSystems Symposium, 2009, RTSS 2009. 30th IEEE, pages 459�468, Dec 2009.

[Kir89] D.B. Kirk. Smart (strategic memory allocation for real-time) cache design. In RealTime Systems Symposium, 1989., Proceedings., pages 229�237, Dec 1989.

[KS90] D.B. Kirk and J.K. Strosnider. Smart (strategic memory allocation for real-time) cachedesign using the mips r3000. In Real-Time Systems Symposium, 1990. Proceedings.,11th, pages 322�330, Dec 1990.

[KW03] Markus Kowarschik and Christian Weiÿ. An overview of cache optimization techniquesand cache-aware numerical algorithms. In Ulrich Meyer, Peter Sanders, and JopSibeyn, editors, Algorithms for Memory Hierarchies, volume 2625 of Lecture Notes inComputer Science, pages 213�232. Springer Berlin Heidelberg, 2003.

233

BIBLIOGRAPHY

[LAD12] Will Lunniss, Sebastian Altmeyer, and Robert I. Davis. Optimising Task Layout toIncrease Schedulability via Reduced Cache Related Pre-emption Delays. In Proceed-ings of the 20th International Conference on Real-Time and Network Systems, RTNS'12, pages 161�170, New York, NY, USA, 2012. ACM.

[LAD14] Will Lunniss, Sebastian Altmeyer, and Robert Davis. A Comparison between FixedPriority and EDF Scheduling accounting for Cache Related Pre-emption Delays. Leib-niz Transactions on Embedded Systems, 1(1):01�1�01:24, 2014.

[LAMD13] W. Lunniss, S. Altmeyer, C. Maiza, and R.I. Davis. Integrating cache related pre-emption delay analysis into EDF scheduling. In Real-Time and Embedded Technologyand Applications Symposium (RTAS), 2013 IEEE 19th, pages 75�84, April 2013.

[LEJ15] Philippe Louvel, Pierre Ezerzere, and Philippe Jourdes. Systèmes électroniques em-barqués et transports. Dunod, 2015.

[Lev09] David Levinthal. Performance Analysis Guide for Intel®Core�i7 Processor and In-tel®Xeon�5500 processors. Technical report, Intel, 2009.

[LFM08] P. Lokuciejewski, H. Falk, and P. Marwedel. Wcet-driven cache-based procedure po-sitioning optimizations. In Real-Time Systems, 2008. ECRTS '08. Euromicro Confer-ence on, pages 321�330, July 2008.

[LGY+15] Mingsong Lv, Nan Guan, Wang Yi, Jan Reineke, and Reinhard Wilhelm. A surveyon cache analysis for real-time systems, 2015.

[LHH97] J. Liedtke, H. Hartig, and M. Hohmuth. OS-controlled cache predictability for real-time systems. In Real-Time Technology and Applications Symposium, 1997. Proceed-ings., Third IEEE, pages 213�224, Jun 1997.

[LHS+97] Chang-Gun Lee, Joosun Hahn, Yang-Min Seo, Sang Lyul Min, Rhan Ha, SeongsooHong, Chang Yun Park, Minsuk Lee, and Chong Sang Kim. Enhanced analysis ofcache-related preemption delay in �xed-priority preemptive scheduling. In Real-TimeSystems Symposium, 1997. Proceedings., The 18th IEEE, pages 187�198, Dec 1997.

[LHS+98] Chang-Gun Lee, Joosun Hahn, Yang-Min Seo, Sang Lyul Min, Rhan Ha, SeongsooHong, Chang Yun Park, Minsuk Lee, and Chong Sang Kim. Analysis of cache-relatedpreemption delay in �xed-priority preemptive scheduling. Computers, IEEE Transac-tions on, 47(6):700�713, Jun 1998.

[LL73] C. L. Liu and James W. Layland. Scheduling algorithms for multiprogramming in ahard-real-time environment. J. ACM, 20(1):46�61, January 1973.

[LLS07] Insup Lee, Joseph YT Leung, and Sang H Son. Handbook of real-time and embeddedsystems. CRC Press, 2007.

[LLX09] Tiantian Liu, Minming Li, and C.J. Xue. Minimizing WCET for Real-Time EmbeddedSystems via Static Instruction Cache Locking. In Real-Time and Embedded Technologyand Applications Symposium, 2009. RTAS 2009. 15th IEEE, pages 35�44, April 2009.

234

BIBLIOGRAPHY

[LLX12] Tiantian Liu, Minming Li, and ChunJason Xue. Instruction cache locking for multi-task real-time embedded systems. Real-Time Systems, 48(2), 2012.

[LM80] Joseph Y-T Leung and ML Merrill. A note on preemptive scheduling of periodic,real-time tasks. Information processing letters, 11(3):115�118, 1980.

[LM95] Yau-Tsun Steven Li and Sharad Malik. Performance analysis of embedded softwareusing implicit path enumeration. In Proceedings of the ACM SIGPLAN 1995 Workshopon Languages, Compilers, &Amp; Tools for Real-time Systems, LCTES '95, pages 88�98, 1995.

[LS99] T. Lundqvist and P. Stenstrom. Timing anomalies in dynamically scheduled micro-processors. In Real-Time Systems Symposium, 1999. Proceedings. The 20th IEEE,pages 12�21, 1999.

[LS12] Jinkyu Lee and K.G. Shin. Controlling preemption for better schedulability in multi-core systems. In Real-Time Systems Symposium (RTSS), 2012 IEEE 33rd, pages29�38, Dec 2012.

[LS14] Jinkyu Lee and K.G. Shin. Preempt a Job or Not in EDF Scheduling of UniprocessorSystems. Computers, IEEE Transactions on, 63(5):1197�1206, May 2014.

[LW82] Joseph Y.-T. Leung and Jennifer Whitehead. On the complexity of �xed-priorityscheduling of periodic, real-time tasks. Performance Evaluation, 2(4):237 � 250, 1982.

[Mai78] David Maier. The complexity of some problems on subsequences and supersequences.J. ACM, 25(2):322�336, April 1978.

[MB91] Je�rey C. Mogul and Anita Borg. The e�ect of context switches on cache performance.In Proceedings of the Fourth International Conference on Architectural Support forProgramming Languages and Operating Systems, ASPLOS IV, pages 75�84, 1991.

[MBB+15] A. Melani, M. Bertogna, V. Bonifaci, A. Marchetti-Spaccamela, and G. C. Buttazzo.Response-Time Analysis of Conditional DAG Tasks in Multiprocessor Systems. In2015 27th Euromicro Conference on Real-Time Systems, pages 211�221, July 2015.

[MNPP12a] J.M. Marinho, V. Nelis, S.M. Petters, and I. Puaut. An improved preemption delayupper bound for �oating non-preemptive region. In Industrial Embedded Systems(SIES), 2012 7th IEEE International Symposium on, pages 57�66, June 2012.

[MNPP12b] J.M. Marinho, V. Nelis, S.M. Petters, and I. Puaut. Preemption delay analysis for�oating non-preemptive region scheduling. In Design, Automation Test in EuropeConference Exhibition (DATE), 2012, pages 497�502, March 2012.

[Mok83] Aloysius K Mok. Fundamental design problems of distributed systems for the hard-real-time environment. PhD thesis, Massachusetts Institute of Technology, 1983.

[MSH11] John WMcCormick, Frank Singho�, and Jérôme Hugues. Building parallel, embedded,and real-time applications with Ada. Cambridge University Press, 2011.

235

BIBLIOGRAPHY

[Mue95] Frank Mueller. Compiler support for software-based cache partitioning. In Proceedingsof the ACM SIGPLAN 1995 Workshop on Languages, Compilers, &Amp; Tools forReal-time Systems, LCTES '95, pages 125�133, 1995.

[Mue00] Timing analysis for instruction caches. Real-Time Systems, 18(2-3), 2000.

[Nel12] Geo�rey Nelissen. E�cient Optimal Multiprocessor Scheduling Algorithms for Real-Time Systems. PhD thesis, Ecole polytechnique de Bruxelles de l'Université libre deBruxelles (ULB), 2012.

[NMR03] Hemendra Singh Negi, Tulika Mitra, and Abhik Roychoudhury. Accurate estima-tion of cache-related preemption delay. In Proceedings of the 1st IEEE/ACM/I-FIP International Conference on Hardware/Software Codesign and System Synthesis,CODES+ISSS '03, pages 201�206, 2003.

[NRG15] Thi Huyen Chau Nguyen, P. Richard, and E. Grolleau. An FPTAS for Response TimeAnalysis of Fixed Priority Real-Time Tasks with Resource Augmentation. Computers,IEEE Transactions on, 64(7):1805�1818, July 2015.

[PC07] R. Pellizzoni and M. Caccamo. Toward the predictable integration of real-time cotsbased systems. In Real-Time Systems Symposium, 2007. RTSS 2007. 28th IEEEInternational, pages 73�82, Dec 2007.

[Pet00] S.M. Petters. Bounding the execution time of real-time tasks on modern processors.In Real-Time Computing Systems and Applications, 2000. Proceedings. Seventh Inter-national Conference on, pages 498�502, 2000.

[PFB14] Bo Peng, N. Fisher, and M. Bertogna. Explicit Preemption Placement for Real-TimeConditional Code. In Real-Time Systems (ECRTS), 2014 26th Euromicro Conferenceon, pages 177�188, July 2014.

[PLM09] Sascha Plazar, Paul Lokuciejewski, and Peter Marwedel. Wcet-aware software basedcache partitioning for multi-task real-time systems. In Proceedings of the InternationalWorkshop on Worst-Case Execution Time Analysis, pages 78�88, 2009.

[PP07] I. Puaut and C. Pais. Scratchpad memories vs locked caches in hard real-time sys-tems: a quantitative comparison. In Design, Automation Test in Europe ConferenceExhibition, 2007. DATE '07, pages 1�6, April 2007.

[PR15] Guillaume Phavorin and Pascal Richard. Cache-Related Preemption Delays and Real-Time Scheduling: A Survey for Uniprocessor Systems. Research Report no. 3, LIAS,Université de Poitiers, 2015. 18p.

[PRG+15] Guillaume Phavorin, Pascal Richard, Joël Goossens, Thomas Chapeaux, and ClaireMaiza. Scheduling with Preemption Delays: Anomalies and Issues. In Proceedings ofthe 23rd International Conference on Real Time and Networks Systems, RTNS '15,pages 109�118, New York, NY, USA, 2015. ACM.

236

BIBLIOGRAPHY

[PRM15a] Guillaume Phavorin, Pascal Richard, and Claire Maiza. Complexity of scheduling real-time tasks subjected to cache-related preemption delays. In Emerging TechnologiesFactory Automation (ETFA), 2015 IEEE 20th Conference on, pages 1�8, Sept 2015.

[PRM15b] Guillaume Phavorin, Pascal Richard, and Claire Maiza. Complexity of schedulingreal-time tasks subjected to cache-related preemption delays. Research Report no. 2,LIAS, Université de Poitiers, 2015.

[PRM15c] Guillaume Phavorin, Pascal Richard, and Claire Maiza. Static CRPD-Aware Real-Time Scheduling. In Work-in-Progress session of the 27th Euromicro Conference onReal-Time Systems (ECRTS'2015), July 2015. 4p.

[Pua02] Isabelle Puaut. Architecture des processeurs et véri�cation de contraintes de temps-réel strict, 2002.

[Pua06] I. Puaut. Wcet-centric software-controlled instruction caches for hard real-time sys-tems. In Real-Time Systems, 2006. 18th Euromicro Conference on, pages 10 pp.�226,2006.

[RAG+14] J. Reineke, S. Altmeyer, D. Grund, S. Hahn, and C. Maiza. Sel�sh-LRU: Preemption-aware caching for predictability and performance. In Real-Time and Embedded Tech-nology and Applications Symposium (RTAS), 2014 IEEE 20th, pages 135�144, April2014.

[Reg02] J. Regehr. Scheduling tasks with mixed preemption relations for robustness to timingfaults. In Real-Time Systems Symposium, 2002. RTSS 2002. 23rd IEEE, pages 315�326, 2002.

[Rei08] Jan Reineke. Caches in WCET Analysis. PhD thesis, Univeristät des Saarlandes,2008.

[RM06a] H. Ramaprasad and F. Mueller. Bounding preemption delay within data cache refer-ence patterns for real-time tasks. In Real-Time and Embedded Technology and Appli-cations Symposium, 2006. Proceedings of the 12th IEEE, pages 71�80, April 2006.

[RM06b] H. Ramaprasad and F. Mueller. Tightening the bounds on feasible preemption points.In Real-Time Systems Symposium, 2006. RTSS '06. 27th IEEE International, pages212�224, Dec 2006.

[RM08] H. Ramaprasad and F. Mueller. Bounding worst-case response time for tasks withnon-preemptive regions. In Real-Time and Embedded Technology and ApplicationsSymposium, 2008. RTAS '08. IEEE, pages 58�67, April 2008.

[RWT+06] Jan Reineke, Björn Wachter, Stefan Thesing, Reinhard Wilhelm, Ilia Polian, JochenEisinger, and Bernd Becker. A De�nition and Classi�cation of Timing Anoma-lies. In Frank Mueller, editor, 6th International Workshop on Worst-Case ExecutionTime Analysis (WCET'06), volume 4 of OpenAccess Series in Informatics (OASIcs),Dagstuhl, Germany, 2006. Schloss Dagstuhl�Leibniz-Zentrum fuer Informatik.

237

BIBLIOGRAPHY

[Sch00] J. Schneider. Cache and pipeline sensitive �xed priority scheduling for preemptivereal-time systems. In Real-Time Systems Symposium, 2000. Proceedings. The 21stIEEE, pages 195�204, 2000.

[SDG09] G Schirner, R Dömer, and A Gerstlauer, editors. Hardware-dependent Software: Prin-ciples and Practice. Springer, 2009.

[SE07] Jan Staschulat and Rolf Ernst. Scalable precision cache analysis for real-time software.ACM Trans. Embed. Comput. Syst., 6(4), September 2007.

[SF99] Jörn Schneider and Christian Ferdinand. Pipeline behavior prediction for superscalarprocessors by abstract interpretation. In Proceedings of the ACM SIGPLAN 1999Workshop on Languages, Compilers, and Tools for Embedded Systems, LCTES '99,pages 35�44, 1999.

[SP95] J. Simonson and J.H. Patel. Use of preferred preemption points in cache-based real-time systems. In Computer Performance and Dependability Symposium, 1995. Pro-ceedings., International, pages 316�325, Apr 1995.

[SR990] What is predictability for real-time systems? Real-Time Systems, 2(4), 1990.

[SSE05] J. Staschulat, S. Schliecker, and R. Ernst. Scheduling analysis of real-time systemswith precise modeling of cache related preemption delay. In Real-Time Systems, 2005.(ECRTS 2005). Proceedings. 17th Euromicro Conference on, pages 41�48, July 2005.

[ST85] Daniel D. Sleator and Robert E. Tarjan. Amortized E�ciency of List Update andPaging Rules. Commun. ACM, 28(2):202�208, February 1985.

[Sta88] J. A. Stankovic. Misconceptions about real-time computing: a serious problem fornext-generation systems. Computer, 21(10):10�19, Oct 1988.

[TD00] Hiroyuki Tomiyama and Nikil D. Dutt. Program Path Analysis to Bound Cache-related Preemption Delay in Preemptive Real-time Systems. In Proceedings of theEighth International Workshop on Hardware/Software Codesign, CODES '00, pages67�71, New York, NY, USA, 2000. ACM.

[TFW00] Fast and precise wcet prediction by separated cache and path analyses. Real-TimeSystems, 18(2-3), 2000.

[The02] Henrik Theiling. Ilp-based interprocedural path analysis. In Alberto Sangiovanni-Vincentelli and Joseph Sifakis, editors, Embedded Software, volume 2491 of LectureNotes in Computer Science, pages 349�363. Springer Berlin Heidelberg, 2002.

[The04] Stephan Thesing. Safe and Precise WCET Determination by Abstract Interpretationof Pipeline Models. PhD thesis, Naturwissenschaftlich-Technischen Fakultäten derUniversität des Saarlandes, Postfach 151141, 66041 SaarbrÃ¼cken, 2004.

238

BIBLIOGRAPHY

[TM04] Yudong Tan and Vincent Mooney. Integrated intra- and inter-task cache analysis forpreemptive multi-tasking real-time systems. In Henk Schepers, editor, Software andCompilers for Embedded Systems, volume 3199 of Lecture Notes in Computer Science,pages 182�199. Springer Berlin Heidelberg, 2004.

[TM05] Yudong Tan and Vincent J. Mooney, III. Wcrt analysis for a uniprocessor with auni�ed prioritized cache. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Con-ference on Languages, Compilers, and Tools for Embedded Systems, LCTES '05, pages175�182, 2005.

[TM07] Yudong Tan and Vincent Mooney. Timing analysis for preemptive multitasking real-time systems with caches. ACM Trans. Embed. Comput. Syst., 6(1), February 2007.

[TSRB15] Hai-Nam Tran, Frank Singho�, Stephane Rubini, and Jalil Boukhobza. Addressingcache related preemption delay in �xed priority assignment. In Emerging TechnologiesFactory Automation (ETFA), 2015 IEEE 20th Conference on, pages 1�8, Sept 2015.

[TY97] Hiroyuki Tomiyama and Hiroto Yasuura. Code placement techniques for cache missrate reduction. ACM Trans. Des. Autom. Electron. Syst., 2(4):410�429, October 1997.

[VLX03] X. Vera, B. Lisper, and Jingling Xue. Data caches in multitasking hard real-timesystems. In Real-Time Systems Symposium, 2003. RTSS 2003. 24th IEEE, pages154�165, Dec 2003.

[WA12] J. Whitham and N. C. Audsley. Explicit Reservation of Local Memory in a Predictable,Preemptive Multitasking Real-Time System. In Real-Time and Embedded Technologyand Applications Symposium (RTAS), 2012 IEEE 18th, pages 3�12, April 2012.

[WEE+08] Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, StephanThesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann,Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and PerStenström. The worst-case execution-time problem—overview of methods andsurvey of tools. ACM Trans. Embed. Comput. Syst., 7(3):36:1�36:53, May 2008.

[WGR+09] R. Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C. Ferdinand.Memory hierarchies, pipelines, and buses for future architectures in time-critical em-bedded systems. Computer-Aided Design of Integrated Circuits and Systems, IEEETransactions on, 28(7):966�978, July 2009.

[WGZ15] Chao Wang, Zonghua Gu, and Haibo Zeng. Integration of Cache Partitioning andPreemption Threshold Scheduling to Improve Schedulability of Hard Real-Time Sys-tems. In Real-Time Systems (ECRTS), 2015 27th Euromicro Conference on, pages69�79, July 2015.

[Wil97] David Wilner. Vx-Files: What Really Happened on Mars. In Keynote at the 18thIEEE Real-Time Systems Symposium (RTSS'97), volume 41, 1997.

239

BIBLIOGRAPHY

[WMH+97] R.T. White, F. Mueller, C.A. Healy, D.B. Whalley, and M.G. Harmon. Timing anal-ysis for data caches and set-associative caches. In Real-Time Technology and Appli-cations Symposium, 1997. Proceedings., Third IEEE, pages 192�202, Jun 1997.

[Wol94] Andrew Wolfe. Software-based cache partitioning for real-time applications. J. Com-put. Softw. Eng., 2(3):315�327, March 1994.

[WP14] S. Wasly and R. Pellizzoni. Hiding memory latency using �xed priority scheduling.In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014IEEE 20th, pages 75�86, April 2014.

[WS99] Yun Wang and M. Saksena. Scheduling �xed-priority tasks with preemption thresh-old. In Real-Time Computing Systems and Applications, 1999. RTCSA '99. SixthInternational Conference on, pages 328�335, 1999.

[WTA14] Bryan C. Ward, Abhilash Thekkilakattil, and James H. Anderson. Optimizingpreemption-overhead accounting in multiprocessor real-time systems. In Proceedingsof the 22Nd International Conference on Real-Time Networks and Systems, RTNS'14, pages 235:235�235:243, 2014.

[XP93] Jia Xu and D.L. Parnas. On satisfying timing constraints in hard-real-time systems.Software Engineering, IEEE Transactions on, 19(1):70�84, Jan 1993.

[YBB10] Gang Yao, G. Buttazzo, and M. Bertogna. Feasibility Analysis under Fixed PriorityScheduling with Fixed Preemption Points. In Embedded and Real-Time ComputingSystems and Applications (RTCSA), 2010 IEEE 16th International Conference on,pages 71�80, Aug 2010.

[YBB11] Gang Yao, Giorgio Buttazzo, and Marko Bertogna. Feasibility analysis under �xed pri-ority scheduling with limited preemptions. Real-Time Systems, 47(3):198�223, 2011.

[YS07] P.M. Yomsi and Y. Sorel. Extending Rate Monotonic Analysis with Exact Cost ofPreemptions for Hard Real-Time Systems. In Real-Time Systems, 2007. ECRTS '07.19th Euromicro Conference on, pages 280�290, July 2007.

240

Résumé

L'utilisation de composants sur étagère, notamment de processeurs avec mémoires cache, se répand dans les systèmesembarqués temps réel, même les plus critiques tels que les avions.Les applications embarquées critiques étant soumises à des contraintes temporelles très strictes, l'ordonnancement tempsréel vise à garantir qu'aucune tâche dans le système ne violera son échéance. Mais l'utilisation d'un cache peut entraînerl'apparition de délais supplémentaires appelés Délais de Préemption Dus au Cache (en anglais Cache-Related PreemptionDelays, crpds) pouvant compromettre l'intégrité du système.La plupart des travaux existants visent soit à réduire les crpds soit à les borner a�n d'améliorer la prédictibilité dusystème, mais peu d'entre-eux se sont attaqués directement au problème d'ordonnancer optimalement des tâches tempsréel sur un système monoprocesseur avec mémoires cache.Cette thèse s'intéresse au problème général d'ordonnancement tenant compte des e�ets du cache. Nous identi�ons deuxproblèmes d'ordonnancement di�érents et prouvons qu'ils sont tous deux NP-di�ciles au sens fort. Puis, nous étudionsl'impact des crpds sur des ordonnanceurs en ligne tels que Rate Monotonic (rm) et Earliest Deadline First (edf) etmontrons qu'ils ne sont plus viables. Nous prouvons aussi que l'ordonnancement en-ligne optimal en tenant compte descrpds est impossible. Nous proposons donc une approche optimale hors-ligne, en utilisant la programmation linéaire.

Mots-clés : systèmes embarqués temps réel, ordonnancement temps réel dur, monoprocesseur, cache, Délais de Préemp-

tion dus au Cache, ordonnancement en-ligne, ordonnancement hors-ligne, programmation linéaire, complexité, viabilité,

Rate Monotonic (rm), Deadline Monotonic (dm), Earliest Deadline First (edf), ordonnancement optimal

Abstract

Components O�-The-Shelf, in particular processors with cache memories, are more and more used in real-time embeddedsystems, even in critical systems such as airplanes.Critical embedded applications are subjected to very strict timing constraints. Real-time scheduling aims to ensurethat every task in the system can be executed without missing a deadline. But because of the use of cache memories,additional delays, known as Cache-Related Preemption Delays (crpds), might occur as soon as multiple tasks can runon the same processor. Those crpds may cause a task to miss its deadline and so jeopardize the system integrity.Most of the existing work focuses on either reducing the crpds or improving the system predictability by bounding thecrpds. But not much has been done concerning the problem of optimally scheduling real-time tasks on uniprocessorsystems with cache memories.This PhD work focuses on the general problem of taking scheduling decisions while accounting for cache e�ects. Twodi�erent scheduling problems, the Cache-aware scheduling problem and the crpd-aware scheduling problem, are identi�edand proved to be NP-hard in the strong sense. Then, the impact of crpds on classic online scheduling policies such asRate Monotonic (rm) and Earliest Deadline First (edf) is studied. We show in particular that neither rm nor edf issustainable when crpds are accounted for. Moreover, we prove that optimal online scheduling is impossible for sporadictasks subjected to crpds. So, we propose an optimal o�ine scheduling approach, using mathematical programming.

Keywords: real-time embedded systems, hard real-time scheduling, uniprocessor, cache, Cache-Related Preemption

Delays, online scheduling, o�ine scheduling, linear programming, computational complexity, sustainability, Rate Mono-

tonic (rm), Deadline Monotonic (dm), Earliest Deadline First (edf), optimal scheduling

Secteur de recherche : Informatique et applications

LABORATOIRE D'INFORMATIQUE ET D'AUTOMATIQUE POUR LES SYSTEMESEcole Nationale Supérieure de Mécanique et d'Aérotechnique

Téléport 2 � 1, avenue Clément Ader � BP 40109 � 86961 Chasseneuil-Futuroscope CedexTél : 05.49.49.80.63 � Fax : 05.49.49.80.64

UP U niversité de P oitiers - LIAS lab · Mots-Clés : systèmes embarqués temps réel,...

Documents

Transcript of UP U niversité de P oitiers - LIAS lab · Mots-Clés : systèmes embarqués temps réel,...