Video enhancement using content-adaptive least mean … · Video Enhancement Using Content-adaptive...

Video Enhancement UsingContent-adaptive

Least Mean Square Filters

Meng Zhao

Video Enhancement Using Content-adaptiveLeast Mean Square Filters

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan deTechnische Universiteit Eindhoven, op gezag van de

Rector Magnificus, prof.dr.ir. C. J. van Duijn, voor eencommissie aangewezen door het College voor

Promoties in het openbaar te verdedigenop donderdag 15 juni 2006 om 16.00 uur

door

Meng Zhao

geboren te Shaanxi, China

Dit proefschrift is goedgekeurd door de promotoren:

prof.dr.ir. G. de Haanenprof.dr.ir. R.H.J.M Otten

Advanced School for Computing and Imaging

This work was carried out in the ASCI graduate school.ASCI dissertation series number 126.

CIP-DATA LIBRARY TECHNISCHE UNIVERSITEIT EINDHOVEN

Zhao, Meng

Video enhancement using content-adaptive least mean square filters/ by Meng Zhao. - Eindhoven : Technische Universiteit Eindhoven, 2006.Proefschrift. - ISBN 90-386-0774-1 - ISBN 978-90-386-0774-0NUR 959Trefw.: video techniek / digitale filters Subjectheadings: video signal processing / adaptive filters

Acknowledgments

This thesis is not a result of my individual work, but comprises a lot of supportand care from my colleagues, friends and family.

First of all, I would like to give my acknowledgments to Prof. Gerard de Haanand Prof. Ralph Otten, who provided me the opportunity of this Ph.D position. Iwould like to give my deep thanks to Prof. Gerard de Haan, I will never forget thehelp and guidance I received from him in the last four years.

I would also give my thanks to my colleagues in the ES group of TechnischeUniversiteit Eindhoven. My thanks to Rian van Gaalen and Marja de Mol-Regelsfor all the support, which was started before I move to the Netherlands. Thanksto Qin Zhao to provide me with a vivid example in finishing a Ph.D thesis stepby step. I would also give thanks to Sander Stuijk and Valentin Gheorghita, whoI could always ask for help when my computer broken down. Thanks to Alexan-der Beric, Amir Ghamarian and Hao Hu, for all the interesting talks conductedtogether in the last few years.

I would show my thanks to the colleagues of Video Processing and VisualPerception group in Philips Research Laboratory Eindhoven. Thanks to MarcoBosma, Paul Hofman, Calina Ciuhu and Harm van de Heijden for their help infinishing some of the experiments and in reviewing my papers. Thanks to RoosRajae-Joordens, Stefan Swinkels, Arnold van Keersop and Pieter Seuntiens inhelping me set up several visual assessment experiments. Thanks to group mem-bers who were bothered several times for my subjective image quality assessmentexperiments.

I would also give thanks to my friends, Tao Jiang and Bin Yin, for the help Ireceived during my daily life.

Last but not least, I should thank my family for all the support I received.Especially to my wife, Jing Duan. She joined me in the Netherlands just at thetime I started the writing. Her love, care and support encouraged me to go ahead.

iii

iv ACKNOWLEDGMENTS

Summary

The current television system is evolving towards an entirely digital, high reso-lution and high picture rate broadcasting system. As compatibility with the pastweights heavily for a popular consumer service, this evolution progresses ratherslowly. The progress is particularly slow compared to the revolution we observein modern display technology, which in the last decade eliminated the traditionalbottleneck in television picture quality. Methods, that can bridge the resulting gapbetween broadcast and display picture quality, are therefore increasingly impor-tant and may profit from the rapid developments in semiconductor technology.This last trend enables a steadily increasing processing power at a given pricelevel, and makes the implementation of complex video enhancement algorithmsfeasible in consumer equipment.

We started our exploration documented in this thesis by reviewing image andvideo resolution enhancement techniques proposed in scientific papers and patentliterature. This review revealed two interesting approaches, both derived fromimage restoration theory, i.e. the Least Mean Square (LMS) or Wiener filteringapproach, and the Bayesian image restoration approach.

We focused on the LMS-filtering methods as particularly Kondo’s classifi-cation based LMS filtering approach for video resolution up-conversion methodattracted our special attention due to its high performance and relatively simplehardware implementation. Li’s localized LMS filtering approach for resolutionup-conversion initially served in a comparative study as an alternative.

We recognized though that Kondo’s classification based LMS filtering outper-forms Li’s localized LMS-filtering and, moreover, can be used in a broad rangeof video enhancement applications. Examples of those applications elaborated inthis thesis include resolution up-conversion, de-interlacing, chrominance resolu-tion up-conversion and coding artefact reduction.

Furthermore we showed that although the LMS filtering approaches are basedon objective metrics, particularly the MSE criterion, it is still possible, for videoup-scaling, to design subjectively more pleasing filters using the LMS metric.

We had to conclude that a direct mapping of the content-adaptive LMS filter-ing to the problem of intra-field de-interlacing does not outperform all existing

v

vi SUMMARY

intra-field methods. However, the classification based LMS filtering approachwas shown to successfully replace the heuristics in two other de-interlacing de-signs, including the vertical temporal filtering and the data mixing for hybrid de-interlacing combining motion-compensated and edge-adaptive techniques.

Finally, we investigated the classification based LMS filtering approach forchrominance resolution enhancement and coding-artifact reduction. In the chromi-nance up-conversion problem, we combined luminance and chrominance infor-mation in an innovative classification of the local image patterns leading to animproved performance compared to earlier designs. In coding-artifact reduction,the relative position information of the pixel inside the coding block along withthe local image structure for classification was shown to give an interesting per-formance and elegant optimization.

Looking back, this thesis reveals a more general usage of LMS filtering forvideo enhancement, which can avoid many heuristic optimizations that are com-monly seen in this area.

Samenvatting

Televisie evalueert naar een volledig digitaal medium, waarbij beelden met eenhoge resolutie en een hoge beeldfrequentie wordt overgedragen. Bij een populairedienst is compatibiliteit erg belangrijk, of meer in het bijzonder televisiebezitterswaarderen het niet als hun toestel snel veroudert, hetgeen ertoe heeft geleid datdeze ontwikkeling relatief traag verloopt. Dit geldt zeker als je het vergelijkt metde revolutie die plaats heeft gevonden voor beeldschermen. Moderne displays vor-men hierdoor niet langer de beperking voor het bereiken van televisie met hogebeeldkwaliteit. De interesse voor methoden die het gat kunnen dichten tussenomroep- en beeldscherm- kwaliteit, is momenteel dan ook groot. Deze methodenkunnen profiteren van de snelle vooruitgang in de halfgeleidertechnologie. Dievooruitgaan heeft geleid tot gestaag toenemende rekenkracht bij constante bli-jvende prijs, zodat de implementatie van zelfs zeer complexe algoritmen voor deverbetering van beeldkwaliteit in consumentenelektronica haalbaar wordt.

We zijn ons onderzoek begonnen met een inventarisatie van resolutie-opscha-lingstechnieken technieken, zoals deze in wetenschappelijke publicaties en in depatent-literatuur te vinden zijn. Beide bronnen brachten een eigen interessantebenadering aan het licht die beide zijn afgeleid van theoretische concepten uit debeeldrestauratie. Het betreft hier de Least Mean Square (LMS) of Wiener filteringbenadering en Bayes’ techniek uitgewerkt voor beeldrestauratie.

We hebben ons gefocusseerd op LMS-filtering, omdat in het bijzonder Kondo’sop classificatie gebaseerde aanpak van LMS-filtering voor video resolutieopschal-ing onze aandacht trok met zijn hoge kwaliteit en relatief eenvoudige hardware-implementatie. Li’s gelokaliseerde benadering voor LMS-filtering om de resolutievan video op te schalen diende in een vergelijkend onderzoek als alternatief.

Het bleek dat Kondo’s methode duidelijk beter presteert dan die van Li en dateerstgenoemde methode bovendien voor een zeer breed bereik aan toepassingeninzetbaar is. Voorbeelden van dergelijke toepassingen, die in dit proefschrift zijnuitgewerkt, betreffen onder meer resolutie-opschaling, de-interlacing, opschalingvan de kleursignalen en coderingsartifactreductie.

Ook hebben we laten zien dat hoewel LMS-filtering is gebaseerd op objec-tieve kwaliteitsmaten, in het bijzonder op het MSE-criterium dat dikwijls slecht

vii

viii SAMENVATTING

correleert met de waargenomen beeldkwaliteit, het toch mogelijk is om subjectiefbetere filters te ontwerpen voor video-opschaling.

We moesten concluderen dat een directe vertaling van inhoudsgestuurde LMS-filtering op het probleem van intra-field de-interlacing niet alle bestaande intra-field methoden overtreft. Echter, de op classificatiegebaseerde methode bleek welsuccesvol in onze poging om de heuristiek te vervangen in een tweetal anderede-interlacing ontwerpen. Het ging hierbij om verticaaltemporele filtering en omhybride de-interlacing, waarin bewegingscompensatie is gecombineerd met richt-ingsafhankelijke interpolatietechnieken.

Tot slot hebben we de op classificatie gebaseerde LMS-optimalisatie voor hetverbeteren van de resolutie van de kleursignalen en ten behoeve van compressie-artefactreductie onderzocht. Bij de kleuropschaling hebben we informatie vande lokale helderheid en kleur gebruikt in een innovatieve gecombineerde structu-urclassificatie. Dit heeft geleid tot superieure resultaten vergeleken met eerdereontwerpen. Bij het reduceren van compressieartefacten leidde een classificatiegebaseerd op zowel de relatieve positie-informatie van het pixel in het coderings-blok als de lokale beeldstructuur tot goede resultaten en een elegante optimal-isatiemethodologie.

In retro-perspectief, toont dit proefschrift de mogelijkheid voor een meer al-gemene toepassing van LMS-optimalisatie bij de ontwikkeling van beeldbew-erkingalgoritmen, waardoor de in dit veld frequent gebruikte heuristiek vermedenkan worden.

Contents

Acknowledgments iii

Summary v

Samenvatting vii

1 Introduction 11.1 Trends in video . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 The evolution from analog to digital . . . . . . . . . . . . 21.1.2 The revolution in display technology . . . . . . . . . . . . 31.1.3 The evolution of processing power . . . . . . . . . . . . . 41.1.4 The convergence of TV and PC . . . . . . . . . . . . . . 61.1.5 Towards TV on mobile . . . . . . . . . . . . . . . . . . . 6

1.2 Challenges in digital video processing . . . . . . . . . . . . . . . 71.2.1 Video scanning format conversion . . . . . . . . . . . . . 71.2.2 Progressive scan conversion . . . . . . . . . . . . . . . . 91.2.3 Resolution up-conversion . . . . . . . . . . . . . . . . . 101.2.4 Digital video compression . . . . . . . . . . . . . . . . . 12

1.3 Objectives and research overview . . . . . . . . . . . . . . . . . . 141.3.1 Original goal . . . . . . . . . . . . . . . . . . . . . . . . 141.3.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 151.3.3 Opportunities . . . . . . . . . . . . . . . . . . . . . . . . 16

1.4 About this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.1 Main contributions . . . . . . . . . . . . . . . . . . . . . 181.4.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Overview of Video Resolution Up-conversion 212.1 Linear up-scaling algorithm . . . . . . . . . . . . . . . . . . . . . 232.2 Advanced resolution up-conversion algorithms . . . . . . . . . . 25

2.2.1 Kondo et al. (TK) . . . . . . . . . . . . . . . . . . . . . . 252.2.2 Atkins et al. (CBA) . . . . . . . . . . . . . . . . . . . . . 28

ix

x CONTENTS

2.2.3 Plaziac et al. (NP) . . . . . . . . . . . . . . . . . . . . . 312.2.4 Li et al. (XL) . . . . . . . . . . . . . . . . . . . . . . . . 322.2.5 Greenspan et al. (HG) . . . . . . . . . . . . . . . . . . . 342.2.6 Tegenbosch et al. (JT) . . . . . . . . . . . . . . . . . . . 35

2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3.1 Objective evaluation . . . . . . . . . . . . . . . . . . . . 362.3.2 Subjective evaluation . . . . . . . . . . . . . . . . . . . . 392.3.3 Results and comparison . . . . . . . . . . . . . . . . . . 40

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Refinements of Resolution Enhancement 453.1 Wiener filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.1.1 An image restoration model . . . . . . . . . . . . . . . . 463.1.2 Image Restoration . . . . . . . . . . . . . . . . . . . . . 473.1.3 Video up-conversion . . . . . . . . . . . . . . . . . . . . 49

3.2 Improving video resolution up-conversion . . . . . . . . . . . . . 513.2.1 Low-pass Anti-Alias Filtering . . . . . . . . . . . . . . . 523.2.2 Interpolation aperture enlargement . . . . . . . . . . . . . 523.2.3 Temporal optimisation window extension . . . . . . . . . 543.2.4 A single-pass XL algorithm . . . . . . . . . . . . . . . . 553.2.5 Results and evaluation . . . . . . . . . . . . . . . . . . . 57

3.3 Subjective optimal LMS filters . . . . . . . . . . . . . . . . . . . 613.3.1 Classification-based LMS filter . . . . . . . . . . . . . . 613.3.2 Localized LMS filter . . . . . . . . . . . . . . . . . . . . 64

3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Video De-interlacing 694.1 Content-adaptive intra-field de-interlacing . . . . . . . . . . . . . 70

4.1.1 Kondo’s classification-based LMS filtering approach forde-interlacing . . . . . . . . . . . . . . . . . . . . . . . . 71

4.1.2 Li’s localized LMS filtering approach for de-interlacing . 724.1.3 Atkins’ method adapted for de-interlacing . . . . . . . . . 74

4.2 Adaptive VT-filters with classification . . . . . . . . . . . . . . . 754.3 Classification and hybrid de-interlacing . . . . . . . . . . . . . . 764.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.4.1 Test sequences . . . . . . . . . . . . . . . . . . . . . . . 794.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

CONTENTS xi

5 Chrominance Resolution Up-conversion 875.1 Classification-based chrominance up-conversion . . . . . . . . . . 88

5.1.1 Aperture definition . . . . . . . . . . . . . . . . . . . . . 895.1.2 Classification using ADRC . . . . . . . . . . . . . . . . . 915.1.3 Class inversion . . . . . . . . . . . . . . . . . . . . . . . 925.1.4 Least Mean Square algorithm for training . . . . . . . . . 92

5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.2.1 Linear up-scaling . . . . . . . . . . . . . . . . . . . . . . 935.2.2 Test images . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3 Objective comparison – MSE analysis . . . . . . . . . . . . . . . 955.3.1 Subjective comparison . . . . . . . . . . . . . . . . . . . 95

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6 Image and Video De-blocking 996.1 De-blocking algorithms . . . . . . . . . . . . . . . . . . . . . . . 101

6.1.1 Classification of structure . . . . . . . . . . . . . . . . . 1026.1.2 Classification of relative position . . . . . . . . . . . . . . 1036.1.3 Classification of structure and relative position . . . . . . 103

6.2 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 1046.2.1 Implementation of the training . . . . . . . . . . . . . . . 1046.2.2 Test material . . . . . . . . . . . . . . . . . . . . . . . . 1056.2.3 Performance measurement . . . . . . . . . . . . . . . . . 1056.2.4 Results and Comparisons . . . . . . . . . . . . . . . . . . 107

6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7 Conclusions and Future Work 1137.1 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . 1147.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.2.1 Non-linear versus linear . . . . . . . . . . . . . . . . . . 1157.2.2 Optimisation based on subjective metric . . . . . . . . . . 115

Curriculum Vitae 129

xii CONTENTS

Chapter 1

Introduction

The history of video processing starts when image scanning, the fundamental prin-ciple enabling television, was first described in 1840’s by Bakewell [1]. It lastedalmost till the end of the 19th century before Nipkov, in 1883, converted the prin-ciple into the first practical system using mechanical scanning [1]. Around 1926,Baird in London and Jenkins in Washington independently gave the first demon-strations of actual television, using Nipkov’s invention. By 1935, the electronicscanning system, invented by Braun in 1897 [1], has become mature for the receiv-ing end of the television system and devices implementing transmission standardshave been developed [2]. The British Broadcasting Company (BBC) started thefirst regular black and white TV broadcast in 1936 [3], but it was only after WorldWar II that television broadcasting became popular.

In 1950, a colour television transmission system was developed in the UnitedStates, which lead to the NTSC-standard (National Television Sub-Committee)and US colour TV broadcasting in 1953. The European alternatives, SECAM(Sequential Colour and Memory) developed in France and PAL (Phase Alterna-tion Line) developed in Germany, overcome the sensitivity for phase errors in thetransmission path of the NTSC-system. Broadcast based on SECAM and PAL hasbeen in use since 1967 [2]. The early black and white TV channels occupied ap-proximately 6 to 8 MHz transmission bandwidth. To introduce the colour signalin a compatible fashion, a sub-carrier was added within the least visible part ofthe video spectrum, modulated by the bandwidth-reduced chrominance signal [4].By 1970 regular colour TV broadcasts had started in most European countries.

After over half a century of using analog television systems, a revolution trig-gered by digital technology is changing the area of video processing profoundly.The digital technology generates many new applications, particularly because itsignificantly simplifies storage and delay of signals, but also creates new prob-lems. Also the introduction of personal computers, game consoles and digitalrecording provide new platforms for digital video and generate new challenges

1

2 CHAPTER 1. INTRODUCTION

for video processing research.More recently, revolutionary developments in display technology, where the

traditional scanning electron beam hitting an electro-luminant screen, is replacedby a flat array (matrix) of cells that individually generate or modulate light, pro-vide large area, high resolution, bright displays with a perfect geometry [5, 6]unequalled by the earlier display technology. The performance of these displayssets new goals for video processing.

In the following, we shall first review some technological trends and applica-tion domains and reveal the most interesting challenges in modern video systems.Given the scope of the thesis, we shall focus on challenges in the area of imageenhancement.

1.1 Trends in video

Digital signal processing did not instantaneously change the video chain, butrather faded into the television receiver with islands of digital signal processingfor separate functions to improve the picture quality, such as noise reduction, scanrate conversion, etc. [7]-[9], or extending the functionality such as Picture-in-Picture (PIP) [10] and Internet browsing [11]. To date, the process has not endedyet, as this would imply that digital processing should cover every aspect includ-ing capture, transmission, storage, etc. Clearly, the impact of digital televisionis more significant than simply moving from an analog to a digital transmissionsystem. It permits a number of program elements, including video, audio anddata, that match the services selected by the consumer [12]. The developments inmodern display technology, semiconductor manufacturing and digital communi-cation networks, stimulate the evolution of the digital television system and makeit feasible.

1.1.1 The evolution from analog to digital

With the existing analog television broadcasting system, including NTSC, PALand SECAM, the video audio and some limited data information are conveyed bymodulating a Radio Frequency (RF) carrier [12]. Digital transmission has advan-tages over the analog system in many aspects, like stability, ease of integration,flexibility and high quality [13, 14], while digital video signals can be relayed overa long distance at extremely low bit error rate [15]. However, there are drawbacksas well.

The raw digital video material requires a much higher transmission bandwidthand storage capacity compared to its analog equivalent. Digital coding techniques

1.1. TRENDS IN VIDEO 3

Video SourceCoding

ServiceMultiplex

Ancillary Data

Audio

Control Data

TransportChannelCoding

Modulation

TransmissionMedia

Digital Video

Figure 1.1: The digital video transmission system, which is composed of several parts includingsource coding and compression, service and data multiplex, modulation and transmission.

are developed to fit the digital video signal into the available transmission chan-nels, or record it on a single Digital Versatile Disk (DVD) [16], but consequentlydegrade the picture quality [17, 18]. New algorithms are required for the reductionof digital compression artefacts, which are caused by heavy compression neces-sary for some application domains [18, 19].

The general digital video transmission system is depicted in 1.1. Source cod-ing is used to squeeze the data into certain transmission media with different band-width, or limited storage capacity [20, 21]. Channel coding takes the data bitstream and adds auxiliary information that can be used by the receiver to recon-struct the data from the received signal [12, 20]. The advances in source codingand channel coding technologies led to international standards for digital televi-sion transmission television. Standards related to source coding, like MPEG-1and MPEG-2, have been developed by the so-called Motion Picture Expert Group[16, 22]. Digital video broadcasting (DVB) standards define coding schemes forvarious transmission media, including cable, terrestrial and satellite [15].

1.1.2 The revolution in display technology

The Cathode Ray Tube (CRT) type of display for television application has evolvedover many decades and has become a very mature display technology [12]. Thestrengths of the CRT type of displays are low cost at a relative small size, full color,wide view angle, high resolution and quick response. However, the drawbacks ofCRT type of displays are difficult to solve, such as high voltage operation, exces-sive weight and volume for large-screen display tubes, geometrical distortions andcompromise between brightness and resolution [12, 23].

The revolution in modern display technology provides us with large area, highresolution, light weight Flat Panel Display (FPD) devices, such as Liquid CrystalDisplay (LCD), Plasma Display Panel (PDP) and Organic Light-Emitting Diode(OLED). The features of FPD devices such as large area, high light-output with


Figure 1.2: FPD TV demand forecast (in Mpcs). Source: Displaybank quarterly report (LCDTV market trend and cost analysis).

high resolution and absence of geometrical distortion makes them very suitable forHDTV and Personal Computer (PC) displays [12]. However, some of the tech-nologies do not support full color video, and LCD suffers from a slow responsetime that causes problems with motion pictures [12].

Although not yet perfect, the FPD-performance and manufacturing technolo-gies are developing fast, leading to a rapidly growing market [6, 12]. Figure 1.2shows that the demand for CRT displays is steadily decreasing while the marketfor FPD is expanding [12]. To display the traditional standard definition (SD)interlaced video material on the new high resolution, progressive FPD display,requires advanced video format conversion techniques. Furthermore, video pro-cessing techniques are being developed to compensate for the deficiencies of theFPD devices [24, 25].

1.1.3 The evolution of processing power

In 1965, Gordon Moore noted that the complexity of minimum cost semiconduc-tor components would be doubled per year since the first prototype microchip wasproduced in 1959 [26]. This exponential increase in the number of componentson a chip became later known as Moore’s Law. Moore’s law is the empiricalobservation that at the rate of technological development, the complexity of anintegrated circuit, with respect to minimum component cost will double in aboutevery 18 months (latter was projected by Moore to every 24 months) [27]. In thelatest form, Moore’s Law became widely associated with the claim that comput-ing power at fixed cost is doubling every two years. As an example, Figure 1.3illustrates Intel’s exponential increase of the number of transistors integrated intoPC micro-processors [28]. The exponential growth implies that the fundamen-tal physical limits of microelectronics are approaching rapidly. Observers have

1.1. TRENDS IN VIDEO 5

Num

ber o

f tra

nsis

tors

on

an in

tegr

ated

circ

uit

2,300

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

1971 1980 1990 2000 2004

Year

40048008

8080

286

386

486

Pentium

Pentium II

Itanium 2(9 MB Cache)

8086

Pentium 4

Pentium III

Itanium 2

Itanium

Numbe

r of tr

ansis

tors d

oubli

ng ev

ery 18

mon

ths

Number of tr

ansistors

doubling every

24 months

Figure 1.3: Growth of transistor counts for Intel processor and Moore’s Law.

concluded from speculations that Moore’s Law will probably be valid for aboutanother decade [28].

If we look at some integrated circuits (ICs) for digital television, we can seethat the computational power is also increasing rapidly with the advances in thesemiconductor technology. In Figure 1.4, examples of video processing ICs forhigh-end scan-rate conversion with motion compensation (MC) [29, 30] are given.From 1996 to 2003, the number of transistors on a single IC increased roughly bya factor of ten. The rapid increasing of processing power allows more advancedvideo processing algorithms to be designed and implemented. With the advent

A B C

Figure 1.4: Micro-photograph of ICs for digital video processing from Philips Electronics. (A)SAA4991 from 1996, 980,000 transistors. (B) SAA4992 from 1999, 4,000,000 transistors. (C)SAA4998H from 2003, 10,000,000 transistors.


Figure 1.5: Personal computer based media center system allows the user to perform a varietyof tasks such as play audio, video, watch TV and record, pause, rewind, time-shift the favoriteshows. (Source: http://www.scottport.com/products/entertainment.htm)

of HDTV, the number of pixels in the picture increases by a factor of five, whichleads to an equivalent increase in processing power.

1.1.4 The convergence of TV and PC

A complete convergence of TV and PC systems will be possible as soon as TVsare entirely digital [31]. Given the evolution of technologies like digital storage,transmission, display, the merge of TV and Personal Computer (PC) seems in-evitable [32, 33]. On the one hand, TV is added with interactive or web function-ality [11], while on the other hand, the processing power of PC provides an idealplatform for advanced digital video processing tasks [34]. Home entertainmentsystems based on multimedia PCs, such as media center products [35], shown inFigure 1.5, provide an economic solution with maximum flexibility. With the cur-rent speed at which the bandwidth of Internet connection increases, Internet hasthe potential to become the most important transmission medium for digital videoand can therefore be regarded as a catalyst of the convergence [32].

1.1.5 Towards TV on mobile

Mobility is an advanced feature of modern consumer electronics equipment. Ad-vanced high resolution, small size, high quality display devices and ICs with de-creasing size and power consumption lead to a fast growing market for mobiledevices. TV service over cellular, or broadband, wireless networks is intensifyingas operators seek for new, high margin applications [36].

1.2. CHALLENGES IN DIGITAL VIDEO PROCESSING 7

Figure 1.6: Television broadcasting on hand-held devices (Source: http://www.dvb.org/).

Philips Semiconductors, forecasts that 50 percent of cell-phones will comewith television capability by 2013, and that volumes by this time will be about600 million handsets per year in total, with 300 million featuring TV [37]. Broad-casting standards such as DVB-H (Digital Video Broadcast for Handheld) [38] inEurope and DMB (Digital Multimedia Broadcasting) [39] in South Korea havebeen developed for mobile television broadcast. Integrating all the componentsof a digital TV receiver into a space small enough to fit into a mobile phone orpersonal digital assistant (PDA) is challenging the current technology. To meetthe low power consumption requirement, simplification of video processing algo-rithms for high-end applications is preferred.

1.2 Challenges in digital video processingConsidering the trends discussed in the previous section, various challenges in thearea of digital video processing are apparent. The main challenges that we shalldiscuss in this section include digital video compression, video format conversion,interlaced-to-progressive conversion, resolution up-conversion and coding-blockartefact reduction.

1.2.1 Video scanning format conversionThe video format, here, refers to the various scanning schemes of the video dis-plays and not to the storage format of the video data file. Basically, a video formatdetermines characteristics such as number of scanning lines, pixel frequency, as-


First field retrace

Second fieldretrace

Scanning line

Picturerefresh rate

Num

be

r o

f sc

an

nin

g li

ne

s

Number of pixels per lines

Figure 1.7: Schematic representation of video scanning format. The number of scanning linesand number of pixels per line determine the resolution.

pect ratio, picture refresh rate, colour system, interlace factor, etc. Figure 1.7illustrates the meaning of the video format parameters. The traditional TV usesthe interlaced scanning scheme, while multimedia PCs employ a progressive dis-play format. Many scanning formats have been standardized for newly developedapplications, like HDTV, video on mobile devices, video telephony, video confer-encing, etc. The co-existence of all these formats generates a need for scanningformat conversion techniques in order to show a video registered in a first formatto be rendered in another format. As discussed in [40], the video scanning formatconversion problem can be decomposed into three branches, i.e. spatial scaling,de-interlacing and picture rate conversion.

Spatial scaling is basically a straightforward sampling rate conversion and useslinear filters to change the resolution of the input video to that of the displaydevice. The aim of scaling is to keep as much as possible the base-band signalintact and suppress the repeat spectra if possible. The options are limited by theindividual sampling grids in the signal path. An interesting new challenge in thisarea is to increase the base-band signal using nonlinear methods that fall in thecategory of video resolution enhancement. This challenge is highly relevant in thecurrent situation where display resolution is less of a bottleneck than the availablevideo resolution.

Video cameras use a picture rate of 50 or 60Hz, while movie films are recordedat 24, 25 or 30Hz. The picture rate of TV and PC displays lie between 50 to 120Hz[40]. High quality picture rate conversion methods make use of motion estima-tion and compensation techniques to predict the missing information. Figure 1.8illustrates the picture rate up-conversion by a factor of two. In case the differ-ence between input and output exceeds 30Hz, the moving object will be perceivedat both the repeated and expected position, which will result in a blurred image.


Position

Picturenumber

nn-1 n+1 n+2

Original Repeated

Expected position of object

Figure 1.8: Frame rate up-conversion by a factor of two using picture repetition. The movingball is perceived at both the repeated and expected position, lead to a blurred image.

To solve this problem, prediction and interpolation of the object at the expectedposition by means of motion estimation and compensation can be used [40].

1.2.2 Progressive scan conversion

With interlaced scanning, only half of the scanning lines of individual picturesare transmitted and reproduced at a TV receiver. The first field is made of theodd scanning lines and the second field is made of the even scanning lines. Ithas been shown that interlaced video display matches the demands of the humanvisual system very well [41]. The interlacing procedure is a complication thoughfor many digital processing tasks and also most modern displays cannot handleinterlaced signals well.

Therefore, de-interlacing, or interlaced-to-progressive-conversion, doubles thevertical-temporal sampling density to produce a suitable signal for display. Since,in video temporal filtering seriously degrades the quality of moving images, whilevertical interpolation cannot lead to good results as the interlaced signal containsfrequencies above the Nyquist limit, de-interlacing is a significant challenge indigital video processing [42]. Figure 1.9 illustrates the de-interlacing process invideo processing.


A B C

Figure 1.9: A demo of the de-interlacing process in video resolution enhancement. (A) is theprogressive image portion and (B) is the interlaced one which contain half of the scanning lines in(A), and (C) is the de-interlaced result from (B).

Table 1.1: Digital video formats for different applications

Video format Y size Colour sampling Frame rateSMPTE 296M 1280× 720 4:2:0 24P/30P/60PSMPTE 295M 1920× 1080 4:2:0 24P/30P/60I

BT.601 720× 576/480 4:4:4 60I/50IBT.601 720× 576/480 4:2:2 60I/50IBT.601 720× 576/480 4:2:0 60I/50I

SIF 352× 288/240 4:2:0 30P/25PCIF 352× 288 4:2:0 30P

QCIF 176× 144 4:2:0 30P

1.2.3 Resolution up-conversion

Table 1.1 gives various formats of digital video standards for different applications[43, 44]. In HDTV engineering, the current main stream resolution standards arethe 1920×1080 interlaced, or 1280×720 progressive [45]-[47] for luminance. Forvarious chrominance signal format, a sub-sampling is normally performed withfactors given in Table 1.2 [46]. The co-existence of the low- and high-resolutionformats leads to up-conversion needs.

Also, with the introduction of HDTV-capable TV receivers, the transmissionof SDTV material does not stop immediately, which asks for up-conversion. Asimilar situation occurs with PCs that have a screen resolution that is higher thanrequired for television. In general the price of high resolution screens has comedown to a level that it becomes affordable, even for TVs that have no HDTVreception.

Consequently, there are various reasons to be interested in resolution up-conv-ersion. In this thesis, we shall distinguish the traditional linear interpolation tech-


A B C

Figure 1.10: Screen-shots of image portions showing (A) the original image and the up-scaledones by a factor of two in both dimension using (B) bilinear interpolation and (C) Classification-based Least Mean Square (LMS) filters. The resolution enhancement method (C) gives betterresults than bilinear up-scaling in terms of sharpness.

niques and the more advanced non-linear methods designed for this purpose byusing the terms spatial up-scaling for the former and resolution up-conversion forthe latter.

The spatial up-scaling methods do increase the number of pixels in the pic-ture, however, the high frequency components in the spectrum of the high res-olution image are still missing. The resolution up-conversion techniques try toestimate the high frequency part of the spectrum in an attempt to better exploit theperformance of the high resolution screen.

Recently, resolution up-conversion has becaming more interesting and impor-tant research area, which attract attention and investigation of many communica-tion and consumer electronics companies. Although we shall limit our discussionto the SD to HD conversion, other video format conversion, like CIF to SD canprofit from the processing.

Figure 1.10 illustrates the process of resolution up-conversion. The resolutionof the input image is increased in both spatial dimensions. Although the resultis only related to the spatial resolution, temporal information may be used in the

Table 1.2: Percentage of each chrominance component resolution with respect to luminance inthe horizontal and vertical directions

Image format Horizontal [%] Vertical [%]4:4:4 100 1004:2:2 50 1004:2:0 50 504:1:1 25 100


process as well.

1.2.4 Digital video compressionIt would require a bandwidth of 40MHz to transmit an uncompressed StandardDefinition TV (SDTV) signal over a cable, or 135MHz via satellite, which wouldrepresent five to six times the bandwidth necessary for transmission of an ana-log SDTV signal [4]. Other than that, 40 GB a storage capacity is required fora 20 minutes SDTV digital video [48]. This would make the digital technol-ogy highly unattractive compared to the analog alternative. Consequently, digitalvideo compression techniques have been investigated thoroughly, which has leadto the MPEG coding standards in the last couple of decades that make the trans-mission and storage of digital video material economically viable.

In the mid 1980s, the JPEG (Joint Photographic Experts Group) standard forcompression of still images was established [49]. It can also be used for videocompression, on the basis that video is a succession of still images, with the namemotion JPEG [49]. The motion JPEG compression scheme is being referred toas intra-picture or Intra coding, since the picture is coded without reference toother pictures in a video sequence [50]. Later on, hybrid coding schemes that useboth temporal prediction and spatial transformation, i.e., inter and intra coding,for redundancy reduction have been designed and standardized. The inter-coding,which uses temporal redundancy to improve the coding efficiency distinguishesvideo compression from Intra image compression standards [50]. These videocompression standards including H.261, MPEG-1, H.263, MPEG-2, and MPEG-4 Part 2 Visual, all aim at achieving an optimal trade-off between bit rate andperceived video quality for various applications [50]. Table 1.3 gives the time lineof the standards with approximate compression ratios and application area.

The JPEG and MPEG compression standards are block based. As illustratedin Figure 1.11, the MPEG compression scheme split the image into blocks, each

Table 1.3: Video coding standards history

Time Standard Compression Applicationsratios

1989 JPEG 15:1 Still image compression1989 H.261 10-100:1 Video telephony1993 MPEG-1 10-100:1 Digital video storage on CD-ROM1994 MPEG-2 10-200:1 Digital video broadcasting (DVB), DVD1995 H.263 10-100:1 Very low bit rate video communication2000 MPEG-4 10-500:1 DVB, Interactive multimedia


......

Video sequence

Group of pictures

...

.

.

.

Block 8x8 pixels

MacroblockPicture

Slice

Figure 1.11: The MPEG data hierarchy.

of which contains 8 × 8 pixels. Reversible transform coding of each block, us-ing the Discrete Cosine Transform (DCT), results in 8 × 8 blocks of coefficientsthat represent the amplitudes of the two dimensional spatial frequency compo-nents of the original pixel block. The DCT transform provides opportunities forcompression profiting from the fact that the human visual system is not very sen-sitive for quantization of the finer details in the image. In combination with thefact that most of the energy in a natural image is concentrated in the lower spatialfrequency components this results in many of the higher frequency coefficientshaving zero or near-zero values and therefore can be ignored [16]. This savestransmission or storage capacity while the resulting images after Inverse DiscreteCosine Transform (IDCT) perceptually hardly suffer, unless of course the quanti-zation is pushed to the extreme. In the MPEG compression scheme, three typesof pictures are defined. The Intra-pictures (I-pictures) are compressed with intra-frame coding using the DCT-transform, the Predicted pictures (P-pictures) arecoded using motion compensated prediction from past I-pictures or P-picturesand the Bi-directionally coded pictures (B-pictures) are coded using motion com-pensated prediction from, either past and/or future I-pictures, or P-pictures [48].Figure 1.12 gives an example of inter-dependent coding among I-, P-, and B-pictures.

Since images are divided into small pixel blocks and compressed indepen-dently in the block based compression scheme, discontinuity appears in betweende-compressed image blocks, which degrades the perceived image quality, es-pecially when a high compression ratio is demanded. Furthermore, performing


I B B P B B P I

Figure 1.12: The I-, P- and B-pictures in a MPEG video sequence.

enhancement directly on the de-compressed image will enhance the artefacts evenfurther. To avoid this unpleasant result from enhancement of the digital artefacts,post-processing after de-compression has been proposed to alleviate the digitalartefacts. The challenge of course is to balance the conflicting demands of artifactreduction and sharpness enhancement.

Figure 1.13 illustrates the importance of the digital artefacts reduction in videoenhancement. A direct sharpness enhancement on images that contain digital arte-facts (B) will result in a image with stronger artefacts (C). An enhancement afterdigital artefacts reduction gives obviously better image quality.

1.3 Objectives and research overview

1.3.1 Original goal

The purpose of video enhancement is to improve the subjective picture quality.In the past, the focus in this field was mainly on sharpness, contrast, and colourreproduction improvement, as well as on noise reduction. Typical examples ofprocessing resulting from research in this domain are: peaking, (colour) transientimprovement, automatic black control, histogram modification, skin-tone correc-tion, and blue-stretch.

More recently, we see an interest in up-conversion of video material from oneresolution level to a higher, e.g. of standard definition video to high definitionvideo (SD to HD conversion). These techniques are currently investigated byconsumer-electronics companies. Some of the techniques are described only inpatents, other techniques are also available from scientific literature.

1.3. OBJECTIVES AND RESEARCH OVERVIEW 15

A B C

D E

Figure 1.13: Blocking artefacts reduction. (A) is the original image portion (B) shows theimage after compression and de-compression using standard JPEG (with the quality factor set to20. Quality scales are not even standardized across JPEG programs. In this thesis we use thesame setting as the Independent JPEG Group baseline software, which set the best quality as100. The software is available at: http://www.ijg.org/.), (C) is the result of direct sharpness en-hancement from (B), (D) is the result of de-blocking and (E) is the result of performing sharpnessenhancement on (D). To compensate for the blurring effects introduced by the printing process,we enhanced images (B) to (E) with a 2D symmetry linear peaking filter.

The assignment of the PhD-study was to gather and evaluate what has beenachieved in this area, and propose (synthesize) an optimal technique of SD to HDconversion, using known techniques and adding original ideas, taking into accountconstraints of implementation in current technologies and keeping in mind proper-ties of digitally transmitted video, i.e. prevent enhancement of blocking artefactsof MPEG-2.

1.3.2 ProblemsAn important complication for gathering and evaluating the proposals in the res-olution up-conversion area was that only some of the techniques are available inthe scientific literature, but many, amongst which the most interesting techniques,are hidden in patents. Although the patents enclose the essence of the underlyinginvention, it sometimes requires a combination of inventions to obtain a truly in-teresting method. Moreover, the optimal parameters may be difficult to obtain, asthey are not essential for the invention and, therefore, are often not documented in


the patent.The classification-based LMS filtering algorithm is described in a stack of

about 50 patents invented by Kondo and owned by Sony. To extract the relevantinformation turned out to be a considerable effort. On the other hand, the system-atic analysis of different approaches for Least Mean Squares (LMS) filtering gavea thorough understanding and, therefore, has resulted in various new applicationsthat might not have been undiscovered.

Another complication, typical for the area, relates to the goal of video en-hancement. Image processing theories and techniques provide a solid basis forthe contemporary video restoration and enhancement research. Compared to thetraditional image restoration problem, which is estimating the real image fromthe observed ones, the final purpose of the image and video enhancement is toimprove the perceived, i.e. subjective image quality. As pointed out in [23], thesuccess of video enhancement processing may be a matter of taste.

Although difficult to quantize and measure, video assessment can be used toevaluate subjective video quality. Tremendous efforts have been put into imagequality assessment [51]-[61]. Also, lots of image quality metrics have been pro-posed during the last three decades, aiming to correlate well with the perceivedimage quality. The current subjective quality measurement methods using sub-jective evaluation are slow, awkward and expensive [58]. Although the simpleobjective metrics, like Mean Square Error (MSE) and Peak Signal to Noise Ratio(PSNR) are widely criticized for not correlating well with the perceived imagequality, they are still widely used for their convenience and low cost. Also, ob-jective metrics can be embedded into an image processing system to optimise thealgorithms.

The image and video enhancement problem is complicated by the fact thataccurate models abstracting the characteristic of the image, and reliable measuresreflecting the subjective impression, are now lacking. In this thesis, we shall notinvestigate new metrics for video quality assessment, but rather design algorithmsbased on the simple objective metric, MSE. However, subjective assessments areadopted in some parts to derive a comprehensive conclusion.

1.3.3 OpportunitiesThe simplest and most widely used methods are linear approaches, which wecategorized into up-scaling techniques. Resolution up-conversion methods in-clude classification-based image interpolation, neural network based image inter-polation and methods optimised for optimal subjective impression with improvedsharpness, etc. We avoided methods that use iterative algorithms that seem prob-lematic in real time applications, since we are aiming at algorithms that are suit-able for consumer TV applications.

1.3. OBJECTIVES AND RESEARCH OVERVIEW 17

To evaluate the selected video up-scaling and resolution up-conversion meth-ods, objective metric such as MSE and subjective assessment using paired com-parison are both carried out to reach a comprehensive and fair conclusion. Basedon many reviewed video up-scaling and resolution up-conversion techniques, theLeast Mean Square (LMS) filtering techniques attracted our particular interest,because of their elegant theoretical background and the resulting performance. Inthis category of methods, Kondo’s classification-based Least Mean Square (LMS)filtering has a performance as good as the best algorithms, while the implementa-tion cost is much lower. Li’s localized LMS filtering approach eventually turnedout to be less good, but also has an elegant background. Both methods adapt theresolution up-conversion filter coefficients to the local image content.

In the classification-based approach, the filter coefficients for each class oflocal image structure are derived from an off-line training using LMS optimisa-tion. In the localized approach, the interpolation-coefficients for resolution up-conversion are derived from an LMS optimisation on the local low-resolutionpixel grid, assuming that the edge orientation does not change with scaling. Weexplored the new opportunity to not only use content-adaptive LMS filtering forresolution up-conversion, but extend it to also enhance the newly generated highfrequency components of the output HD image, which will further improve theperceived sharpness and lead to a subjectively optimal resolution up-conversionsystem.

From the success of the content-adaptive LMS filtering for resolution up-conversion, we further concluded that this technology might be attractive for spa-tial de-interlacing as well. Unfortunately, we had to conclude that such direct ex-tension does not provide obvious advantages over traditional intra-field methods.However, the classification-based LMS filtering approach can be used to improvethe traditional Vertical Temporal filter by introducing content-adaptation.

A further opportunity was explored by a generalization of classification-basedLMS filtering for solving the data-mixing problem in hybrid de-interlacing thatcombines motion compensated methods with directional intra-field interpolation.

Yet another opportunity was seen in the joint resolution up-conversion ofchrominance and luminance signals. The luminance signal is proposed to facil-itate the classification of the chrominance signal and consequently improve thequality of the chrominance resolution enhancement.

Finally, we explored the options for classification-based LMS filtering appliedin de-blocking of severely compressed digital video. New options shall be re-ported, using for classification not only the local image structure but also the par-ticular position information in the coding block.


1.4 About this thesis

1.4.1 Main contributions

An overview of the current video resolution up-conversion methods, gathered bothfrom scientific literature and patent data-bases, is given with both subjective andobjective evaluations of the reviewed techniques.

From the large stack of patents filed by Sony, we extracted the classification-based LMS filtering technique. A comparative analysis of the classification-basedLMS filtering approach and the alternative localized LMS filtering approach givesa profound understanding of the content-adaptive LMS filtering for video en-hancement.

We proposed a subjectively optimal resolution enhancement algorithm basedon objective metric optimisation and based on the thoroughly analyzed principleof the content-adaptive LMS filtering we extended the use of that principle tovarious new application areas such as de-interlacing, chrominance resolution up-conversion and digital artefacts reduction.

1.4.2 Outline

After the general introduction in the Chapter 1, Chapter 2 gives our overview ofthe current resolution up-conversion techniques. Objective figures are shown tocompare all the reviewed methods, while a subjective comparison of the represen-tative methods using paired comparison is given to benchmark the most relevantmethods.

Chapter 3 first reviews the standard Wiener filtering approach for image filter-ing, generalizes this for up-scaling, and then analyses the two LMS filtering basedresolution up-conversion approaches from the overview. It became clear that thetwo content-adaptive resolution up-conversion methods, i.e. Kondo’s and Li’s areactually two types of Wiener filters specialized to the local content of the image.Based on the analysis, we present some refinements for those content-adaptiveLMS filtering methods for resolution up-conversion, aiming at a subjective opti-mal solution.

Chapter 4 focuses on the problem of video de-interlacing. De-interlacingmethods derived from resolution up-conversion techniques discussed in Chapter2 are first presented. Then, the classification-based LMS filtering approach thatgives the best performance price ratio is adopted for a content-adaptive VerticalTemporal filtering. A further extension of this approach to hybrid de-interlacing,combining directional interpolation and motion compensated interpolation, is pro-posed.

1.4. ABOUT THIS THESIS 19

The special case of chrominance resolution up-conversion is discussed in Chap-ter 5. The resolution up-conversion of chrominance signal is achieved again byapplying classification-based LMS filtering. However, for classification, bothchrominance and luminance data are shown to be useful.

Chapter 6 demonstrates the reduction of digital coding artefacts, using class-ification-based LMS filtering.

Finally in Chapter 7, we draw our conclusions. With the large variety of appli-cation examples shown in the previous chapters, this thesis aims to reveal a moregeneral usage of LMS filtering for video enhancement that can avoid the heuristicapproaches commonly seen in this area.

Chapter 2

Overview of Video ResolutionUp-conversion

In many display applications the cathode ray tube (CRT) is being replaced bya matrix type of display. For television, an important consequence is that theold trade-off between light-output and resolution has largely been eliminated.This highly facilitates the transition from standard definition television (SDTV)to high-definition television (HDTV). As long as SDTV and HDTV formats existtogether, video resolution up-conversion techniques are required to map SDTVsignals to HDTV screens1.

In general, video format conversion is a complex matter due to the trackingcapabilities of the human visual system [23]. The complete SDTV to HDTVconversion includes at least de-interlacing and possibly frame-rate conversion,both requiring advanced motion-compensated signal processing. An overviewdedicated to de-interlacing can be found in [9][23][42].

In this chapter, we shall focus on the spatial resolution up-conversion problem,assuming progressive video with the correct frame-rate as our input. This problemhas been solved theoretically [64, 65], and practical approximations of that the-ory exist [66]-[69] and have been implemented and assessed in earlier overviews[65][70]-[72]. These implementations, i.e. video-scaler integrate circuits (ICs),are available in all personal computers (PC) and in most TVs.

More recently, researchers have designed more advanced spatial resolution up-conversion techniques that outperform the popular video scalers based on lineartheory mentioned above [73]-[82]. Some of these methods have been designed toachieve a better performance with objective metric, usually a mean square error

1This chapter is an adaptation of an invited overview paper that will appear in the Proceedingsof 2006 International Symposium, Seminar, and Exhibition of International Society of InformationDisplay [62], which is submitted to the Journal of International Society of Information Display[63] with an extended version.

21

22 CHAPTER 2. OVERVIEW OF VIDEO RESOLUTION UP-CONVERSION

0 π π2ω

X

π−π2−

0 π π2ω

X

π−π2−

0 π π2ω

X

π−π2−

1

1

1/2

0 π π2ω

X

π−π2−

1/2

0 π π2ω

X

π−π2−

1

(A)

(B)

(C)

(D)

(E)

Figure 2.1: Frequency property of 1D signal decimation and interpolation. The frequency char-acteristic of: (A) The original band-limit discrete signal, (B) Low-pass filtered version of signalA, (C) Down-sampling of signal B, (D) Up-sampling of signal C and (E) Low-pass filtered signalD.

(MSE), other algorithms have been optimised for a superior score in subjectiveassessments.

Figure 2.1 depicts the frequency characteristic of decimation and interpola-tion process with a 1D band-limit signal. Although samples were recovered afterthe decimation and interpolation process, the high-frequency part of the originalsignal is missing. Since the SD video signal can be modelled as a decimationresult from a HD video signal, spatial up-scaling of SD video signal is the re-verse process of the 2D decimation. The traditional linear methods aim at findingthe optimal low-pass interpolation filter, while the more advanced methods try torecover the missing spectral component.

2.1. LINEAR UP-SCALING ALGORITHM 23

2.1 Linear up-scaling algorithmLinear up-scaling can be described as the convolution of a continuous interpola-tion kernel h1(x) with the sampled input signal xk:

Fn(x) =∑

xkh1(x− k) (2.1)

This convolution results in a continuous signal Fn(x). This continuous signal cansubsequently be re-sampled to obtain a digital output signal yl. The linear up-scaling algorithms differ only in the way the convolution kernel h1(x) is obtained.This section will describe some popular convolution kernels and show how theseare derived.

A frequently used approach to calculate the convolution kernel is based on(piecewise) polynomial functions. The zero order filter that can be constructedthis way is the so-called “nearest neighbour interpolation kernel”, which results inthe following 1-D interpolation kernel:

h11(x) =

{1 , −0.5 < x ≤ 0.50 , else

(2.2)

This nearest neighbour interpolation filter is not popular in image re-sampling dueto rather low image quality. A better low cost option is the first degree piecewisepolynomial function that can be derived by convolution of the nearest neighbourkernel with itself:

h12(x) = h1

1(x) ∗ h11(x) =

{1− |x| , |x| ≤ 1

0 , else(2.3)

Because of its low-cost, this interpolation filter is very popular for 2D and 3D ap-plications, where it is known as bi-linear and tri-linear interpolation, respectively.The image quality is still rather low due to the discontinuity of the first derivativeof the convolution kernel.

There are several methods to construct higher order piecewise polynomialfunctions. Using the same approach as described before, a piecewise polyno-mial of degree n can be derived by the convolution of a piecewise polynomial ofdegree n− 1 with the nearest neighbour kernel.

The family of filter kernels derived by repeated convolution of the nearestneighbour kernel is known as B-spline filters. The two most popular members ofthis family are the third and fourth order B-spline filters, known as the quadraticand cubic B-spline, respectively. As B-spline filters of order three and up are notinterpolating, they are not optimal for direct use in Equation 2.1.

With the generalized interpolation techniques [72], polynomial functions thatare not interpolating, but also filter the original data, can be modified into inter-polating functions. With this technique, an inverse filter is used to pre-filter the


input data, after which the polynomial function is used to calculate the final inter-polation result. Consequently, the data interpolated with an nth degree B-splinefunction is (n− 1) times continuously differentiable, while the original values areretained.

In the digital domain, the pre-filter is an all-pole IIR filter. The combination ofthe pre-filter with the corresponding B-spline function of degree n is an IIR filterthat is known as the cardinal spline of degree n [69]. For higher degrees, thesecardinal splines converge strongly to the ideal low-pass filter, i.e. the sinc function.Since the cubic B-spline function is not interpolating, we shall apply the cardinalcubic spline function, i.e. its interpolating counterpart, when we refer to cubicB-spline interpolation. The required IIR pre-filter can often be approximated witha 15- to 20- tap FIR filter [83].

Beside the B-spline functions, other piecewise polynomial functions can begenerated. Mitchell and Netravali described a parameterized convolution kernelof a family of cubic convolution kernels that use four neighbouring points [68]:

h13(x) =

1

6

(12− 9B − 6C)|x|3+(−18 + 12B + 6C)|x|2 + (6− 2B) |x| ≤ 1(−B − 6C)|x|3 + (6B + 30C)|x|2+(−12B − 48C)|x|+ (8B + 24C) 1 ≤ |x| ≤ 2

0 else

(2.4)

Here, (B, C) are parameters that can be controlled by the user. With B = 1 andC = 0, it becomes identical to the cubic B-spline. When B = 0, the resultingkernels are interpolating for all values of C. An interpolating cubic spline func-tion that is frequently used for image interpolation is Keys’ cubic spline kernel[66] that has the parameters B = 0 and C = 0.5. The cubic spline functionwith parameters B = 1/3 and C = 1/3 was proposed by Mitchell and Netravalifor good quality image interpolation [68]. Although this cubic-spline functionis not interpolating and should hence be combined with a pre-filter to obtain aninterpolating kernel, it is normally used in the direct form like the interpolatingkernels. Because this is also the way it was evaluated and proposed by Mitchelland Netravali, we will also use this filter in the direct (non-interpolating) form forevaluation.

For 2D interpolation, 1D interpolation functions can be applied in two direc-tions separately. The resulting 2D interpolation kernel will therefore be:

hn(x, y) = h1n(x) ∗ h1

n(y) (2.5)

2.2. ADVANCED RESOLUTION UP-CONVERSION ALGORITHMS 25

HD-input2-D

Down-scale

Storecoefficients forevery class in

LUT

HD-derivedSD-image

Classcode

Up scalefilter

coefficients

LMS-optimisecoefficients per

class

ADRC

Figure 2.2: The training process according to Kondo. The SD video signal is a down-scaledversion of the HD signal. Sample pairs are extracted from the training material and classificationof samples is done by ADRC. Using Least Mean Square algorithm within every class, optimalcoefficients are figured out and stored in a Look Up Table (LUT).

2.2 Advanced resolution up-conversion algorithmsIn this section, we shall summarize the more recently developed non-linear res-olution up-conversion techniques, which aim at outperforming the earlier linearmethods. The first two methods fall into a first category that are based on explicitclassification of the data in the filter aperture. The third method, using neuralnets, also falls into that category and performs an implicit classification. The lasttwo methods fall into a second category, and they extend the frequency spectrum,obtained by linear up-scaling, by adding phase-coherent harmonics to the videosignal.

2.2.1 Kondo et al. (TK)Basically, Kondo’s method is a data dependent interpolation filter [74]. The mo-mentary filter coefficients, during interpolation, depend on the local block contentof the image, which can be classified based on the pattern of the block. To ob-tain the filter coefficients, a training process should be performed in advance. Asshown in Figure 2.2, the training process employs both the HD-video and the SD-video as the training material and uses the Least Mean Squares (LMS) criterionto get the optimal coefficients, which is computationally intensive due to the largenumber of classes. Fortunately, it needs to be performed only once. In practicalsystems, classification of luminance blocks can be realized by using Adaptive Dy-namic Range Coding (ADRC) [84]. When encoding each pixel into 1-bit, Q, withADRC:

Q = b FSD − FMIN

FMAX − FMIN

+ 0.5c (2.6)

Here FSD is the luminance value of the SD pixel. FMAX and FMIN are the


2i

2(i+1)

SD Pixel

2(j+1) 2(j+2) 2(j+3)

2(i+4)

2(i+2)

2(i+3)

2(j+4)

F00 F01 F02

F10

F11

F12

F20 F21 F22

2(j+5)

2(i+5)

A B

C D

HD Pixel

2j

Figure 2.3: Aperture used in Kondo, Atkins [75] methods and Neural Network. The white pixelsare interpolated HD pixels (FHI ). The black pixels are SD pixels (FSD), with Fkl a shorthandnotation for FSD(2(i+2k)+1, 2(j+2l)+1), etc. The HD pixel A that corresponds to FHI(2(i+2), 2(j + 2)) is interpolated using nine SD pixels (F00 up to F22).

maximum and minimum luminance values of the pixels in the classification aper-ture respectively. b·c is the floor operator. Other classification techniques canbe thought of, but the advantage of ADRC is its simple implementation. UsingEquation 6.2, the number of classes decreases from 2569 to 29 with an aperturecontaining nine SD pixels, as shown in Figure 2.3.

It has been shown in [85] that if the image data is inverted, the coefficients inthe LUT should remain the same. By combining the two complementary classes,the size of the LUT reduces with a factor of two without loss of image quality.

Let FHD be the luminance value of the original (not the up-converted) HDpixels and FHI be the value of the interpolated ones, which is a weighted sum ofthe nine SD pixels in the interpolation window. The equation to interpolate pixelson position A is:

FHI(2(i + 2), 2(j + 2)) =2∑

k=0

2∑l=0

wkl,cFSD(2(i + 2k) + 1, 2(j + 2l) + 1) (2.7)

where wkl,c are weights for class c. Suppose one class contains in total a numberof t samples in the training process, then the error of the pth interpolation sample


is:

ep,c = FHD,p − FHI,p

= FHD,p −2∑

k=0

2∑l=0

wkl,cFSD,p(2(i + 2k) + 1, 2(j + 2l) + 1)

(p = 1, 2, ..., t) (2.8)

Consequently, the total (squared) error of this class can be expressed as:

e2c =

t∑p=1

e2p,c (2.9)

To find the minimum, we calculate the first derivative of e2 with respect to each w

∂e2c

∂wkl,c

=t∑

p=1

2(∂ep,c

∂wkl,c

)ep,c = −t∑

p=1

2FSD,p(2(i + 2k) + 1, 2(j + 2l) + 1)ep,c

(k = 0, 1, 2; l = 0, 1, 2) (2.10)

The minimum occurs when the first derivative is zero, which leads to the followingequation for each class:

X00,00 X00,01 · · · X00,22

X10,00 X10,01 · · · X10,22

X20,00 X20,01 · · · X20,22...

......

...X22,00 X22,01 · · · X22,22

w00,c

w01,c

w02,c...

w22,c

=

Y0

Y1

Y2...

Y8

(2.11)

The coefficients wkl,c can be obtained by solving Equation 2.11 for each class.Here,

Xkl,qr =t∑

p=1

FSD,p(2(i + 2k) + 1, 2(j + 2l) + 1)

·FSD,p(2(i + 2q) + 1, 2(j + 2r) + 1)

(k, q = 0, 1, 2; l, r = 0, 1, 2) (2.12)

and:

Y3k+l =t∑

p=1

FSD,p(2(i + 2k) + 1, 2(j + 2l) + 1)

·FHD,p(2(i + 2), 2(j + 2))

(k = 0, 1, 2; l = 0, 1, 2) (2.13)


Filtering Class 1

Filtering Class 2

Filtering Class M

Classification ParametersVAR

RV0...RVM-1CW0...CWM-1

...

Classifya0,k,b0,k

a1,k,b1,k

aM-1,k,bM-1,k

Hi-ResImage

p(0|y)

p(1|y)

p(M-1|y)

yr

Low-ResImage

Figure 2.4: Interpolating process in CBA. The input SD signal is first filtered with a number ofM of filters and then multiply with the possibility that the input belongs to a certain class.

2.2.2 Atkins et al. (CBA)In C. B. Atkins’s method (CBA) [75], the high resolution output image results as aweighted sum of the outputs of a number of linear filters based on classification, asshown in Figure 2.4. While in TK, ADRC encoding is proposed to obtain the classindex, CBA uses Expectation Maximization (EM ) algorithm [86, 87] for classi-fication. The following non-linear transform is used to generate the classificationvector ~y:

~y =

{~y′‖~y′‖−0.75 , ~y′ 6= 00 , else

(2.14)

Here, the vector ~y′ is an eight-element vector constructed by stacking the eightdifferences between the center pixel and its neighbours in the classification aper-ture2, which is the same as in TK (Figure 2.3). The current classification vectoris compared with the representative vector

−−→RVc of the cth class to calculate the

probability that the current image block belongs to class c:

p(c|~y) =CWcexp

(−‖~y−

−−→RVc‖2

2V AR

)∑M−1

d=0

(CWdexp

(−‖~y−

−−→RVd‖2

2V AR

)) (2.15)

2The scalar -0.75 in Equation 2.14 has been obtained experimentally by Atkins.


During interpolation, the high-resolution pixel at position A (Figure 2.3) iscomputed as follows:

FHI(2(i + 2), 2(j + 2)) = (2.16)M−1∑c=0

2∑k=0

2∑l=0

(ac,klFSD(2(i + 2k) + 1, 2(j + 2l) + 1) + bc,(i,j))p(c|~y)

Here, a is a 4 × 9 matrix and b is a 4 × 1 matrix of class c, with each row cor-responding to the coefficients for interpolating pixel A to D in Figure 2.3 respec-tively. As Figure 2.4 shows, each input low-resolution image block is first filteredwith the filters from each class, the filter outputs are then multiplied with the cal-culated weighting coefficients p(c|~y) and the results are finally summed togetherto provide the output high-resolution pixel FHI .

The representative vectors−→RV that each of which represents a class, the class

weights CW , indicating the global probability of that class, and the varianceV AR, which indicates the average distance of each representative vector and themean of the representative vectors, used in Equation 2.15, are classification pa-rameters all obtained from the training process using the EM algorithm to findthe optimal classification parameters in an iterative way. During the training, anumber NCV (typically between 25,000 and 100,000) of classification vectorsare first extracted from the SD training material. The initial values for the itera-tion are selected randomly from the classification vectors. The other initial valuesare as follows:

CW(0)j =

1

M(j = 0, 1, ...,M − 1) (2.17)

V AR(0) =1

NCV × 8

7∑m=0

NCV−1∑i=0

(yi(m)− yi(m))2 (2.18)

Here, yi(m) is the mth element in the ith classification vector, and is the samplemean of the mth element of all the classification vectors. A log likelihood LL(kl)is then computed for the initial values:

LL(~yi, kl) = log

[M−1∑j=0

CW(kl)j

(2πV AR(kl))exp

(−1

2V AR(kl)‖~yi −

−−→RVj

(kl)‖2

)](2.19)

LL(kl) =NCV−1∑

i=0

LL(~yi, kl) (2.20)

where the index kl indicates the klth iteration.


Updates for the next iteration are then made:

p(j|~y, kl − 1) =CW

(kl−1)j exp

(−‖~y−

−−→RVj

(kl−1)‖22V AR(kl−1)

)∑M−1

d=0

(CW

(kl−1)d exp

(−‖~y−

−−→RVd

(kl−1)‖22V AR(kl−1)

)) (2.21)

NCV(kl)j =

NCV−1∑i=0

p(j|~yi, kl − 1) (2.22)

CW(kl)j =

NCV(kl)j

NCV(2.23)

RV(kl)j =

1

NCV(kl)j

NCV−1∑i=0

~yip(j|~yi, kl − 1) (2.24)

V AR(kl) =1

8

M−1∑j=0

CW(kl)j

NCVj(kl)

NCV−1∑i=0

‖~y −−−→RVd

(kl)‖2p(j|~yi, kl − 1) (2.25)

The new log likelihood is then calculated using Equation 2.19 and 2.20. Theiteration stops when:

|LL(kl)− LL(kl − 1)| < 0.09× ln(8×NCV ) (2.26)

In the second stage of the training, a number NFDV (typically 1,000,000) offilter design vector triplets {~yi, ~Li, ~hi}NFDV−1

i=0 are extracted from the SD and HDtraining materials. ~Li is the vector that contains all the pixels in the SD aperture(F00 up F22 to in the aperture shown in Figure 2.3), ~yi is the classification vectorderived from the current aperture and ~hi is the vector that contains the correspond-ing four HD pixels (HD pixels A, B, C and D in the aperture shown in Figure 2.3).For class j, the interpolation parameters ~a and~b are computed as follows:

NFDVj =NFDV−1∑

i=0

p(j|~yi) (2.27)

mLj =1

NFDVj

NFDV−1∑i=0

Lip(j|~yi) (2.28)

mhj =1

NFDVj

NFDV−1∑i=0

hip(j|~yi) (2.29)

GLLj =1

NFDVj

NFDV−1∑i=0

(Li −mLj)(Li −mLj)′p(j|~yi) (2.30)


Input OutputHiddenLayer

00w

01w

10 −Nhw

10 −Nhb

0h

1h

1−Nhh11 −Nob

10w

.

.

.

+

+

+

.

.

.

+

+

+

01b

00b

11w

11 −Now

10b

11b

FI0

FI1

FINi

FO0

FO1

FONo

Figure 2.5: An interpolation neural network architecture with one hidden layer.

GhLj =1

NFDVj

NFDV−1∑i=0

(hi −mhj)(Li −mLj)′p(j|~yi) (2.31)

Aj = GhLj(GLLj)−1 (2.32)

bj = mhj −GhLj(GLLj)−1mLj (2.33)

Aj and bj are 4-row matrices of class j with each row corresponding to thecoefficients for interpolating pixel A to D in Figure 2.3 respectively, as shown inEquation 2.17.

The number of classes, and therefore the number of representative vectors andclass weights, in CBA is fixed, typically around 100. Unlike the classificationmethod used in TK, this number does not depend on the size of the classificationwindow.

2.2.3 Plaziac et al. (NP)Neural networks have been applied for various image processing applications[76, 78, 77]. In essence, they are non-linear adaptive filtering methods. In con-trast with the two classification-based techniques discussed above, the neural net-work performs an implicit classification and introduces non-linearity. The mostcommonly used “back-propagation algorithm” uses the MSE criterion, similar toKondo’s method, for optimising the parameters of the network.


Figure 2.5 shows a multi-layer perceptron Neural Network architecture pro-posed by N. Plaziac in [76] with one hidden layer, 24 inputs, 16 hidden units andfive outputs. The neural network in itself is very flexible, without any limitationson the architecture or the number of the hidden layers.

Suppose there are Nh hidden neurons in the hidden layer, as shown in Figure2.5, the output of this hidden layer is:

~h = tanh(~w0 · ~FI +~b0) (2.34)

Here, ~FI is the input vector, which is composed of the Ni input SD pixels inthe interpolation aperture. ~w0 is the weight matrix between the input layer and thehidden neurons,~b0 is the bias vector to the hidden layer. Function tanh introducesthe non-linearity. The output of the network gives the four interpolated HD pixels~FHI shown in Figure 2.3,

~FHI = ~w1 · ~h +~b1 (2.35)

~w1 is the weight matrix between the output layer and the hidden layer. ~b1 isthe bias to the output. The parameters ~w0, ~b0, ~w1, ~b1 have been determined ina prior off-line training process using a back-propagation algorithm. An exam-ple of a suitable optimisation algorithm is the Levenberg-Marquardt method [88]and available in the Matlab software package. The aperture proposed in [76] dif-fers from the interpolation apertures that we discussed for Kondo’s and Atkins’method. By using the same aperture, as shown in Figure 2.3, we do not changethe essence of the technique, but enable a fairer comparison among the individualalgorithms. Consequently, our implementation of the neural network method hasnine inputs, six hidden units and four outputs.

2.2.4 Li et al. (XL)While the previous methods perform the coefficient optimisation in an off-linetraining, Li’s method (XL) [79] does the optimisation on-the-fly. The theory be-hind XL is Wiener filtering, the interpolation coefficients are derived by applyingan LMS algorithm on the local low-resolution pixel grid. A possible performanceadvantage over TK and CBA is that the local image neighbourhood needs no sim-plification, using a limited number of classes. On the other hand, no original HDimage is available for optimisation and more calculations are required.

To solve the problem that no original image is available, Li assumes that theedge orientation does not change with scaling, recognising the resolution-invariantproperty of edge orientation. Therefore, the coefficients can be approximated fromthe low-resolution image within a local window by using the LMS method.

The real implementation of the algorithm is divided into two steps. In the firststep, as shown in Figure 2.6, only pixels FHI(2(i + m), 2(j + n)) (m mod 2 = 1


A

B

CF

00F

01F

02F

03F

04F

05

F10 F11 F12 F13 F14 F15

F20 F21 F22 F23 F24 F25

F30 F31 F32 F33 F34 F35

F40 F41 F42 F43 F44 F45

F50 F51 F52 F53 F54 F55

2i

2(i+1)

2j 2(j+1) 2(j+2) 2(j+3)2(j-3) 2(j-1)

2(i+4)

2(i+6)

2(i-1)

2(i-4)

2(i-2)

2(i-3)

2(i+2)

2(i+3)

2(i+5)

2(j-4) 2(j-2) 2(j+4) 2(j+5) 2(j+6)

Figure 2.6: Aperture used in new edge-directed interpolation. A is the aperture with four SDpixels involved in interpolation. B is the aperture with the SD pixels used to calculate the fourinterpolation coefficients. C is the aperture that includes all the diagonal neighbours of the SDpixels in B. Here, Fkl is a shorthand notation for FSD(2(i + 2k − 4), 2(j + 2l − 4)).

and n mod 2 = 1) are calculated using a 4th order interpolation:

FHI(2(i + 1), 2(j + 1)) =1∑

k=0

1∑l=0

w2k+lFSD(2(i + 2k), 2(j + 2l)) (2.36)

Denoting M as the pixel set on the SD-grid, used to calculate the four weights,the sum of square errors (SSE) over set M in the optimisation can be written asthe sum of squared differences between original SD-pixels FSD and interpolatedSD-pixels FHI :

SSE =∑i,j

(FSD(2i + 2, 2j + 2)− FHI(2i + 2, 2j + 2))2 (2.37)

which in matrix formulation becomes:

SSE = ||~y − ~wC||2 (2.38)

Here, ~y contains the SD-pixels in M (pixels FSD(2(i + 2k − 4), 2(j + 2l − 4)),with k, l = 1, 2, 3, 4) and C is a 4 × M2 matrix whose kth row contains the fourdiagonal SD-neighbours of the kth SD-pixel in ~y. The weighted sum of each rowdescribes a pixel FSI , as used in Equation 2.36.


FSD FSDH FHDHM FHDH FHD

Extract HFGaussianup-scaling

Clip &amplify

Extractharmonics

Gaussianup-scaling

Figure 2.7: Flowchart of the non-linear high-frequency extrapolation (HG) method. The high-frequency components are extracted and up-scaled with a non-linear procedure.

To find the minimum SSE, the derivative of the SSE over ~w is calculated:

∂(SSE)

∂ ~w= 0 (2.39)

−2~yC + 2~wC2 = 0 (2.40)~w = (CT C)−1(CT~y) (2.41)

In smooth areas (CT C)−1 may not be fully ranked, in which case there is no an-swer for ~w. Therefore, smooth-area detection is performed prior to the calculationand plain averaging is used in these areas.

Half of the HD pixels are obtained in this first step. For the rest of the HDpixels, the procedure is repeated, but on a 45 degrees rotated grid. Details can befound in [79].

2.2.5 Greenspan et al. (HG)H. Greenspan’s technique (HG) [73] generates an HD output image by addingphase-coherent higher harmonics of the high frequencies in the SD image to theresult of the linearly up-scaled SD image. The phase-coherent higher harmonicsare extrapolated from the high-frequency components of the SD image using anon-linear (clipping) process. Figure 2.7 shows the flowchart.

If we refer to the SD input image as FSD, its high-frequency componentsFSDH can be extracted using:

FSDL = LPF ∗ FSD (2.42)

FSDH = FSD − FSDL (2.43)

Here LPF is a 2D separable low-pass filter realized by applying the Gaussian ker-nel (1/16, 1/4, 3/8, 1/4, 1/16) both horizontally and vertically. Denoting FHDHM

as the linearly up-scaled signal of FSDH , the higher harmonics FHDH are pre-dicted by extracting the harmonics from a clipped version of FHDHM :

FHDH = HPF (s× (CLIP (FHDHM))) (2.44)


where s is a scaling constant and CLIP (x) is the following function:

CLIP (x) =

T x > Tx −T ≤ x ≤ T

−T x < −T(2.45)

Here, T = (1 − c) × max(FSDH), with c a clipping constant ranging between 0and 13. The high-frequency extracting process in Equation 2.44 is performed inthe same way as extracting FSDH from FSD shown in Equation 2.42 and 2.43 butat a scaled grid. Adding FHDH to the linearly up-scaled version of FSD gives thefinal output image FHD.

In [73], the linear up-scaling is realized by first inserting zeros into the SDsignal at alternative lines and pixels, followed by filtering this up-sampled signalusing a Gaussian type low-pass filter with coefficients (1/8, 1/2, 3/4, 1/2, 1/8) bothhorizontally and vertically.

2.2.6 Tegenbosch et al. (JT)Scaling a SD frame to HD resolution in a linear way, according to Section 2.1,extends the spatial frequency spectrum by a factor of two, but the upper half is“empty”. Linear sharpening filters, by definition, can only enhance but not createhigher frequencies.

Non-linear techniques can be used to create these higher frequencies and fillthe upper half of the frequency spectrum. A non-linear sharpening technique, Lu-minance Transient Improvement(LTI) [89], makes edges steeper with a filter thatamplifies high frequencies but prevents strong overshoots by clipping the resultbetween the pixel values at the beginning and the end of each edge (see Figure2.8). The cascade of linear up-scaling and LTI is referred to as a “non-linear up-scaler”.

The key part of Bellers’ method, i.e. the LTI algorithm, is essentially a one-dimensional (1D) technique. This technique can be extended to two dimensions inseveral ways. In the straightforward way described by Bellers, LTI operates in thehorizontal direction first and successively in the vertical direction, much like theextension of 1D scaling to 2D as described in Section 2.1. For better image quality,it was proposed in [80] to apply the LTI method perpendicular to the edge. Thisinvolves estimating the local edge orientation, and “rotating” the line, along whichLTI operates, in a direction perpendicular to the edge. In terms of image quality,this ‘rotating’ 1D LTI yields best results for 2D edges. To reduce computationalcomplexity, an alternative edge orientation dependent LTI was proposed in [80].In this design, LTI operates, in parallel, in the horizontal and vertical direction.

3The constants s and c are experimentally set to 1.5 and 0.55 for optimal MSE.


Figure 2.8: Principle of the LTI algorithm, demonstrated on a model edge. LTI operates on asignal (dotted line) by applying linear sharpening (dashed line), and clipping between start andend level of an edge (large dots), yielding the final result (solid line).

The two results are then combined using a weighted sum, with weights dependingon the edge orientation.

2.3 EvaluationTo evaluate the algorithms in our overview, four still images and seven video se-quences are selected, covering a large variety of material including sports, movie,cartoon, etc. The set of stills contains an image with high contrast edges in alldirections (Bicycle), images with both distinct edges and fine details (Lighthouse,House) and an image full of details (Peacock). The test set of sequences includea sequence with detailed lettering and fine structure (Siena), a sequence with veryfine details in cloth (Office), a sequence that has a clear background and smallmoving objects with mosquito noise due to digital compression (NYstill), a se-quence with low contrast edges (Sailing), a cartoon sequence (Toystory), and twosequences from an HD camera with clearly visible noise (Shields and Stockholm).

As mentioned in the introduction, we shall rank the algorithms using both ob-jective and subjective criteria. For the objective ranking the MSE-metric is used,while we conducted a subjective assessment experiment to produce the subjectiveranking. Both experiments shall be discussed separately in the following sub-sections.

2.3.1 Objective evaluation

In order to enable an MSE based comparison, for which we need an HD-referenceand an up-scaled HD-version of the same material, we first down-scale the HD

2.3. EVALUATION 37

HD-input

2-DDown-scale

Up-conversionunder test

HD-derivedSD-image

HDinterpolated

Calculate errormeasure

Figure 2.9: Evaluation flowchart of various spatial resolution up-conversion techniques.

video sequence by a factor of two in each dimension and then scale it up to theoriginal size with the method under test. The original and the up-converted HDvideo sequences are then compared using the MSE criterion. Figure 2.9 depictsthe evaluation process.

Please note that the originals, that in the evaluation act as the HD-reference,are high-quality test sequences, but registered at a resolution of (H×V ) 720×576pixels (progressive). This accelerates the testing and makes it easier to obtain goodtest data, while this “down-scaled experiment”, in our opinion does not affect therelative performance.

Down-scaling

Generally, for down-scaling, a low-pass anti-alias pre-filter before decimation hasto be used. However, this is not an appropriate model for acquiring a picture froma low-resolution imaging device. A better approximation is obtained by simplyaveraging four HD pixels, emulating an SD-cell [90]:

FSD(2i + 1, 2j + 1) =1∑

k=0

1∑l=0

FHI(2(i + k), 2(j + l))

4(2.46)

Since a camera usually has an aperture correction filter to compensate for thefinite size of the pixels, it seems appropriate to additionally apply a high-frequencyenhancement filter in the evaluation. A 3-tap “aperture correction” filter is used:

Fout(i, j) = −βFSD(i− 1, j) + (1 + 2β)FSD(i, j)− βFSD(i + 1, j) (2.47)

where β = 0.07 gives the best compensation for the low-pass filtering of the 4-point averaging filter. In Equation 2.47 only the horizontal aperture correctionfilter is defined, but the same filter is also used to filter the down-scaled image inthe vertical direction. Note that the down-scaling shifts the SD output over halfan HD-pixel distance with respect to the input HD grid.


A AA

B B

C C

A AA

B B

C C

A AA

Figure 2.10: Shifting the HD pixel grid. Pixels on grid A are original HD pixels. Pixels ongrid B and C are 0.25-pixel and 0.75-pixel shifted versions of A respectively. The down-scaling isbased on the shifted HD pixels on grid B.

Pre-processing

From the descriptions in Section 2.2, we conclude that different up-scaling andresolution up-conversion methods result in different HD-pixel grids. Taking intoaccount the displacement introduced by the down-scaling process, the HD imagesas interpolated with TK, CBA, NP, JT, are generated on the original HD-grid. Theremaining methods, XL and HG, introduce a shift of half an HD-pixel distancehorizontally and vertically. Since linear methods illustrated in Section 2.1 can beinterpolated on any grid, we put bilinear and bi-cubic interpolation into the formergroup while B-spline goes into the latter. To cope with this two group situation ina fair comparison, two copies of the selected video test sequences were generated.The first is shifted over 0.25-pixel and the other over 0.75-pixel. This results intwo “originals” on different grids that have the same picture quality.

Shown in Figure 2.10, pixels on grid A are original HD pixels, the other pixelsare interpolated using a FIR interpolation filter4. Pixels on grid B and C are onequarter and three quarter pixel shifted compared to A respectively. The down-scaling uses the 0.25-pixel shifted HD material to generate the SD sequence. ForTK, CBA, NP, JT, bilinear and bi-cubic interpolation, after up-scaling or res-olution up-conversion, the pixels are lying on the same grid as the 0.25-pixelshifted ones, hence the up-converted result should be compared with the 0.25-pixel shifted copies. For the other methods, there is a half HD-pixel shift afterup-scaling or up-conversion and the result should be compared with the 0.75-pixelshifted version of the HD sequences.

4The coefficients of this FIR filter (obtained from Matlab) are:0.0014, 0.0020, 0.0015, 0, -0.0090, -0.0135, -0.0103, 0, 0.0333, 0.0511, 0.0404, 0, -0.0947, -0.1546, -0.1321, 0, 0.2927, 0.6150, 0.8770, 1.0000, 0.8770, 0.6150, 0.2927, 0, -0.1321, -0.1546,-0.0947, 0, 0.0404, 0.0511, 0.0333, 0, -0.0103, -0.0135, -0.0090, 0, 0.0015, 0.0020, 0.0014.

2.3. EVALUATION 39

Up-conversion and MSE calculation

After up-conversion of the down-scaled video/image by a factor of two in bothdimensions using the linear and advanced up-conversion methods, the MSE be-tween the reference and the up-converted video/image(s) can be calculated using:

MSE(n) =1

N

∑i,j

(F (i, j)−GF (i, j))2 (2.48)

Here, F (i, j) and GF (i, j) are the luminance values of the reference and the up-converted image respectively. N represents the number of pixels in the image.

The techniques being evaluated include all the advanced up-conversion meth-ods presented in Section 2.2 and a selection of linear methods shown in Section2.1, which are bilinear, cubic-spline with both Keys’ and Michell-Netravali’s ker-nel, and the 23-tap FIR approximation of the cardinal cubic B-Spline filter.

2.3.2 Subjective evaluation

For subjective comparison, a paired comparison [91] of the up-scaled images andvideo sequences was performed. As no reference is necessary for this paired com-parison, there is no need to down-scale or shift the SD version and the original SDmaterial are used as input directly for all scaling methods.

The paired comparison experiment was set up in a test room with an ambientillumination of 50 lux, measured at the screen in the direction perpendicular tothe viewer. Twenty people who participated in this evaluation were asked to sitin front of two LCD-TVs at a distance of three times the display height (1.2m).The group of viewers included an equal number of expert and non-expert view-ers. The resolution of the LCD-TVs used is 1280 × 720. The up-scaled imagesand video sequences were in 4:2:2 format. Because we concentrate on the lumi-nance only, the chrominance channels were up-converted with the same (bilinear)interpolation in all cases. The up-scaled materials were cropped to fit the displayresolution. Two up-scaled versions of each image/sequence were shown on thetwo screens next to each other and each viewer was asked to select the one thathe/she perceived as having the best image quality. Since such a subjective as-sessment is much more time consuming than the objective evaluation, we had tolimit the number of algorithms to four. The four assessed methods are TK, whichis an MSE optimised classification-based method, B-spline, which represents thetraditional linear interpolation methods, XL, which uses the largest amount of cal-culations due to the online optimisation of the filter coefficients, and JT, which incontrast to the other methods aims at optimal subjective image quality instead ofminimal MSE.


C<B<A<D

C<B<A<D

C<A<B<D

A<D<B<C

Figure 2.11: The result of subjective evaluation using paired comparison. A, B, C and D indicatethe four evaluated methods that are B-spline, TK, XL and JT in correspondence. A higher valueon the quality scale means a higher preference by the viewers. An underline indicates that thedifference between the methods above it is not significant.

2.3.3 Results and comparison

Table 2.1 gives the MSE scores of individual sequences. Over all, TK’s methodgives the lowest average MSE score. Apart from the Office and Stockholm se-quences, TK’s method also gives the lowest MSE score on the individual imagesand sequences. TK’s method is followed by CBA’s method with a small mar-gin. These two methods form a first group that score significantly better thanthe rest. A second group of methods that score almost equal in terms of averageMSE is formed by the cubic-spline interpolation with Keys’ kernel, the 23-tapB-spline filter, and HG’s method closely followed by NP’s method. The B-splinefilter performs the best on the Office sequence, showing its advantage on imageswith low contrast details that are not too much distorted by overshoots. Bilin-ear, cubic-spline interpolation with Mitchell-Netravali’s kernel, and XL’s methodform a third group in this objective comparison with an average MSE score that issignificantly higher than that of the second group. JT’s method gives the highestMSE score due to the non-MSE optimal enhancement involved.

The result of the subjective evaluation is found to be image content dependentand can be categorized into four groups. Figure 2.11 shows the paired comparisonevaluation results for each group separately. Group one consists of the majority of

2.4. CONCLUSION 41

the images and sequences, in which JT’s method is preferred. B-spline interpola-tion comes in second place, although there is not a significant difference betweenB-spline and TK’s method. XL’s method scores the worst in this group. Grouptwo has in essence the same order of preference as group one, only the perceiveddifferences are more distinct. In group three, the rank of TK’s method and B-spline is turned over compared with group one. However, the difference betweenthese two methods is again not significant. Group four consists of only one se-quence (NYStill). The results show that there is no significant difference betweenthe four methods for this sequence. This probably due to the fact that the sequencecontains severe mosquito noise, which is enhanced by JT’s method.

To give the reader a subjective impression, Figure 2.12 shows the detailed areaof an original picture and its up-converted counterparts using different up-scalingand resolution up-conversion algorithms. The result of bilinear interpolation israther blurred compared to the other methods. The cubic-spline interpolation withKeys’ kernel, cubic-spline interpolation with Mitchell-Netravali’s kernel, 23-tapB-spline FIR interpolation filter and HG’s method visually perform well in finestructured areas, but along edges, overshoots are clearly visible. TK’s and CBA’smethods suffer less from overshoots at the edges and perform also well in finestructured areas. NP’s method introduces overshoots that are visually similar tothe cubic-spline interpolation with Mitchell-Netravali kernel. XL’s method resultsin clear and sharp edges but give strong artefacts in fine structured areas. JT’smethod gives the most sharp image at the expense of strong overshoots and clearlyvisible enhancement of alias.

2.4 ConclusionWe have presented an overview of spatial up-scaling and resolution up-conversiontechniques, including conventional linear methods and some recently developednon-linear algorithms. Figure 2.13 depicts a tree structure of the various videoup-scaling and resolution up-conversion techniques. The traditional linear up-conversion methods try to approximate the Sinc interpolation function, which isoptimal in theory. The more advanced methods aim at recovering the missingspectrum. Greenspan extrapolates the high frequencies of the up-converted im-age from its counterpart in the low-resolution version, by generating harmonicsthrough clipping the available high frequency part of the spectrum. Kondo adaptsthe interpolation filter in a global MSE optimal way, using ADRC for image con-tent classification. Plaziac also does the interpolation in an MSE optimal way,but the classification of the image content is done implicitly by using the NeuralNetwork method. Atkins uses the EM algorithm for image content classification,which is widely used in pattern recognition for unsupervised data clustering. The


(C) Bi-cubic Keys

(D) Bi-cubic MN

(B) Bilinear

(E) B-Spline (F) TK

(I) XL

(J) HG

(H) NP

(A) Original

(K) JT

(G) CBA

Figure 2.12: Image details processed with different up-scaling and resolution up-conversionmethods: (A) Original picture, and up-scaled version using (B) Bilinear interpolation, (C) Cubic-spline interpolation with Keys kernel, (D) Cubic-spline interpolation with Mitchell-Netravali ker-nel, (E) 23-tap B-spline filter, (F) TK, (G) CBA, (H) NP, (I) XL, (J) HG and (K) JT.

2.4. CONCLUSION 43

LinearMethods

AdvancedMethods

Bilinear (BIL)

Bi-cubic

B-Spline (BS)...

FrequencyDomainMethods

Greenspan (HG)

Tegenbosch (JT)SpatialDomainMethods Content

AdaptiveMethods

ClassificationBased

WithoutClassification

Li (XL)

ExplicitClassification

ImplicitClassification Plaziac (NP)

Kondo (TK)

Atkins (CBA)

Figure 2.13: Tree structure of the video up-scaling and up-conversion techniques.

theory behind Li is Wiener filtering. The interpolation coefficients are derived byapplying the LMS algorithm on the local low-resolution pixel grid. Tegenboschdoes the up-conversion in a subjective optimal way, using sharpening and peakingon luminance transients perpendicular to the edge.

From our evaluation, we conclude that conventional linear methods either re-sult in blurred images (like bilinear interpolation and cubic-spline interpolationwith Michell-Netravali kernel) or introduce strong over-shoots on edges (the B-spline interpolation filter and the cubic-spline interpolation with Keys’ kernel).Most of the non-linear methods evaluated perform better on edges with goodsharpness and little to no overshoots (TK, CBA, XL), although some methodssuffer from artefacts in fine details (XL). The other two non-linear methods (NPand JT) perform comparable to the linear methods that give strong overshoots. JTgives even stronger overshoots due to the enhancement, however, this results inthe perceived sharpness.

The MSE comparison and subjective assessment reveal that the optimal MSEdoes not guarantee the best perceived image quality. However, it can be usedas the starting point for further sharpness enhancement. From the MSE pointof view, the content-adaptive methods, or the classification-based methods, showtheir advantage at minimizing the MSE score. It is expected that when in JT’smethod such a content-adaptive method is used instead of a linear scaling filter,the subjective image quality will be even higher, although one should be carefulnot to enhance the possible artefacts generated by such a non-linear scaling filter.All scaling methods described in this chapter can benefit from the enhancementin JT’s method, but given the results of the objective comparison, we believe thatthe combination with TK’s method will give the best subjective results. However,this assumption requires further research in the future.


Table 2.1: MSE scores on individual sequences of the up-scaling and resolution up-conversionalgorithms

MSE BIL KEYS MN BS TKBicycle 89.7 65.0 76.6 61.3 43.8House 304.6 267.5 288.8 276.0 249.9

Lighthouse 67.0 57.5 62.6 57.7 54.0Peacock 367.0 343.3 373.0 349.4 320.8Office 22.9 18.8 21.0 18.4 18.9

NYStill 105.0 84.4 95.3 78.5 76.0Sailing 47.9 40.5 44.4 41.1 37.4Shields 81.2 66.3 74.2 65.6 60.5Siena 101.0 84.4 93.2 85.4 75.2

Stockholm 82.0 66.5 74.7 63.6 64.7Toystory 23.8 18.8 21.2 18.6 14.1Average 120.2 101.2 111.4 101.4 92.3

MSE CBA NP XL HG JTBicycle 47.8 58.9 67.7 64.1 198.0House 255.6 268.1 294.7 270.7 480.3

Lighthouse 55.2 58.9 65.9 57.6 110.0Peacock 330.2 347.1 357.2 347.6 593.5Office 18.7 33.8 35.6 18.8 35.5

NYStill 77.3 84.1 143.3 81.7 171.3Sailing 39.6 46.3 45.7 40.6 94.0Shields 62.0 71.2 77.5 66.2 133.3Siena 78.1 87.0 92.5 84.4 224.9

Stockholm 64.6 69.9 90.5 65.3 138.3Toystory 15.8 21.2 16.4 18.7 57.1Average 95.0 104.2 117.0 101.4 203.3

Chapter 3

Refinements of ResolutionEnhancement

The aim of the image and video resolution enhancement is to recover informationfrom the down-sampled images or video materials. Image restoration has a sim-ilar aim, which tries to recover information from degraded images. Traditionallyimage restoration deals with images degraded by blurring, noise, geometry dis-tortion, etc. [92]. But also image down-sampling can be seen as a degradationprocess in which picture information is lost. Because of the similarity it mightmake sense to tackle the image and video resolution up-conversion problem withapproaches from traditional image restoration. They fall into two main categories[92][93]:

• Wiener filtering where an optimal stationary linear filter, is used to minimizethe mean square error (MSE) between restored image and the ideal one.

• Bayesian strategy that seeks the image of the highest probability given thedegraded data. The target function and the likelihood are related throughBayes’ Rule, which includes the probability distribution of the image, alsoknown as image prior.

Methods that apply some form of Least Mean Square (LMS) filtering for videoresolution up-conversion, are derived from the Wiener filtering approach to im-age restoration. This includes both classification-based method (such as TK) andnon-classification-based method (like XL). The CBA method, also classification-based, is based on Bayesian image restoration. For further refinement these meth-ods use either classification or localized optimisation. Objective evaluation revealsthat the performance of the two classification-based approaches are more or lessthe same. However, methods based on Wiener filtering or using content-adaptiveLeast Mean Square (LMS) filters, have relatively simple implementations. Our

45

46 CHAPTER 3. REFINEMENTS OF RESOLUTION ENHANCEMENT

Restoration FilterW

Input ImageFI

Blurred ImageFB

_

+

Spatial BlurB

Additive NoiseN

DegradedImage FD

RestoredImage FR

Error ImageFE

Figure 3.1: Wiener Filtering for Image Restoration

study concentrates therefore on content-adaptive LMS filtering approaches for im-age and video resolution up-conversion.

In this chapter, we first give a short review of Wiener filtering theory, appliedto image restoration. Then we investigate the LMS filtering application in a videoresolution up-conversion context. This analysis is followed by improvements onthe image and video resolution up-conversion methods proposed by Kondo andLi.

3.1 Wiener filtering

3.1.1 An image restoration model

Figure 3.1 gives a block diagram of an image restoration system. The input imageFI is spatially blurred and then corrupted by additive noise N , which is assumedto be uncorrelated with the input ideal image. The degraded and noisy image isreferred as FD. The degradation process can be formulated as follows:

FD = B · FI + N (3.1)

Here, B is the spatial blur matrix and N is the additive white Gaussian noise.If W captures the filtering operation in matrix format within the digital imageprocessing domain, the restored image is obtained by:

FR = FD ·W (3.2)

The error image signal e is the difference between the ideal input image FI andthe restored image FR:

e = FI − FR (3.3)

3.1. WIENER FILTERING 47

3.1.2 Image RestorationWiener filters can be applied to improve images degraded by blurring and ad-ditive white Gaussian noise. These MSE optimal linear filters are also namedLeast Mean Square filters because they aim at minimizing the mean square errorbetween the filtered and the ideal signal [94]. Wiener filters can be derived forcontinuous image signals in frequency domain and for discrete image signal inthe spatial domain.

FI(u, v), FD(u, v) and H(u, v) denote the Discrete Fourier Transforms (DFT)of the ideal image, the degraded image, and the blurring function, respectively.PI(u, v) and PN(u, v) are the power spectra of the ideal image and the noise sig-nal. The frequency response of the Wiener filter is then given by [94]:

W (u, v) =H∗(u, v)

|H(u, v)|2 + PN (u,v)PI(u,v)

(3.4)

whereH∗(u, v) = |H(u, v)|2H−1(u, v) (3.5)

If noise is omitted, that is PN (u,v)PI(u,v)

= 0, the Wiener filter becomes:

W (u, v) = H−1(u, v) (3.6)

This is equivalent to the inverse filter for the blurring function.For the case of additive white noise and no blurring, the Wiener filter obtains

the following form:

W (u, v) =PI(u, v)

PI(u, v) + σ2n

(3.7)

where σ2n is the variance of the noise.

In the spatial-domain model, presented in Section 3.1.1, the objective is to finda filter W that minimizes the mean-square error,

e2 = E[(FI − FR)T (FI − FR)

](3.8)

where E denotes “expectation”. This can be accomplished by either differentiatinge2 with respect to FR,

∂e2

∂FR

= 0 (3.9)

or employing the orthogonality principle [95]:

E[(FR − FI) (FD − E[FD])T

]= 0 (3.10)


A B

C D

Figure 3.2: Image de-noise with Wiener filter processed with MATLAB. (A) The original image,(B) Image corrupted by white Gaussian noise (with a variance of 0.002), (C) Image de-noised withWiener filtering based on 3×3 aperture, (D) Image de-noised with Wiener filtering based on 5×5aperture. To compensate for the low-pass characteristic of the printing process, we enhanced theimages using a 2D linear symmetrical peaking filter.)

With E[FR] = E[FI ], it leads to the result:

W = (KDD)−1KID (3.11)

where KDD is the covariance matrix of the observation vector and KID is thecross-covariance matrix between the input image and the degraded image.

The effect of using Wiener filters for image noise reduction can be seen inFigure 3.2.

Equation 3.11 is similar to Equation 2.11 and 2.41 from the work by Kondoand Li. The difference lies in that in Kondo’s approach (Equation 2.11), the LMSfiltering is performed on the samples within the same class, while Li (Equation2.41) constrains the LMS filtering to the neighbourhood of the interpolation.

A few remarks are in order here. Wiener filters cannot reconstruct frequencycomponents which have been degraded by noise, they can only suppress them.

3.1. WIENER FILTERING 49

0.010 -0.017

-0.013 0.068

-0.027

0.203

-0.031 0.209 0.784

-0.003

-0.019

0.009

-0.002

-0.112 0.037

0.015 -0.043 -0.107

-0.011 0.020 0.033

0.005 0.001

0.001 -0.006

0.043 0.191

0.192 0.783

-0.020

-0.094

-0.025 -0.093 0.021

0.645 0.154

0.155 0.046

A B C

Figure 3.3: General LMS filters for image resolution enhancement. (A) 2×2 aperture, (B) 3×3aperture, (C) 5× 5 aperture.

Moreover, Wiener filters are unable to restore components for which H(u,v)=0.This means that it is not possible to undo degradation caused by bandlimiting ofH(u,v).

3.1.3 Video up-conversionIn the case of image and video up-scaling and up-conversion, degradation is causedby down-scaling. In the frequency domain the Wiener filter in the absence of noiseis then equivalent to the inverse filter of the down-scale function. In the spatial do-main, linear regression with Equation 3.11 can be used to derive a Wiener filterfor resolution up-conversion. Li and Kondo proposed two different approaches tofurther refine the LMS optimisation. In Li’s approach, the LMS filtering is per-formed in a neighbouring area of the pixel to be interpolated, or based on the localimage content. With Kondo’s approach, the MSE is minimized for a large num-ber of image portions that have the same structure, which is a structure controlledfiltering approach.

A general globally optimised Wiener filter can be obtained by performing lin-ear regression without classification or localization on different apertures. Theglobal optimisation for various apertures leads to the filters with coefficients shownin Figure 3.3, based on the HD-SD grid relation shown in Figure 2.3. Five videosequence are used to test and compare the performance of these filters. Screen-shots from each test sequences are shown in Figure 3.4. Table 3.1 reports thescores on these sequences for various LMS filtering approaches to image resolu-tion up-conversion.

The scores of bilinear interpolation and of a general LMS filter with 2 × 2


Tokyo

Bicycle

SienaKiel

Football

Figure 3.4: Screen-shots from the test sequences.

apertures are very close to each other. The classification-based LMS filter TKon 2 × 2 aperture brings marginal performance gain. Real improvement can beachieved by enlarging the filtering aperture to 3× 3 or 5× 5. The performance ofthe XL method lies between bilinear interpolation and LMS filtering on a 3 × 3aperture, close to the TK method with a 2× 2 aperture. A comparison of bilinearinterpolation, the XL method and the TK method with a 2× 2 aperture is given infigure 3.5. In image portions that contain distinct edges, the XL method enhancesthe edge and gives there a better image quality. In detailed areas, where the XLwill generate artefacts, the TK method based on 2 × 2 aperture performs morerobust. Compared to the TK method based on 3 × 3 aperture, a 2 × 2 aperture isnot sufficient in distinguishing various image patterns.

With the 2 × 2 aperture, localization is more effective than classification in

Table 3.1: MSE scores on individual sequences with (A) Bilinear and various up-conversionapproaches using LMS filter: (B) General LMS filter based on 2× 2 aperture, (C) TK with 2× 2aperture, (D) General LMS filter based on 3× 3 aperture, (E) TK with 3× 3 aperture, (F) GeneralLMS filter based on 5× 5 aperture, (G) XL.

MSE A B C D E F GBicycle 94.90 91.44 85.00 70.72 47.00 66.27 70.46Football 75.30 72.83 71.58 58.90 52.39 56.05 74.53

Kiel 221.77 211.41 209.78 199.96 185.50 203.16 209.33Siena 99.54 94.65 94.50 82.92 73.70 82.66 90.89Tokyo 63.72 60.49 61.75 49.83 49.78 48.13 66.71

3.2. IMPROVING VIDEO RESOLUTION UP-CONVERSION 51

Bilinear XL TK (2x2)

Figure 3.5: Image portions from up-converted images using: Bilinear interpolation LEFT, XLMIDDLE and TK on 2× 2 interpolation and classification aperture RIGHT.

introducing content adaptivity. However, this content adaptivity on 2 × 2 aper-ture performs worse than simply enlarging the aperture without adaptivity, as theresults from a general LMS filter based on 3 × 3 aperture show. There are manypossible ways to improve the performance of the localized LMS filtering approachbased on XL, which will be discussed in the following section.

3.2 Improving video resolution up-conversionThe XL method estimates the interpolation coefficients from the dominant pat-tern within the neighbouring image region, or the optimisation window [96]. Inthe detailed parts, a 2 × 2 image aperture cannot sufficiently distinguish the var-ious image patterns, which leads to inaccurate interpolation coefficients. Threemodifications can be made to improve the performance of the current method:

1. pre-filtering pixels on the SD grid before the LMS optimisation, using alow-pass filter,

Input ImageFI(n)

Interpolation

Anti-aliasFiltering

WienerEstimation

InterpolationCoefficients

OutputImage FO(n)

Figure 3.6: Improving the XL algorithm by low-pass anti-alias filtering.


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−140

−120

−100

−80

−60

−40

−20

0

Normalized Frequency (×π rad/sample)

Mag

nitu

de (

dB)

Magnitude Response (dB)

Figure 3.7: Frequency response of the low-pass anti-alias filter.

2. enlarging the aperture for interpolation,

3. extending the LMS optimisation window in temporal domain.

3.2.1 Low-pass Anti-Alias FilteringTo enhance LMS optimisation low-pass pre-filtering is proposed. A low-pass filterreduces the spectrum of the image signal. As a consequence highly detailed imageconfigurations will smear out. Ideally, low-pass filtering should be performedbefore each step of the LMS estimation, i.e. on the original SD image and onthe intermediate image after the first interpolation. However, since the purposeof the low-pass filtering is to only preserve the main structures of the image, it ispossible to simplify the process in a single step by low-pass filtering a bilinear-interpolated high resolution image. A block diagram of the XL algorithm withanti-alias filtering is shown in Figure 3.6.

In the real implementation, the low-pass filtering is performed using an 11-tap FIR filter, with the following coefficients: -0.0017, -0.0116, -0.0062, 0.0849,0.2579, 0.3533, 0.2579, 0.0849, -0.0062, -0.0116, -0.0017. Figure 3.7 shows thefrequency characteristic of the low-pass anti-alias filter.

3.2.2 Interpolation aperture enlargementExtending the number of pixels on which interpolation is performed in the XLmethod, i.e. increasing the order of the interpolation, may result in a better per-formance. An extension of the interpolation aperture in both the horizontal and


2i

2(i+1)

2j 2(j+1) 2(j+2) 2(j+3)2(j-1)

2(i+4)

2(i-1)

2(i-2)

2(i+2)

2(i+3)

2(j-2) 2(j+4)

Existing SD/HD pixel HD pixel to be interpolated

Figure 3.8: Spatial interpolation aperture enlargement in XL. LEFT: The extended interpolationaperture. RIGHT: Corresponding aperture used in the LMS optimisation on the SD grid.

vertical direction with 12 pixels in total is hereby proposed. It leads to an aperturethat is the best approximation of a circle, as shown in Figure 3.8.

The eight newly added pixels are six pixels away from the center pixel, in thehorizontal or vertical directions, during the LMS optimisation. Using these distantpixels in the LMS optimisation may degrade the performance of the proposed im-plementation. A more compact interpolation aperture is therefore recommended,using four original low-resolution pixels and eight interpolated pixels, as illus-trated in Figure 3.9 (LEFT) and reducing the distance of the aperture of the pixelsused in the interpolation, as illustrated in Figure 3.9 (RIGHT) to prevent a mis-match between the angle of the new pixels’ positions and the ones used in theinterpolation.

In the first interpolation step, the center HD pixel is calculated from four SD(FSD) and eight intermediate (FIM ) pixels in the interpolation aperture as follows:

FHI(2(i + 1), 2(j + 1)) =1∑

k=0

1∑l=0

w2k+lFSD(2(i + 2k), 2(j + 2l))

+1∑

k=0

1∑l=0

w4+2k+lFIM(2(i + 4k − 1), 2(j + 2l))

+1∑

k=0

1∑l=0

w8+2k+lFIM(2(i + 2k), 2(j + 4l − 1)) (3.12)

The eight intermediate pixels used for the interpolation do not belong to the SDgrid, which can be interpolated using a simple bilinear interpolator.

The autocorrelation matrix, or the covariance matrix, C for calculating the 12


2i

2(i+1)

2j 2(j+1) 2(j+2) 2(j+3)2(j-1)

2(i+4)

2(i-1)

2(i-2)

2(i+2)

2(i+3)

2(j-2) 2(j+4)

Existing SD/HD pixel HD pixel to be interpolatedIntermediate pixel

0w1w

2w 3w

8w

5w

6w 7w

4w

9w

10w 11w

Figure 3.9: Compact spatial interpolation aperture enlargement in XL. LEFT: The extendedinterpolation aperture. RIGHT: Corresponding aperture used in the LMS optimisation on the SDgrid.

coefficients, will have size 12 × M2, where the kth row is composed of 12 pixelsvalues at the corresponding positions in Figure 3.9 (RIGHT).

3.2.3 Temporal optimisation window extensionThe XL method offers the option to enlarge the optimisation window spatially inan image in order to improve robustness. However the correlation between pix-els usually decreases fast with the spatial distance, affecting the accuracy of themethod. Taking into account, beside an optimisation window in the current frame,also one in the previous frame and one in the next frame, will compensate for thisdecrease. The subsequent pixel sets added to the LMS optimisation in the tempo-ral direction must take into account the motion. Therefore, their positions mustbe compensated by a motion vector. In the current implementation, the motionestimation is accomplished by using the “3D Recursive Search Block Matcher”algorithm [97].

Denoting the motion vector for the pixel at position ~x in the current frame n,as ~D(~x, n), the pixel positions in frames n− 1 and n + 1 become

F(~x− ~D, n− 1

)≈ F (~x, n) F

(~x + ~D, n + 1

)≈ F (~x, n) (3.13)

Figure 3.10 shows a block diagram of the proposed temporal window enlarge-ment algorithm. The previous high resolution image FHI(n−1) and the next highresolution image FHI(n + 1) is required for this enlargement. The latter can beobtained by bilinear interpolation on the next SD image FSD(n + 1).


Input ImageFI(n-1)

Input ImageFI(n)

Input ImageFI(n+1)

Motion Estimationand Temporal

Aperture Extraction

WienerEstimation

InterpolationCoefficients

InterpolationOutput

Image FO(n)

Figure 3.10: Temporal optimisation window extension for XL resolution up-conversion method.

3.2.4 A single-pass XL algorithmLet us first define a new relation between the HD and the SD pixel grids. Asshown in Figure 3.11, the grey HD pixel is interpolated using:

FHI(2(i+1), 2(j +1)) =1∑

k=0

1∑l=0

w2k+lFSD(2(i+2k)+1, 2(j +2l)+1) (3.14)

This is the equation to interpolate the HD pixel at position (2(i + 1), 2(j + 1)).The other HD pixels at positions (2(i + 1), 2(j + 2)),(2(i + 2), 2(j + 1)) and(2(i + 2), 2(j + 2)) are interpolated in a similar way [98].

The advantage of this new relation is that we can accomplish the interpolationin one pass rather than in the two of the original XL algorithm. However, in de-tailed areas the estimation of coefficients will be inaccurate, because the distancebetween the SD pixels has to be large to preserve the geometrical similarity be-tween the HD pixels and SD pixels. Figure 3.11 shows how to solve this problemwith an additional grid using interpolated intermediate pixels. Geometry charac-teristics are now preserved, while the pixel distance is cut in half. The choiceof such an additional grid here is rather arbitrary. Many other options exist andmore research is required to find the optimal solution. The pixels on the addi-tional grid can be obtained by linear interpolation, using, for instance, a bilinearor B-Spline interpolation filter. We emphasize that the interpolated pixels are usedonly for LMS estimation of the coefficients and not for interpolation. To allevi-ate the computational burden, simple methods, like bilinear interpolation, can beused, especially when low-pass anti-alias filtering prior to LMS optimisation isadopted.


2(i+2)

2(i+3)

2(i+4)

2(i-2)

1w

0w 1w

3w2w

2i

2(i+1)

2(i-1)

2(i+5)

2(i+6)

2(i+7)

2j

2(j+1)

2(j+2)

2(j+3)

2(j+4)

2(j+5)

2(j+6)

SD pixel HD pixel

2(j-1)

2(j-2)

2(j+7)

2w 3w

0w

2(i+2)

2(i+3)

2(i+4)

2(i-2)1w

0w 1w

3w2w

2i

2(i+1)

2(i-1)

2(i+5)

2(i+6)

2(i+7)

2j

2(j+1)

2(j+2)

2(j+3)

2(j+4)

2(j+5)

2(j+6)

SD pixel HD pixel

2(j-1)

2(j-2)

2(j+7)

2w 3w

0w

Additional pixel

Figure 3.11: New relation between the HD and the SD pixel grids for interpolation. The whitepixels represent the pixels on the HD grid while the black ones are pixels on the SD grid. The greyposition is the additional pixel interpolated using bilinear method for LMS estimation.

The modifications on the XL method mentioned in this section can be used inthis single pass method. In particular, spatial aperture enlargement and a low-passanti-alias filtering on the SD grid prior to the LMS optimisation provide consid-erable improvements. Figure 3.12 depicts spatial aperture enlargement based onthe new HD-SD grid-relation applied to a cross aperture or a 3× 3 aperture. Withthe cross aperture, the current HD pixel is interpolated from the four SD pixels (inblack) and eight additional (in light grey) pixels:

FHI(2(i + 1), 2(j + 1)) =1∑

k=0

1∑l=0

w2k+lFSD(2(i + 2k) + 1, 2(j + 2l) + 1)

+1∑

k=0

1∑l=0

w4+2k+lFIM(2(i + 2k)− 1), 2(j + 2l))

+1∑

k=0

1∑l=0

w8+2k+lFIM(2(i + 2k), 2(j + 2l)− 1))(3.15)

Similarly, the HD pixel in the 3× 3 aperture can be calculated as follows:

FHI(2i, 2j)) =1∑

k=−1

1∑l=−1

w2k+l+3FIM(2(i + 2k) + 1, 2(j + 2l) + 1) (3.16)


0w 1w 2w

3w

8w

5w

6w 7w

4w2i

2(i+1)

2(i+2)

2(i+3)

2(i+4)

2(i-1)

2(i-2)

2(i+5)2j

2(j+1)

2(j+2)

2(j+3)

2(j+4)

2(j+5)

2(j-1)

2(j-2)

SD pixel HD pixelAdditional pixel

0w 1w

2w 3w

8w

5w

6w 7w

4w

9w

10w 11w

2i

2(i+1)

2(i+2)

2(i+3)

2(i+4)

2(i-1)

2(i-2)

2(i+5)

2j

2(j+1)

2(j+2)

2(j+3)

2(j+4)

2(j+5)

2(j-1)

2(j-2)

SD pixel HD pixelAdditional pixel

Figure 3.12: Spatial aperture enlargement for interpolation, using 12 pixels in a cross aper-ture (LEFT), or nine pixels (LEFT). In both case, the eight pixels (in light grey) are additionalinterpolated pixels.

3.2.5 Results and evaluationTable 3.2 gives the MSE of the various up-conversion methods modified from XL.The combination of spatial interpolation aperture enlargement using cross aper-ture and low-pass anti-alias filtering gives the overall lowest MSE score. Tempo-ral extension of the optimisation window and low-pass anti-alias filtering does notbring MSE reduction. Spatial interpolation aperture enlargement alone results inan increase in the MSE score. However, when combined with low-pass anti-aliasfiltering, it gives the minimum MSE score in the modified two-pass XL method.The single pass XL method with cross aperture and low-pass anti-alias filteringgives the overall lowest MSE score. Compared with the original XL method, itreduces the MSE score by 15%. Temporal optimisation window extension doesnot result in a reduction in MSE score, but slightly decreases the optimisationwindow size.

Figure 3.13 and 3.14 present images obtained from the original XL methodand various proposed modifications. The images are up-converted from real SDmaterial, not the down-scaled ones we used in the objective evaluation. Differ-ences are visible in the highly detailed areas. In areas that contain distinct edges,the performance is very close. Temporal optimisation window extension with afour pixel interpolation aperture produces artefacts along the boundary of fore-ground and background objects. Spatial interpolation aperture enlargement aloneimproves the sharpness of the up-converted image but introduces artefacts in somehighly detailed areas. However, when combined with anti-alias filtering, the arte-facts can be less pronounced while the image sharpness is not degraded. The sin-gle pass XL method does not show advantages over the original XL method using


A B

C D

E F

G H

I J

Figure 3.13: An image portion from up-converted Bicycle using: (A) Original XL method, (B)XL with low-pass anti-alias filtering, (C) XL with temporal optimisation window extension, (D)XL with spatial interpolation aperture enlargement, (E) XL with spatial interpolation aperture en-largement and low-pass anti-alias filtering, (F) XL with spatial interpolation aperture enlargement,low-pass anti-alias filtering and temporal optimisation window extension, (G) Single pass XL with3 × 3 aperture, (H) Single pass XL with 3 × 3 aperture and low-pass anti-alias filtering, (I) Sin-gle pass XL with cross aperture and low-pass anti-alias filtering, (J) Single pass XL with crossaperture, low-pass anti-alias filtering and temporal optimisation window extension.


A B

C D

E F

G H

I J

Figure 3.14: An enhanced version of Figure 3.13. To compensate for the low-pass characteristicof the printing process, we enhanced the images using a 2D linear symmetrical peaking filter. Thusthe reader may perceive the difference among various methods.


Table 3.2: MSE scores on individual sequences up-converted with various LMS filtering ap-proaches: (A) Original XL, (B) XL with low-pass anti-alias filtering, (C) XL with temporal opti-misation window extension, (D) XL with temporal aperture extension and low-pass anti-alias fil-tering, (E) XL with spatial interpolation aperture enlargement (Cross shape), (F) XL with spatialaperture enlargement (Cross shape) and low-pass anti-alias filtering, (G) XL with spatial apertureenlargement (Cross shape) and temporal aperture extension, (H) XL with spatial aperture enlarge-ment (Cross shape), anti-alias low-pass filtering and temporal aperture extension, (I) Single passXL with 2 × 2 aperture and low-pass anti-alias filter, (J) Single pass XL with 3 × 3 aperture, (K)Single pass XL with cross aperture and low-pass anti-alias filtering, (L) Single pass XL with crossaperture, low-pass anti-alias filtering and temporal aperture extension. M is the number of pixelson each side of the optimisation window.

MSE M Bicycle Football Kiel Siena TokyoA 8 70.46 74.53 209.33 90.98 66.71B 8 71.06 70.12 200.03 88.09 63.75C 6 76.40 73.15 208.46 94.35 67.76D 6 80.62 72.92 208.52 95.55 66.81E 8 81.90 83.19 222.99 97.62 75.61F 8 66.72 64.70 209.14 85.93 57.31G 6 80.32 81.94 224.06 100.15 77.51H 6 68.81 65.94 211.45 89.25 58.66I 8 102.28 91.00 239.81 110.88 79.62J 7 66.80 58.60 199.74 82.10 51.50K 8 54.15 59.57 194.61 78.12 52.05L 6 56.33 65.56 205.84 81.70 53.80

four pixels for interpolation. When combined with spatial interpolation apertureenlargement and low-pass anti-alias filtering, the single pass XL method gives up-converted images with the highest sharpness and the least artefacts, which resultin the best subjective image quality. With the enlarged 12-pixel interpolation aper-ture, temporal optimisation window extension does not bring visual differences.Alias along the border of the objects is compressed. The single pass XL methodusing 3 × 3 aperture (G) results in less artefacts when compared with the use ofcross aperture (I) in the texture area, while the latter gives better results along dis-tinct edges. Especially, the single pass XL method using 3 × 3 aperture can beused alone, without low-pass anti-alias filter, to obtain a sharper image.

Combining the results of objective evaluation and these subjective screen-shots shown in Figure 3.13, we can conclude that using 12-pixels in a cross aper-ture for interpolation can effectively suppress aliasing. This is particularly evidentwhen we up-scale images that have been down-scaled using a model shown inSection 2.3.1. Superior results are obtained when manipulating alias-free material

3.3. SUBJECTIVE OPTIMAL LMS FILTERS 61

InputImage

Sharpen HDImage

Training

2-Ddownscale

SDImage

Interpolator

Figure 3.15: Interpolator design with built-in sharpening.

for up-conversion, using a 3× 3 interpolation aperture.

3.3 Subjective optimal LMS filtersFrom the evaluation of the various video up-scaling and up-conversion methodsshown in Chapter 2, JT’s method aiming at improving the sharpness of the up-scaled HD image is mostly preferred in the subjective evaluation. When we viewsharpening and blurring as mutually inverse operations, it is also possible to designclassification-based LMS filters for optimal subjective impression.

3.3.1 Classification-based LMS filter

To improve the sharpness of an up-converted image, image sharpening techniqueslike luminance transient improvement [89] can always be used after the up-scalingor up-conversion. Although such post-processing enhances details, it inherentlyamplifies the noise. In [99], Atkins proposed an interpolator design with a built-in sharpening as shown in Figure 3.15. The training is performed between thesharpened HD-image and down-scaled original HD-image.

InputImage

SpatialBlur


LUT

DegradedImage

Classcode

Sharpeningfilter

coefficients


class

ADRC

AWGN

Figure 3.16: Classification-based LMS sharpness enhancement training between ideal imagesand degraded counterparts.


0.025 0.025

0.025 0.025

0.025

0.1

0.025 0.1 0.1

0.025

0.025

0.025

0.025

0.1 0.025

0.025 0.025 0.1

0.025 0.025 0.025

0.025 0.025

0.025 0.025

Figure 3.17: Low pass filter for image spatial blur.

In either case, the performance of the overall system will highly depend onthe sharpening technique. In our work, a classification-based LMS filter for im-age sharpening and noise reduction is first designed. Training with ideal imagesand their degraded counterparts is performed to obtain such a classification-basedLMS filter (Figure 3.16). The outcome of the training is in essence a classification-based restoration Wiener filter. When applied to an ideal, non-degraded image,such a restoration Wiener filter becomes a classification-based sharpening andnoise reduction filter. Therefore, it can be used as a post-processing filter to im-prove the sharpness of an up-converted image, or, as a sharpening filter for thetraining system shown in Figure 3.15.

In experiments, the spatial blur in Figure 3.16 can be accomplished by using a5×5 low-pass filter with the coefficients of Figure 3.17. The Additive White Gaus-sian Noise (AWGN) has a level of 18. The aperture used for classification-basedsharpness-enhancement LMS filters is shown in Figure 3.18. By re-labelling thepixels there, the classification-based sharpness enhancement filters can be formu-lated as:

FF (i, j) =12∑

k=0

wk,cFk (3.17)

Here, FF (i, j) is the filtered output, Fk denotes the kth input pixel and wk,c arethe filter coefficients that depend on the class to which the current image blockbelongs.

Instead of using the training strategy proposed by Atkins, we can also use atraining method shown in Figure 3.19 as an alternative. With the current train-

3.3. SUBJECTIVE OPTIMAL LMS FILTERS 63

j-2 j-1 j

i-2

i-1

i

i+1

i+2

j+1 j+2

F11

F0

F3F2F1

F4 F7F5

F9

F12

F6

F10

F8

Figure 3.18: Diamond shape aperture for classification-based sharpness enhancement LMSfilters.

ing method, the input images are first up-converted using standard methods likeTK’s classification-based up-conversion filter, followed by a classification-basedsharpening and noise reduction filter. The training between the up-convertedimages and the input images employs an LMS algorithm, which results in theclassification-based subjective optimal up-conversion filter.

To avoid the overshoots generated by the classification-based subjective opti-mal up-conversion filter, the interpolated result is clipped at the maximum FMAX

and the minimum FMIN value defined as follows:

DR = Fmax − Fmin (3.18)FMAX = Fmax + 0.2×DR (3.19)FMIN = Fmin − 0.2×DR (3.20)

Here, Fmax and Fmin are the maximum and minimum luminance value in thecurrent SD interpolation aperture, which is 3× 3 in the current implementation.

SDImages

2-DUp-scale


LUT

Up-scaledImages

HDImages

Up-scalefilter

coefficients


class

SharpenClasscode

ADRC

Figure 3.19: Sharpness enhancement training between ideal images and degraded counterparts.


2i

2(i+1)

2(i+2)

2(i+3)

2(i+4)

2(i-1)

2(i-2)

2(i+5)

2j

2(j+

1)

2(j+

2)

2(j+

3)

2(j+

4)

2(j+

5)

2(j-1)

2(j-2)

SD pixel HD pixel

text

2(i-2)

2(i+6)

2(j-3)

2(j+

6)

Figure 3.20: Diamond shape aperture for subjective optimal LMS filtering.

3.3.2 Localized LMS filterSimilar to the classification-based approach that performs LMS estimation be-tween SD images and sharpness-enhanced HD images, we can also integrate thesubjective LMS filtering into the localized XL approach.

In the original XL method, the interpolation coefficients are estimated on theSD image grid, providing the same geometry relationship. Therefore, by blurringthe neighbouring pixels on the SD grid (or additional grid) while leaving the cen-ter pixels untouched, we can obtain an subjective optimal localized LMS filter.Taking the left part of Figure 3.11 as an example, the SD grid or the additionalgrid are low-pass filtered using a spatial blur filter shown in Figure 3.17 to obtainthe blurred neightbouring pixels: F ′(2(i − 2) + 1, 2(j − 2) + 1), F ′(2(i − 2) +1, 2(j + 6) + 1), F ′(2(i + 6) + 1, 2(j− 2) + 1) and F ′(2(i + 6) + 1, 2(j + 6) + 1).Here, F ′ denotes the pixels from the blurred SD grid. Referring to Equation 2.41,the pixel values in the autocorrelation matrix C are replaced by the pixel valuesfrom the low-pass filtered SD grid (or additional grid).

In the real implementation, we select the 3 × 3 aperture for this subjectivelyoptimal localized LMS approach. Although the cross aperture gives the best re-sults in the objective evaluation, it also produces severe artefacts in image partscontaining vertical or horizontal high frequencies. The clipping method shown inEquation 3.20 can be used to avoid possible overshoots.

In Figure 3.21 and 3.22, we give some screen-shots of portions from the up-converted Bicycle and Siena sequence respectively, using various subjectively op-timal up-conversion approach. The up-converted image using classification-based

3.4. CONCLUSION 65

subjectively optimal LMS filtering approach is sharper than the localized LMSapproach. Most people confirm that using a diamond shape aperture as shownin Figure 3.20 results in up-converted images (E) with slightly better perceivedsharpness compared to the same approach with a traditional 3 × 3 aperture (D),although the difference is almost smeared out on the printed image.

3.4 ConclusionIn this chapter, we first compared the two approaches that apply Wiener filteringin the area of image and video up-conversion applications, i.e. the localized LMSoptimisation method from Li and the classification-based LMS filtering approachfrom Kondo.

We found the performance of the XL method can be improved by using low-pass anti-alias filtering and spatial interpolation aperture enlargement. Combinedwith the results shown in Table 3.1, for up-conversion methods based on Wienerfiltering, a 3× 3 or cross-aperture gives the best performance/price ratio. The ac-curacy of XL method with enlarged interpolation aperture is sensitive to noise andhigh-frequency details. Consequently, a low-pass anti-alias filtering prior to thelocal LMS optimisation can improve the accuracy. The extension of the optimisa-tion window in the temporal domain does not lead to a performance improvementin the up-converted image, which may be due to the inaccuracy in the motionestimation. The symmetrically shaped interpolation aperture leads to a better per-formance than non-symmetrical ones, i.e. the original XL, because the formerretains consistency in the whole interpolation process. All these adaptations im-prove the quality of the up-converted image, at the cost of increased computationaleffort.

The classification-based global LMS optimisation with 3 × 3 aperture givesthe better performance/price ratio compared to other LMS filtering based up-conversion methods. Off-line LMS optimisation improves the computational ef-ficiency and provides additional flexibility. The classification-based global LMSoptimisation can be used to build classification-based sharpening filters, i.e. class-ification-based subjectively optimal LMS filter.

Image quality remains a subjective issue and no objective measure can reliablyreflect the subjective impression. However, the LMS-based filter design for imageup-conversion based on the objective MSE metric can effectively avoid heuristicapproaches in the optimal filter design.


A B

C D

E F

Figure 3.21: An image portion from up-converted Bicycle using: (A) Standard TK method,(B) JT method, (C) Standard TK method with classification-based sharpness enhancement, (D)Classification-based subjective optimal up-conversion using 3 × 3 aperture, (E) Classification-based subjective optimal up-conversion using the diamond shape aperture, (F) Subjective optimalup-conversion based on 3× 3 aperture using localized approach.

3.4. CONCLUSION 67

A B

C D

E F

Figure 3.22: An image portion from up-converted Siena using: (A) Standard TK method,(B) JT method, (C) Standard TK method with classification-based sharpness enhancement, (D)Classification-based subjective optimal up-conversion using 3 × 3 aperture, (E) Classification-based subjective optimal up-conversion using diamond shape aperture, (F) Subjective optimal up-conversion based on 3× 3 aperture using localized approach.

Chapter 4

Video De-interlacing

For resolution enhancement, beside up-scaling, de-interlacing techniques attractlots of research interest as well. Traditional Standard Definition Television (SDTV)broadcasting system employs the interlaced scheme to save transmission band-width. However, modern display principles and the introduction of video in per-sonal computers (PC) require de-interlacing techniques to display the traditionalinterlaced video materials for progressive scanning [9]. Moreover, some of theHigh Definition Television (HDTV) standards are progressive in nature, thus re-quiring de-interlacing techniques to complete the interlaced SD to progressivescan HD format conversion.

Various de-interlacing techniques have been proposed in the last few decadesand new methods are still being investigated. Previous overviews on de-interlacing[42, 23] categorize de-interlacing methods into non-motion compensated meth-ods, motion compensated (MC) methods and hybrid methods. Shown in Figure4.1, the non-MC methods include linear techniques such as spatial filtering, tem-poral filtering, vertical-temporal filtering and non-linear techniques like motionadaptive filtering, edge-dependent interpolation and methods with implicit adap-tation. The MC category includes temporal backward projection, time-recursivede-interlacing, adaptive recursive de-interlacing, generalised sampling theorembased de-interlacing, etc. Non-MC methods are used in consumer products thatrequire a reasonable performance at relatively low cost. The motion compen-sated methods provide better quality, but because complex motion estimation isrequired, they are only used in high-end consumer and professional products.

It has been shown before that intra-field de-interlacing algorithms can be adapt-ed for image up-scaling [100]. The essence of this adaptation is that the image is“de-interlaced twice”, i.e. once in the horizontal and once in the vertical direction.With the same concept, intra-field de-interlacing can be thought of as a particularcase of spatial up-scaling, which only up-scales the image in the vertical dimen-sion. Therefore, intra-field de-interlacing can be achieved with methods that are

69

70 CHAPTER 4. VIDEO DE-INTERLACING

Non-MCMethods

MCMethods

LinearMethods

No-LinearMethods

Spatial Filtering

Temporal Filtering

VT Filtering

Motion Adaptive algorithm

Edge-dependent interpolation

Implicitly adapting methods

Hybrid Methods

Temporal backwards projection

Time-recursive de-interlacing

Adaptive-recursive de-interlacing

Generalized sampling theorem based de-interlacing

...

...

...

Figure 4.1: Tree structure of the video up-scaling techniques.

adapted from the currently known advanced spatial up-scaling techniques.In this chapter, we shall first extend the two LMS filtering approaches from the

field of up-scaling into the field of video de-interlacing [101]. Kondo’s (TK) [74,102] approach designs LMS filters through a global Mean Square Error (MSE)minimization based on image content classification. Li’s (XL) [79] approach op-timises the LMS filtering within the local neighbouring area of the interpolation.Atkins’s (CBA) [75] image up-scaling method is also based on image contentclassification. We extended it to the field of video de-interlacing as a reference.

Apart from the direct adaptation, in the following sections, we will show thata similar methodology of classification-based LMS filtering techniques can beapplied to other de-interlacing problems. When interpreting the de-interlacingproblem as finding the optimal filter coefficients based on the Least Mean Squarecriterion leads to performance improvement of the vertical temporal (VT) filteringmethod [103][104]. The further extension of LMS filtering will provide us witha novel solution to the data mixing problem for hybrid de-interlacing methods.With these extrapolations, we can extend the classification-based LMS filteringtechniques into the large variety of de-interlacing application aspects [105].

4.1 Content-adaptive intra-field de-interlacingThe formal definition of the de-interlaced video frame is as follows:

Fo(~x, n) =

{F (~x, n) , y mod 2 = n mod 2Fi(~x, n) , otherwise

(4.1)

4.1. CONTENT-ADAPTIVE INTRA-FIELD DE-INTERLACING 71

x-3 x-2 x-1

y-2

y-1

y

y+1

y+2

x x+1 x+2 x+3

y-3

y+3

A

B

Existing pixel Pixel to be Interpolated

ADRC LUT

FIR Filter

Classes

Coefficients

Fi

Figure 4.2: The aperture of TK. The output results from a content-adaptive FIR filter with theinterpolation aperture B. The coefficients of the FIR filter come from a LUT, which is addressedby using ADRC on the classification aperture A. A and B are apertures on the interlaced grid.

with ~x =

(xy

)designating the spatial position. F (~x, n) is the luminance value

of the pixel at position ~x in the input field number n. Fi(~x, n) is an interpolatedpixel, Fo(~x, n) is an output pixel.

One direct way of adapting advanced 2D up-scaling techniques for de-interl-acing is to down-sample the image in the horizontal dimension by a factor of twoafter a factor of two 2D up-scaling. However, this straightforward transformationresults in an inefficient implementation. We propose a more elegant procedure,where we adopt the essence of the particular up-scaling algorithm and then adaptit to the de-interlacing problem. In the following, we will present the details ofhow to adapt the up-scaling algorithms for de-interlacing.

4.1.1 Kondo’s classification-based LMS filtering approach forde-interlacing

Kondo has presented a 525i (525 line interlaced) SD signal to 525p (525 lineprogressive) or 1050i (1050 line interlaced) HD signal conversion method us-ing classification-based LMS filters [102]. Strictly speaking, the interlaced-to-progressive conversion method described in [102] is not equivalent to the de-interlacing problem since resolution up-conversion in the horizontal domain isincluded as well. However, it is straightforward to apply the classification-basedLMS filtering method to the intra-field de-interlacing problem.

Shown in Figure 4.2, the output of the adapted method of Kondo (TK) is ob-


tained from content-adaptive filtering. The coefficients of the adaptive FIR filterdepend on the class to which the image block A belongs. The equation to calculatethe output pixel is:

Fo(~x, n) =

{F (~x, n) , y mod 2 = n mod 2∑

k,l F (~x + k~uy + l~ux, n)hc(k, l) , otherwise(4.2)

(k = −3,−1, 1, 3; l = −2,−1, 0, 1, 2)

where hc(k, l) are coefficients, c is the class index generated by applying adap-tive dynamic range coding (ADRC) [84] on aperture A. ~uy = (0, 1)T and ~ux =(1, 0)T .

Analogous to the up-scaling method of Kondo, the optimal filter coefficientsfor each class, hc(k, l), are obtained from a prior training process using both pro-gressive sequences and corresponding interlaced sequences. The training processis based on the Least Mean Square (LMS) method, which minimizes the MSE. Af-ter training, interpolation coefficients for each class are stored in a Look-Up-Table(LUT) and the result of the ADRC generates the address of this LUT.

When coding with ADRC, the number of pixels in the classification aperturehas to be limited to keep the system affordable. The interpolation aperture hasmuch less effect on the cost, since it affects the size of the LUT only propor-tionally to the size of the classification aperture. Therefore, in our experiments,different apertures have been used for interpolation and classification. As shownin Figure 4.2, a 5× 2 classification aperture and a 5× 4 interpolation aperture onthe interlaced grid are adopted to compromise the performance and cost.

4.1.2 Li’s localized LMS filtering approach for de-interlacingUnlike TK, Li’s method (XL) [79] does the Least Mean Square (LMS) optimisa-tion of the interpolation coefficients on the fly. When adapted to de-interlacing, asymmetric aperture with six interlaced pixels is proposed:

Fo(~x, n) =

{F (~x, n) , y mod 2 = n mod 2∑

k,l F (~x + k~uy + l~ux, n)w(k, l) , otherwise(4.3)

(k = −1, 1; l = −0.5, 0, 0.5)

As demonstrated in Figure 4.3, six pixels on the interlaced grids are used tocalculate the center one in the interpolation aperture A, including two original pix-els from the interlaced grid and four additional sub-pixels, which are interpolatedusing a FIR half-band interpolation filter in the horizontal domain. Those addi-tional sub-pixels are calculated to keep the geometry duality between the pixelsin interpolation aperture A and the pixels in aperture C. The distance between the

4.1. CONTENT-ADAPTIVE INTRA-FIELD DE-INTERLACING 73

B

CF00 F01 F02 F03 F04 F05

F10 F11 F12 F13 F14 F15

F20 F21 F22 F23 F24 F25

F30 F31 F32 F33 F34 F35

F40 F41 F42 F43 F44 F45

F06

F16

F26

F36

F46

2i-1

2i

2i+3

2i-2

2i-5

2i-3

2i-4

2i+1

2i+2

j j+1 j+2 j+3j-2j-3 j-1

F52 F53 F54 F55 F562i+5

2i+4

A

F50 F51

Original Pixel Additional Pixel Pixel to be interpolated

Figure 4.3: Aperture used in the adapted XL method for de-interlacing. A is the aperture withsix interlaced or additional pixels involved in interpolation. B is the aperture with the interlacedpixels used to calculate the interpolation coefficients. C is the aperture that includes the neighborsof the interlaced pixels in B. Here, Fi,j is a shorthand notation for Fi(i, j), etc.

center pixel and its six neighbouring pixels (F00, F01, F02, F20, F21, F22) in aper-ture C is two times the corresponding pixel distances in the interpolation apertureA. With this structure of the interpolation aperture, we could keep computationalcomplexity relatively low while avoiding pixels that are far away from each other,which lead to a more accurate estimation of the interpolation coefficients. Addingextra pixels outside of aperture A for interpolation does not bring too much per-formance gain but increases the computation cost.

Simpler interpolation methods, like bi-linear interpolation, can be used as analternative in calculating the intermediate pixels with lower computational com-plexity.

Referring to Figure 4.3 and denoting M as the pixel set on the interlaced gridin image block B the Summed Square Error (SSE) over set M is the summedsquare difference between original interlaced pixels and interpolated pixels:

SSE = ||~p− ~wC||2 (4.4)

Here, ~p contains the interlaced pixels in block B (pixels FI(k, l), with k = 1, ..., 5; l =1, ..., 4) and C is a 6 × kl matrix whose mth row contains the six neighbouringpixels of the mth interlaced pixel in ~p. The weighted sum of each row determines


an interpolated pixel FIP .The weighting coefficients then are derived using:

∂(SSE)

∂ ~w= 0 (4.5)

−2~pC + 2~wC2 = 0 (4.6)~w = (CT C)−1(CT ~p) (4.7)

Vertical averaging is used in smooth areas, not only to avoid the situation wherethe LMS equation is unsolvable, but also to alleviate the computational load.

In the tested implementation, a 10 × 11 aperture on the interlaced grid cen-tered around the interpolated pixel is used for optimisation, i.e., the window Mwithin which the LMS optimisation has been performed. Unlimited extensionof the optimisation aperture will lead to a global optimisation without classifica-tion. The current choice of the optimisation window is a compromise between theinterpolation accuracy and computational cost.

4.1.3 Atkins’ method adapted for de-interlacingAtkins’ (CBA) method [75] for image up-scaling uses the Expectation Maximiza-tion (EM) algorithm for image content classification, which results in a fixed,predetermined number of classes regardless the size of the classification aperture.Different from Kondo’s method, the local image content does not belong to a sin-gle class, but belongs to every class with a certain weight.

Our adapted CBA for de-interlacing uses the same classification and interpola-tion aperture as used in adapted TK method shown in Figure 4.2. The interpolatedpixel results as a weighted sum of the outputs of a number M of linear filters:

Fo(~x, n) ={F (~x, n) , y mod 2 = n mod 2∑

j

∑k,l(aj,klF (~x + k~uy + l~ux, n) + bj,kl)p(j|~y) , otherwise

(4.8)

(j = 0, 1, ...,M, k = −3,−1, 1, 3; l = −2,−1, 0, 1, 2)

where aj,kl and bj,kl are interpolation filter coefficients, and p(j|~y) are the weight-ing coefficients calculated by comparing the classification vector ~y with the repre-sentative vectors that each represents a class. The classification vector ~y is derivedfrom the classification aperture using the same non-linear transform as in the up-scaling:

~y =

{~y′‖~y′‖−0.75 , ~y′ 6= 00 , else

(4.9)

Here, ~y is a ten-element vector constructed by stacking the 10 differencesbetween the pixels in the classification aperture and a reference pixel, which is the

4.2. ADAPTIVE VT-FILTERS WITH CLASSIFICATION 75

central pixel in the 3 × 3 classification aperture in CBA. In our adaptation, usingthe same classification aperture as in the adapted TK method, the reference pixelcan be either the average of 10 pixels in the classification aperture, or the averageof the two vertically neighbouring pixels. In contrast with our adapted TK methodthat uses the LMS algorithm in the training, CBA uses the Maximum Likelihood(ML) method for finding the optimal interpolation coefficients for each class.

4.2 Adaptive VT-filters with classificationAmong the various linear de-interlacing techniques, the vertical temporal (VT)filtering method is theoretically the best approach [42, 103].

The formal definition of VT filtering approach for de-interlacing is given in[42, 23] as:

Fo(~x, n) =

{F (~x, n) , y mod 2 = n mod 2∑

k F (~x + k ~uy, n + m)h(k, m) , otherwise(4.10)

(k, m ∈ ...,−1, 0, 1, 2, ...)

The definition of ~x and ~uy are the same as in the earlier sections of the chapter.Different from the intra-field de-interlacing methods, the VT-filtering approachuses both spatially and temporally neighbouring pixels in the interpolation. It isdesigned to take only higher vertical frequency components from the neighbour-ing fields. To that end, the coefficients in each neighbouring field add up to zero.

Various apertures, or combinations of spatial-temporal neighbouring pixels forinterpolation, have been proposed. Weston proposed a three-field VT filter con-taining eight pixels (VT-8) in the interpolation aperture and their correspondingcoefficients are [103]:

16h(k, m) =

8, 8 , (k = −1, 1) ∧ (m = 0)−1, 2,−1 , (k = −2, 0, 2) ∧ (m = −1, 1)0 , (otherwise)

(4.11)

A three-field VT filter aperture with 14 pixels (VT-14) has also been proposed in[103]. Figure 4.4 illustrates those two filters. The extension of the filter aperturein the vertical direction will increase the hardware cost since more line memoriesare required.

The VT filtering for de-interlacing is capable of preventing alias and excessiveblur in stationary and horizontally moving images. However, on sequences thatcontain vertical motion, VT filtering will introduce more visible artefacts. Toprevent severe motion artefacts on sequences that contain vertical or fast motion,Selby proposed to compute an adaptively weighted average between VT filtering,


Original pixel Interpolated pixel Original pixel Interpolated pixel

0.031

n-1 n

y-3

y-2

y-1

y

y+1

y+2

Ver

tical

pos

ition

n+1

y+3

Field numbery-4

y+4 0.031

0.031 0.031

-0.116 -0.116

-0.116 -0.116

0.170 0.170

0.526

-0.026

0.526

-0.026

-1/16

1/8

-1/16

n-1 ny-3

y-2

y-1

y

y+1

y+2V

ertic

al p

ositi

on

n+1

y+3

Field number

1/2

1/2

-1/16

1/8

-1/16

Figure 4.4: LEFT: The three-field symmetrical apertures with eight pixels for both linear andcontent-adaptive VT filtering. Numbers next to the original pixels are the linear VT-filter coef-ficients. RIGHT: The three-field symmetrical apertures with 14 pixels for linear VT filtering.Numbers next to the original pixels are the linear VT-filter coefficients.

vertical interpolation and temporal interpolation [106]. In the absence of motionor small motion (less than two pixels per frame), the mixing is performed betweentemporal interpolation and linear VT-filtering, while in areas that contain largemotion, vertical interpolation and linear VT-filtering are combined to obtain theoutput. Theoretically, Selby’s method will provide reasonable results when themotion detection is accurate, which is not so simple to achieve on the interlacedscanning grid.

We propose to achieve the same goal without introducing motion detection,but rather by making an image content-adaptive VT filter based on classification.As shown in Figure 4.4, the adaptive VT filter uses two pixels in the spatial domainand six pixels from the temporal domain, which includes the previous and thenext field. The interpolation function based on this aperture can be formulated asfollows:

Fo(~x, n) =

{F (~x, n) , y mod 2 = n mod 2∑

k F (~x + k ~uy, n + m)hc(k,m) , otherwise(4.12)

(m = −1, 0, 1; k = −2,−1, 0, 1, 2) ∧ ((k + m)mod2 = 1)

Here, hc(k,m) are the VT filter coefficients, which depend on the pattern of thelocal aperture, which can be classified as before by using ADRC. With the clas-sification aperture shown in Figure 4.4, a 7-bit class code will be generated andresults in 128 classes using ADRC and class inversion.

4.3 Classification and hybrid de-interlacingHybrid de-interlacing techniques are designed to combine advantages of individ-ual methods [107]-[110]. In general, assume we have a number N of different

4.3. CLASSIFICATION AND HYBRID DE-INTERLACING 77

Progressivesequence

Progressiveto

interlace

Storecoefficients in

LUT

Interlacedsequence

Classcode

Mixingfilter

coefficients

LMS-optimisecoefficients

per class

Classif-ication

De-interlacingmethod 1

De-interlacedsequence 1

De-interlacingmethod N

.

.

.

De-interlacedsequence N

.

.

.

Figure 4.5: Training process of the adaptive data mixing method.

de-interlacing results Fi,j(~x, n) (j = 1, 2, .., N). The hybrid output Fo(~x, n) isthen a weighted sum of those N candidates:

Fo(~x, n) =

{F (~x, n), y mod 2 = n mod 2∑N

j=1 kjFi,j(~x, n), otherwise(4.13)

Here, F (~x, n) is the luminance value of the pixel at position ~x in the input fieldnumber n and kj (j = 1, 2, .., N) are mixing filter coefficients associated withthe corresponding de-interlacing methods. The performance of the hybrid meth-ods depends on the mixing coefficients kj providing certain individual input de-interlacing methods. Usually, kj are determined experimentally using quality met-rics of the corresponding de-interlacing algorithms and are difficult to optimise.

Classification-based LMS filtering can be applied for hybrid de-interlacingwhen we interpret the mixing problem as finding the optimal filter coefficients kj ,given certain input de-interlaced pixels Fi,j(~x, n) and the corresponding qualitymetrics, or error indicators. Differing from the previous applications, the filteringaperture is composed of pixels that are possible candidates from various interpo-lation methods.

In the classification-based data mixing method, the equation to interpolate theoutput pixel is slightly different from Equation 4.13:

Fo(~x, n) =

{F (~x, n), y mod 2 = n mod 2∑N

j=1 kj,cFi,j(~x, n), otherwise(4.14)

where kj,c are weights for class c.For classification, adaptive dynamic range coding (ADRC) can be used. When

the number of individual de-interlacing methods N is small, a 2-bit encoding can


be used to improve precision. Therefore, encode each pixel F into 2-bit Q:

DR = FMAX − FMIN (4.15)

Q = b(F − FMIN)× 3

DR+ 0.5c (4.16)

DR is the dynamic range of the input pixels to be encoded, FMAX and FMIN

corresponding to the maximum and minimum pixel values of the current inputpixel group and b·c is the floor operator. The concatenation of the encoded Qfrom each input pixel F generates the class code c, which can be used to addressthe LUT. In the data mixing context for hybrid de-interlacing, the inputs that areused for ADRC to generate class-code are not pixels in the same block of animage in the traditional sense, but can be the output pixels from several individualde-interlacing methods or error indicators of each de-interlacing method.

Figure 4.5 illustrates the training process of the classification-based data mix-ing for hybrid de-interlacing method. A large amount of progressive input videosequences are first interlaced, and then de-interlaced with a number of N de-interlacing methods to obtain input pixels Fi,j(~x, n) (j = 1, 2, .., N) for training.The input pixels are classified to generate the class code. The LMS optimisation isperformed within each class to obtain the optimal mixing filter coefficients, whichare stored in a look-up table (LUT) and can be addressed with the class code cafter training.

4.4 EvaluationA general evaluation of de-interlacing methods has been given in [23, 42], usingthe objective metric MSE. In [112], a subjective evaluation using paired com-parison [91] of the various de-interlacing techniques is given. It reveals that inthe context of video de-interlacing, the objective and subjective scores are highlycorrelated in a certain range. Therefore, in this evaluation, we will use objectivemetric MSE for ranking the evaluated algorithms. The screen shots are also givento enable a subjective impression.

The MSE between the original progressive image and the de-interlaced imageis calculated according to:

MSE =1

N(Fp(~x, n)− Fo(~x, n))2 (4.17)

Here, Fp(i, j) and Fo(i, j) are the luminance values of the original progressivesequences and the de-interlaced sequences, respectively. N represents the numberof pixels in the calculation. We use 20 frames from each sequence in calculatingthe MSE, excluding the pixels from the border area, which was set to 30 pixels.

4.4. EVALUATION 79

Circle Kiel

Siena Football

Bicycle

Tokyo

Figure 4.6: Screen shots from the progressive test sequences used for evaluation. The arrowindicates the direction of motion within each sequence.

4.4.1 Test sequences

The test set consists six video sequences that differ in many aspects. It was usedto evaluate the various methods: the stationary sequence Circle, the globally hor-izontal moving sequence Tokyo, the global vertically moving sequence Siena andthe zooming sequence Kiel, the locally moving sequences Football and Bicycle.Figure 4.6 gives a screen shot from each of the progressive sequences.

4.4.2 Results

Intra-field methods

Table 4.1 shows the MSE of individual sequences de-interlaced by various intra-field algorithms presented in Section 4.1. Here, we include the Edge-DependentDe-interlacing (EDDI) as a reference for the edge-directed intra-field methods.We also include line averaging (LA) as a reference to show the gain compared tothat very simple method.

The averaged objective scores of the various intra-field de-interlacing methodsare close to each other. However, for each individual sequence, the MSE scorescan vary with the de-interlacing method. The two classification-based methods,TK and CBA, show their ability of accurately filtering fine detailed image struc-tures. The EDDI method, which was designed to improve the accuracy of inter-polation along near horizontal edges, is rather dangerous in detailed image areas.


Table 4.1: MSE scores on individual sequences with various intra-field de-interlacing methods.

MSE LA TK XL CBA EDDIBicycle 50.53 33.47 38.37 35.05 39.15Circle 157.76 158.15 168.87 156.11 154.98

Football 42.33 38.23 40.11 38.77 40.09Kiel 170.90 172.16 179.06 164.83 164.79

Siena 45.47 38.78 45.11 41.98 42.54Tokyo 41.36 37.84 45.02 41.17 40.98

Average 84.73 79.77 86.09 79.65 80.42

This is reflected by the MSE scores on the Bicycle and Football sequence. On se-quences like Circle and Tokyo that contain mostly horizontal and vertical edges,TK, CBA, EDDI and LA scores very close to each other in terms of MSE. XLperforms worse on those two sequences because it introduces severe artefacts inthe image portions that contain a lot of horizontal or vertical details. The CBAmethod performs close to TK with detailed image structures. However, with nearhorizontal edges, it performs in between TK and EDDI. This ranks CBA over-all at the first place, but the difference with TK and EDDI is subtle. We believethat the classification method that is used in CBA can distinguish various imagepatterns better than ADRC, but the filtering process is not performed in an MSE-optimal way. The XL method gives lower MSE on sequences like Bicycle whencompared to EDDI, but always gives higher scores on all the sequences whencomparing with the two classification-based methods.

Screen shots of image portions from the original Bicycle sequence and its de-interlaced counterparts using these intra-field de-interlacing methods are given inFigure 4.7. The previous MSE analysis is confirmed in the subjective comparison.The EDDI method gives the best de-interlacing result on near horizontal edges,while it shows some weakness in the text parts. The TK and CBA methods thatinterpolate the missing pixel based on image classifications produce in stair-caseartefacts near horizontal edges. The two classification-based methods outperformEDDI in the text area. The XL method generates fewer staircases at near hor-izontal edges than TK and CBA, but sometimes gives errors at pure horizontaledges.

It is interesting to compare the two approaches based on LMS filtering, theclassification-based global TK approach and the localized XL approach, side byside. The XL’s localized LMS filtering approach works well with long distinctedges or simple image patterns but will mostly fail within high frequency areas.This can be explained by the HD-SD model used in the XL method, which as-sumes the pair of pixels at the different resolution coupled along the same ori-

4.4. EVALUATION 81

Table 4.2: MSE scores on individual sequences with three inter-field de-interlacing methods.

MSE VT-8 VT-14 Classification-based VTBicycle 57.34 62.63 47.00Circle 78.09 61.90 2.15

Football 44.45 49.44 40.20Kiel 179.26 198.40 173.60

Siena 70.49 92.45 63.27Tokyo 15.85 10.97 8.45

Average 74.24 79.30 55.78

entation. This assumption holds only with simple image patterns, or with imageportions that contain low frequency components. The problem of the TK methodis that the size of the classification aperture is limited, which makes it fail withnear horizontal edges.

Adaptive vertical temporal filtering using classification-based LMS filteringapproach

Table 4.2 shows the MSE of the individual sequences de-interlaced by the lin-ear VT filter and our proposed content-adaptive VT-filtering. For the linear VT-filtering, we implemented the two different versions introduced in Section II pro-posed by Weston [103]. The first is VT-8, using the VT filter shown at the lefthand side of Figure 4.4. Another is VT-14 using the VT filter illustrated at theright hand side of Figure 4.4. For the linear VT filter, the vertical extension of theinterpolation aperture will increase the MSE score on vertically moving sequences(Siena). With the classification-based content-adaptive method, we obtain, on theaverage, an MSE reduction of about 40 percent compared to the linear VT fil-ters. On the stationary sequence Circle, the line-flicker effect is totally removed,resulting in an MSE-value close to zero.

As an illustration, Figure 4.8 shows screen photographs of image portionsfrom moving images de-interlaced with the linear VT filters and our proposedcontent-adaptive filters. It can be clearly seen that the content-adaptive VT filtercan effectively suppress the motion artefacts, which will obviously improve theperceived image quality. A real-time playback shows that flickering artefacts inthe stationery image area of the de-interlaced video sequences are totally removed.


Hybrid methods including motion compensation

The hybrid de-interlacing algorithm proposed by Nguyen [109] and Kovacevic[110] that mixes four methods is used to benchmark the classification-based LMSfiltering method for hybrid de-interlacing. The four individual de-interlacingmethods are: line averaging (LA), edge-dependent interpolation, field averaging(FA) and MC field averaging. We use EDDI [111] for edge-dependent interpola-tion and 2D GST [113] for the MC method.

The interpolated pixels from the individual methods are calculated as follows:

FiLA(~x, n) =F (~x− ~uy, n) + F (~x + ~uy, n)

2(4.18)

FiFA(~x, n) =F (~x, n− 1) + F (~x, n + 1)

2(4.19)

FiEDDI(~x, n) =F (~x− ~uy + l~xx, n) + F (~x + ~uy − l~ux, n)

2(4.20)

FiGST (~x, n) =F n,n−1(~x, n) + F n,n+1(~x, n)

2(4.21)

with ~ux = (1, 0)T , ~uy = (0, 1)T , l be the edge orientation. F n,n−1(~x, n) is theresult from the GST de-interlacing method based on the previous and current fieldand F n,n+1(~x, n) is the result based on current and next field. According to thismixing method, four error indicators (εLA, εEDDI , εFA and εGST ) are computedby calculating the absolute pixel difference or summed difference between groupof pixels along the interpolation direction.

εLA = |F (~x− ~uy, n)− F (~x + ~uy, n)| (4.22)

εFA = |F (~x, n− 1)− F (~x, n + 1)| (4.23)

εEDDI = |F (~x− ~uy + l~ux, n)− F (~x + ~uy − l~ux, n)| (4.24)

εGST = |F n,n−1(~x, n)− F n,n+1(~x, n)| (4.25)

The corresponding weighting factors kq, according to Kovacevic [110], are de-fined as:

kq =εLAεEDDIεFAεGST

εq · SUM

(q ∈ {LA, EDDI, FA,GST}) (4.26)

with

SUM = εLAεEDDIεFA + εLAεEDDIεGST

+εLAεFAεGST + εEDDIεFAεGST (4.27)

4.4. EVALUATION 83

Table 4.3: MSE scores on individual sequences with various data mixing methods for hybridde-interlacing algorithm. (A) the mixing method using coefficients calculated from individualerror indicators, (B) classification-based mixing method that uses input pixels for classification,(C) classification-based mixing method that uses error indicators for classification.

MSE A B CBicycle 49.24 39.44 28.63Circle 0.18 0.32 0.38

Football 38.69 31.13 27.85Kiel 107.46 91.21 85.65

Siena 21.63 7.59 8.59Tokyo 9.24 7.15 8.63

Average 37.74 29.47 26.62

We implemented the classification-based mixing algorithm with this hybridde-interlacing problem. Instead of calculating the mixing coefficients using Equa-tion 4.26, an off-line training was performed to obtain the optimal mixing coeffi-cients, which employs about 2000 frames of video sequences with a large varietyof features. To obtain sufficient precision, we allocate two bits for each input pixelor error indicator when doing ADRC to generate the class-code. The encodinggenerates an 8-bit class code leading to 256 classes in total.

The hybrid de-interlacing was performed on the interlaced test sequences withEquation 4.13 (with coefficients determined by Equation 4.26 and 4.14 respec-tively). The Mean Square Error (MSE) was calculated between the de-interlacedvideo sequences and progressive ones. Table 4.3 presents the MSE scores. Col-umn A gives the results from the mixing algorithm proposed by Nguyen [109]and Kovacevic [110], using error indicators to determine the mixing coefficients.Column B and C show results from the classification-based method, using inputpixels or error indicators for classification respectively. The results show that us-ing interpolated pixels from individual de-interlacing methods for classificationgives a reduction in the MSE score with respect to the mixing method proposedby Nguyen and Kovacevic. Using error indicators for classification will furtherimprove the overall performance. However, more calculations for obtaining theerror indicators are required, although they are more coarsely quantized than inthe method of Nguyen and Kovacevic.

To enable a subjective comparison of those methods, screen-shots of portionsof the Bicycle sequence are given in Figure 4.9. The top row shows that theclassification-based mixing algorithm successfully removes artefacts in the textareas. In the area that contains distinct long edges in all directions and areasthat contain complex foreground and background (Bottom part), the method that


performs classification using interpolated pixels generates severe artefacts. There-fore, methods that perform classification using error indicators are to be preferred.

4.5 ConclusionDe-interlacing is the key technology in merging traditional interlaced video for-mats with modern progressive, high definition display requirements. With the in-troduction of LMS filtering into video de-interlacing, especially the classification-based LMS filtering approach, the filter design can be done without theory approx-imation or heuristic searching. Filters optimal with respect to the objective MSEcriterion can be obtained with slight increase in computational complexity.

Specifically, to the intra-field methods, the LMS filtering approach does notprovide much performance gain. However, this does not prevent the LMS filter-ing approach from being an effective approach in intra-field de-interlacing filterdesign.

The classification-based LMS filtering approach can effectively improve theperformance of the linear inter-field VT filter by introducing content adaptivity.The objective performance gain is about 50 percent in average. The classification-based LMS filter approach effectively alleviates the weakness of the standard VTfilter on vertically moving sequences and stationary sequences.

When applying the classification-based LMS filtering approach to data mix-ing for hybrid de-interlacing, the mixing coefficients from the convex space (asshown in Equation 4.26) are extended to a wider solution range. Consequently thismethod leads to the optimal solution based on the Minimum Mean Square Error(MMSE) criterion. The classification-based mixing method gives both significantreduction in MSE of the de-interlaced video sequences and a clear improvementof the subjective image quality.

The classification-based Wiener filtering approach, after all, is a linear method.This property provides relatively simple and robust solutions.

4.5. CONCLUSION 85

TK XL

CBA EDDI

Original LA

Figure 4.7: Image portions from the original Bicycle sequence and the de-interlaced ones using:line averaging (LA) the classification-based LMS filtering approach (TK), the localized LMS fil-tering approach (XL), the EM based interpolation method (CBA) and the edge-dependent method(EDDI).


A B C

Figure 4.8: Screen shots of images de-interlaced with: A) VT filtering using the aperture shownin Figure 4.4 (LEFT), B) VT filtering using the aperture shown in Figure 4.4 (RIGHT), and C)Proposed content-adaptive VT filter using classification and interpolation aperture shown in Figure4.4 (LEFT).

A B C

Figure 4.9: Image portions from de-interlaced images using: (A) mixing method using coeffi-cients calculated from individual error indicators, (B) classification-based mixing method that usesinput pixels for classification, (C) classification-based mixing method that uses error indicators forclassification.

Chapter 5

Chrominance ResolutionUp-conversion

In video engineering, the chrominance signal is commonly sub-sampled to savethe transmission bandwidth or storage space. Different levels of chrominance(UV) sub-sampling lead to various Y:U:V signal formats, such as 4:2:2, 4:1:1 and4:2:0. Figure 5.1 sketches the pixel structure of each format 1.

High-quality spatial video up-conversion techniques have been developed toportray SD signals on a HD display, with improved sharp luminance details. Atpresent, various television manufacturers therefore incorporate advanced tech-niques for accurate resolution up-conversion of the luminance signal in high-endtelevision sets. Thus far in the current thesis, we have presented video resolutionenhancement techniques for the luminance component of the video signal. How-ever, additional image quality may be gained by applying similar methods to thechrominance components, U and V.

To achieve an up-converted HD chrominance signal that retains the same res-olution as the luminance from the sub-sampled ones, a two step approach is pro-posed. In the first step, up-conversion from the sub-sampled signal to a 4:4:4format SD video signal is required to equal the luminance and chrominance res-olution. Afterwards, in the second step, resolution up-conversion techniques de-veloped for luminance signal can be directly applied to chrominance signal forthe 4:4:4 format video signal. Since the methods for the second step has beenpresented in the previous chapters, we shall focus on the method for the first stepin this chapter.

Traditional linear interpolation methods that are typically used do not increasethe sharpness of the edges in the chrominance signal. We propose to use the lu-

1This chapter is an adaptation of a previous publication in the Proceedings of VCIP, SPIE,2004[114].

87

88 CHAPTER 5. CHROMINANCE RESOLUTION UP-CONVERSION

Y0

Y1

Y2

Y3

Y4

Y5

Y6

Y7

U0 U1 U2 U3

U4 U5 U6 U7

V0

V1

V2

V3

V4 V5 V6 V7

Y0

Y1

Y2

Y3

Y4

Y5

Y6

Y7

U0-1 U2-3

U4-5 U6-7

V0-1

V2-3

V4-5 V6-7

Y0

Y1

Y2

Y3

Y4

Y5

Y6

Y7

U0-3

U4-7

V0-3

V4-7

Y0

Y1

Y2

Y3

Y4

Y5

Y6

Y7

U0-1,4-5 U2-3,6-7

V0-1,4-5

V2-3,6-7

4:4:4 4:2:2 4:1:1 4:2:0

Y

U

V

Figure 5.1: Chrominance sub-sampling. With the 4:4:4 format, the luminance component Yand the two chrominance components U and V have the same resolution. Chrominance detail isreduced by sub-sampling. With the 4:2:2 format, the chrominance is sub-sampled by a factor oftwo in the horizontal direction. With the 4:1:1 format, the chrominance is sub-sampled by a factorof four in the horizontal direction. With the 4:2:0 format, the chrominance is sub-sampled by afactor of two in both horizontal and vertical directions.

minance information in image content classification for building content-adaptiveLMS filters for the chrominance coordinates.

In this chapter, the content-adaptive LMS filtering for chrominance resolu-tion up-conversion from a sub-sampled format to the 4:4:4 format is presented,while we assume that the luminance plane remains unchanged in this process. Weborrow the idea of the classification-based resolution up-conversion method thatresulted in a good price-performance ratio in previous studies for luminance res-olution up-conversion [74, 75]. The main idea of the classification-based videoresolution up-conversion technique is to define classes of local image character-istics, and to apply the same interpolation filter to all image apertures that corre-spond to a single class. Although similar techniques can be used for chrominanceindependent of luminance, it is proposed to up-convert the chrominance usingclasses derived from both chrominance and luminance data. Thus, by involvingthe luminance component, it is expected that chrominance edges are detected moreprecisely, and that consistency between luminance and chrominance transients isimproved. This is illustrated in Figure 5.2.

5.1 Classification-based chrominance up-conversionIn a classification-based resolution up-conversion method, the momentary filtercoefficients, during interpolation, depend on the local content of the image, whichcan be classified based on the pattern of the block [74, 75]. We choose Kondo’smethod for its easy implementation and good performance as shown for lumi-

5.1. CLASSIFICATION-BASED CHROMINANCE UP-CONVERSION 89

Y

U/V

YY

U/V

Figure 5.2: LEFT: Misalignment between luminance edge and sub-sampled chrominance edge.RIGHT: The center of the chrominance edge is aligned with the center of the luminance edgewith improved transient.

nance resolution up-conversion in Chapter 2. For classification, we applied ADRC(Adaptive Dynamic Range Coding) [84], which assigns a bit pattern to each pixel.The value, Q, of the bit in the classification aperture is defined by:

Q = b F − FMIN

FMAX − FMIN

+ 0.5c (5.1)

Here F is the value of the pixel. FMAX and FMIN are the maximum andminimum values of the pixels in the classification aperture respectively. b·c is thefloor operator.

The bit pattern reflects the local image structure and also forms a unique classindex. To obtain the filter coefficients for a specific class, a training process is per-formed in advance. The training process employs both the 4:4:4 format video sig-nal and its sub-sampled version as the training material and uses the Least MeanSquare (LMS) algorithm to obtain the optimal coefficients for up-conversion fromthe sub-sampled resolution to 4:4:4 for each class. The training process is com-putationally intensive since a large amount of video material has to be used. Theadaptive interpolation coefficients will be obtained after the training and stored ina LUT. This is done off-line and needs to be performed only once.

5.1.1 Aperture definitionThe proposed filter aperture depends on the input video format. For the 4:2:2, or4:1:1 format, it makes sense to use a horizontal resolution up-conversion filter,since the horizontal resolution of the luminance is higher than that of the chromi-nance. For a 4:2:0 system a 2-D aperture seems appropriate, since not only thehorizontal, but also the vertical luminance resolution is higher than that of thechrominance.

For chrominance resolution up-conversion, we first define a filter aperture anda content classification aperture. Those two apertures can be the same or can


A

D

B

C

UY VY UY VY UY VY

U VY UY VY UY V

Y Y

Y Y

U VY UY VY UY V

U V

U V

U V U V U V U V U V i

2j

2j+1

2j-1

2j-2

2j+2

2j+3

i

2j

2j+1

2j-1

2j-2

2j+2

2j+3

2j+4

2j+5

2j-3

2j-4

2j

2j+1

2j-1

2j-2

2j+2

2j+3

i

2j

2j+1

2j-1

2j-2

2j+2

2j+3

i-1

i+1

i

i-1

i+1

Figure 5.3: (A) The ten chrominance-only 1D aperture, named C1. (B) The six chrominance-1Dand six luminance-1D aperture, named C1Y1. (C) The ten chrominance-2D and four luminance-1D aperture, named C2Y1. (D) The six chrominance-1D and eight luminance-2D aperture, namedC1Y2. For example, C1Y2 indicates a one-dimensional chrominance and two-dimensional lumi-nance classification aperture. In contrast to the classification aperture which can depend on lumi-nance also, the filter aperture is always composed of the chrominance samples only. The centerpixels, indicated by a box, represent pixel positions of the up-converted chrominance data.

be different: the filter aperture acts only on pixel samples used for interpolation,whereas the content classification aperture may contain also other data for classi-fication. As the class number increases exponentially with the number of pixels inthe aperture, a tradeoff between aperture size and number of classes is necessary.

As an example, we will limit our investigation on 4:2:2 to 4:4:4 up-conversionwith pure chrominance data. For classification, we can use only chrominance orthe combination of chrominance with luminance data, which leads to four possibleclassification strategies, which include:

• chrominance data, 1 dimensional only (C1)

• 1 dimensional chrominance and 1 dimensional luminance data (C1Y 1)

• 2 dimensional chrominance and 1 dimensional luminance data (C2Y 1)

• 1 dimensional chrominance and 2 dimensional luminance data (C1Y 2).

Taking into account the number of classes, we first choose ten horizontal pix-els that consist of five U and five V data samples (Aperture A in Figure 5.3),which is used as both the classification aperture and interpolation aperture. In theapertures B, C and D, the interpolation aperture constantly consists of three U andthree V data samples in the horizontal direction, which is part of the classification

5.1. CLASSIFICATION-BASED CHROMINANCE UP-CONVERSION 91

aperture as well. The luminance data or chrominance data in the vertical dimen-sion (Aperture C and D in Figure 5.3) is included into the classification aperture.As depicted in Figure 5.3, the grey pixels form the interpolation aperture while thegrey and white pixels together form the classification aperture. Including detailededge information of the luminance data is expected to improve the classification ofthe image content which will lead to more accurate interpolation. From a theoret-ical point of view, there is no need to include the luminance data for interpolation,thus in order to reduce the complexity of the interpolation, the luminance datawas used only in the classification whereas chrominance data was used for bothclassification and interpolation.

For the aperture C1Y 1 shown in Figure 5.3, let US(i, 2j) and VS(i, 2j + 1)be the U and V component of the 4:2:2 video signal respectively, UH(i, 2j),VH(i, 2j), UH(i, 2j + 1) and VH(i, 2j + 1) be the chrominance value to be in-terpolated in the 4:4:4 video signal. The equations to calculate those chrominancevalues are:

UH(i, 2j) =2∑

k=0

wk,c1US(i, 2j + 2k − 2) (5.2)

VH(i, 2j) =2∑

k=0

wk,c2VS(i, 2j + 2k − 1) (5.3)

UH(i, 2j + 1) =2∑

k=0

wk,c3US(i, 2j + 2k − 2) (5.4)

VH(i, 2j + 1) =2∑

k=0

wk,c4VS(i, 2j + 2k − 1) (5.5)

(5.6)

5.1.2 Classification using ADRC

In the classification-based chrominance resolution up-conversion, individual fil-ters are computed for each class of image blocks. Classes were created by 1-bitquantization with ADRC [84] using Equation 5.1.

The individual class bits of all components (Y,U,V) are concatenated to form asingle class-index into the LUT of filter coefficients. Taking aperture B in Figure5.3 as an example, the class index is generated as shown in Figure 5.4.

ADRC encoding with more bits per pixel can be thought of in a similar way,but the number of pixels in the classification aperture has to be kept smaller. Oth-erwise, the computational load is unacceptable.


UY VY UY VY UY VY

35,50 40,75 60,160 58,82 116,70 68,50

YAV=81.2, U AV=70.3 and V AV=55.3ADRC

0,0 0,0 0,1 1,1 1,0 1,0

Aperture

Pixel Values

Result of ADRC

ADRCY= 001100 ADRC U= 001 ADRC V= 011

Figure 5.4: Class index is a concatenation of the Q values of Y, U and V components in thecurrent classification aperture. In this example image block, it is 001100− 001− 011.

5.1.3 Class inversionIt has been shown in previous studies [85] that for luminance resolution up-conv-ersion, the coefficients in the LUT should remain the same if we invert the picturedata. Consequently, any binary class should yield the same interpolation coeffi-cients. By combining the two complementary classes, the size of the LUT reduceswith a factor of two with no loss of image quality. It seems reasonable that thisholds equally well for the chrominance data U and V. Therefore, calculating theADRC-coefficients independently for the luminance and chrominance values hasan attractive side-effect. Since we propose to apply ADRC coding to Y, U and Vdata independently, the number of classes for each component is reduced by 50percent. It is also a simple conversion that gives little overhead in terms of bitadministration.

5.1.4 Least Mean Square algorithm for trainingThe interpolation coefficients in Equation 5.6 can be obtained from a trainingprocess that employs both 4:4:4 and 4:2:2 signals. The standard least mean square(LMS) optimisation is performed within each class to get the optimal coefficients.

5.2 EvaluationThe procedure for this evaluation is depicted in Figure 5.5. The evaluation re-quires 4:4:4 format video signal as reference. We first sub-sample the 4:4:4 formatvideo signal to a 4:2:2 format and then up-convert it to the 4:4:4 format using theclassification-based resolution up-conversion algorithm and linear up-conversion

5.2. EVALUATION 93

Original 4:4:4

Sub-sampledto 4:2:2

compliant withITU-R R.601

Up-scaling withClassification

Method

Linear 27-tap FIRFilter

Simple Linear[1 2 1] filter

Up-scaled4:4:4

MSEcalculation

Figure 5.5: Evaluation flowchart for chrominance resolution up-conversion algorithms. Theoriginal 4:4:4 signal is sub-sampled to obtain the 4:2:2 signal, the 4:2:2 signal is then up-convertedto 4:4:4 by three different approaches. The MSE between chrominance signal of the originalimages and up-converted images is calculated to evaluate each of the three methods.

respectively. Then the MSE between the chrominance signal of the original imageand up-converted image is calculated. We can compare the result of each apertureusing the combination of objective MSE criterion and subjective evaluation.

5.2.1 Linear up-scalingIn this study, a linear filter was designed for the down-scaling of the 4:4:4 formatto 4:2:2, according to the ITU requirements described in document Rec. ITU-RBT.601 [115]. This was done to provide a reference. The result was a center-symmetric 27-taps FIR filter, with the following coefficients: 0.0022, 0, -0.0065,0, 0.0141, 0, -0.0268, 0, 0.0491, 0, -0.0967, 0, 0.3148, 0, 0.4996, 0, 0.3148, 0,-0.0967, 0, 0.0491, 0, -0.0268, 0, 0.0141, 0, -0.0065, 0, 0.0022.

To demonstrate that this filter meets the ITU requirements, the frequency re-sponse of this FIR filter is shown in Figure 5.6. This filter was also used for 4:2:2to 4:4:4 up-sampling, as a reference for the new non-linear up-sampling method.

5.2.2 Test imagesTest images used in the evaluation include natural images and synthetic images.The natural images show the effectiveness of the algorithm in normal applications.However, one should bear in mind that the improvement can only be achieved atedges in chrominance, which are not as dominant in natural images as luminanceedges. The synthetic images, shown in Figure 5.7, with abundant chrominancetransitions were therefore created to show the specific advantages of the algorithm.Images A and B are natural images with much detail and vivid colour. Image Ahas a resolution of 2048 × 1536 and B is 1920 × 1080. Images C and D are testimages, both with a resolution of 720 × 576. The idea behind C is to create all


0 1 2 3 4 5 6−60

−50

−40

−30

−20

−10

0

10

Frequency (MHz)

Mag

nitu

de r

espo

nse

(dB

)

Figure 5.6: The frequency response that corresponds to the filter coefficients (thick line). Thestraight thin lines mark the ITU-recommended limits in between which the filter response mustpass. The connected straight lines at the left and right indicate lower and upper limits, respectively.Note, that the filter is skew-symmetrical and has a symmetrical pass-band at 3.375 MHz, at halfthe Nyquist frequency of 6.75 MHz.

possible types of vertical edges, including simple edges, double-step edges andthin lines of different widths, with different combination of colours (R, G, B, C,M, Y). Image D is a colour zone plate with edges at different orientations and withdifferent widths. The equations below specify how the image has been computed.In short, a grey-scale zone plate with values between 0.0 and 1.0 was used as aweighting factor to interpolate between two RGB-triplets, ~S = (rS, gS, bS) and~T = (rT , gT , bT ), thus creating the colour zone plate, ~C(i, j) :

~C(i, j) = w(i, j) · ~S + (1− w(i, j)) · ~T (5.7)

with

w(i, j) =1

2+

1

2Kmedian

[αsin

(π ∆

2L(i2 + j2)

), − d, + d

](5.8)

with (i, j) the image indices, w(i, j) the local weight at (i, j), ∆ the samplinginterval of the zone plate, L = 500/∆, and α scaling factor. The factor α defines aclipping amplitude for the sine-based zone plate, creating sharp edges of varyingwidths, where clipping values −d and +d correspond to ~T and ~S, respectively.Typical values were K = 0.5 and d = 0.25. A sample of the test image is shownin Figure 5.7 D.

5.3. OBJECTIVE COMPARISON – MSE ANALYSIS 95

A B

C D

Figure 5.7: Test images for chrominance up-scaling and resolution up-conversion.

5.3 Objective comparison – MSE analysis

An MSE analysis based on the four test images with different up-scaling or reso-lution up-conversion methods was performed. The methods we evaluated includethe 27-tap interpolation FIR filter, the simple (1, 2, 1) FIR interpolation filter andclassification-based method we introduced here with various apertures shown inFigure 5.3.

From Table 5.1, we can clearly see that the introduction of the luminance sig-nal in the classification improves the resolution up-conversion quality compared tothe traditional linear up-scaling method, especially for our test images. For naturalimages, the difference is rather small. This is due to the fact that the differencesonly occur at transitions in edges, which are a relatively small part of the naturalimage. On average, the MSE dropped about 30% in both U and V componentswith respect to the traditional linear method.

5.3.1 Subjective comparison

To better illustrate the detailed differences among all methods, as shown in Figure5.8, we put the chrominance components as grey values.

As we can see in Figure 5.8, the 27-tap low-pass interpolation FIR filter per-


Table 5.1: MSE of the chrominance resolution up-conversion.MSE Image A Image B Image C Image D Average

U 0.79 4.69 116.48 38.49 40.1127-tap filterV 0.77 1.85 109.46 47.79 39.96U 0.88 5.61 154.95 62.35 55.95(1, 2, 1) filterV 0.85 2.16 146.63 80.67 57.58U 1.53 6.05 102.94 33.12 35.91C1V 1.86 2.84 95.05 40.34 35.02U 0.67 5.27 81.32 30.72 29.50C1Y1V 0.67 1.89 78.18 38.56 29.83U 0.71 5.68 77.68 30.07 28.53C2Y1V 0.71 2.04 70.00 38.54 27.82U 0.68 5.34 73.38 29.85 27.31C1Y2V 0.67 1.91 68.22 38.88 27.42

forms well in most areas and gives clear details. However, it causes strong ringingat the sharp edges, which can be very annoying for the perceived image quality.The 3-tap FIR filter corrects this problem to some extent, but creates staircases andblurring in the detailed areas such as the parallel lines in the bottom-left part ofthe image block (area N ) in Figure 5.8. With the classification-based method, theup-converted image keeps the sharpness without generating any ringing. In thisclassification-based method, local clipping, normally used to prevent overshoots,can be omitted. This can be observed mainly in the edge transition area in the toppart of the image block (area M ) in Figure 5.8. To further illustrate the advantageof using luminance data during classification, we show another detailed area inour test image in Figure 5.9. Within this image, it is clear that with the help ofluminance data in the classification, one obtains sharper chrominance edges in theup-converted image.

5.4 ConclusionIn this chapter, chrominance resolution up-conversion using classification-basedLMS filters is presented. To this end, the luminance information is included inclassification. The classification-based chrominance up-conversion using lumi-nance information is a very effective method for video chrominance up-conversion.With this method, we can up-convert the chrominance components in the videosignal with clear and distinct colour at the edges. Classification using a 2-D lumi-nance aperture will make better use of edge information in the luminance signal.Compared with the traditional linear method, this new algorithm shows clear im-

5.4. CONCLUSION 97

Original A B C D

M

N

Figure 5.8: Comparison of different up-scaling and resolution up-conversion method. Here weonly show the U component. The V component performs in the same way. Left is the Original,(A) up-converted using 27-tap FIR filter, (B) 3-tap FIR filter, (C) classification-based method usingaperture A shown in Figure 5.3, and (D) classification-based method using aperture D shown inFigure 5.3. To compensate for the low-pass characteristic of the printing process, we enhanced theimages using a 2D linear symmetrical peaking filter.

provement. It gives distinct colour transitions without introducing any overshoots.The overshoot free result comes from the nature of the LMS algorithm, thereforelocal clipping is not necessary.

The algorithm yields an easy hardware implementation with low cost. It usesa LUT for obtaining interpolation coefficients. A simple threshold operation gen-erates the address for the LUT, and the pixels involved in interpolation are limitedto six chrominance data.


Original A B C

Figure 5.9: Illustration of using luminance data in the classification. TOP: U component.MIDDEL: Waveform of cross-section in U component at the position indicated by the whiteline. BOTTOM: An enhanced version of the top images. Left is the Original, (A) up-convertedwith the 27-tap FIR filter, (B) up-converted with the classification-based method using aperture Ashown in Figure 5.3, (C) classification-based method using aperture D shown in Figure 5.3. TheV component performs in the same way. We enhanced the images using a 2D linear symmetricalpeaking filter to compensate for the low-pass characteristic of the printing process, which is shownat the bottom row.

Chapter 6

Image and Video De-blocking

With the increase of the resolution, the storage capacity and transmission band-width become more and more demanding for digital video materials. Digital com-pression techniques have been developed to solve these problems. Such compres-sion techniques include JPEG and MPEG, which are block-based compressiontechniques and are current international standards [48]. In the JPEG and MPEGcompression schemes, images are segmented into blocks, typically of 8 × 8 pix-els, which are coded independently [48]. Discontinuities between coding blockslead to artefacts and can greatly degrade the image quality, especially at low bit-rates, or a high compression ratio. The most common digital artefacts caused byimage and video compression are so-called blocking artefacts. Traditional imageenhancement techniques, particularly for contrast or sharpness enhancement, willworsen rather than enhance an image in the presence of those blocking artefacts 1.

As illustrated in Figure 6.1, blocking artefacts are normally categorized intothree types[18, 19]:

• staircase noise, where compression leads to jagged edges

• grid noise, which happens at the borders of coding blocks in smooth areasof the image

• corner outliers, where pixels located at corners have a clearly deviatingvalue from the neighbours

Various techniques for removing those blocking artefacts, or “de-blocking al-gorithms”, have been presented in previous studies [18][19][118]-[128]. Post-processing is the most popular approach, as it does not require any change to the

1This chapter is an adaptation of a previous publication in the Proceedings of the 8th Interna-tional Symposium on Consumer Electronics, 2004[116].

99

100 CHAPTER 6. IMAGE AND VIDEO DE-BLOCKING

Figure 6.1: Image contains blocking artefacts. The circle marked with a indicates staircasenoise, b indicates grid noise and c shows a corner outlier.

current image and video compression standard, and thus leaves the encoder un-touched. The simplest approach is low-pass filtering of the decoded image [118],but the drawback is the smoothing of image details that results in a blurred image.More advanced methods try to filter the image adaptively rather than uniformly.Those methods attempt to classify the edge-directions in the image block basedon spatial image content [18]-[122], on DCT coefficients [123]-[126], or on acombination of the two [127, 128].

Digital artefacts, including blocking artefacts, can be seen as errors in the im-age, thus can be tackled with image error recovery techniques. We start the inves-tigation of image and video error recovery [117] method using content-adaptiveLMS filtering proposed by Kondo. In Kondo’s method for image and video errorrecovery, the momentary filter coefficients depend on the local (block) structure ofthe image, which can be classified based on the local pattern in the filter apertureusing Adaptive Dynamic Range Coding (ADRC) [84]. Figure 6.2 illustrates themethod.

We can interpret the compression artefacts as “error” and recover them byKondo’s scheme, using filters that adapt to the local image structure. For imageand video de-blocking, the degradation model is JPEG or MPEG encoding anddecoding. Here, we propose to extend this adaptiveness by adapting filters alsoto the relative position in the local grid, as compression artefacts appear to bedependent on both the relative position of the block grid and the local structure.

In this chapter, the de-blocking algorithm using classification-based LMS fil-

6.1. DE-BLOCKING ALGORITHMS 101

OriginalImage

Degrad-ation

Storecoefficients

for everyclass in LUT

CorruptedImage

Classcode

Recovery filtercoefficients

LMS-optimisecoefficients

per class

ADRC

Figure 6.2: The training process in Kondo’s error recovery method. Sample pairs of originalimage or video and its corrupted counterpart are extracted from the training material. Classificationof image samples is done by ADRC. Using Least Mean Square algorithm within every class,optimal coefficients is figured out and stored in a Look Up Table (LUT).

tering is presented. Since JPEG and MPEG are block based techniques, blockingartefacts do not appear equally at different pixel locations inside the coding block.Therefore, algorithms are designed to adapt to both luminance information andpixel position within the coding block, including:

• the local image structure

• the relative position of the pixel in the block grid

• combination of the two features.

The algorithms were implemented and evaluated in the frame-work of de-blockingJPEG compressed images.

6.1 De-blocking algorithmsThe filter aperture used to perform the de-blocking process is defined as shownin Figure 6.3. The filter aperture is symmetrical in shape, and the size is limitedto 13 pixels. Further extension of the symmetrical aperture using more pixelswill highly increase the cost. The choice of the aperture is a result of an educatedguess, considering that this results in five pixels that would sample pure horizontalor vertical edges and three pixels for edge angles of 45o or 135o. The classificationof the group of pixels inside this aperture will determine the selection of the filter,such that the image content can be filtered in a MSE optimal way.


j-2 j-1 j

i-2

i-1

i

i+1

i+2

j+1 j+2

F11

F0

F3F2F1

F4 F7F5

F9

F12

F6

F10

F8

Figure 6.3: Filter aperture used in content-adaptive de-blocking. The output pixel FF (i, j) inthe center of the aperture is calculated from 13 decoded pixels F0 to F12.

Let FO(i, j) be the luminance value of the original pixel at position (i, j), andFF (i, j) be the filtered output, which is a weighted sum of the thirteen pixels inthe filtering aperture of the image containing blocking artefacts. The equation tocalculate FF (i, j) is:

FF (i, j) =12∑

k=0

wk,cFD(m, n; k) (6.1)

where wk,c are weights that depend on the class to which the current image blockbelongs. As illustrated in Figure 6.3, FD(m, n; k) is the kth pixel in the filteringaperture of the decoded image at location (m,n). Again, the weights wk,c areobtained from standard least square regression, or, the least mean square (LMS)optimisation.

The classification method will directly affect the performance of the LMS op-timisation, and in consequence, the quality of the de-blocking filtering. There areinnumerable ways to classify the local image data, but the aim should be group-ing image apertures with the most similarities. We do not derive the classificationsystematically, therefore the classification methods are based on considerations ofrelative information. We start with classification based on local image structuresusing ADRC, as proposed by Kondo, followed by a method also using relativepixel position information in the de-coding block.

6.1.1 Classification of structureOne can classify a pixel based on its surrounding pixels, by encoding each pixelin the classification aperture into 1-bit with ADRC. The value, Q, of the bit isdefined by:

6.1. DE-BLOCKING ALGORITHMS 103

Q = b FD − FMIN

FMAX − FMIN

+ 0.5c (6.2)

Here FD is the luminance value of the decoded input pixel. FMAX and FMIN

are the maximum and minimum luminance values of the decoded pixels in theclassification aperture respectively. b·c is the floor operator.

With the ADRC scheme, the number of classes increases exponentially withthe number of pixels in the aperture. It is always a trade-off between system com-plexity and precision of classification. To retain an affordable number of classeswithout sacrificing the performance, we can define separate filtering and classifi-cation apertures. Those two apertures can be the same, but can also be differentso that the filter aperture acts on pixel samples used for filtering only, whereas theclassification aperture may contain data for classification. If desired, the classifi-cation aperture can be relatively small to achieve an cost-effective implementation.

Analogous to what has been done for the luminance up-scaling, the coeffi-cients in the LUT should remain the same if we invert the picture data [74]. Withthe classification and filtering aperture shown in Figure 6.3, using this class inver-sion method, ADRC will generate a 12-bit class code.

6.1.2 Classification of relative positionIt has been shown in previous studies [121], that the relative pixel position insidethe coding block can be of great importance in the classification of block basedencoded and decoded images or video content. A way to classify a pixel is to useits relative position inside the coding block. An example of the common 8 × 8pixel block is shown schematically in Figure 6.4. A coarse classification may usefour relative positions while a more complex and accurate classification could useall 16 possible relative positions. Here, only the classification of the pixels in thetop left part of the coding block is shown. Classification of other pixels in thecurrent coding block can be derived from symmetry. The 16 relative positionsneed four bits to be encoded as classes, whereas four relative positions requireonly two bits.

6.1.3 Classification of structure and relative positionUsing the combination of image structure classification and the pixel position clas-sification could lead to a classification of decoded image data, although the addi-tional bits should be justified by a sufficiently large improvement. Concatenationof the class-code generated by image structure classification using ADRC and theclass-code of position class gives the final class-code, which addresses the LUTof the de-blocking filters.


(A) (B)

0 1

2 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Figure 6.4: Classification of the relative position in the 8×8 decoding block using: (A) 4 relativepositions. (B) 16 relative positions. Here only the class-code of the 16 pixels in the top-left regionof the decoding block is given, class code of pixels in the other three regions can be derived usinga position ’mirror’ operation.

To validate the effectiveness of introducing position information for classifica-tion, a combined classification that uses nine pixels for structure information with16 relative pixel positions was included in the current study. The pixels used forstructure classification are the central nine shown in Figure 6.3 (pixels in the box).

6.2 Results and Evaluation

6.2.1 Implementation of the training

A training process was performed to obtain the de-blocking filters. As shown inFigure 6.2, the degradation concerns the JPEG compression and de-compression.The free baseline JPEG software from the Independent JPEG Group website [129]is used for the encoding and decoding. The so-called quality factor of this com-pression is set to 20. Beside using ADRC for image structure classification andrelative pixel position information for position class, several combinations of thetwo were investigated in this study. The number of classes corresponding to eachof the classification methods is demonstrated in Table 6.1.

We should emphasise that all the methods use the same interpolation apertureshown in Figure 6.3. They only differ from each other in classification. As shownin Table 6.1, method B and G contain the same number of classes. Method Ballocates all 13 bits for structure classification while method G allocates 8 bits forstructure and 4 bits for relative position classification. With these two differentclassification schemes, containing the same number of classes, we explore the

6.2. RESULTS AND EVALUATION 105

effectiveness of using relative position information in classification.Here, single class (Type A) means that all the local image content is put into a

single class, and only one set of filter coefficients is calculated in the MSE-optimalway. This is basically the Wiener filtering approach. The ADRC is performedwith the aperture shown in Figure 6.3. Since the linear regression algorithm onlycontains a single global minimum, we use as much images as possible for thetraining, a total of 5,000 images.

6.2.2 Test materialFive images were used to test our proposed algorithm. The original uncompressedimages are first compressed with a JPEG encoder to introduce blocking aretfacts,and then filtered to reduce the compression artefacts. Figure 6.5 shows the originaluncompressed images.

6.2.3 Performance measurementA reliable image quality evaluation still requires a subjective assessment. Never-theless two objective scores have been measured that at least give a partial view onthe obtained performance. To enable comparison with results from other publica-tions, the MSE between the original image and the de-blocked image is calculatedaccording to:

MSE =1

N

∑i,j

(FO(i, j)− FF (i, j))2 (6.3)

Here, FO(i, j) and FF (i, j) are the luminance values of the pixels in the originaland the image after de-blocking, respectively. N represents the number of pixelsin the image.

Table 6.1: Classification details for image de-blocking using classification-based LMS filters.The abstract is a concatenate of number of bits for structure class (S) and position class (P).

Type # structure # position # structure # position # total abstractclasses classes bits bits classes

A 1 0 / / 1 S1P0B 4096 0 13 0 4096 S13P0C 0 4 0 2 4 S0P2D 0 16 0 4 16 S0P4E 4096 4 13 2 16384 S13P2F 4096 16 13 4 65536 S13P4G 256 16 8 4 4096 S9P4


Bicycle Lenna Boat

MotorBirds

Figure 6.5: The uncompressed images used in the evaluation.

Although a non-zero MSE is caused by artefacts, the MSE value does notcorrespond with the perceived quality as some artefacts are more annoying thanothers. The MSE cannot discriminate between highly visible regular patterns asin blocking and artefacts as ringing that may be less disturbing.

To give a more quantitative impression of the blocking artifact, we use anotherobjective metric specially designed for measuring blocking artefacts in the image,i.e. the “Block-edge Impairment Metric (BIM)” proposed by Wu [130]. The BIMexpresses the strength of the blocking artefacts present in the image, taking theirvisibility into account. The BIM value reflects the amount of energy present in thegradient at the borders of the coding block in horizontal and vertical directions incontrast to the gradients inside the blocks. It has been shown to correlate very wellwith the perceived blocking in an image. A value BIM = 1 refers to no blockingartefacts at all and the larger the BIM number, the more blocking artefacts arepresent in the image.

Drawback of the BIM is that one can simply minimize the BIM metric bystrongly smoothing the image. However, this will lead to a blurred image, whichwill become evident from a high MSE-figure. Therefore, we expect a combinationof MSE and BIM to better reflect the perceived image quality. Particularly, if theBIM of a first method is lower than that of a second method, while its MSE is nothigher, then the first method is likely better also in a subjective assessment.

Since the two objective metrics still cannot fully replace a subjective assess-ment, we shall enable the reader to form his own subjective opinion of the perfor-mance by also showing processed images for comparison.

6.2. RESULTS AND EVALUATION 107

Table 6.2: MSE scores on individual images compressed with Quality Factor = 20. The firstcolumn (BLK) are scores of images containing blocking artefacts. The others are images de-blocked by using classification of: (A) Single class, (B) 4096 structure classes, (C) 4 positionclasses, (D) 16 position classes, (E) 4096 structure classes + 4 position classes, (F) 4096 structureclasses + 16 position classes, (G) 256 structure classes + 16 position classes.

MSE BLK A B C D E F GABSTRACT S1P0 S13P0 S0P2 S0P4 S13P2 S13P4 S9P4

Bicycle 63.5 53.5 44.4 53.2 53.1 44.1 43.3 46.9Lenna 36.9 32.4 30.7 32.1 32.0 30.4 29.5 31.1Boat 60.7 53.4 50.7 53.1 53.0 50.3 49.6 51.1Birds 15.0 12.8 12.6 12.6 12.6 12.5 12.0 13.6Motor 129.3 113.8 103.9 113.3 113.4 103.7 102.3 106.0

6.2.4 Results and Comparisons

Table 6.2 shows the MSE comparison of the evaluated methods. To test the ro-bustness of the proposed algorithms, we also applied content-adaptive filters toJPEG images that are compressed with a different quality factor than what is usedfor training. The MSE result of this second test is shown in Table 6.3. Clearly, theconclusions are not critically depending on the compression ratio.

In terms of MSE, we can see that combination of structure classes and positionclasses gives the best result. However, judged by MSE, as shown in Table 6.2and Table 6.3 the main improvement step comes from plain low-pass filtering(A) and a further step from using structure information in addition to structureinformation to introduce content adaptivity (F,G). Using relative pixel positioninside the coding block alone brings not more than a minor improvement in termsof MSE.

In contrast to these observations are the conclusions we could draw from Fig-

Table 6.3: MSE scores on individual images compressed with Quality Factor = 30. The firstcolumn (BLK) are scores of images containing blocking artefacts. (A) to (G) stands for the sameclassification method shown in Table 6.2.

MSE BLK A B C D E F GBicycle 44.5 37.7 31.7 37.3 37.3 31.2 30.4 34.2Lenna 27.1 23.9 23.1 23.8 23.7 23.1 22.2 23.7Boat 44.3 38.9 37.3 38.8 38.8 37.2 36.5 37.3Birds 9.6 8.3 8.8 8.3 8.3 8.3 8.3 8.3Motor 93.0 82.8 76.0 82.7 82.5 76.9 75.4 78.4


BIM

2.22

1.67 1.62 1.60 1.58 1.57

1.391.49

1.00

1.20

1.40

1.60

1.80

2.00

2.20

2.40

BLK A B C D E F G

Figure 6.6: BIM comparison. (BLK) Images coded and decoded with quality factor 20 thatcontain blocking artefacts, and using de-blocking filter from classification of: (A) Single class, (B)4096 structure classes, (C) 4 position classes, (D) 16 position classes, (E) 4096 structure classes +4 position classes, (F) 4096 structure classes + 16 position classes, (G) 256 structure classes + 16position classes.

ure 6.6, which shows a comparison of the evaluated methods using the BIM met-ric. The classification using relative pixel position information alone gives almostthe same BIM scores compared with classification using only local image struc-ture, even though a much lower number of classes is used (16 with relative pixelposition classes compared to 4096 local image structure classes). The combina-tion of the two classification methods clearly further improves the BIM score,especially for method F that using 16 relative positions.

So, although the importance of using position information is hardly visiblein the MSE-score, the BIM shows that it has a great advantage for the perceivedblocking artefacts. Our screen shots later shall also prove the relevance of usingmore than one metric to reflect the perceived image quality.

Comparing the MSE scores of methods B (4096 structure classes) and F (256structure classes and 16 position classes), we notice that allocating 4-bits to theposition instead of to the structure classes leads to a slight decrease in MSE, butthe decrease in the BIM metric is more obvious. Another observation, when com-paring method B with F and method G with F respectively, is that adding 4 bitsfor position classes (compare B with F) is more advantageous than adding 4 bitsfor structure classes (G to F). Although the algorithm was trained to minimizethe MSE, the BIM score is improved as well, which indicates the reduction ofblocking artefacts.

6.3. CONCLUSION 109

In conclusion, our analysis based on the objective evaluation shows that theMSE score is better when using structure information for classification, while theBIM score is better when using relative position information in coding artefactsreduction. The weakness of using structure information or relative position aloneis that they cannot distinguish block discontinuities from edges within the image,as reflected by the BIM metric. We conclude that position information helps inremoving grid noise, and we expect that structure information may also help incancelling staircase noise and ringing artefacts.

Figure 6.7 is added to enable a subjective comparison. Figure 6.8 shows anenhanced version of Figure 6.7 to help the reader perceiving the differences, bycompensating for quality loss in the printing process. We can see from these im-ages that with the one-class de-blocking method, the blocking artefacts are partlyremoved, but are still visible. Using structure information or position informationalone may reduce the blocking artefacts, but neither of them can totally removethe grid noise. A classification that combines image structure information andrelative pixel position gives the best result.

6.3 ConclusionIn this chapter, we presented a blocking artefacts reduction method using class-ification-based LMS filters. The introduction of content adaptivity using imageclassification leads to MSE optimal filters for image de-blocking. Local imagecontent and relative pixel position information are used for classification. Withthe proposed de-blocking method, the input JPEG and MPEG decoded imagesare filtered with content-adaptive filters obtained from a training process. Theexperiment shows that the proposed method is capable of removing the blockingartefacts in the de-coded image, without constraining the encoder or decoder.

We also conclude that using relative pixel position information inside the cod-ing block combined with local image structure for classification gives better re-sults than using local structure alone. In fact, using structure information by itselfcannot distinguish blocking artefacts from normal image edges, especially at theblock border. It is more efficient to include relative position information than allo-cate bits for extra image structure, provided sufficient image structure is availablefor classification.

The drawback of using relative position information is that the block grid po-sition is required. Processing may shift the block grid and the block size may evenbecome different from the normal 8×8 due to scaling, which is not uncommon ina video processing chain. Therefore, block grid detection techniques are requiredto provide relative position information for classification. We did not investigateto which extent the quality of known block-grid detection techniques [131, 132]


BA C

D E F

BA C

D E F

Figure 6.7: Image detail of Motor (Top) and Birds (Bottom) processed with different de-blocking methods: (A) Original picture, (B) JPEG compressed version with quality factor setto 20, and de-blocking version using (C) Single class de-blocking filter, (D) 16 position classes,(E) 4096 structure classes, (F) 4096 structure classes + 16 position classes.

6.3. CONCLUSION 111

BA C

D E F

BA C

D E F

Figure 6.8: An enhanced version of Figure 6.7. To compensate for the low-pass characteristic ofthe printing process, we enhanced the images using a 2D linear symmetrical peaking filter. Thusthe reader may perceive the difference among various methods.


affects the performance of our proposed method.To discriminate between image edges and grid noise, the dynamic range of the

local image content could also be used in the classification. The optimal combi-nation of image structure, relative position, and local dynamic range for classifi-cation remains an interesting topic for future investigation.

Even though the investigations in the specific area of image de-blocking arecertainly not exhaustive, the success of the algorithm shown in this chapter furtherdemonstrates the effectiveness of classification-based LMS filters in the video en-hancement field. Heuristic optimisation can be avoided, while the researcher mayfocus on the optimal classification. We have to admit though that finding the opti-mal classification unfortunately still requires the creativity of the researcher.

Chapter 7

Conclusions and Future Work

Content-adaptive Least Mean Square (LMS) filters, i.e. content-adaptive filtersdesigned via linear regression, have been investigated in this thesis for applica-tions in video and image enhancement.

Our interest in these filters was triggered when creating an overview of state-of-the-art video up-scaling and resolution up-conversion techniques using bothobjective and subjective evaluation. From this overview, Kondo’s classification-based LMS filtering approach for video resolution up-conversion attracted ourattention due to its high performance and relatively simple hardware implemen-tation. Li’s localized LMS filtering approach for resolution up-conversion wasinvestigated in a comparative study as an alternative.

We recognized that the content-adaptive LMS filtering, especially Kondo’sclassification-based LMS filtering, which outperforms Li’s localized LMS-filtering,can be used for a broad area of video enhancement applications. Examples of ap-plications elaborated in this thesis include resolution up-conversion, de-interlacing,chrominance resolution up-conversion and coding artefact reduction.

Furthermore we showed that although the LMS filtering approaches are basedon objective metrics, particularly the MSE criterion, it is still possible, for videoup-scaling, to design subjectively more pleasing filters using the LMS metric.

We had to conclude that a direct mapping of the content-adaptive LMS filter-ing to the problem of intra-field de-interlacing does not outperform all existingintra-field methods. However, the classification-based LMS filtering approachwas shown to successfully replace the heuristics in two other de-interlacing de-signs, including vertical temporal filtering and data mixing for hybrid de-interlacingcombining motion-compensated and edge-adaptive techniques.

Finally, we investigated the classification-based LMS filtering approach forchrominance resolution enhancement and coding-artifact reduction. In the chromi-nance up-conversion problem, we combined luminance and chrominance infor-mation in an innovative classification of the local image patterns leading to an

113

114 CHAPTER 7. CONCLUSIONS AND FUTURE WORK

improved performance compared to earlier designs. In coding-artifact reduction,the relative position information of the pixel inside the coding block, along withthe local image structure for classification, was shown to give interesting perfor-mance and elegant optimisation.

7.1 Concluding remarks

The content-adaptive LMS filtering approach can be used in various video en-hancement applications. The content adaptation can be based, either on classifi-cation of the local picture content, or on localization of the LMS-training. Theclassification-based approach was shown to yield a better performance while thecomplexity is significantly lower.

Traditionally, filters for video enhancement applications are derived, eitherfrom theory, or from a heuristic approach. Filters derived from theory often donot yield the optimal performance due to inaccurate modelling of natural videosignals. On the other hand, filter design with a heuristic approach is usually timeconsuming and requires thorough expertise. The current LMS filtering approachwith content adaptation provides a very effective filter design methodology thatreplaces the heuristics by ”supervised learning” or ”training”.

The nature of the content adaptation, realized by classification of the inputdata, is application dependent. Consequently, the classification has to be basedon an analysis of the characteristic of the specific application and therefore stillrequires expertise and unfortunately often also some heuristics. Once the classifi-cation has been decided upon however, the optimal usage of the obtained contentadaptation is fully automated. We hope and expect that the various applicationselaborated in this thesis stimulate the creativity of the reader in developing appro-priate classification methods for other application areas.

From a hardware implementation point of view, classification-based LMS fil-ters have the highly attractive feature that a single architecture, i.e. a filter kernelobtaining its coefficients from a Look-Up Table, can be used in a wide rangeof video enhancement tasks. Although the off-line training may take some time(only once), real time processing gives no problem with the current semiconductormanufacturing technology for resolutions including HDTV.

7.2. FUTURE WORK 115

7.2 Future work

7.2.1 Non-linear versus linearThe adaptive LMS filtering approaches investigated in this work are based on thelinear regression model. Also, the individual content-adaptive filters are linearfilters. Non-linear models like neural networks and support vector machines havebeen proposed for non-linear filter design.

The neural network is a non-linear regression model with a great flexibility andhas been shown to give interesting results for resolution up-conversion in Chapter2. Given a set of input and output data, the performance of the neural networkdepends on the structure of the network. Particularly, increasing the number ofhidden nodes will often increase the performance, be it at the cost of an increasedcomputational complexity. Based on our favourable experience with local imagestructure classification, Hu has presented a neural network with pre-classificationof the image data [133]. His experiments show that the replacement of the linearfilter module by a neural network leads to a MSE reduction of about five percentthat comes with a mildly increased computational cost.

Further research, aimed at marrying non-linear approaches with local imagestructure adaptation seems interesting, not only for resolution up-conversion thathas been touched upon by Hu, but also for other video enhancements.

7.2.2 Optimisation based on subjective metricThe linear LMS filters and the non-linear neural network filters are all aiming atminimizing the MSE during the regression. Although we could experimentallyshow a good correlation of the MSE and the perceived image quality for intra-field de-interlacing algorithms, it is commonly known that the MSE does not ac-curately reflect the subjective image quality for most other video enhancementtasks. Automatic optimisation based on a reliable subjective metric, therefore,seems a challenging target for future video enhancement research.

116 CHAPTER 7. CONCLUSIONS AND FUTURE WORK

Bibliography

[1] J. D. McGee, The history of electronic imaging In: Electronic imaging, Edi-tors: T. P. McLean and P. Schage, Academic Press, 1979.

[2] A. M. Dhake, Television engineering, Tata McGraw-Hill Publishing Com-pany Limited, 1980.

[3] History of BBC, 1930s, available at:http://www.bbc.co.uk/heritage/story/1930s.shtml.

[4] H. Benoit, Digital television, ISBN: 0340691905, Arnold, a member of theHodder Headline Group, 1997.

[5] W. E. Glenn, “Consumer displays of the future”, IEEE Transactions on Con-sumer Electronics, Vol. 42, No. 3, pp. 573-576, Aug. 1996.

[6] K. I. Werner, “The flat panel’s future”, IEEE Spectrum, Vol. 30, No. 11, pp.18-22, 25-26, Nov. 1993.

[7] A. Ishizu, K. Sokawa, R. Matsuura, K. Imai and M. Fujita, “Digital signalprocessing for improved NTSC television receiver”, IEEE Transactions onConsumer Electronics, Vol. 35, No. 3, pp. 259-265, Aug 1989.

[8] R. P. Kleihorst, G. de Haan, R. L. Lagendijk and J. Biemond, “Motion com-pensated noise filtering of image sequences”, Proceedings of the EuropeanSignal Processing Conference (EUSIPCO’92), pp. 1385-1388, Aug. 1992.

[9] E. B. Bellers and G. de Haan, De-interlacing: A key technology for scan rateconversion, Advances in Image communications, Elsevier, Vol. 9, ISBN: 0-444-50594-6, 2000.

[10] L. D. Johnson, J. N. Pratt and D. C. Greene, “Low cost picture-in-picture forcolor TV receivers”, IEEE Transactions on Consumer Electronics, Vol. 36,No. 3, pp. 380-386, Aug. 1990.

117

118 BIBLIOGRAPHY

[11] Y. Tomari, M. Saito, N. Okada, and R. Yoshida, “Design and implementationof Internet-TV”, IEEE Transactions on Consumer Electronics, Vol 43, No.3, pp. 953-960, Aug. 1997.

[12] J. Whitaker, Standard Handbook of Video and Television Engineering, 4thEdition, ISBN: 0071411801, McGraw-Hill companies, 2003.

[13] H. Kaneko and T. Ishiguro, “Digital television transmission using bandwidthcompression techniques”, IEEE Communications Magazine, Vol. 18, No. 4,pp. 14-22, Jul. 1980.

[14] U. Reimers, Digital video broadcasting – the international standard for dig-ital television, Springer-Verlag Berlin Heidelberg New York, ISBN: 3-540-60946-6, 2001.

[15] U. Reimers, “Digital video broadcasting”, IEEE Communications Magazine,Vol. 36, No. 6, pp. 104-110, Jun. 1998.

[16] T. Sikora, “MPEG Digital Video-Coding Standards”, IEEE Signal Process-ing Magazine, Vol. 14, No. 5, pp. 82-100, Sep. 1997.

[17] P. Gastaldo, S. Rovetta, R. Zunino, “Objective quality assessment of MPEG-2 video streams by using CBP neural networks”, IEEE Transactions on Neu-ral Networks, Vol. 13, No. 4, pp. 939-947, Jul. 2002.

[18] Y. L. Lee, H. C. Kim and H. W. Park, “Blocking effect reduction of JPEGimages by signal adaptive filtering,” IEEE Transactions On Image Process-ing, Vol. 7, pp. 229-234, Feb. 1998.

[19] B. Ramamurthi and A. Gersho, “Non-linear space-variant post-processing ofblock coded image,” IEEE Transactions On Acoustics, Speech, and SignalProcessing, Vol. ASSP-34, pp. 1258-1267, Oct. 1986.

[20] Y. Wu and B. Caron, “Digital television terrestrial broadcasting”, IEEE Com-munications Magazine, Vol. 32, No. 5, pp. 46-52, May 1994.

[21] D. Anastassiou, “Digital television”, Proceedings of the IEEE, Vol. 82, No.4, pp. 510-519, Apl. 1994.

[22] J. B. Waltrich, “Digital video compression - an overview”, Journal of Light-wave Technology, Vol. 11, No. 1, pp. 70-75, Jan. 1993.

[23] G. de Haan, Video Processing for Multimedia Systems, 2nd Edition, ISBN:90-9014015-8, 2000.

BIBLIOGRAPHY 119

[24] C. H. Lee, S. H. Park, J. K. Kang and C. W. Kim, “A real time image pro-cessor for reproduction of gray levels in dark areas on plasma display panel(PDP)”, IEEE Transactions on Consumer Electronics, Vol. 48, No. 4, pp.879-886, Nov 2002.

[25] H. Lee, S. Koichi, B. Cho, T. Lee and H. Lee, “Implementation of high per-formance false contour reduction system using pattern analysis and error-predict method for PDP-HDTV”, Proceedings of ICCE 2003, IEEE Interna-tional Conference on Consumer Electronics 2003, pp. 412-413, 17-19 June2003.

[26] G. Moore, “Cramming More Components Onto Integrated Circuits”, Elec-tronics Magazine, Vol. 38, No.8, Aug. 1965.

[27] “Moore’s law”, From Wikipedia, the free encyclopedia, available at:http://en.wikipedia.org/wiki/Moore’s law.

[28] I. Tuomi, “The live and death of Moore’s law”, FirstMonday, Vol. 7, No. 11, Nov. 2002, available at:http://firstmonday.org/issues/issue7 11/tuomi/index.html.

[29] G. de Haan, J. Kettenis and B. de Loore, “IC for motion-compensated 100Hz TV with natural-motion movie-mode”, IEEE Transactions on ConsumerElectronics, Vol. 42, No. 2, pp. 165-174, May 1996.

[30] G. de Haan, “IC for motion-compensated de-interlacing, noise reduction,and picture-rate conversion”, IEEE Transactions on Consumer Electronics,Vol. 45, No. 3, pp. 617-624, Aug. 1999.

[31] D. Clark, “PC and TV makers battle over convergence”, Computer, Vol. 30,No. 6, pp. 14-16, June 1997.

[32] S. R. Turner, “Interactivity and the convergence of TV and PC-real or imag-ined?”, Broadcasting Convention, International (Conference PublicationNo. 428), pp. 200-202, 12-16 Sep. 1996.

[33] R. Jain, “PC and TV convergence: is it finally here?”, IEEE Multimedia, Vol.9, No. 4, pp. 103-104, Oct.-Dec. 2002.

[34] R. Willner, “Transforming the PC into a TV, radio, VCR, and video editingstudio”, Conference record WESCON/’95. ’Microelectronics Communica-tions Technology Producing Quality Products Mobile and Portable PowerEmerging Technologies’, pp. 743-748, 7-9 Nov., 1995.

120 BIBLIOGRAPHY

[35] “Features and benefits of the HP Media Center PC”, available at:http://www.pcworld.com/reviews/article/0,aid,118824,00.asp.

[36] C. Sodergerd, Mobile television - technology and user experiences, VTTPublications, ISBN 9513862429, 2003.

[37] “Philips predicts mobile TV dominance”, Wireless Watch, available at:http://www.theregister.co.uk/2005/01/28/mobile tv prediction/.

[38] M. Kornfeld, “DVB-H - the emerging standard for mobile data communi-cation”, 2004 IEEE International Symposium on Consumer Electronics, pp.193-198, Sep. 1-3, 2004.

[39] Y. W. Sawng, J. S. Lee and H. S. Han, “Market Acceptance for the SatelliteDMB (Digital Multimedia Broadcasting) Services in Korea”, Proceedingsof International Conference on Mobile Business, 2005. ICMB 2005, pp. 413- 419, 11-13 Jul. 2005.

[40] G. de Haan, “Video format conversion”, invited paper, Digest of the SID’99,pp. 52-55, May 1999.

[41] E. B. Bellers, I. E. J. Heynderickx, G. de Haan, I. de Weerd, “Optimal tele-vision scanning format for CRT-displays”, IEEE Transactions on ConsumerElectronics, Vol. 47, No. 3, pp. 347-353, Aug. 2001.

[42] G. de Haan and E. B. Bellers, “De-interlacing - an overview”, Proceedingsof the IEEE, Vol. 86, Issue 9, pp. 1839-1857, Sep. 1998.

[43] A. M. Tekalp, Digital Video Processing, ISBN: 0-13-190075-7, PrenticeHall PTR, 1995.

[44] Y. Wang, J. Ostermann and Y. Q. Zhang, Video Processing and Communica-tions, ISBN: 0-13-017547-1, Prentice-Hall,2002.

[45] K. Jack, Video Demystifyied: a Handbook for the Digital Engineer, 3rd Edi-tion, ISBN: 1-878707-56-6, LLH Technology Publishing, 2001.

[46] C. A. Poynton, A Technical Introduction to Digital Video, ISBN: 0-471-12253-X, John Wiley & Sons, 1996.

[47] A. F. Inglis and A. C. Luther, Video Engineering, 2nd Edition, ISBN: 0-07-031791-7, McGraw-Hill, 1996.

[48] V. Bhaskaran and K. Konstantinides, Image and Video Compression Stan-dards, Algorithms and Architectures, 2nd edition, ISBN: 0792399528,Kluwer Academic Publisher: Norwell, MA, pp. 147-228, 1997.

BIBLIOGRAPHY 121

[49] M. Ghanbari, Standard codecs: image compression to advanced coding,ISBN: 0-85296710-1, The Insitution of Electrical Engineers, London, 2003.

[50] G. J. Sullivan and T. Wiegand, “Video Compression - from concepts to theH.264/AVC standard”, Proceedings of the IEEE, Vol. 93, No. 1, pp. 18-31,Jan 2005.

[51] J. Biemond, L. Links and D. Boekee, “Image modeling and quality criteria”,IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 27,No. 6, pp. 649-652, Dec. 1979.

[52] N. B. Nill, “A visual model weighted cosine transform for image compres-sion and quality assessment”, IEEE Transactions on Communications, Vol.33, No. 6, pp. 551-557, June 1985.

[53] A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their per-formance”, IEEE Transactions on Communications, Vol. 43, No. 12, pp.2959-2965, Dec. 1995.

[54] T. N. Pappas and R. J. Safranek, “Perceptual criteria for image quality eval-uation”, Handbook of Image and Video Processing, ISBN: 0-12119790-5,Academic Press, May 2000.

[55] K. T. Tan and M. Ghanbari, “A multi-metric objective picture-quality mea-surement model for MPEG video”, IEEE Transactions on Circuits and Sys-tems for Video Technology, Vol. 10, No. 7, pp. 1208-1213, Oct. 2000.

[56] N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans and A. C. Bovik,“Image quality assessment based on a degradation model”, IEEE Transac-tions on Image Processing, Vol. 9, No. 4, pp. 636-650, Apr. 2000.

[57] Z. Wang and A. C. Bovik, “A universal image quality index”, IEEE SignalProcessing Letters, Vol. 9, No. 3, pp. 81-84, Mar. 2002.

[58] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image qualityassessment: from error visibility to structural similarity”, IEEE Transactionson Image Processing, Vol. 13, No. 4, pp.600-612, Apr. 2004.

[59] J. Guo, M. Van Dyke-Lewis and H. R. Myler, “Gabor difference analysis ofdigital video quality”, IEEE Transactions on Broadcasting, Vol. 50, No. 3,pp. 302-311, Sep. 2004.

[60] J. A. Saghri, “Image quality measure based on a human visual systemmodel”, Optical Engineering, Vol. 28, No. 7, pp. 813-818, July 1989.

122 BIBLIOGRAPHY

[61] M. H. Pinson and S. Wolf, “A new standardized method for objectively mea-suring video quality”, IEEE Transactions on Broadcasting, Vol. 50, No. 3,pp. 312-322, Sep. 2004.

[62] G. de Haan and M. Zhao, “Making the best of legacy video on modern dis-plays”, To be presented at 2006 International Symposium, Seminar, and Ex-hibition of International Society of Information Display, Jun. 2005.

[63] M. Zhao, M. Bosma and G. de Haan, “Making the best of legacy video onmodern displays”, Submitted to Journal of International Society of Informa-tion Display.

[64] B. Jahne, Digital Image Processing, Concepts, Algorithms, and ScientificApplications, 4th Edition, Springer-Verlag, ISBN: 3540627243, pp. 248-261, 1997.

[65] T. M. Lehmann, C. Gonner and K. Spitzer, “Survey: Interpolation methodsin medical image processing”, IEEE Transactions on Medical Imaging, Vol.18, No. 11, pp. 1049-1075, Nov. 1999.

[66] R. G. Keys, “Cubic convolution interpolation for digital image processing”,IEEE Transactions Acoustics Speech, Signal Processing, Vol. ASSP-29, No.6, pp. 1153-1160, Dec. 1981.

[67] E. Meijering, “A note on cubic convolution interpolation”, IEEE Transac-tions on Image Processing, Vol. 12, No. 4, pp. 477-479, Apr. 2003.

[68] D. P. Mitchell and A. N. Netravali, “Reconstruction filters in computergraphics”, Computer Graphics, Vol. 22, No. 4, pp. 221-228, Aug. 1988.

[69] M. Unser, “Splines: a perfect fit for signal and image processing”, IEEESignal Processing Magazine, Vol. 16, No. 6, pp. 22-38, Nov. 1999.

[70] E. Maeland, “On the comparison of interpolation methods”, IEEE Transac-tions on Medical Imaging, Vol. 7, No. 3, pp. 213-217, Sep. 1988.

[71] P. Thevenaz, T. Blu and M. Unser, “Interpolation revisited”, IEEE Transac-tions on Medical Imaging, Vol. 19, No. 7, pp. 739-758, Jul. 2000.

[72] P. Thevenaz, T. Blu, M. Unser, “Image interpolation and resampling”, Hand-book of Medical Imaging, Processing and Analysis, I. N. Bankman, Ed.,Academic Press, San Diego Ca, USA, ISBN: 0120777908, pp. 393-420,2000.

BIBLIOGRAPHY 123

[73] H. Greenspan, C. H. Anderson and S. Akber, “Image enhancement by non-linear extrapolation in frequency space”, IEEE Transactions on Image Pro-cessing, Vol. 9, No.6, pp. 1035-1048, Jun. 2000.

[74] T. Kondo, T. Fujiwara, Y. Okumura, and Y. Node, “Picture conversion appa-ratus, picture conversion method, learning apparatus and learning method”,US-patent 6,323,905, Nov. 2001.

[75] C. B. Atkins C. A. Bouman, and J. P. Allebach, “Optimal image scaling usingpixel classification”, 2001 International Conference on Image Processing,Vol. 3, pp. 864-867, 2001.

[76] N. Plaziac, “Image interpolation using neural networks”, IEEE Transactionson Image Processing, Vol. 8, No. 11, pp. 1647-1651, Nov. 1999.

[77] J. Go, K. Sohn and C. Lee, “Interpolation using neural networks for digitalstill cameras”, IEEE Transactions on Consumer Electronics, Vol. 46, No. 3,pp. 610-616, Aug. 2000.

[78] R. Carotenuto, G. Sabbi and M. Pappalardo, “Spatial resolution enhance-ment of ultrasound images using neural networks”, IEEE Transactions onUltrasonics, Ferroelectrrics, and Frequency Control, Vol. 49, No. 8, pp.1039-1049, Aug. 2002.

[79] X. Li and M. T. Orchard, “New edge-directed interpolation”. IEEE Transac-tions on Image Processing, Vol. 10, No 10, Oct. 2001, pp. 1521-1527.

[80] J. Tegenbosch, P. Hofman and M. Bosma, “Improving nonlinear up-scalingby adapting to the local edge orientation”, Proceedings of the SPIE, Vol.5308, pp. 1181-1190, Jan. 2004.

[81] K. Ratakonda and N. Ahuja, “POCS based adaptive image magnification”,Proceeding of 1998 International Conference on Image Processing, Vol. 3,pp. 203-207, Oct. 1998.

[82] B. S. Morse and D. Schwartzwald, “Image magnification using level-set re-construction”, Proceedings of the 2001 IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, CVPR 2001, Vol. 1, pp. I-333-I-340, Dec. 2001.

[83] M. Unser, A. Aldroubi, M. Eden, “The L2-Polynomial Spline pyramid”,IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15,No. 4, pp. 364-379, Apr. 1993.

124 BIBLIOGRAPHY

[84] T. Kondo, and K. Kawaguchi, “Adaptive dynamic range encoding methodand apparatus”, US-patent 5,444,487, Aug. 1995.

[85] T. Kondo, Y. Fujimori, S. Ghosal and J. J. Carrig, “Method and apparatusfor adaptive filter tap selection according to a class”, US-patent 6,192,161,Feb. 2001.

[86] R. O. Duda, P. E. Hart and D. G. Stork, Pattern classification, 2nd Edition,Chichester: Wiley-Interscience, ISBN 0-471-05669-3, pp. 100-107, 2001.

[87] T. K. Moon, “The Expectation-Maximization algorithm”, IEEE Signal Pro-cessing Magazine, Vol. 13, No. 6, pp. 47-60, Nov. 1996.

[88] R. Fletcher, Practical methods of optimization, 2nd Edition, John Wiley &Sons Ltd., ISBN 0-471-49463-1, pp. 100-107, Aug. 2000.

[89] E. B. Bellers and J. Caussyn, “A high definition experience from standarddefinition video”, Proceedings of the SPIE, the International Society for Op-tical Engineering, Vol. 5022, pp. 594-603, 2003.

[90] R. R. Schultz and R. L. Stevenson, “A bayesian approach to image expansionfor improved definition”, IEEE Transactions on Image Processing, Vol. 3,No. 3, pp. 233-242, May 1994.

[91] R. Rajae-Joordens and J. Engel, “Paired comparisons in visual perceptionstudies using small sample sizes”, Displays, Vol. 26, No. 1, pp. 1-7, Jan.2005.

[92] M. R. Banham and A. K. Katsaggelos, “Digital image restoration”, IEEESignal Processing Magazine, Vol. 14, Issue 2, pp. 24-41, Mar. 1997.

[93] M. Saeed, H.R. Rabiee, W. C. Karl and T. Q. Nguyen, “Bayesian restora-tion of noisy images with the EM algorithm”, Proceedings of InternationalConference on Image Processing 1997, Vol. 2, pp. 322-325, Oct. 1997.

[94] R. C. Gonzalez, Digital Image Processing, ISBN: 0-20150803-6, Addison-Wesley, 2004.

[95] W. K. Pratt, Digital Image Processing, ISBN: 0-47101888-0, John Wiley &Sons, pp. 409-412, 1978.

[96] J. A. Leitao, M. Zhao and G. de Haan, “Content-adaptive video up-scalingfor high-definition displays”, SPIE, Proceedings of VCIP, Vol 5022, Jan.2003.

BIBLIOGRAPHY 125

[97] G. de Haan and P. Biezen, “Sub-pixel motion estimation with 3-D recursivesearch block-matching”, Signal Processing: Image Communications 6, pp.229-239, 1994.

[98] M. Zhao and G. de Haan, “Content adaptive video up-scaling”, Proceedingof Annual Conference of the Advanced School for Computing and Imaging2003, ISBN 90-803086-8-4, pp. 151-156, Jun. 2003.

[99] C. B. Atkins, “Classification-based methods in optimal image interpolation”,Ph.D thesis Purdue University, Dec. 1998.

[100] M. Zhao, J. A. Leitao and G. de Haan, “Towards an overview of spatial up-conversion techniques”, Proceedings of the ISCE 2002, pp. E13-E16, Sep.2002.

[101] M. Zhao and G. de Haan, “Intra-field De-interlacing with Advanced Up-scaling Methods”, Proceedings of the 8th International Symposium on Con-sumer Electronics, pp. 315-319, Sep. 2004.

[102] T. Kondo, Y. Tatehira, N. Asakura, M. Uchida, T. Morimura, K. Ando, H.Nakaya, T. Watanabe S. Inoue and W. Niitsuma, “Information signal pro-cessing apparatus, picture information converting apparatus, and picturedisplay apparatus”, US-patent 6,483,545B1, Nov. 2002.

[103] M. Weston, “Interpolating lines of video signals”, US-patent 4,789,893,Dec. 1988.

[104] M. Zhao and G. de Haan, “Content adaptive vertical temporal filtering forde-interlacing”, Proceedings of the 9th International Symposium on Con-sumer Electronics 2005, pp. 69 - 73, Jun. 2005.

[105] M. Zhao, C, Ciuhu and G. de Haan, “Classification based data mixing forhybrid de-interlacing techniques”, Proceedings of the 13th European SignalProcessing Conference, Sep. 2005.

[106] S. Selby and D. Stan, “Motion adaptive de-interlacing method and appa-ratus”, US-patent 6,784,942 B2, Aug. 2004.

[107] J. Salo, Y. Nuevo and V. Hameenaho, “Improving TV picture quality withlinear-median type operations”, IEEE Transactions on Consumer Electron-ics, Vol. 34, no 3, pp. 373-379, Aug. 1988.

[108] J. S. Kown, K. Seo, J. Kim and Y. Kim, “A motion adaptive de-interlacingmethod”, IEEE Transactions on Consumer Electronics, Vol. 38, no 3, pp.145-150, Aug. 1992.

126 BIBLIOGRAPHY

[109] A. Nguyen and E. Dubois, “Spatial-temporal adaptive interlaced-to-progressive conversion”, in Signal Processing of HDTV, IV, E. Dubois andL. Chiariglione, Eds., Elsevier Science Publishers, pp. 749-756, 1993.

[110] J. Kovacevic, R. J. Safranek and E. M. Yeh, “De-interlacing by successiveapproximation”, IEEE Transactions on Image Processing, Vol. 6, no 2, pp.339-344, Feb. 1997.

[111] G. de Haan and R. Lodder, “De-interlacing of video data using motionvectors and edge information”, Digest of the ICCE’02, pp. 70-71, Jun. 2002.

[112] M. Zhao and G. de Haan, “Subjective evaluation of de-interlacing tech-niques”, “Proceedings of SPIE, Image and Video Communications and Pro-cessing”, Jan. 2005.

[113] C. Ciuhu and G. de Haan, “A two-dimensional generalized sampling theoryand application to de-interlacing”, SPIE, Proceedings of VCIP, pp. 700-711,Jan. 2004.

[114] M. Zhao, P.M. Hofman and G. de Haan, “Content-adaptive up-scaling ofchrominance using classification of luminance and chrominance data”, Pro-ceedings of VCIP, SPIE, pp. 721-730, Jan. 2004.

[115] ITU-R R.601 Recommendation.

[116] M. Zhao, R. E. J. Kneepkens, P. M. Hofman and G. de Haan, “ContentAdaptive Image De-blocking”, Proceedings of the 8th International Sympo-sium on Consumer Electronics, pp. 299-304, Sep. 2004.

[117] T. Kondo, J. J. Carrig, Y. Okumura, and W. K. Carey, “Classified adaptiveerror recovery method and apparatus”, US-patent 6,522,785, Feb. 2003.

[118] H. C. Reeve and J. S. Lim,“Reduction of blocking effects in image coding”,Optical Engineering, Vol. 23, no. 1, pp. 34-37, Jan/Feb. 1984.

[119] G. A. Triantafyllidis, M. Varnuska, D. Sampson, D. Tzovaras and M. G.Strintzis, “An efficient algorithm for the enhancement of JPEG-coded im-ages,” Computers & Graphics, No. 27, pp. 529-534, 2003.

[120] T. Meier, K. N. Ngan and G. Crebbin, “Reduction of blocking artifacts inimage and video coding,” IEEE Transactions on Circuits and Systems forVideo Technology, Vol. 9, No. 3, pp. 490-500, Apr. 1999.

BIBLIOGRAPHY 127

[121] C. J. Kuo and R.J. Hsieh, “Adaptive Postprocessor For Block Encoded Im-ages,” IEEE Transactions on Circuits and Systems for Video Technology,Vol. 5, no. 4, pp. 298-304, Aug, 1995.

[122] Y. C. Choung and J. K. Paik, “A fast adaptive image restoration filter forreducing block artifacts in compressed images,” IEEE Transactions on Con-sumer Electronics, Vol. 43, No. 4, pp. 1340-1346, Nov. 1997.

[123] Y. Luo, and R. K. Ward, “Removing the blocking artifacts of block-basedDCT compressed images,” IEEE Transactions on Image Processing, Vol. 12,No. 7, pp. 838-842, Jul. 2003.

[124] S. S. O. Choy, Y. H. Chan and W. C. Siu, “Reduction of block-transformimage coding artifacts by using local statistics of transform coefficients,”IEEE Signal Processing Letters, Vol. 4, No. 1, pp. 5-7, Jan. 1997.

[125] B. Jeon and J. Jeong, “Blocking artifacts reduction in image compressionwith block boundary discontinuity criterion”, IEEE Transactions On Circuitsand Systems for Video Technology, Vol. 8, No. 3, pp. 345-357, Jum. 1998.

[126] H. Paek, R. C. Kim and S. U. Lee, “On the POCS-based postprocessingTechniques to reduce the blocking artefacts in transform coded images”,IEEE Transactions On Circuits and Systems for Video Technology, Vol. 8,No. 3, pp. 358-367, Jun. 1998.

[127] J. Huang, Y. Q. Shi and X. Dai, “Blocking artifact removal based on fre-quency analysis,” IEE Electronics Letters, Vol. 34, No. 24, Nov. 1998.

[128] J. G. Apostolopoulos and N. S. Jayant, “Postprocessing for very low bit-rate video compression,” IEEE Transactions on Image Processing, Vol. 8,No. 8, pp. 1125-1129, Aug. 1999.

[129] Independent JPEG Group, code and some supportingdocumentation for JPEG image compression avaiable at:http://www.ijg.org/files/jpegsrc.v6b.tar.gz.

[130] H. R. Wu, “A new distortion measure for video coding blocking arti-facts”, IEEE International Conference on Communication Technology 1996,ICCT’96, In Proceedings of Picture Coding Symposium. Vol. 2(1996), pp.658-661, May 1996.

[131] E. Lesellier and J. Jung, “Robust wavelet-based arbitrary grid detectionfor MPEG”. 2002. Proceedings of 2002 International Conference on ImageProcessing, Volume 3, pp. III-417 - III-420, Jun. 2002.

128 BIBLIOGRAPHY

[132] R. Muijs and I. Kirenko, “A no-reference blocking artefacts measure foradaptive video processing”, Proceedings of 13th European Signal Process-ing Conference (EUSIPCO), Sep. 2005.

[133] H. Hu, P. M. Hofman and G. de Haan, “Content-adaptive neural filters forimage interpolation using pixel classification”, Proceedings of SPIE, Appli-cations of Neural Networks and Machine Learning in Image Processing IX,Jan. 2005.

Curriculum Vitae

Meng Zhao was born in Shaanxi, P. R.China, on June 6, 1975. He receivedB.Sc. and M.Sc. degrees in Electri-cal Engineering from Xi’an JiaotongUniversity, Xi’an, P. R. China in 1996and 1999, respectively. From 1999to 2001 he was working in the insti-tute of physical electronics at Xi’anJiaotong University as a research as-sistant. He was working towardsthe Ph.D. degree at the departmentof Electrical Engineering at the Eind-hoven University of Technology from2001 to 2005 on the subject of videoenhancement. He joined Philips Re-search Eindhoven in November 2005,where he is currently a Research Scientist in the group Video Processing Systems.

129

Video enhancement using content-adaptive least mean … · Video Enhancement Using Content-adaptive...

Documents

Transcript of Video enhancement using content-adaptive least mean … · Video Enhancement Using Content-adaptive...