Adaptive ﬁltering algorithms for acoustic echo and noise...

Adaptive filtering algorithms for acoustic echoand noise cancellation

Geert Rombouts

25th april 2003

KATHOLIEKE UNIVERSITEIT LEUVENFACULTEIT TOEGEPASTE WETENSCHAPPENDEPARTEMENT ELEKTROTECHNIEKKasteelpark Arenberg 10, 3001 Leuven (Heverlee)

Adaptive filtering algorithms for acoustic echo and noise cancellation

Proefschrift voorgedragen tot het behalen van het doctoraat in de toegepaste weten-schappen door Geert ROMBOUTS

Jury :

Prof. dr. ir. E. Aernoudt, voorzitterProf. dr. ir. M. Moonen, promotorProf. dr. ir. D. Van CompernolleProf. dr. ir. B. De MoorProf. dr. ir. S. Van HuffelProf. dr. ir. P. Sommen (TU Eindhoven)Prof. dr. ir. I. K. Proudler (King’s College, UK)

UDC 681.3*I12:534

April 2003

Copyright Katholieke Universiteit Leuven - Faculteit Toegepaste Wetenschappen

Arenbergkasteel, B-3001 Heverlee

Alle rechten voorbehouden. Niets uit deze uitgave mag vermenigvuldigd en/of open-baar gemaakt worden door middel van druk, fotocopie, microfilm, elektronisch of opwelke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uit-gever.

All rights reserved. No part of this publication may be reproduced in any form byprint, photoprint, microfilm or any other means without written permission from thepublisher.

D/2003/7515/13

ISBN 90-5682-402-3

Voor mijn grootmoeder,Maria Jonckers

4

Abstract

In this thesis, we develop a number of algorithms for acoustic echo and noise cancel-lation.

We derive a fast exact implementation for the affine projection algorithm (APA), andwe also show that when using strong regularization the existing (approximating) fasttechniques exhibit problems.

We develop a number of algorithms for noise cancellation based on optimal filteringtechniques for multi–microphone systems. By using QR–decomposition based tech-niques, a complexity reduction of a factor 50 to 100 is achieved compared to existingimplementations.

Finally, we show that instead of using a cascade of a noise–cancellation system andan echo–cancellation system, it is better to solve the combined problem as a globaloptimization problem . The aforementioned noise reduction techniques can be used tosolve this optimization problem.

5

List of symbols

B(k) : Right hand side in QRD–RLS based noise reduction equation

d(k) : Desired signal of an adaptive filter at timek(d1(k) d2(k) d3(k)

): Desired signals for multiple right hand sides

d(k) : Vector with recent desired signal samples

δ : Regularisation parameter (diagonal loading)

e(k) : Error signal of an adaptive filter

ε{} : Expected value operator

f(k) : Loudspeaker reference signal

G : Givens rotation

gi : Far end room paths

hi : Near end room paths

λ : Forgetting factor (weighting factor)

λn : Forgetting factor during noise–only periods

λs : Forgetting factor during speech+noise–periods

M : Number of channels

µ : Stepsize

n(k) : Noise signal

N : Number of filter taps per channel

Naec : Number of taps in the AEC part in AENC

Q(k) : Orthogonal matrixQ in a QR–decomposition

R(k) : Upper triangular matrixR in a QR–decomposition

Σ : Diagonal matrix in an SVD–decomposition

σi : Singular value

u(k) : Input vector with microphone signals and echo reference

v(k) : Acoustical disturbance signal

6

v(k) : Vector with recent disturbance samples

V (k) : Toeplitz matrix with disturbance signal

w(k) : Filter coefficient vector. A subscript may specify the algorithm used.

W (k) : Matrix of which the columns are filter vectors

x(k) : Input signal

x(k) : Input vector

X(k) : Toeplitz matrix with input signal

Ξ(k) : Input correlation matrix

y(k) : Output of adaptive filter⊗: Convolution symbol

Contents

1 Speech signal enhancement 13

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . .14

1.2.1 Nature of acoustical disturbances . . . . . . . . . . . . . . .14

1.2.2 AEC, reference–based noise reduction . . . . . . . . . . . . .15

1.2.3 ANC, reference–less noise reduction . . . . . . . . . . . . . .19

1.2.4 Combined AEC and ANC . . . . . . . . . . . . . . . . . . .21

1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

1.4 The market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

2 Adaptive filtering algorithms 29

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

2.2 Normalized Least Mean Squares algorithm . . . . . . . . . . . . . .33

2.3 Recursive Least Squares algorithms . . . . . . . . . . . . . . . . . .35

2.3.1 Standard recursive least squares . . . . . . . . . . . . . . . .35

2.3.2 QRD–updating . . . . . . . . . . . . . . . . . . . . . . . . .36

2.3.3 QRD–based RLS algorithm (QRD–RLS) . . . . . . . . . . .38

7

8 CONTENTS

2.3.4 QRD–based least squares lattice (QRD–LSL) . . . . . . . . .42

2.3.5 RLS versus LMS . . . . . . . . . . . . . . . . . . . . . . . .43

2.4 Affine Projection based algorithms . . . . . . . . . . . . . . . . . . .45

2.4.1 The affine projection algorithm . . . . . . . . . . . . . . . .45

2.4.2 APA versus LMS . . . . . . . . . . . . . . . . . . . . . . . .45

2.4.3 The Fast Affine Projection algorithm (FAP) . . . . . . . . . .46

2.5 Geometrical interpretation . . . . . . . . . . . . . . . . . . . . . . .47

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50

3 APA–regularization and Sparse APA for AEC 51

3.1 APA regularization . . . . . . . . . . . . . . . . . . . . . . . . . . .52

3.1.1 Diagonal loading . . . . . . . . . . . . . . . . . . . . . . . .52

3.1.2 Exponential weighting . . . . . . . . . . . . . . . . . . . . .55

3.2 APA with sparse equations . . . . . . . . . . . . . . . . . . . . . . .56

3.3 FAP and the influence of regularization . . . . . . . . . . . . . . . .61

3.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . .63

3.5 Regularization in multichannel AEC . . . . . . . . . . . . . . . . . .64

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69

4 Block Exact APA (BEAPA) for AEC 71

4.1 Block Exact Fast Affine Projection (BEFAP) . . . . . . . . . . . . . .72

4.2 Block Exact APA (BEAPA) . . . . . . . . . . . . . . . . . . . . . . .75

4.2.1 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . .75

4.2.2 Complexity reduction . . . . . . . . . . . . . . . . . . . . . .78

4.2.3 Algorithm specification . . . . . . . . . . . . . . . . . . . . .79

4.3 Sparse Block Exact APA . . . . . . . . . . . . . . . . . . . . . . . .79

4.3.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . .81

CONTENTS 9

4.3.2 Complexity reduction . . . . . . . . . . . . . . . . . . . . . .82

4.3.3 Algorithm specification . . . . . . . . . . . . . . . . . . . . .84

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85

5 QRD–RLS based ANC 87

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88

5.2 Unconstrained optimal filtering based ANC . . . . . . . . . . . . . .89

5.3 QRD–based algorithm . . . . . . . . . . . . . . . . . . . . . . . . .91

5.3.1 Speech+noise – mode . . . . . . . . . . . . . . . . . . . . .93

5.3.2 Noise only–mode. . . . . . . . . . . . . . . . . . . . . . . .95

5.3.3 Residual extraction . . . . . . . . . . . . . . . . . . . . . . .95

5.3.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . .97

5.3.5 Algorithm description . . . . . . . . . . . . . . . . . . . . .97

5.4 Trading off noise reduction vs. signal distortion . . . . . . . . . . . .97

5.4.1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . .99

5.4.2 Speech+noise mode . . . . . . . . . . . . . . . . . . . . . .99

5.4.3 Noise–only mode . . . . . . . . . . . . . . . . . . . . . . . .101

5.5 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101

5.6 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . .103

5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108

6 Fast QRD–LSL–based ANC 109

6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110

6.2 Modified QRD–RLS based algorithm . . . . . . . . . . . . . . . . .112

6.2.1 Speech+noise–mode . . . . . . . . . . . . . . . . . . . . . .112

6.2.2 Noise–only mode . . . . . . . . . . . . . . . . . . . . . . . .115

6.3 QRD–LSL based algorithm . . . . . . . . . . . . . . . . . . . . . .119

10 CONTENTS

6.3.1 Per sample versus per vector classification . . . . . . . . . . .119

6.3.2 LSL–algorithm . . . . . . . . . . . . . . . . . . . . . . . . .124

6.4 Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .125

6.4.1 Transition from speech+noise to noise–only mode . . . . . .125

6.4.2 Transition from a noise–only to a speech+noise–period . . . .131

6.5 Noise reduction vs. signal distortion trade–off . . . . . . . . . . . . .132

6.5.1 Regularization in QRD–LSL based ANC . . . . . . . . . . .132

6.5.2 Regularization using a noise buffer . . . . . . . . . . . . . . .133

6.5.3 Mode–dependent regularization . . . . . . . . . . . . . . . .135

6.6 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .138

6.7 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . .139

6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139

7 Integrated noise and echo cancellation 143

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .143

7.2 Optimal filtering based AENC . . . . . . . . . . . . . . . . . . . . .145

7.3 Data driven approach . . . . . . . . . . . . . . . . . . . . . . . . . .150

7.4 QRD–RLS based algorithm . . . . . . . . . . . . . . . . . . . . . . .151

7.4.1 Speech+noise/echo updates . . . . . . . . . . . . . . . . . .151

7.4.2 Noise/echo–only updates . . . . . . . . . . . . . . . . . . .152

7.5 QRD–LSL algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .154

7.6 Regularized AENC . . . . . . . . . . . . . . . . . . . . . . . . . . .155

7.6.1 Regularization using a noise/echo buffer . . . . . . . . . . . .155

7.6.2 Mode–dependent regularization . . . . . . . . . . . . . . . .156

7.7 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .157

7.8 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164

7.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164

CONTENTS 11

8 Conclusions 165

12 CONTENTS

Chapter 1

Speech signal enhancement

A microphone often picks up acoustical disturbances together with a speaker’s voice(which is the signal of interest). In this work, algorithms will be developed for tech-niques that allow for removing these disturbances from the speech signal before fur-ther processing it.

1.1 Overview

In general, more than one type of disturbances will be present in a microphone signal,each of them requiring a specific enhancement approach. We will mainly focus on twoclasses of speech enhancement techniques, namelyacoustic echo cancellation(AEC)(section 1.2.2) andacoustic noise cancellation(ANC) (section 1.2.3).

For AEC, a whole range of algorithms exists, from computationally cheap to expen-sive, with of course a corresponding performance. We will focus on one of the ’inter-mediate’ types of algorithms, of which the performance and complexity can be tuneddepending on the available computational power. We will describe some methods toincrease noise robustness, we will show how existing fast implementations fail whentheir assumptions are violated, and we will derive a fast implementation which doesnot require any assumptions.

For ANC, a class of promising state of the art techniques exists of which the char-acteristics could be complementary to the features of computationally cheaper (andcommercially available) techniques. Existing algorithms for these techniques havea high numerical complexity, and hence are not suited for real time implementation.This observation motivates our work in the field of acoustic noise cancellation, and wedescribe a number of algorithms that are (several orders of magnitude) cheaper than

13

14 CHAPTER 1. SPEECH SIGNAL ENHANCEMENT

existing implementations, and hence allow for real time implementation.

Finally we will show that considering the combined problem of acoustic echo andnoise cancellation as a global optimization problem leads to better results than usingtraditional cascaded schemes. The techniques which we use for ANC can easily bemodified to incorporate AEC.

The outline of this first chapter is as follows. After a problem statement in section1.2, we will describe a number of applications in which acoustic echo– and noise can-celling techniques prove useful in section 1.3. In section 1.4, an overview of commer-cially available applications in this field is given. In section 1.5 our own contributionsare summarized. Section 1.6 gives an outline of the remains of the thesis.

1.2 Problem statement

1.2.1 Nature of acoustical disturbances

In many applications involving speech communication, it is difficult (expensive) toplace microphones closely to the speakers. The microphone amplification then has tobe large due to the large distance to the speech source. As a result, more environmentalnoise will be picked up than in the case where the microphones would be close to thespeech source.

For some of these disturbances, areferencesignal may be available. For example aradio may be playing in the background while someone is making a telephone call.The electrical signal that is fed to the radio’s loudspeaker can be used as a referencesignal for the radio sound reaching the telephone’s microphone. We will call thetechniques that rely on the presence of a reference signal ’acoustic echo cancellationtechniques’ (AEC), the reason for this name will become clear below.

For other types of disturbances,no reference signalis available. Examples of suchdisturbances are the noise of a computer fan, people who are babbling in the roomwhere someone is using a telephone, car engine noise, ... Techniques that performdisturbance reduction where no reference signal is available will be called ’acousticnoise cancellation techniques’ (ANC) in this text.

In some situations the above two noise reduction techniques should be combined witha third enhancement technique, namely dereverberation. Each acoustical environmenthas an impulse response, which results in a spectral coloration or reverberation ofsounds that are recorded in that room. This reverberation is due to reflections ofthe sound against walls and objects, and hence has specific spatial characteristics,other than those of the original signal. The human auditory system deals with thiseffectively because it has the ability to concentrate on sounds coming from a certaindirection, using information from both ears. If for example one would hear a signal

1.2. PROBLEM STATEMENT 15

recorded by only one microphone in a reverberant room, speech signals may easilybecome unintelligible. Of course also voice recognition systems that are trained onnon–reverberated speech will have difficulties handling signals that have been filteredby the room impulse response, and hence dereverberation is necessary.

In this thesis, we will concentrate on algorithms for both classes of noise reduction(noise reduction with (AEC) and without (ANC) a reference signal). Dereverberationwill not be treated here (we refer to [32, 40, 3] for dereverberation techniques).

1.2.2 AEC, reference–based noise reduction

The most typical application of noise reduction in case a reference signal is available,is acoustic echo cancellation (AEC). As mentioned before, we will use the term AECto refer to the technique itself, even though the disturbance which is reduced is notalways strictly an ’echo’.

Single channel techniques. A teleconferencing setup consists of two conferencerooms (seeFigure 1.1) in both of which microphones and loudspeakers are installed.

Near end room

SpeechNear end

Far end room

SpeechFar end

AEC

Figure 1.1: Acoustic echo cancellation. The loudspeaker signal in the near end room is pickedup by the microphone, and would be sent back to the far end room without an echo canceller,where the far end speaker would hear his own voice again (delayed by the communicationsetup).

Sound picked up by the microphones in one room (called the ’far end speech’ and the’far end room’) is reproduced by the loudspeakers in the other (near end) room. Thetask of an ’echo canceller’ is to avoid that the portion of the far–end speech signal,which is picked up by the microphones in the near end room, is sent back to the farend. Hearing his own delayed voice will be very annoying to the far end speaker.


A similar example is voice control of a CD–player. The music itself then can beconsidered a disturbance (echo) to the voice control system.

The loudspeaker signal in both cases is ’filtered’ by the room impulse response. Thisimpulse response is the result of the sound being reflected and attenuated (in a fre-quency dependent way) by the walls and by objects in the room. Due to the nature ofthis process, the room acoustics can be modeled by a finite impulse response (FIR) fil-ter. Nonlinear effects (mostly by loudspeaker imperfections) are not considered here.

In an acoustic echo cancellation algorithm, a model of the room impulse response isidentified. Since the conditions in the room may vary continuously (people movingaround being an obvious example), the model needs to be updated continuously. Thisis done by means of adaptive filtering techniques. In the situation inFigure 1.2the farend signalx(k) is filtered by the room impulse response, and then picked up by a mi-crophone, together with the desired speech signal of the near end speaker. We considerdigital signal processing techniques, hence A/D converted signals, i.e. discrete–timesignals and systems. At the same time, the loudspeaker signalx(k) is filtered by amodelw(k) of the room impulse responsewreal, and subtracted from the microphonesignald(k) :

e(k) = d(k)−wT (k)x(k).

During periods where the near end speaker is silent, the error (residual) signale(k)may be used to updatew(k), but when the near end speaker is talking, this signalwould disturb the adaptation process. We assume that the room characteristics donot change too much during the periods in which near end speech is present, andthe adaptation is frozen in these periods by a control algorithm in order to solve thisproblem.

[x(k) ... x(k−N+1)] x(k) =

d(k)e(k) +

k

FarEnd

−+

w Signal

Near end room

SpeechNear end

Figure 1.2: Echo canceller : typical situation.

In the acoustic echo canceller scheme, the adaptive filtering structure (see alsoFigure


2.1) is easily recognized. Theinput signal to this adaptive filter is the loudspeakersignalx(k) (the reference signal), thedesiredsignal for the filter is the microphonesignald(k), and theerror signale(k) of the adaptive filter is used as the output signalfor the AEC scheme.

In practice, the length of the room acoustics (and by consequence also the impulseresponse length of the modelw(k)) can easily be 2000 filter taps (even for a rather lowsampling frequency of 8 kHz). This is the reason why people often use the celebratedand computationally cheap Normalized Least Mean Squares (NLMS) adaptive filter(see section 2.2), or even cheaper frequency domain derivatives of it for adaptingw(k)[19, 18]. The disadvantage of NLMS is its often poor performance for non–white inputsignals (like speech).

While NLMS is a cheap algorithm, the Recursive Least Squares (RLS) algorithm(section 2.3) has a higher performance, and fast variants are indeed used for acousticecho cancellation [9, 8, 20, 21]. However, due to its complexity, efforts have been doneto find algorithms that combine the low complexity of NLMS with the performanceof RLS. Most notably are the Fast Newton Transversal filter (FNTF) [42] and fastvariants [26] of the Affine Projection Adaptive (APA) [43] filter (see section 2.4). Inthis thesis, we derive a number of contributions to the field of APA–filtering.

The performance advantage offered by these filters compared to NLMS, is due to a’prewhitening’ structure that removes the correlation from the reference signal. Aswill be shown later, further signal processing may require multiple microphones (amicrophone array) that pick up the sound in the room. The echo canceller structurethen obviously has to be repeated for each of the microphones, as shown inFigure 1.3.The prewhitening stage, however, can be ’shared’ among the different microphones in

e (k)1

e (k)2

d (k)1

d (k)2

FarEnd

−

+

−

+ −

−SpeechNear end

X (k)

Figure 1.3: Multi–microphone acoustic echo canceller. The single channel setup can simply berepeated

a multi–microphone setup.


An acoustic echo canceller never consists of the adaptive filter alone, but always re-quires some control logic. The adaptive filter is in practice never updated when nearend speech is present, and only updated if there is far end signal available. The de-cision can e.g. be based upon measurements of the correlation of the residual signale(k) with the loudspeaker signal. In this text, however, this control device will notbe considered. All experiments have been done with a ’perfect’ control device, i.e.speech periods have been marked manually.

In the acoustic echo canceller context, it is important that the decision device neverallows the filter to adapt during a double–talk period (when both far end and near endspeaker active), since then the adaptation would be disturbed by the near end signal,and the coefficients would converge to wrong values. The other situation is less prob-lematic : when a period in which only far end talk is present is labeled as double–talk,the echo canceller would not adapt. If this would happen often, the overall conver-gence would just be somewhat slower.

We refer to the literature [10, 25, 31, 45] from which a suitable implementation canbe picked.

Multichannel techniques. Multi–channel techniques for acoustic echo cancellation[4, 28, 2, 41] should not be confused with multi–microphone techniques. In a multi–microphone–setup, all adaptive filters have the same input signal (the mono loud-speaker signal), while in a multichannel–setup, multiple loudspeakers (or referencesignals) are used, seeFigure 1.4. An application example is a stereo setup used for

d (k)1

FarEnd

−

+ +−

SpeechNear end

X (k)1

X (k)2

Figure 1.4: Multi–channel acoustic echo canceller. The fundamental problem of stereophonicAEC tends to occur in this case, and decorrelation of the loudspeaker signals is necessary toachieve good performance


teleconferencing in order to provide the listener with a comfortable spatial impres-sion. While the extension of the single channel techniques to multiple microphones istrivial, multi–channel AEC on the other hand is highly non–trivial.

A specific problem with multichannel echo cancellation is the non–uniqueness [4, 20,5, 24, 2] of the solution. This is sometimes referred to as the ’fundamental problem’of stereophonic echo cancellation. Since all loudspeaker signals stem from the samesound source in the far end room, their joint correlation matrix may be rank–deficient.As a result, there is not a single solution for a multi–channel echo canceller, but asolution space. The echo canceller may find a solution for which the output signalis zero in the absence of near end speech, while the filter is not converged to thereal room impulse response (see section 3.5). As a result, the slightest change in thefar end room impulse response, may destroy the successful echo cancellation. Formultichannel echo cancellation both a change in the transmitting– and in the receivingroom will have this effect.

Even if this situation would not occur, still the problem becomes ill conditioned if bothfar–end signals are correlated. This often results in a large sensitivity to noise that maybe present ind(k), for example due to continuously present background noise in thenear end room.

This also indicates that proper measures should be used for the evaluation of differentalgorithms. One should not only look to the energy in the residual echo signal, becauseit can indeed be small or zero while the filter has not yet converged to the real echopath. For simulated environments, the room acoustics path is known, and hence thedistance between this path and the echo canceller path can be plotted. While this isonly feasible in artificial setups, it is the only ’correct’ way to evaluate the convergencebehaviour of an echo canceller, especially in the multi channel case.

1.2.3 ANC, reference–less noise reduction

The signal picked up by the microphone will in realistic situations often also containdisturbance components for which no reference signal is available. Also for this case,multiple approaches to noise cancellation exist.

Single channel techniques A microphone picks up a signal of interest, togetherwith noise. Single microphone approaches to noise cancellation will try to estimatethe spectral content of the noise (during periods where the signal of interest is absent),and —assuming that the noise signal is stationary— compensate for this spectrum inthe spectrum of the microphone input signal whenever the signal of interest is present.The technique is commonly called ’spectral subtraction’ [16, 17]. Single channel ap-proaches are known to perform poorly when the noise source is non–stationary, andwhen the spectral content of the noise source and of the signal of interest are similar.


Multi–channel techniques In multi–channel acoustic noise cancellation, a micro-phone array is used instead of a single microphone to pick up the signal. Apart fromthe spectral information also the spatial information can be taken into account. Dif-ferent techniques that exploit this spatial information exist.

In filter– and sum beamforming[60], a static beam is formed into the (assumed known)direction of the (speech) source of interest (also called thedirection of arrival). Whilefilter–and sum beamforming is about the cheapest multi–channel noise suppressionmethod, deviations in microphone characteristics or microphone placement will havea large influence on the performance, Since signals coming from other directions thanthe direction of arrival are attenuated, beamforming also provides a form of derever-beration of the signal.

Generalized sidelobe cancellers(Griffiths–Jim beamforming) [60] aim at reducingthe response into directions of noise sources, with as a constraint a distortionlessresponse towards the direction of arrival. The direction of arrival is required priorknowledge. Avoice activity detectoris required in order to discriminate betweennoise– and speech+noise periods, such that the response towards the noise sourcescan be adapted during noise–only periods. Griffiths–Jim beamforming effectively is aform of constrained optimal filtering.

A third method isunconstrained optimal filtering[12][13]. Here a MMSE–optimalestimate of the signal of interest can be obtained, while no prior knowledge is requiredabout geometry. A voice activity detector again is necessary and crucial to properoperation. The distortionless constraint towards the direction of arrival is not imposedhere. A parameter can be used to trade off signal distortion against noise reduction.

The contributions of this thesis in the field of acoustic noise reduction will be fo-cused on this last method (chapters 5 and 6). Existing algorithms for unconstrainedoptimal filtering for acoustic noise reduction are highly complex compared to bothother (beamforming–based) methods, which implies that they are not suited for realtime implementation. On the other hand, they are quite promising for certain ap-plications, since they have different features than the beamforming–based methods :filter–and sum beamformers are well suited (and even optimal) for enhancing a lo-calized speech source in a diffuse noise field, and generalized sidelobe cancellers areable to adaptively eliminate directional noise sources, but both of them rely upon apriori information about the geometry of the sensor array, the sensor characteristics,and the direction of arrival of the signal of interest. This means that the unconstrainedoptimal filtering technique is more robust against microphone placement and micro-phone characteristics, and that the direction of arrival is not required to be known apriori. Another advantage is that they can easily be used for combined AEC/ANC, aswe show in chapter 7.

1.3. APPLICATIONS 21

1.2.4 Combined AEC and ANC

In many applications, techniques to cancel noise for which a reference signal exists(AEC) are often combined with techniques that do not use a reference signal (ANC),since both types of disturbances are often present. The order in which both signalprocessing blocks are applied to the signals is very important. InFigure 1.5, bothoptions are shown. The upper scheme will first apply multichannel noise cancellation(no reference signal), and then echo cancellation. The advantage is that, since mostreferenceless noise reduction schemes make use of multiple microphones, only oneecho canceller is needed. Moreover, in addition to the echo path, the echo cancellerwill have to model the variations in the noise cancellation block. The lower schemein Figure 1.5 requires an echo canceller for each microphone, and these need to berobust against the noise that is still present in their input signals. In spite of the highercomplexity of the second scheme, it is most often used because of its better perfor-mance compared to the first scheme. Apart from these combination schemes, a lot ofdifferent combination schemes are described in literature [1, 7, 37, 38, 6].

In this thesis, we will show that considering the combined problem as a global op-timization problem leads to a better performance. We will describe how the uncon-strained filtering techniques derived in the chapters about noise cancellation, can eas-ily be adopted for solving the combined acoustic noise and echo cancellation problem.For echo paths of reasonable length, real time implementation of these techniques ispossible with present day processors.

1.3 Applications

Tele– and videoconferencing As a first application example we consider telecon-ferencing. A number of people is meeting in two rooms. In each of these rooms,a microphone array and a loudspeaker are present. The loudspeaker reproduces thesound of the speakers in the other meeting room. The system can be expanded to havemore loudspeakers, in order to give the conference participants a spatial impression ofthe reproduced sound.

If no echo–cancellationis applied, echo’s and howling can occur. Echo paths can beas long as 200 msec, while a sampling speed of about 16 kHz is required in orderto have a high enough speech quality, resulting in echo path impulse responses of upto 3000 taps. On the other hand, people talking in the background, a computer fan,air conditioning are all examples of disturbances that should be handled by means ofnoise cancellation.

Often the echo–cancellers in this type of applications could profit from algorithmsas described in chapters 3 and 4, of which the convergence is less dependent on theinput signal statistics than what is the case for NLMS. Also algorithms providing the


Desiredsignal

Desiredsignal

Noise

Noise

From far end

To far end

To far end

From far end

Noise reduction

Noise reduction

Acoustic echo

canceller

Acoustic echo

canceller

Interference

Speaker

Interference

Speaker

Figure 1.5: Two methods to combine echo– and noise cancellation.

1.3. APPLICATIONS 23

’combined’ ANC and AEC–approach in chapter 7 would increase the performance ofa speech enhancement system for tele– or videoconferencing. Note though that forlarger auditoria the required number of filter taps is huge, and that complexity of thealgorithms should be taken into account.

Car applications In car applications such as voice control of a mobile phone orsound system, or hands free telephony, noise appears to be the most important prob-lem. For engine noise or radio sound, a reference is available or can be derived, whilewind– and tyre noise, passengers talking to each other, ... are disturbances without areference signal.

Acoustic paths in cars are much shorter (up to 256 impulse response taps), as com-pared to typical conference room impulse responses. Also in this case both ANC andAEC are required. Because of the limited length of the echo path, the algorithms inchapter 7 certainly become an option.

Voice control Voice control technology can be found in consumer products, butalso finds applications in making technology accessible for disabled people. Speechrecognition systems are often trained with clean speech (without noise), because a lotof clean speech databases are available, although also databases are set up for specificnoise situations (e.g. speech recognition in cars).

A specific problem is voice control of a surround sound audio system, where a multi-channel echo–canceller is required in order to suppress the signal stemming from thefive speakers after being picked up by the microphone. In this case, reference sig-nals are available, and algorithm with a better performance for coloured signals thanNLMS are required (chapters 3 and 4).

Hearing aids Acoustic noise cancellation techniques are applied in the field of hear-ing aids and cochlear implants. It is known that merely amplifying a signal doesnot contribute to increasing the speech intelligibility, when ’background noises’ arepresent. Noise cancellation techniques can alleviate this problem, and at present 2–microphone hearing aids with noise cancellation technology are commercially avail-able.

The space (and hence the computational power) in a behind–the–ear device is limited,so most of the time cheap (adaptive beamforming) algorithms are used at present, butalso these devices could benefit from the techniques in chapters 5 and 6.

Selective volume control Techniques that are developed for acoustic echo can-celling, can also be applied in other fields. An example is a ’selective volume control’


device, which is used in e.g. discotheques to turn down the sound volume automat-ically if it exceeds the legal norms. In order to avoid that loud noises made by thecrowd would result in lowering the amplifier’s volume, an adaptive filter is used toretain only the sound from the loudspeakers in the signal that is picked up by a mea-surement microphone before the sound pressure level is calculated.

A similar system is a volume control application in e.g. a train station, where thevolume is automatically turned up if a train passes, or if the crowd is noisy, but whichis not sensitive to the sound of the public address system’s own loudspeakers.

This kind of applications is even more demanding concerning filter lengths than ordi-nary echo cancelling in rooms. The legal norms about the maximum sound pressurelevel are given per frequency on the full audible frequency spectrum. This means thata sampling rate of 44 kHz is required. So the required filter length is more than 10000filter taps.

On the other hand, calculations could be done off–line instead of in real time, and themusic signals can be largely correlated. This again requires ’intermediate’ algorithmsbetween NLMS (convergence depends on input signal statistics) and RLS.

Recording A recording of e.g. an orchestra or a theatre play imposes different con-straints. Microphones will not be placed in an array with an a priori known geometry,but they will be spread over the whole stage on which the performance takes place.The signal of interest does not originate from one specific direction. In dedicated the-atres, the noise will mainly consist of the audience, but also scenario’s with noise ofair conditioning or heating systems (recordings in churches) are possible.

1.4 The market

A large number of companies are currently offering products and services that arelinked with the above–mentioned speech enhancement techniques. While in highend devices for auditorium teleconferencing (price about 5000 Euro) it is difficultto gather information on the type of algorithms used, data sheets about desktop con-ferencing consumer products often indicate that computationally cheap NLMS–likeor frequency domain derivatives of NLMS are used.

Examples of companies are Spirit Corporation (http://www.spiritcorp.com) , provid-ing code libraries for acoustic echo and noise cancellation optimized for differenttypes of DSP processors, and for the Microsoft Windows operating system. Poly-com (http://www.polycom.com) provides ’desktop’ teleconferencing solutions, andthe performance data they publish (a convergence time of 10–40 sec) indicate the useof cheap adaptive filters. Larger systems are e.g. built by Clearone (http://www.clearone.com).

1.5. CONTRIBUTIONS 25

Another application is audio enhancement. Both the application CoolEdit (from Syn-trillium, http://www.syntrillium.com) and SoundForge (from Sonic Foundry. http://www.sonicfoundry.com)contain signal enhancement modules providing single channel spectral subtractiontechniques.

Commercial voice command applications often use proprietary techniques based uponbeamforming (e.g. with a microphone array on top of a computer monitor (AndreaElectronics, http://www.andreaelectronics.com)). In hearing aids, the commercialstate of the art devices use two microphones and Griffiths–Jim beamforming basednoise cancellation schemes.

The importance of speech enhancement technology in the current market is also shownby the fact that in the most recent version of Microsoft Windows XP noise–cancellationand echo–cancellation features are built in the operating system (http://www.microsoft.com).It is clear that in the consumer telecommunications market, the demand for handfreemobile telephony — a direct application of the techniques described here — is high,because of security (and legal) issues concerning use of a mobile phone while driving.As an example : in 2002, the worldwide sales of mobile phones has risen with 6%,423,4 million devices were sold worldwide (http://www.tijdnet.be/archief).

1.5 Contributions

From section 1.4, one can see that the commercially available applications are allbased upon ’low complexity’ algorithms, obviously due to real–time and cost con-straints. For acoustic echo cancelling often more performant algorithms than NLMS–based ones begin to be used, certainly in ’high end’ applications. The performanceand the complexity of the APA–based algorithms we have studied in this work can be’tuned’ to use the available computational power. We provide some an alternative forobtaining noise–robustness and derive an efficient frequency–domain based algorithm,which does not contain any approximations (contrarily to existing implementations)

One notices that the computational complexity of the newer (unconstrained adaptivefiltering) algorithms for noise reduction prohibits their commercial application. Ofcourse, with the rise of computational power over the years, in a decade from nowthese algorithms will also be applied, even in consumer electronics. In this text wewill focus our attention to some of these new (’academic’) techniques, and we willderive new algorithms that have a (sometimes dramatically) reduced complexity com-pared to their predecessors, while keeping their performance at the same level. Thisshould allow these more performant techniques to be considered for use in commercialapplications in a much shorter time frame.

The contributions to the field of speech enhancement which are treated in this text,can be subdivided into three major categories.


• The first category consists of signal enhancement techniques for acoustic noisereduction when a reference signal is available (AEC). The results consists of al-ternative regularization techniques for improving the noise robustness of acous-tic echo cancellers based upon the affine projection algorithm (see further on inthis text) , and the Block Exact Affine Projection Algorithm (BEAPA), which isa fast frequency domain version of the affine projection algorithm with roughlythe same complexity as BEFAP (see further on in this text), but without theneed for the assumptions that need to be made for BEFAP. The results hereofare published in the conference papers [50, 47, 48, 49] and in the journal paper[55]. They will be treated in chapters 3 and 4.

• The second category focusses on MMSE–based optimal filtering for acousticnoise reduction in case no reference signal is available (ANC). We proposed aQRD–RLS and a QRD—LSL based approach to unconstrained optimal filteringthat achieves the same performance as existing (GSVD–based) techniques, butwith a complexity reduction of respectively one and two orders of magnitude.These results have been published in the papers [54, 52] and [56, 51]. We willtreat them in chapters5 and6.

• Finally, combination of noise– and echo cancelling is treated in chapter7, andthis result is in our paper [53].

1.6 Outline

7. Integrated noise− and echo cancellation

1. Speech signal enhancement

Echo and noise cancellation

8. Conclusions

2. Adaptive filtering algorithmsIntroduction

Acoustic echo cancellation Acoustic noise cancellation

4. BEAPA for AEC

3. APA regularization and Sparse APA

6. Fast QRD−LSL based ANC

5. QRD−RLS based ANC

Figure 1.6: Outline of the text

The outline of the text is depicted inFigure 1.6. Chapter 2 contains additional intro-ductory material. Relevant adaptive filtering algorithms are reviewed, and the conceptof signal flow graphs is explained briefly.

1.6. OUTLINE 27

Chapter 3 and 4 of the thesis focus on acoustic echo cancellation. More specifi-cally in chapter 3 the importance of noise robustness in acoustic echo cancellers isreviewed, and some techniques are derived to implement this into fast affine projec-tion algorithms. We also show that traditional fast implementations exhibit problemswhen strong regularization is applied. Inchapter 4 a frequency domain block exactaffine projection algorithm is derived which does not contain the approximations thatare present in traditional fast affine projection schemes, while it has a complexity thatis comparable to these schemes.

Chapter 5 and 6 focus on acoustic noise cancellation techniques. Inchapter 5 anunconstrained optimal filtering based noise cancellation algorithm is derived. Thisalgorithm is based upon the QR–decomposition (see section 2.3 for a definition). Itobtains the same performance as existing algorithms for unconstrained optimal fil-tering, while its complexity is an order of magnitude lower.Chapter 6 builds uponthe previous one to derive an even cheaper fast QRD–based algorithm while againperformance is maintained at the same level.

In chapter 7we discuss the combination of AEC and ANC, and show the performanceadvantage of using an integrated approach to acoustic noise and echo cancellationcompared to traditional combination schemes.

Chapter 8 finally, contains the overall conclusions of this work, as well as suggestionsfor further research.

Chapter 2

Adaptive filtering algorithms

Adaptive filters will play an important role in this text. Therefore, we will devotea chapter to giving an overview of commonly used adaptive filtering techniques. Insection 2.1 the general adaptive filtering setup and problem will be reviewed. The nor-malized least mean squares algorithm (NLMS) and the recursive least squares (RLS)algorithms will be reviewed in sections 2.2 and 2.3. An intermediate class of al-gorithms, both complexity– and performance–wise, can be derived from the affineprojection algorithm (APA). APA will be introduced in section 2.4. In each sectioncomplete algorithm descriptions of these algorithms will be given for reference.

Later on in this text, APA will be the main topic in chapters 3 and 4, where it will beused for acoustic echo cancellation. Chapters 5, 6 and 7 will mainly be based uponalgorithms derived from RLS and fast versions thereof.

2.1 Introduction

In this introduction we will give a short overview of the data representations that willbe used in the remains of the chapter and the thesis. We will use both adaptive filteringconfigurations with single and multiple input and output channels.

A single input, single output adaptive filtering setup is shown inFigure 2.1. An inputsignalx(k) is filtered by a filterw(k). The output from this filtering operation issubtracted from a ’desired signal’d(k) and the resulting ’error signal’e(k) is used toupdate the filter coefficients. The signals are assumed to be zero mean, andd(k) isa linearly filtered version ofx(k) with zero mean noise added that is assumed to beindependent ofx(k).

29

30 CHAPTER 2. ADAPTIVE FILTERING ALGORITHMS

x

y

+d e

w

−

Figure 2.1: Adaptive filter. The filter coefficientsw are adapted such thate is minimized.

All of the algorithms are based upon an overdetermined system of linear equations

X(k)w(k) =(d(k) d(k − 1) . . .

)T, (2.1)

where

X(k) =

xT (k)

xT (k − 1)xT (k − 2)

...

,

x(k) =(x(k) x(k − 1) . . . x(k −N + 1)

)T,

which will be solved in the least squares sense, i.e. based on a LS criterion

minwLS(k)

∥∥∥( d(k) d(k − 1) . . .)T −X(k)w(k)

∥∥∥2

. (2.2)

The LS solution is given as

wLS(k) = (XT (k)X(k))−1XT (k)(d(k) d(k − 1) . . .

)T.

We will also use the MMSE criterion

minwMMSE(k)

ε{(d(k)− xT (k)w(k))2}, (2.3)

whereε{�} is the expectation operator. The MMSE solution is given as

wMMSE(k) = (ε{x(k)xT (k)})−1ε{x(k)d(k)}.

In each time stepk, a new equation will be added to (2.1), so at each time instanta new value forw(k) can be calculated. Since adaptivity is required in a changing

2.1. INTRODUCTION 31

environment, algorithms will be designed to ’forget’ old information. This can beachieved by exponentially weighting the rows ofX(k) as it is usually done in theRLS algorithm, i.e.

X(k) =

xT (k)

λxT (k − 1)λ2xT (k − 2)

...

,

or by only using theP most recent input vectors inX(k) :

X(k) =

xT (k)

xT (k − 1)...

xT (k − P + 1)

.

W W W

x x x

1 32

3

−+

+d e

1 2

Figure 2.2: A multi–channel adaptive filter. The input vectorx(k) consists of the con-catenation of the channel input vectorsxi(k), and similarly the filter vectorw(k) =(

wT1 (k) wT

2 (k) wT3 (k)

)T.

In this text, we will also consider multichannel (multiple input) adaptive filters (see


Figure 2.2), where the input vectorsx(k) will be defined as

x(k) =

x1(k)...

x1(k −N + 1)−−−−−x2(k)

x2(k − 1)...

xM (k −N + 1)

. (2.4)

Similarly w(k) is then defined as a stacked version of the filter vectorswi(k) fori = 1...M :

w(k) =

w1(k)w2(k)

...wM (k)

.

HereM will be the number of input channels of the adaptive filter, andN is thenumber of filter taps per input channel. Sometimes an alternative definition for theinput vector will be used in which the input signals will be interlaced :

x(k) =

x1(k)...

xM (k)−−−−−x1(k − 1)x2(k − 1)

...xM (k −N + 1)

. (2.5)

As a result, also the corresponding filter taps will be interlaced.

Considering setups with multiple microphones, we will be solving least squares mini-mization problems that share the same left–hand side matrixX(k), but have differentright hand side vectors. They can be solved concurrently as onemultiple–right handside least squares problem. In this case the columns of a matrixW (k) will be solu-tions to LS–problems with the columns of a matrixD(k) as their respective right hand

2.2. NORMALIZED LEAST MEAN SQUARES ALGORITHM 33

sides. A system of equations analoguous to (2.1) can be written down :

X(k)W (k) = D(k), (2.6)

X(k) =

xT (k)xT (k − 1)

...

,

x(k) =(x(k) x(k − 1) . . . x(k −N + 1)

)T,

D(k) =

dT (k)

dT (k − 1)...

dT (k − 1)

,

d(k) =(d1(k) d2(k) . . .

)T.

Note the structure ofd(k) of which the components represent the different desiredsignal samples at timek. The least squares solution can be found from

minW (k)

‖D(k)−X(k)W (k)‖ . (2.7)

The corresponding MMSE criterion is

minW (k)

∥∥ε{d(k)− xT (k)W (k)}∥∥ .

In the next sections we will give an overview of the different adaptive filtering tech-niques that will be used in this thesis.

2.2 Normalized Least Mean Squares algorithm

One approach to solving (2.1) is the Least Mean Squares (LMS) algorithm. Thisalgorithm is in fact a stochastic gradient descend method applied to the underlyingMMSE criterium (2.3). The update equations for the filter coefficient vectorwlms(k)are

e(k + 1) = d(k + 1)− xT (k + 1)wlms(k),wlms(k + 1) = wlms(k) + µx(k + 1)e(k + 1), (2.8)

y(k + 1) = xT (k + 1)wlms(k + 1). (2.9)

Hereµ is a step size parameter. A full description is shown inAlgorithm 1 .

In order to make the convergence behaviour independent of the input energy, often theNormalizedLeast Mean Squares (NLMS) algorithm is used, where the filter vector


Algorithm 1 LMS algorithmwlms = 0; y = 0;Loop (new input vector x and desired sig-nal d in each step):

e = d− ywlms = wlms + µxey = xTwlms

update is divided by the input energy. The algorithm is given by

e(k + 1) = d(k + 1)− xT (k + 1)wnlms(k),

wnlms(k + 1) = wnlms(k) + µx(k + 1)e(k + 1)

xT (k + 1)x(k + 1) + δ. (2.10)

Here theδ is a ’regularization term’. In NLMS it guarantees that the denominatorcan not become zero, but it also provides noise robustness (see section 2.4). Similarequations are obtained for the definitions (2.6). It can be shown that, forµ = 1 andδ = 0, the a posteriori error for NLMS,

epost(k + 1) = d(k + 1)− xT (k + 1)wnlms(k + 1),

is zero, which means that for the NLMS–algorithm the systems of equations (2.1) or(2.6) are effectively reduced to one single equation, namely the most recent one, andthat this equation is solved exactly based on a minimum–norm weight vector adap-tation. NLMS is a computationally cheap algorithm with a complexity of4N flopsper sample1, but it suffers from a slow convergence when non–white input signals areapplied. In practice often frequency domain variants of this algorithm are used in or-der to obtain an even lower complexity. An algorithm description of the time domainNLMS algorithm is given inAlgorithm 2. .

Algorithm 2 NLMS algorithmwnlms = 0; y = 0Loop (new input vector x and desired sig-nal d in each step):

e = d− ywnlms = wnlms + µ xe

xTx+δ

y = xTwnlms

We also note here that if the LMS–algorithm is to be calculated for multiple desired(right hand side) signals, the whole algorithm simply has to be repeated for each de-sired signal. In the NLMS–algorithm the (small) cost of calculating the input energy

1For complexity calculations in this text we will count an addition and a multiplication for 2 separatefloating point operations

2.3. RECURSIVE LEAST SQUARES ALGORITHMS 35

can be shared :

eT (k + 1) = dT (k + 1)− xT (k + 1)Wnlms(k),

Wnlms(k + 1) = Wnlms(k) + µx(k + 1)eT (k + 1)

xT (k + 1)x(k + 1) + δ. (2.11)

2.3 Recursive Least Squares algorithms

In this section, we will first review the standard recursive least squares algorithm, thenthe numerically more stable (and thus preferrable) QRD–based RLS algorithm, andfinally the fast QRD–Least Squares Lattice algorithm.

2.3.1 Standard recursive least squares

Instead of applying a stochastic gradient descent method (NLMS), the recursive leastsquares (RLS) algorithm solves system (2.6) or (2.1) in a least squares (LS) sense,i.e. based on the LS criterion (2.2), and does so by applying recursive updates to thesolution calculated in the previous time step (cfr. newton-iterations on a quadraticerror surface where the hessian reduces to a correlation matrix). For exponentiallyweighted RLS, the update equations are

erls(k + 1) = d(k + 1)− xT (k + 1)wrls(k),

Ξ−1(k + 1) =1λ2

Ξ−1(k)−1λ2 Ξ−1(k)x(k + 1)xT (k + 1) 1

λ2 Ξ−1(k)1 + 1

λ2 xT (k + 1)Ξ−1(k)x(k + 1),

wrls(k + 1) = wrls(k) + Ξ−1(k + 1)x(k + 1)erls(k + 1), (2.12)

whereΞ−1(k) is the inverse correlation matrix (Ξ(k) = XT (k)X(k)). The firstequation calculates the error at time instantk + 1, while the second equation is thefilter coefficient update. Instead of doing an update in the direction of the input vectorx(k) as in LMS, in (2.12) the input signal can be seen to be whitened because it ismultiplied by the inverse correlation matrix. An algorithm description is provided inAlgorithm 3 .

Again a regularization (or better : ’diagonal loading’) term can be added to the inversecorrelation matrix

wrls(k + 1) = wrls(k) + (XT (k + 1)X(k + 1) + δI)−1x(k + 1)erls(k + 1).

HereI is the unity matrix. It is well known that this provides robustness to noise termsthat could be present ind(k) [27].


Algorithm 3 RLS algorithmwrls = 0Ξinv = 106.I //Init with large numberLoop (input : d and x):erls = d− xTwrls

Ξinv = 1λ2 (Ξinv −

1λ2 ΞinvxxT 1

λ2 Ξinv

1+ 1λ2 xTΞinvx

)

wrls = wrls + Ξinvxerls

y = xTwrls

It is easily seen that in casemultiple right hand side signalsare present, the update ofthe inverse correlation matrix can be shared among the different right hand sides :

eTrls(k + 1) = dT (k + 1)− xT (k + 1)Wrls(k),

Ξ(k + 1) =1λ2

Ξ(k)−1λ2 Ξ(k)x(k + 1)xT (k + 1) 1

λ2 Ξ(k)1 + 1

λ2 xT (k + 1)Ξ(k)x(k + 1),

Wrls(k + 1) = Wrls(k) + Ξ(k + 1)x(k + 1)eTrls(k + 1). (2.13)

This effectively means that in that case apart from the cost of calculating the inversecorrelation matrix (once), for each channel only an LMS–like updating procedureneeds to be calculated. This is easily shown by comparing (2.12) and (2.8). We willnow describe an RLS algoritm based on QRD–updates, which is known to have goodnumerical properties.

2.3.2 QRD–updating

Every matrixX ∈ <L×MN with linearly independent columns,L ≥ MN (in ourapplication,M will be the number of microphones or ’input channels’ andN thenumber of filter taps per microphone) can be decomposed into an orthogonal matrixQ ∈ <L×MN and an upper triangular matrixR ∈ <MN×MN , whereR is of full rankand has no non–zero entries on the diagonal.

X = QR. (2.14)

This decomposition is called ’QR–decomposition’ (QRD), andR is called theCholesky–factor or square root of the matrix productXTX, sinceXTX = RTR. In our appli-cationsX(k) is often defined in a time recursive fashion,

X(0) =(xT (0)

),

X(k + 1) =(

xT (k + 1)λX(k)

). (2.15)


Here0 < λ ≤ 1 is a forgetting factor andk is the time index. We will now brieflyreview the QR–updating procedure [29] for computing the QRD ofX(k+1) from theQRD ofX(k). If we replaceX(k) by its QR–decomposition, we obtain :

X(k + 1) =(

1 00 Q(k)

)(xT (k + 1)λR(k)

).

We can now find an orthogonal transformation matrixQ(k + 1) :

X(k + 1) =(

1 00 Q(k)

)(xT (k + 1)λR(k)

), (2.16)

=(

1 00 Q(k)

)Q(k + 1)︸︷︷︸

[∗|Q(k+1)]

(0

R(k + 1)

),

= Q(k + 1)R(k + 1).

The ’*’ are don’t care–entries. HereQ(k + 1) will be constructed as a series ofGivens–rotations,

Q(k + 1) = G1,2(θ1(k + 1))G1,3(θ2(k + 1)) . . . G1,MN (θMN (k + 1)),

with

Gi,j(θ) =

Ii−1 0 0 0 0

0 cos θ 0 − sin θ 00 0 Ij−i−1 0 00 sin θ 0 cos θ 00 0 0 0 IMN−j

.

Each of these rotations will zero out one of the elements of the top row in the com-pound matrix (

xT (k + 1)λR(k)

)(2.17)

in order to obtain the updatedR(k+ 1) in the right hand side of (2.16).Q(k) will notbe usefull in applications, and hence will not be stored. The procedure for choosingthei, j andθ for the Givens–rotations is best explained in the signal flow graph (SFG)for QRD–updating which is shown inFigure 2.3 for M = 2 andN = 4. In thisSFG the upper triangular matrixR(k) can be recognized, as well as the input vector(from the delay line) that is placed on top of it. Compare this to the matrix (2.17). Therotations (hexagons in the signal flow graph) are defined by :

(a′

b′

)=(

cos θ sin θ− sin θ cos θ

)(ab

).


When a new input vectorx(k+1) enters the scheme, the top left hexagon will calculateθ1 such that its outputb′ = 0,

tan θ1 =x1(k + 1)R11(k + 1)

, (2.18)

and it will updateR11 accordingly. Note that the denominator in this expression isnever zero by the definition of the QR–decomposition (the matrixR(k) should beproperly initialized before the first iteration). The other hexagons in the first row usethis θ1 to process the remaining elements of the input vector and the top row ofR(k). This corresponds to applyingG1,2(θ1(k)). Then the first hexagon in the second rowwill calculateθ2 so that applyingG1,3(θ2(k)) nulls the second element of the modifiedinput vector and updates the second row ofR(k), and so on [39]. For a more detaileddescription of this signal flow graph, we refer to [46].Algorithm 4 shows the QRD–updating process, see also [29]. Note that the updating scheme requiresO((MN)2)flops per update.

Algorithm 4 QRD–updatingUpdateQRD (R, x, Weight){// x is input vector// an upper triangular matrix R is being updatedfor (i = 0; i < M * N; i++){

R[i][i] *= Weight;temp = sqrt (R[i][i] * R[i][i] + x[i] * x[i]);sinTheta = x[i] / temp;cosTheta = R[i][i] / temp;R[i][i] = temp;for (j = i+1; j < M * N; j++)

{temp = R[i][j] * dWeight;R[i][j] = cosTheta * temp + sinTheta * x[j];x[j] = -sinTheta * temp + cosTheta * x[j];}

}}

2.3.3 QRD–based RLS algorithm (QRD–RLS)

The QR–decomposition can be used to perform a least squares estimation of the form

minW (k)

‖X(k)W (k)−D(k)‖2 . (2.19)

HereW (k) is a matrix, each column of which corresponds to a least squares estima-tion problem withX(k) and the corresponding column ofD(k) (referred to as the


a

b b’

a’

θ θ

1x (k−1)1x (k)1x (k+1) x (k−2)1

R (k+1)ij

0

0

0

0

0

0

0

R77

R66

R55

R44

R33

R22

22 22x (k)x (k+1) x (k−2)

R11

0

=

λ

x (k−1)

a

b

a’

0

θ =arctan(b/a)

hexagons are rotations

R (k)ij

∆

∆

∆

∆

∆

∆

filter input 2

filter input 1

R88

∆

Figure 2.3: Givens–rotations based QRD–updating scheme to update anR(k)–matrix. Ontop the new input vector is fed in, and for each row of theR(k)–matrix a Givens–rotation isexecuted in order to obtain an upper triangular matrixR(k + 1).


“desired response signal”). If (2.1) is solved instead of (2.6), bothD(k) andW (k)reduce to a vector.

D(k) will also be defined in a time–recursive fashion using weighting :

D(k + 1) =(

dT (k + 1)λD(k)

). (2.20)

Using equation (2.14) it is found that the least squares solution to (2.19) is given by

W (k) = (X(k)TX(k))−1XT (k)D(k)= R(k)−1QT (k)D(k)︸︷︷︸

Z(k)

. (2.21)

HenceW (k) is computed by performing a triangular backsubstitution with left handside matrixR(k) and right hand side matrixZ(k). FromR(k) = QT (k)X(k) andZ(k) , QT (k)D(k) it follows thatZ(k) can be obtained by expanding the QRD–updating procedure with the desired signals part, i.e. applying the QRD–updatingprocedure to (

X(k) D(k))

instead ofX(k) only, as shown inFigure 2.4with

d(k) =(d1(k) d2(k)

)T.

At any point in time the least squares solutionW (k) may then be computed based onthe storedR(k) andZ(k) according to formula (2.21).

The update equation becomes :(1 00 Q(k)

)(xT (k + 1) dT (k + 1)λR(k) λz(k)

)=(

1 00 Q(k)

)Q(k + 1)︸︷︷︸

[∗|Q(k+1)]

(0 rT (k + 1)

R(k + 1) z(k + 1)

). (2.22)

RLS has a rather large computational complexity, but (unlike NLMS) it shows a verygood performance that is independent of the input signal statistics. Furthermore, it hasbeen shown in [39] that

dT (k + 1)− xT (k + 1).W (k) =rT (k + 1)∏MN

i=1 cos θi(k + 1), (2.23)

whererT (k + 1) =

(ε1 ε2

)


is a byproduct of the extended QRD–updating process, as indicated inFigure 2.4. Thismeans that we can extract (a priori) least squares residuals without having to calculatethe filter coefficientsW (k) first. This is referred to as ’residual extraction’. Notethat the denominator in (2.23) can not become zero because since the denominator of(2.18) will never be zero. For the a posteriori residuals, we can write

dT (k + 1)− xT (k + 1).W (k + 1) =MN∏i=1

cos θi(k + 1)rT (k + 1). (2.24)

The signal flow graph for the whole procedure as given inFigure 2.4corresponds toFigure 2.3with the right hand side columns with inputsd1(k) andd2(k) added to theright, as well as a “Π cos θ accumulation chain” added to the left. The complexity ofthis scheme is stillO(M2N2) per time update.Algorithm 5 gives details about theQRD–RLS procedure.

Algorithm 5 Update of the QRD–RLS algorithmQRDRLS_update (R, x, r, Weight){// x is the input vector// r is the desired signal input (a scalar in this case)// the upper triangular matrix R is being up-dated, along with// the vector z which is the right hand side// the residual signal is returnedPiCos = 1;for (i = 0; i < M * N; i++){

R[i][i] *= Weight;temp = sqrt (R[i][i] * R[i][i] + x[i] * x[i]);sinTheta = x[i] / temp;cosTheta = R[i][i] / temp;R[i][0] = temp;for (j = i+1; j < M * N; j++)

{temp = R[i][j] * Weight;R[i][j] = cosTheta * temp + sinTheta * x[j];x[j] = -sinTheta * temp + cosTheta * x[j];}

temp = z[i] / Weight;z[i] = cosTheta * temp + sinTheta * r;r = -sinTheta * temp + cosTheta * r;PiCos *= cosTheta;}

return r * PiCos;}


1 2

d (k+1)d (k+1)

1 11x (k+1) 1x (k) x (k−1) x (k−2)

2 2x (k) 2x (k−1) 2

0

0

1 2

x (k+1) x (k−2)

r(k+1)

1

∆

∆

∆

∆

∆

∆

ε ε

LS residual

filter input 2

filter input 1

R11

R33

R55

R88

R77

R66

R44

R22

0

0

0

0

0

0

0

0

0

0

0

0

0

cosΠ θ

0

Z(k+1)

Figure 2.4: QRD–RLS algorithm. The right hand side (desired signal) is updated with the samerotations as the left hand side.

2.3.4 QRD–based least squares lattice (QRD–LSL)

It is well known that the shift–structure property of the input vectors ofFigure 2.4canbe exploited to reduce the overall complexity. It can be shown [46] that a QRD–RLSscheme as shown inFigure 2.4 is equivalent to the scheme ofFigure 2.5, which re-quires onlyO(M2N) flops per update instead ofO(M2N2) for the original scheme.SinceN (the number of filter taps) is typically larger thanM (the number of mi-crophones), this amounts to a considerable complexity reduction. The complexityreduction stems from replacing the off–diagonal part of the triangular structure (the


diagonal part is seen to be still in place), by the computations in the added left handpart. The resulting algorithm is called QRD–LSL (QRD–based least squares lattice),and it is known to be a numerically stable implementation of the RLS algorithm, sinceit only uses stable orthogonal updates as well as exponential weighting. Note thatQRD–LSL needs to read the input one sample ahead as compared to QRD–RLS. Forfurther details on the QRD–LSL derivation, we refer to [46]. InAlgorithm 6 theQRD–LSL adaptive filter is given in pseudocode.

1d (k+1)

2d (k+1)

1 2

1 1

22∆

∆

∆ ∆

∆ ∆

∆ ∆

LS residual

filter input 2

filter input 1

R22

R55

R88

R77

R66

R44

R33

R11

ε ε

x (k+1)

x (k+1)x (k+2)

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

Πcos θ

0

0

0

0

0

0

0

0

x (k+2)

Figure 2.5: QRD–LSL. Notice that the inputs for the right hand side part are the desired signalsat timek + 1, while the inputs for the left hand side are the input signals at timek + 2

2.3.5 RLS versus LMS

RLS is much more complex than LMS, but the performance for colored signals likespeech is often better. In formula (2.12) the updating equation for LMS can be recog-nized, with a ’prewhitening’ added in the form of the multiplication with the inversecorrelation matrix.


Algorithm 6 QRD–LSL updateUpdate(R, x, Weighting, RightHandSideMatrix, r, PiCos){for (i = 0; i < M*N; i++)

{for (j = i; j < M*N; j++)

{R[i][j] *= Weighting;}

GivensCalcAngle(SinTheta, CosTheta, R[i][i], x[i]);for (j = i+1; j < M*N; j++)

{GivensRotate(SinTheta, CosTheta, R[i,j], x[j]);}

for (j=0; j < RightHandSideMatrix.GetNrColumns(); j++){RightHandSide[i][j] *= Weighting;GivensRotate(SinTheta, CosTheta, RightHandSideMa-

trix[i][j], r[j]);}

PiCos *= CosTheta;}

return r;}ProcessNewInput(x, Weight, Desired){

PiCos=1;xl = x;xr = x;delay[0] = x;

for (int i=0; i < N; i++){

dxl = delay[1];dxr = dxl;

Update(RightR[i], dxr, dWeight,[RotationsRight[i] z[i]],[xr dDesired], PiCos);

if (i < N-1){Update(LeftR[i], xl, dWeight, Rotation-

sLeft[i], dxl, 0);delay[i] = dxl;}

xl = xr;}for (int i = 0; i < N-1; i++)

{delay[i+1] = delay[i];}

return dDesired*dPiCos;

}

2.4. AFFINE PROJECTION BASED ALGORITHMS 45

2.4 Affine Projection based algorithms

We will introduce the affine projection algorithm, and its time domain fast versionnamed FAP.

2.4.1 The affine projection algorithm

Theaffine projection algorithm(APA) [43] is an ’intermediate’ algorithm in betweenthe well known NLMS and RLS algorithms, since it has both a performance and acomplexity in between those of NLMS and RLS. It is (for the case of a single desiredsignal) based upon a system of equations of the form

XP (k)Twapa(k − 1) =

d(k)

d(k − 1)...

d(k − P + 1)

= dP (k), (2.25)

XP (k) =(

x(k) x(k − 1) . . . x(k − P + 1)),

whereN is the filter length,P (with P < N ) is the number of equations in the system.The ’basic’ system of equations (2.1) can again be recognized, this time with a smallernumber (P ) of equations.

The APA–recursion for a single desired signal is given as :

e(k + 1) = dP (k + 1)−XTP (k + 1)wapa(k), (2.26)

g(k + 1) = (XTP (k + 1)XP (k + 1) + δI)−1e(k + 1),

wapa(k + 1) = wapa(k) + µXP (k + 1)g(k + 1)y(k + 1) = xT (k + 1)wapa(k + 1).

Hereµ andδ are a step size and a regularization parameter respectively. The regular-ization parameter is important in providing noise robustness, as will be explained insection 2.4.P is a small number (e.g. 10) compared toN (e.g. 1000 or 2000). Thefirst element ofeP (k + 1) is the a priori error of the ’most recent’ equation in eachstep. An algorithm specification is given inAlgorithm 7 .

The complexity of the algorithm isO(MNP ). Also this algorithm is easily extendedto multiple right hand sides (2.6).

2.4.2 APA versus LMS

Just like for the RLS–algorithm, one can recognize an LMS–filter in (2.26), preceededby a pre–whitening step on the input signal. The NLMS algorithm is as a matter of


Algorithm 7 Affine projection algorithmw = 0Loop (new x and d in each step):

add column x to XP as first columnremove last column from XP

add d as first element from dPremove bottom element from dPe = dP −XT

Pwg = (XT

PXP + δI)−1ew = w + µXPgy = xTw

fact a special case of the APA–algorithm withP = 1. Forµ = 1 andδ = 0, the aposteriori error in (2.26)

epost(k + 1) = dP (k + 1)−XTP (k + 1)w(k + 1)

is zero, i.e. APA basically solves theP most recent equations exactly, based on aminimum norm weight vector update. Remember that the NLMS–algorithm applieda minimum norm update to the solution vector such that the a posteriori error of themost recently added equation is exactly zero. A geometrical interpretation will begiven in section 2.5.

2.4.3 The Fast Affine Projection algorithm (FAP)

A fast version of APA, called FAP, which has a complexity of4N + 40P flops isderived in [26]. Since typicallyP � N , FAP only has a small overhead as comparedto NLMS. This complexity reduction is accomplished in two steps.First, one onlycalculates the first element of theP–element error vectoreP (k) (see formula 2.26)and one computes the otherP − 1 elements as (1−µ) times the previously computederror. As stated in [26], this approximation is based upon an assumption about theregularization by diagonal loading (δI) :

ei(k) = eposti−1(k − 1) for i > 1.

Hereei(k) means thei’th component of the vectore(k), and similarly foreposti (k).

Indeed, we have

epost(k) = dP (k)−XTP (w(k − 1) + µX(k)(XT (k)X(k) + δI)−1e(k))︸︷︷︸

w(k)

≈ e(k)− µe(k)≈ (1− µ)e(k).

2.5. GEOMETRICAL INTERPRETATION 47

As shown in [26], this eventually leads to

e(k) ≈

e1(k)

−−−−−−−(1− µ)e1(k − 1)

...(1− µ)eP−1(k − 1)

, (2.27)

wheree1(k) = d(k) − xT (k)w(k − 1). Note that with a stepsizeµ = 1, theP − 1lower equations would have been solved exactly already in the previous time step, andhence their error would indeed be zero.

A secondcomplexity reduction is achieved by delaying the multiplications in thematrix–vector productX(k)g(k) in equation 2.26. This results in a ’delayed’ co-efficient vector

w(k − 1) = w(0) + µk−1∑l=P

x(k − l)l∑

j=0

gj(n− l + j),

such thatw(k) = w(k − 1) + µXP (k)f(k),

where

f(k) =

g1(k)

g2(k) + g1(k − 1)...

gP−1(k) + . . .+ g1(k − P + 1)

.

It can be shown that an updating formula forw(k) exists. A correction term can beused to obtain the residual at timek without having to calculatew(k) first. Details onthe derivation can be found in [26].Algorithm 8 is a full description of FAP.

The complexity of the FAP adaptive filter can even be further reduced by using fre-quency domain techniques. In chapter 4, the Block Exact Fast Affine Projection (BE-FAP) adaptive filter [59] will be reviewed, and we will derive a block exact version ofAPA, without the FAP–approximations, but with almost equal complexity as BEFAP.

2.5 Geometrical interpretation

All algorithms update their filter coefficient vector in each time step. The similaritybetween (2.10), (2.12) and (2.26) is obvious.

For NLMS withµ = 1 andδ = 0 the a posteriori error is zero in each step, and thiscorresponds to the fact that the most recent equation in a system of equations like (2.6)


Algorithm 8 Fast affine projections (FAP) forN filter taps which outputs the residualsof the filtering operation. The notation•a:b denotes the vector formed by thea’th tob’th component (inclusive) of vector•

Loop :rxx = rxx + x(k).x2:P (k) + x(k −N)x2:P (k −N)e1 = d− xT we1 = e1 − µrTxxf1:P−1

e =

e1

(1− µ)e1

...(1− µ)eP−1

Update S = (XTX + δ)−1

f =

0f1

...fP−1

+ Se

w = w + µx(k − P + 1)fPe = eoutput e1 or d− e1

is solved exactly. For APA (µ = 1 andδ = 0) the a posteriori error vector of sizeP iszero, which means that theP most recent equations in (2.6) are exactly solved. This ispossible as long asP ≤ N . WhenP > N we can only solve the system of equationsin the least squares sense, which then corresponds to an RLS algorithm with a slidingwindow. So APA is clearly an intermediate algorithm in between RLS and NLMS inview of complexity, but also in view of performance.P is a parameter that can betuned in function of the available processing power, where a largerP results in highercomplexity, but improved performance.

The fact that the performance of APA is intermediate between NLMS and RLS, can beshown geometrically.Figure 2.6shows a geometric representation of the convergenceof an NLMS filter with two filter taps. Assume the optimal filter vector that has to beidentified by the process to bew. The vectorsxi are the consecutive input vectors,while the pointswi are the estimates of the filter vector in successive time steps.Assume the estimate of the filter vector at time 0 to bew0. When a new inputx1

arrives,w0 will be updated in the direction ofx1 such that the error in the directionof x1 becomes zero (µ = 1 is assumed). This means thatw0 is projected on to a line(=affinity) that is orthogonal to the input vectorx1, and that contains the vectorw.

When the process continues, we see that the estimates converge tow. The conver-gence rate is higher when the directions of the input vectors are ’white’, which meansif they are effectively uncorrelated.

2.5. GEOMETRICAL INTERPRETATION 49

��

� � � � � � � � � � � ��

� � � � � � � � � � � ��

��

� � � ��

� � ��

��

� � � ��

��

� � � ��

��

� � � � � � ��

��

��

X5

X4

W5

X3

X2

X1

W0

W1W2

W3

W4

W

Figure 2.6: Geometrical interpretation of NLMS. The estimate of the filter vector is projectedupon an affinity of the orthogonal complement of the last input vector, such that this affinitycontains the ’real’ filter vector.

��

� � � � � � � ��

��

W1W0

W

X1

X2

Figure 2.7: Geometrical interpretation of APA. The estimate of the filter vector is projectedupon an affinity of the orthogonal complement of the lastP input vectors. This affinity containsthe ’real’ filter vector.


For the APA–algorithm, we have a sketch inFigure 2.7 for a system with 3 filtertaps and the APA–orderP = 2. Now the estimatew0 is projected on the affinityof the orthogonal complement of the lastP = 2 input vectors, such that this affinitycontains the solution vectorw (again, if stepsizeµ = 1). It can be seen that thisresults in a faster convergence compared to NLMS whenx1 andx2 havealmostthesame direction (i.e. when the input vectors are correlated). This intuitive geometricalinterpretation shows that APA is an extension of the NLMS–algorithm, and at thesame time it explains the name of theaffineprojection algorithm.

2.6 Conclusion

In this chapter, we have reviewed some adaptive filtering algorithms that will be im-portant in the rest of this text. The NLMS algorithm is a cheap algorithm (O(4MN)complexity)that exhibits performance problems in the case where non–white input sig-nals are used, because its convergence speed is dependent on the input signal statistics.On the other hand, the RLS–algorithm, which performs very well even for non–whitesignals, is much more expensive (O((MN)2) complexity for the standard versions,andO(M2N) for stable fast versions like QRD–LSL). A class of ’intermediate’ algo-rithms is the APA–family of adaptive filters. These filters have a design parameterPwith that one can tune both complexity and performance. RLS and APA can be seenas an LMS–filter with additional pre–whitening.

Not long ago, only NLMS–filters and even cheaper frequency domain variants wereused to implement acoustic echo cancellation, because of their complexity advantagewhenever long adaptive filters are involved, allthough both RLS and APA are wellknown to perform better. Due to the increase of computing power over the years, alsoAPA–filters more and more find their way into this field. We continue in this directionby proposing a new fast version of the APA–algorithm in the next chapters.

Chapter 3

APA–regularization and SparseAPA for AEC

In the previous chapter we have reviewed the adaptive filtering algorithms that areimportant in this thesis. In this and the next chapter, we will apply the affine projectionalgorithm to the problem of acoustic echo cancellation.

APA has become a popular method in adaptive filtering applications. Fast versionsof it have been developed, such as FAP (chapter 2) and the frequency domain BlockExact Fast Affine Projection (BEFAP) [59].

In this chapter, we focus on three main topics, namely regularization of APA, problemsthat exist in conventional fast APA implementations when regularization is applied,and finally regularization of APA in multichannel acoustic echo cancellation.

In a traditional echo canceller system, the adaptation is switched off by a control al-gorithm when the microphone signal contains a near end (speech) signal. However,robustness against continuously present near end noise is also very important, espe-cially for the APA–type of algorithms, which indeed tend to exhibit a large sensitivityto such near end noise. Regularization is generally used to obtain near end noise ro-bustness. We will review two alternatives for regularization, and introduce a thirdalternative (which we will call ’sparse equations’ technique).

Existing fast implementations of the affine projection algorithm (based upon FAP)exhibit problems when much regularization is used. Besides that, the FAP algorithmcan not be used with the sparse equations technique that we will derive. We will showthis in section 3.3, and this motivates further algorithm development in chapter 4.

The outline of this chapter is as follows : in section 3.1, we will first state the problemthat occurs if near end noise is present, and how diagonal loading and exponential

51

52 CHAPTER 3. APA–REGULARIZATION AND SPARSE APA

weighting (regularization) can be used to resolve this . In section 3.2 we will introducea ’sparse equations’ regularization technique, which will also reduce the influence ofnear end noise. The problems in the FAP algorithm when much regularization is used,are demonstrated in section 3.3. In sections 3.4 and 3.5 experimental results are givenand the behaviour of multichannel echo cancellation is studied when regularization isapplied. Conclusions are given in section 3.6.

3.1 APA regularization

In this section we will review why regularization is important for APA–type algo-rithms in case near end noise sources are present. ’Diagonal loading’ and exponentialweighting as regularization methods are also reviewed. In the next section, we willintroduce a third alternative, which we will call the ’sparse equations technique’.

3.1.1 Diagonal loading

The (semipositive definite) covariance matrixXT (k)X(k) that is used (and inverted)in the APA–expressions (2.26) is regularized by adding a small constantδ times aunity matrix (diagonal loading). The equations are repeated here for the update fromk − 1 to k (for convenience) : e(k) = dP (k)−XT

P (k)w(k − 1)g(k) = (XT

P (k)XP (k) + δI)−1e(k)w(k) = w(k − 1) + µX(k)g(k)

.

The obvious effect of this is that the matrix can not become indefinite, but regulariza-tion also has a beneficial effect when near end noise is present. This is shown in [27]as follows. RewritedP (k) as

dP (k) = x(k).wreal + n(k)

with wreal the room impulse response we are looking for. The vectorn(k) consists ofonly the near end noise in absence of a far end signal. We can derive the formula forthe difference vector∆w(k) between the real impulse responsewreal and the identifiedimpulse response at timek, w(k), namely

∆w(k) ≡ w(k)−wreal, (3.1)

∆w(k) =

I − µXP (k)(XTP (k)XP (k) + δI)−1XT

P (k)︸︷︷︸P (k)

∆w(k − 1) +

µXP (k)(XTP (k)XP (k) + δI)−1︸︷︷︸

P (k)

n(k). (3.2)

3.1. APA REGULARIZATION 53

If XP (k)is written as its singular value decomposition,

XP (k) = U(k)Σ(k)V T (k),

we can write

P (k) = XP (k)(XTP (k)XP (k) + δI)−1XT

P (k) (3.3)

⇓ XTP (k)XP (k) , U(k)diag(σ2

0(k), . . . , σ2P−1(k))UT (k)

P (k) = U(k)diag(σ2

0(k)σ2

0(k) + δ, . . . ,

σ2P−1(k)

σ2P−1(k) + δ

, 0N−P )UT (k),

whereσi(k) are the singular values ofXP (k), andU(k) andV (k) are orthogonal.These equations show thatδ has an effect both on the adaptation (first term of equation3.2), and on the near end noise amplification matrix (second term of equation 3.2).P (k) can be interpreted as an almost–projection matrix. Ifδ is chosen to be the powerof the background noisen(k), replacingXP (k) by its singular value decompositionreveals that the directions (see section 2.5) in the adaptation ofw(k) with large signal

to noise ratios are retained (since thenσ2i

σ2i+δ

≈ 1) and that (unreliable) updates in

directions with small signal to noise ratios are reduced (σ2i

σ2i+δ≈ σ2

i

δ ). Hence this is the

obvious choice forδ concerning its influence on the adaptation.

In the second term of equation 3.2, the continuously present background noise is seento be multiplied by the matrixµP (k) :

P (k) = XP (k)(XTP (k)XP (k) + δI)−1 (3.4)

⇓ XTP (k)XP (k) , U(k)diag(σ2

0(k), . . . , σ2P−1(k))UT (k)

P (k) = U(k)diag(σ0(k)

σ20(k) + δ

, . . . ,σP−1(k)

σ2P−1(k) + δ

, 0N−P )V T (k).

SinceU(k) andV (k) are orthogonal matrices, the noise amplification factors for thei–th mode are in thek–th step given as

τ(σi(k), δ) = µσi(k)

σ2i (k) + δ

. (3.5)

So the largerδ is chosen, the less the near end noise is amplified into the adaptationof w(k). The conclusion is that by the proposed choice ofδ, the amplification of thenear end noise is prevented, while the adaptation itself is only reduced in directionswith a low SNR.

Figure 3.1shows the echo energy loss for an acoustic echo canceller against time forsome speech signal in the presence of near end noise. The dotted line is the loss foran unregularized APA–algorithm, the full line results when a properly chosen regular-ization term is applied before inverting the correlation matrix. Both are plotted on alogarithmic scale. The regularized case is better.


0 2 4 6 8 10 12 14 16 180

10

20

30

40

50

60

Atte

nuat

ion

[dB

]

Time [s]

Figure 3.1: When near end noise is present, the dotted line is the echo energy loss in dB foraffine projection without regularization, the full line for affine projection with regularization.The graph shows a better result when regularization is applied.

Often a Fast Transversal Filter (FTF) algorithm is used [26] to update the inverse cor-relation matrix(XT

P (k)XP (k))−1, since then also regularization by diagonal loadingcan rather straightforwardly be built in. But since this type of algorithms is known tohave poor numerical properties, we propose to use QR–updating instead. The updateequations for APA then become e(k) = dP (k)−XT

P (k)w(k − 1)RTP (k)RP (k)g(k) = e(k)

w(k) = w(k − 1) + µXP (k)g(k)(3.6)

The second equation in (3.6) can then be solved by first updatingR(k) by means ofAlgorithm 4 , and then performing two successive backsubstitutions (with quadraticalcomplexity becauseR(k) is triangular). QR updating is numerically stable since itcan be implemented by using only (stable) orthogonal rotations. QR–updating — justlike the backsubstitutions — has a quadratic complexity, but sinceP (the dimensionof (XT

P (k)XP (k))−1) in acoustic echo cancelling applications typically is very smallcompared to the filter length (P = 2 . . . 10 whileN = 2000), this is not an issue.

The implementation cost for diagonal loading in the FAP algorithm is zero if (as in theoriginal algorithm) FTF is used to update the correlation matrices, but is is impossibleto implement this when (as we propose) the stable QR–updating approach is used.Exponential weighting as a regularization technique on the other hand, fits in nicely

3.1. APA REGULARIZATION 55

with the QR–updating approach.

3.1.2 Exponential weighting

An alternative way to introduce ’regularization’ consists in using anexponential win-dowfor estimating the inverse covariance matrix [44]. The updating for the correlationmatrix now becomes

(XTP (k + 1)XP (k + 1))−1 = (λXT

P (k)XP (k) + x(k + 1)xT (k + 1))−1

with x(k) = [ x(k) x(k − 1) . . . x(k − P + 1)]T . This is in contrast to theequations (2.25) and (2.26), where a sliding window is used. Figure 3.2 shows thatwhen no noise is present, APA with a sliding window and APA with an exponentialwindow both perform almost equally well. As shown in Figure 3.3, the regularizationeffect of using an exponential window keeps the coefficients from drifting away fromthe correct solution when noise is present.

0 0.5 1 1.5 2 2.5

x 104

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18APA (dotted) versus APA with exponential window (full), no noise present

Figure 3.2: For a simulated environment with a speech input signal, this plot shows the dis-tance (the norm of the difference) between the real filter and the identified filter coefficientvectors against time, both for original APA with a sliding window (dotted line) and APA with anexponential window (full line). This experiment shows the noiseless case, performance of bothalgorithms is equal.


0 0.5 1 1.5 2 2.5

x 104

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2APA (dotted) versus APA with exponential window (full line) when noise is present

Figure 3.3: Distance between identified and real filter vector versus time for the case wherethe echo signal is a speech signal, and near end noise is present. The full line is APA with anexponential window, the dotted line is the original APA algorithm without regularization. Theidentified filter coefficients get much closer to the real coefficients in case regularization (byusing an exponential window) is used.

3.2 APA with sparse equations

In this section we will derive a third alternative for incorporating regularization intothe affine projection algorithm, which we call ’Sparse Equations’ technique. Diagonalloading is not easily implemented when the QR–updating technique is used, but bothexponential weighting (see previous section) and the ’Sparse Equations’–techniqueon the other hand, are. Equation 3.5 shows that forδ = 0 noise amplification will besmaller if the smallest singular values are larger. So every method that realizes this issuitable to be used instead of explicit regularization.

The reason why the singular values become small, lies in the autocorrelation that existsin the filter input signal. The system ofP consecutive equations that is solved in APA,will therefore have a large condition number. This leads to the idea of using non-consequtive equations. The equations will be less correlated since a typical speechautocorrelation function decreases with the time lag. We call the non–consequtiveequationssparse equations. We will develop this further for equally spaced sparseequations.

The matrixXP (k) ∈ RN×P in the equations (2.25) to (2.26) is replaced by a matrix

3.2. APA WITH SPARSE EQUATIONS 57

XP (k) ∈ RN×P as follows.

e(k) = dP (k)− XTP (k)w(k − 1) (3.7)

g(k) = (XTP (k)XP (k))−1e(k) (3.8)

w(k) = w(k − 1) + µXP (k)g(k) (3.9)

where

XP (k) = [x(k) x(k −D) . . . x(k − (P − 1)D)] (3.10)

dP (k) =[d(k) d(k −D) . . . d(k − (P − 1)D)

]TFigure 3.4shows the time behaviour of the smallest and the largest singular value of aregularized1 (XT

P (k)XP (k) + δI) (explicit regularization case) and(XTP (k)XP (k))

0 0.5 1 1.5 2 2.5

x 104

10−5

10−4

10−3

10−2

10−1

100

101

102

103

Smallest and largest singular value. Full line : sparse equations, dotted : successive

Samples

σ

Figure 3.4: Smallest and largest singular value of input correlation matrix for a speech signalin function of time. Dotted line : explicit regularization (δ = 0.1). Full line : sparse equations(D = 10). Signal peak value = 0.1,P = 10, N = 1024. Regularization parameters weretuned for equal initial convergence in an echo canceller setup.

(sparse equations case) for a speech signal, plotted on a logarithmic scale.Figure3.4 shows that the matrix constructed using sparse equations typically has a bettercondition number than the explicitly regularized one.

1In order to provide a fair comparison, regularization parameters have been tuned so that the initialconvergence performance of an APA–adaptive filter is equal in both cases


There is a restriction though : the input signalx(k) has to be nonzero over the consid-ered time frame (just some background (far end) noise is enough), because otherwiseits covariance matrix even with sparse equations will become zero, (i.e. singular).Notice the silence in the input signal in the beginning of the plot. Since a controlalgorithm is available in every practical implementation of an echo canceller, its in-ternal signals can be used to switch off adaptation when there is no far end signalpresent, and if there is a signal present, its covariance matrix should be of full rank.Experiments confirm that this is always the case in practice for speech signals.

We will again use QR–updating (and downdating, see below) to track the covariancematrix (see section 2.3.2) .

XP (k) = QP (k)RP (k) (3.11)

(XTP (k)XP (k))−1 = (RTP (k)QTP (k)QP (k)RP (k))−1 = R−1

P (k)R−TP (k) (3.12)

HereRP (k) ∈ RP×P is an upper triangular matrix,QP (k) is an orthogonal ma-trix. Equation (3.12) shows that only the upper triangular matrixRP (k) needs to bestored and updated. Equation (3.8) can then be calculated by backsubstitution. (Analternative would be using inverse QR–updating [29] instead of QR–updating andmultiplications instead of backsubstitutions ).

From (2.25) and (3.10) it is seen that for updatingXP (k) to XP (k + 1), instead ofadding a column to the right and removing a column from the left, also a row canbe added to the top, and one removed from the bottom. This translates into sizePupdates and downdates for the upper triangular matrixRP (k).

Updating can be done using Givens-rotations onRP (k) for the updates (which cor-responds to adding a row toXP (k)). Similarly, downdating is performed using hy-perbolic rotations [29]. The procedure (and SFG, seeFigure 2.3) is similar to theQRD–updating procedure, only now hyperbolic transformations of the form(

a′

b′

)=(

cosh(θ) − sinh(θ)− sinh(θ) cosh(θ)

)(ab

)where angleθ is computed in a diagonal processor in the signal flow graph. Thedowndating algorithm is given inAlgorithm 9 , together with the function to updateR with a rectangular window.

In this way a rectangular window is implemented. Because the hyperbolic rotationsare not numerically stable, it is interesting to make the window (weakly) exponentialby multiplying the matrixRP (k) with a weighting factorλ (very close to 1) in eachstep. In this case, the filter weights must be compensated. This is due to the fact thatXP (k) is updated row by row, while the actual input vectors are the columns. So the’compensated’ filter vector becomes

wcomp(k) = diag(1

λN−1, . . . ,

1λ0

)w(k)

3.2. APA WITH SPARSE EQUATIONS 59

Algorithm 9 QRD–downdating and tracking ofR(k) with a rectangular window. Ifλ 6= 1 the filter vector should be compensated.

DowndateQRD (R, x){// x is input vector// an upper triangular matrix R is being downdatedfor (i = 0; i < M * N; i++)

{if (abs(x[i]) < abs(R[i][i]))

{temp = x[i]/R[i][i];coshTheta = 1 / sqrt(1-temp*temp);sinhTheta = coshTheta*temp;}

else{temp = R[i][i]/x[i];sinhTheta=1/sqrt(1-temp*temp);coshTheta=sinhTheta*temp;}

R[i][i] = coshTheta * R[i][i] - sinhTheta * x[i];for (j = i+1; j < M * N; j++)

{temp = R[i][j] ;R[i][j] = coshTheta * temp - sinhTheta * x[j];x[j] = -sinhTheta * temp + coshTheta * x[j];}

}}TrackRRectangularWindow{UpdateQRD(R, xk:−1:k−P+1, λ)DowndateQRD(R, xk−N :−1:k−N−P+1)}


How much decorrelation is provided by choosingD larger is of course dependent uponthe statistics of the far end (echo reference) signal. In our experiments we have taken afixed value ofD. It should be noted that the complexity and memory requirements ofthe implementation will rise for largerD. If D is chosen large, more ’past information’is considered for the estimation of the input statistics, (which is also the case forexponential weighting of course), so tracking of the input signal statistics will becomeslower.

The plots inFigure 3.5 show the evolution of the distance between a (syntheticallygenerated) room impulse response and what APA (dotted line) and APA with sparseequations (full line) identify as the filter vector. InFigure 3.5, there is no near endnoise present, and then both methods have almost equal performance. InFigure 3.6, asmall quantity of white near end noise disrupts the adaptation of the filter coefficients,which now at some points have a tendency to move away from their optimum values.The sparse equations setup can be seen to perform better than the setup with explicitregularization. This experiment was repeated with different distances between theequations and different regularization factors. Here, a distance ofD = 5 was chosen,compared to aδ = 0.01 (where the maximum signal level is 0.1).

0 0.5 1 1.5 2 2.5

x 104

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2Distance between real room response and (1) APA (dotted) (2) Sparse−APA (full)

Figure 3.5: Distance between real room response andw(k), the identified filter vector in func-tion of time (speech input). Dotted line is regularized APA (δ = 0.01) , full line is sparse APA(D = 5). No near end noise is present. Sparse APA converges somewhat slower.

The tree alternatives for regularization we have described, can all be used, and evencombined. If QR–updating is used to keep track of the covariance matrix, regulariza-tion by diagonal loading is difficult to implement, but both exponential weighting andthe sparse equations technique are valid choices.

3.3. FAP AND THE INFLUENCE OF REGULARIZATION 61

0 0.5 1 1.5 2 2.5

x 104

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2norm van fout van disp−w en van gewone w (Volle lijn = disp).

Figure 3.6: Distance between real room response andw(k) for a speech signal. Dotted line isregularized APA (δ = 0.01), full line is sparse APA (D = 5). Near end noise is present. SparseAPA is shown to be a viable alternative for regularization

It is also possible to combine the sparse equations technique with exponential weight-ing. This can easily be done by leaving out the downdates and the compensation forthe filter weights. The sparse equations technique provides one with an extra parame-ter when regularizing APA, or can be used as a standalone technique for regularization.

3.3 FAP and the influence of regularization

FAP was reviewed in section 2.4.3. An important observation is that the fast affineprojection algorithm [26] builds upon some assumptions that are not valid anymoreif the influence of the regularization becomes too large. The algorithm then startsto expose convergence problems, which is clearly shown inFigure 3.7 for a FAPalgorithm with exponential weighting as a regularization technique.

Figure 3.8 shows another example with explicit regularization forP = 10 with a’strong’ regularization parameter (δ = 10). The plot shows the time evolution of thesynthetically generated room impulse response and the filter vector estimated by bothalgorithm classes. APA is shown to perform better for this large regularization param-eters than the FAP algorithm. This in particular will be a motivation for developinga fast (block exact) APA algorithm in chapter 4, as an alternative to the existing fast(block exact) FAP algorithms.


0 0.5 1 1.5 2 2.5

x 104

0

0.05

0.1

0.15

0.2

0.25

Figure 3.7: Time evolution of the distance between identified and real filter for FAP (dotted)and FAP with an exponential window,λ = 0.9998 (full line) and a speech signal as far endsignal. The approximations made in FAP are clearly not valid anymore when an exponentialwindow is used.

0 0.5 1 1.5 2 2.5

x 104

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

|Wk−W

real|

Samples

FAP, δ=10

APA, δ=10

Figure 3.8: Behaviour of the FAP–based algorithms as compared to the APA–based algorithmswhen strong regularization is involved. The time evolution of the filter vector error norm (dis-tance between real and identified room impulse response) is shown. The APA–algorithm has amuch better convergence. The input is a speech signal.

3.4. EXPERIMENTAL RESULTS 63

3.4 Experimental results

We will now show some experimental results concerning the regularization effect ofthe sparse equations technique on the echo canceller performance in the presence ofnear end noise. In all experiments in this section the same speech signal is used (max-imum value of the signal is0.1). The length of the echo canceller is900 taps, and ittries to model a synthetically generated room impulse response of1024 taps. This is atypical situation for echo cancelling : the ’real’ room impulse response is longer thanthe length of the acoustic echo canceller. The step size parameter is alwaysµ = 1.

In Figure 3.9, we compare the time evolution of weight error for APA (dotted line)and Sparse–APA (full line). In this simulation, white near end noise disturbs theadaptation of the filter coefficients. In the first half of the plot, the SNR is higher thanin the second half of the plot. In all these experiments, the regularization parameters(δ for APA and the equation distanceD for Sparse APA) were tuned to obtain an equalinitial convergence, in order to have a fair comparison of the steady state performance.This was done by settingD = 5 for sparse–APA and then experimentally determiningthe value ofδ (= 0.005) in order to get the same initial convergence.

Sparse–APA where10 out of50 equations are used (soD = 5), outperforms explicitlyregularized FAP (δ = 0.005) with 10 successive equations, both in the high and thelow SNR part. Its performance is comparable with explicitly regularized APA (δ =1) with 10 successive equations when the SNR is not too low, while else explicitregularization is better. On the plot also the performance of an explicitly regularizedFAP–algorithm (δ = 1) with 50 successive equations is shown in order to show theperformance drop if only10 out of 50 equations are used. We can conclude that theperformance of Sparse–APA where 10 out of 50 equations are used is better than theperformance of FAP with 10 successive equations, if near end noise is present. Thereason for this can probably be found in the regularizing effect of the sparse equationstechnique, and in the fact that the approximations in FAP are not present in Sparse–APA.

Figure 3.9also shows that regularization reduces FAP performance more than in thecase of APA. In APA a regularizationδ = 1 is needed to slow down the convergenceto the same rate as the initial convergence for the sparse equations technique withD = 5. For FAP, the initial convergence speed has already decreased to that pointwith an explicit regularization ofδ = 0.005. So this figure proves the performancebenefit of using APA instead of FAP.

In Figure 3.10, the tracking behaviour of Sparse APA is compared to the behaviourof APA, and it is the same for both algorithms when they are regularized comparably(equal initial convergence behaviour). In this experiment,D = 25 for Sparse APA,andδ = 0.1 for plain APA.P = 10 in both cases.


0.5 1 1.5 2

x 104

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

FAP, δ=.005, P=10

Sparse−APA, D=5, P=10

FAP, δ=1, P=50

APA, δ=1

Samples

|Wk−W

real|

Figure 3.9: Distance between real room response andw(k) in the presence of near end noise infunction of time. Regularization parameters have been tuned to give equal initial convergencecharacteristics. The SNR of the far end speech signal versus the near end noise is higher in thefirst half of the signal than in the second half. Regularization reduces FAP performance morethan APA performance.

3.5 Regularization in multichannel AEC

An important issue is multichannel AEC, as we have already mentioned in chapter1. When multiple loudspeakers are used to reproduce the sound that stems from onespeech source, the non–uniqueness problem occurs [41, 4]. InFigure 3.11 the sit-uation is depicted. Microphones in the far end room pick up the sound of ’Source’filtered by the transmission room impulse responsesg1 andg2. These signals are thenagain filtered by the receiving room impulse responsesh1 andh2. If the lengthN ofthe echo canceller filterw is larger or equal than the length of the transmission roomimpulse responses, the following equation holds :

xT1

(g2

0

)= xT2

(g1

0

)such that

X(k)

g2

0−g1

0

= 0

which means thatX(k) is rank deficient and hence that no unique solution exists for(2.7).

3.5. REGULARIZATION IN MULTICHANNEL AEC 65

0 0.5 1 1.5 2 2.5

x 104

0

0.05

0.1

0.15

0.2

0.25

Sparse−APA

Explicitly regularized APA

Tracking behaviour

|Wk−W

real|

Samples

Figure 3.10: Tracking behaviour of Sparse APA (full line) compared to explicitly regularizedAPA (dotted line) (far end signal is a speech signal). Regularization is tuned to obtain equalinitial convergence. At the12000 th sample, the room characteristics change. The trackingbehaviour remains equal. Note the small peaks that occur if the input signal is not ’persistentlyexciting’ (to be solved by the speech detection device)

x1

x2

ei di

W

+

Transmission

Source

g

g

h

h

2

1

1

2

Receivingroom

room

Figure 3.11: The multichannel echo cancellation non uniqueness problem. Changes in eitherthe transmission room or the receiving room will destroy successful echo cancellation when theexact pathsh1 andh2 have not been identified byw =

(w1 w2

)T.


Allthough the adaptive filter will find some solution, only the solution correspondingto the true receiving room echo paths is independent of thetransmission roomim-pulse responses. In a mono acoustic echo canceller setup, the filter has to re–adaptif the acoustical environment in the receiving room changes. In case a multichannelecho canceller does not succeed to identify the correct filter path, changes in thetrans-mission roomwill also result in a residual echo signal occurring in the acoustic echocanceller output.

In practice this situation doesnotstrictly occur, because for echo cancellation the filterlengthN is usually smaller than the length of the impulse response in the transmissionroom (

xT1 0 xT2 0)( g2

g1

)= α ∼= 0

But still, this means that the problem is typically ill–conditioned.

Attempts to solve this problem can be found in the literature [41, 58, 28, 7], and theyconsist of decorrelating the loudspeaker signals (i.e. reducing the cross–correlationbetween the inputs of the adaptive filter by means of additional filtering operations,non–linearities, noise insertion, etc.). Obviously it is important that this remains in-audible.

In addition to these decorrelation techniques (which can not be exploited too muchbecause of the inaudibility constraint), it is important to use algorithms of which theperformance is less sensitive to correlation in the input signal than NLMS. In [4] itis shown that RLS performs well because of its independence of the input eigenvaluespread. Since RLS is an expensive algorithm, and APA is intermediate between NLMSand RLS, APA is often considered to be a good candidate for use in multichannel AEC[34, 17, 33].

Experiments show that the influence of near end noise on the adaptation is a lot largerfor a multichannel setup than for a mono echo canceller based upon affine projection,and that for good results the regularization has to be a lotstronger, i.e. the problemthat occurs in FAP–based algorithms is even more present in this case.

In [26], explicit regularization is suggested withδ equal to the near end noise power.But experiments show that this is not enough for the case with a large cross correlationbetween the input channels, when a large amount of noise is present. When variousappropriately strong regularization methods are used instead, the performance dropdue to the approximation in FAP is unacceptably large. For this reason, we propose touse the APA algorithm instead of FAP.

In the experiments shown here, 50.000 samples of a signal sampled at 8 kHz havebeen recorded in stereo, the room impulse responses (1000 taps) we want the filter toidentify have been generated artificially, and artificial noise (SNR=30 dB) has beenadded.

If we apply the Sparse–APA algorithmwith spacingD between the equations, exper-

3.5. REGULARIZATION IN MULTICHANNEL AEC 67

iments show that the cross–correlation problem in the stereo algorithm adds to theauto–correlation problem that is already present in the mono–algorithm. This meansthat a much stronger regularization is required in the stereo case. We have chosenD = 200. This is to be compared with the typical value ofD = 10 for the mono case.

Forexponential updating, a forgetting factorλ = 0.9998 is a typical value.

As already mentioned,explicit regularizationcan only be used in the FAP–based al-gorithms for small regularization terms (δ comparable to the near end noise variance,which was0.001 in our experiments). When largeδ are required as in this stereo prob-lem with a lot of noise present, one has to resort to an exact APA–implementation.Even aδ = 0.1, for which the FAP–approximation is not valid anymore, is not largeenough to regularize the problem at hand, as shown inFigure 3.12. Eventually,δ = 2was chosen. The results of these three techniques are shown inFigure 3.13, and canbe seen to be comparable.

0 1 2 3 4 5 6

x 104

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Figure 3.12: Distance between real and identified impulse response versus time for APA (speechsignal) with an explicit regularization factorδ = 0.1. For the mono case this was sufficient, butobviously not for a stereo setup : the filter does not converge.

Finally, we want to reiterate that strong regularization doesnot solvethe stereo echocancelling problem, only decorrelation techniques do. But regularization is necessaryin addition in order to provide near end noise robustness.


0 1 2 3 4 5 6

x 104

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Full : explicit with δ=2, dashed : exponential updating with λ=0.9999, dotted : sparse equations with D=200

Figure 3.13: 3 Ways of regularizing the stereo affine projection algorithm for a speech signalinput with near end noise present. Full line is the distance between the real and the identi-fied impulse response for explicit regularization withδ = 2, dashed line is the distance forexponential updating withλ = 0.9999 and dotted line is the distance for the sparse equationstechnique with an equation spacing ofD = 200. The parameters are clearly higher than in themono case.

3.6. CONCLUSION 69

3.6 Conclusion

In this chapter, we have shown that if affine projection techniques are used for acousticecho cancellation, it is important to provide sufficient regularization in order to obtainrobustness against continuously present background noise. This is important in singlechannel echo cancellation, but even more so in multichannel echo cancellation. Inthe latter case, cross–correlation between the loudspeaker signals (and hence the inputsignals of the adaptive filter) leads to ill–conditioning of the problem. Regularizationneeds to be applied in addition to decorrelation techniques.

We proposed to replace the FTF–based update of the small correlation matrix of sizeP in the original FAP algorithm by a QRD–based updating procedure that is numer-ically more stable. Diagonal loading is not easily implemented in this QRD–basedapproach, and therefore we have described two alternative approaches, exponentialweighting, and a new technique based on ’sparse equations’. Performance–wise com-parable results can be obtained by the three regularization techniques.

We have shown that there are both advantages and disadvantages to the FAP–algorithm.Diagonal loading can be incorporated, because it uses the FTF–algorithm for updat-ing the sizeP correlation matrix, but on the other hand it makes some approximationsthat are only valid when not too much regularization is applied, and hence it exposesperformance problems when more regularization is applied (e.g. in multichannel echocancellation). This observation motivates the derivation of the BEAPA–algorithm inchapter 4.

Chapter 4

Block Exact APA (BEAPA) forAEC

We have explained in the previous chapter why regularization is an absolute neces-sity in affine projection based adaptive filtering algorithms, and that FAP (fast imple-mentation of APA) relies on an implicit ’small regularization parameter’ assumption.This in particular may lead to poor performance of the FAP algorithm as compared to(properly regularized) APA.

In this chapter, a block exact affine projection algorithm (BEAPA) is derived that doesnot rely on the assumption of a small regularization parameter. It is an exact fre-quency domain translation of the original APA algorithm, and still has about the samecomplexity as BEFAP, which is a similar frequency domain (hence low complexity)version of FAP. In a second stage, the BEAPA algorithm is extended to incorporatean alternative to explicit regularization that is based on so–called “sparse” equations(section 3.2).

Section 4.1 will review FAP and a frequency domain version thereof : block exactFAP (BEFAP) [59]. FAP is not exactly equal to APA, and therefore some regular-ization techniques will have different effects in FAP, compared to APA as shown inthe previous chapter. In section 4.2, a fast block exact frequency domain version ofthe affine projection algorithm is derived (Block Exact APA). This algorithm has acomplexity that is comparable to the BEFAP–complexity, while it is an exact but fastversion of APA. In section 4.3 Block Exact APA is extended to allow for the sparseequations technique to be used (Sparse Block Exact APA).

71

72 CHAPTER 4. BLOCK EXACT APA (BEAPA) FOR AEC

4.1 Block Exact Fast Affine Projection (BEFAP)

In [36] and [59], a block exact version of FAP (see section 2.4.3) is derived, whichis referred to as BEFAP. Since the derivation of the algorithm in this chapter is basedupon BEFAP, it is instructive to review the concept of this algorithm in order to clarifythe differences. Abasis filter vectorthat is fixed during ablockof sizeN2 (e.g. 128)is taken as a basis for a fast (frequency domain) convolution with a (possibly smaller,but a typical value is also 128) block lengthN1. Since the filter vector is fixed duringthis block, the filtering operation can indeed be calculated cheaply in the frequencydomain. N1 can be made smaller to reduce the delay of the system. To obtain anexactversion of the FAP–algorithm, corrections to the residuals obtained with the fastconvolution are calculated during the block. The complexity of the corrections growswithin the block, but because of the choice of the parameters, it never reaches thecomplexity of the full filtering operation that is needed with FAP. After a block, allthe corrections during the block of lengthN2 are applied to the basis filter vector, bymeans of a frequency domain convolution, resulting in the same output as the originalFAP algorithm.

If the filter vectorwereupdated ineachstep and thebasis filter vectorw(k− 1) wereknown at time instantk, we can writew(k+ i−1) in terms ofw(k−1). We letsj(k)denote thej’th component of vectors(k) :

w(k + i− 1) = w(k − 1) +i∑

j=1

µXP (k + i− j)g(k + i− j),

= w(k − 1) +i+P−1∑j=1

sj(k + i− 1)x(k + i− j)

−P−1∑j=1

sj(k − 1)x(k − j).

The meaning ofs(k) is as follows : since the columns inXP (k) shift through thismatrix, the multiplications withg(k) can largely be simplified by adding togethercorresponding components of the vectorsg(k) for k = 1..i and thus building up avectors(k) recursively, containing such summed components. In what follows,k isthe sample index at the start of a new block, whilei is an index inside a block. If welet s|ji denote a sub–vector consisting of thei’th to the j’th element ofs, the vectors(k + i− 1) is recursively obtained as

s(k)︸︷︷︸∈RP×1

=(

0s(k − 1)|P−1

1

)+ µg(k) (i = 1),

4.1. BLOCK EXACT FAST AFFINE PROJECTION (BEFAP) 73

s(k + i− 1)︸︷︷︸∈R(P+i−1)×1

=(

0s(k + i− 2)

)+(µg(k + i− 1)

0i−1

)(i > 1), (4.1)

where0i−1 is a null vector of sizei− 1. Vectors(k) grows within a block, but its sizeis reset toP × 1 at each block border (where a new basis filter is calculated). So ineach block, vectors(k + i− 1) grows from sizeP to sizeP +N2 − 1. The contentsof the firstP − 1 positions ofs(k − 1) remain intact when crossing block borders.In BEFAP, the filter vector isnot updated in each time step, but only at the end of ablock. We will use the expression forw(k + i − 1) to derive corrections to the filteroutput that have to be applied after a filtering operation with the basis filterw(k− 1).The filter output is then written as

y(k + i) = xT (k + i)w(k + i− 1)= xT (k + i)w(k − 1)

+i+P−1∑j=1

sj(k + i− 1)xT (k + i)x(k + i− j)

−P−1∑j=1

sj(k − 1)xT (k + i)x(k − j)

= xT (k + i)w(k − 1) +i+P−1∑j=1

sj(k + i− 1)rj(k + i) (4.2)

−P−1∑j=1

sj(k − 1)ri+j(k + i).

We letrj(k) denote thej’th component of vectorr(k). These correlations are definedas

rj(k + i) ≡ xT (k + i)x(k + i− j). (4.3)

In practical implementations, these correlations are recursively updated. Still referringto [59], one can avoid the third term in the equations for the output if one defines analternativebasisfiltering vector :

z(k − 1) = w(k − 1)−P−1∑j=1

sj(k − 1)x(k − j), (4.4)

which can be updated as

z(k +N2 − 1) = z(k − 1) +X(k +N2 − P )s(k +N2 − 1)|P+N2−1P ,(4.5)

whereX(k +N2 − P ) is defined as

X(k +N2 − P ) = (4.6)[x(k +N2 − P ) x(k +N2 − P − 1) . . . x(k − P + 1)

].


We can now rewrite (4.2) as

y(k + i) = xT (k + i)z(k − 1) +i+P−1∑j=1

sj(k + i− 1)rj(k + i). (4.7)

The filtering operation with the alternative basis vector in the first term of equation 4.7and the update to the next basis filter vector (4.5) afterN2 samples can be performedin the frequency domain by means of fast convolutions. The block sizes of theseconvolutions do not necessarily have equal length : the block size for the filteringoperation isN1 and the block size for the update isN2. The overall complexity of thisBEFAP–algorithm is

6M1 log2M1 − 7M1 − 31N1

+ 6N2 + 15P − 4+

P 2 − PN2

+ 10P 2 +6M2 log2M2 − 7M2 − 31

N2. (4.8)

Algorithm 10 Block Exact FAPfor j = 1 to N2 + Pr1j = (u|kk−L+1)Tu|k−j−1

k−j−Lendfor

loopfor i = 0 to N2 − 1

if (i modulo N1 == 0)<fill next part of y with convolution of

next block of N1 samples from u with z>endif

r1 = r1 + uk+i+1

←−−−−−−−−−−−−−u|k+i+1k+i+1−N2−P+1 − uk−L+i+1

←−−−−−−−−−−−−−−−u|k−L+i+1k−L+i+1−N2−P+1

ek+i+1 = dk+i+1 − yk+i+1 − (s|P+i−11 )T r1|i+P2

E1 = ek+i+1

for α = 2 to PEα = (1− µ)Eα−1

endfor<update S−1, the P × P inverse covariance matrix>g = S−1E

s =

[0

s|N2+PD−21

]+

[µg

0N2−1

]endfor

z = z+<convolution of s|P+N2−1P with u|k−P+N2+1

k−P+N2+1−L+1+1>

k = k +N2

endloop

WhereN1 andN2 are block lengths for the frequency domain algorithm (e.g.N1 =N2 = 128). FurthermoreMi = N + Ni − 1. The terms containing the logarithms

4.2. BLOCK EXACT APA (BEAPA) 75

are obviously due to the FFT–operations that are used. The complexities of the FFT’shave been taken from [35]. The termP

2−PN2

can often be neglected becauseP � N2.A typical example isP = 3, N1 = N2 = 128, N = 800 leading to1654 flopsper sample (which is about half the complexity of FAP). An algorithm description isprovided inAlgorithm 10.

4.2 Block Exact APA (BEAPA)

In this section a fast implementation of APA (Block Exact Affine Projection Algorithm) is derived, based on [59]. In FAP (and in BEFAP), the calculation of the lowerP −1components of the error vector was based upon an approximation (2.27), while onlythe first component is really computed. In APA all components of the residual vectorek are computed in each step instead of only the first one. We describe a method thatdoes not require a full filtering operation for all of theP equations. The complexityof the new algorithm is

6M1 log2M1 − 7M1 − 31N1

+ 6N2 + 15P − 5 + 11P 2

P 2 − PN2

+6M2 log2M2 − 7M2 − 31

N2. (4.9)

This formula shows that even though the full error vector is calculated, the requirednumber of flops is a lot smaller than doingP full filtering operations. For the exampleof section 4.1,P = 3, N1 = N2 = 128, N = 800, leading to1662 flops. (The differ-ence with the BEFAP–algorithm becomes slightly bigger whenP is larger). Figure4.1 is a schematical representation of the final algorithm.

4.2.1 Principle

In the FAP–algorithm [26], only the first component of the error vector is calculated,and the others are approximated. Block Exact APA will be derived here along the linesof BEFAP, but such that all error vector components are calculated in each step. Whenk denotes the sample index corresponding to the beginning of a block, andi (1..N2)an index inside the block, we have

y(k + i) = XTP (k + i)w(k + i− 1),

e(k + i) = dP (k + i)− y(k + i),

g(k + i) = (XT (k + i)X(k + i))−1e(k + i), (4.10)


BaseFilter

z

*

r

*

r

*

r

*}

∆ BlocklengthCommon for all microphones

e

gs

1 P

+ + +

+++To far end

From far end

Mostly known from past

2(X X)−1T

Figure 4.1: A schematical representation of the BEAPA algorithm in an echo canceller setup.Bold lines are vectors. A box with an arrow inside is a buffer that outputs a vector.

w(k + i− 1) = w(k − 1) +i∑

j=1

µXP (k + i− j)g(k + i− j). (4.11)

In these expressions,dP (k + i) is the desired signal,e(k + i) is a vector with thea priori errors of theP equations in this step, andy(k + i) is the vector with theoutputs of the filter for theseP equations. We again propose to use QR–updating anddowndating (or in case of exponential weighting as regularization updating only, seechapter 3) to keep track ofR(k), the Cholesky factor ofX(k) and use this triangularmatrix in order to calculate 4.10 with quadratic complexity. From equation 4.11 it canbe seen that, in a similar way as was done for BEFAP, we can write

w(k+ i−1) = w(k−1)+i+P−1∑j=1

sj(k+ i−1)x(k+ i− j)−P−1∑j=1

sj(k−1)x(k− j),

wherew(k−1) is ourbasis filter vector, and where the vectors(k+i−1) is recursivelyobtained froms(k + i− 2). In the beginning of each block, the size of vectors(k) isreset. The recursion is

s(k) =(

0s(k − 1)|P−1

1

)+ µg(k) i = 1, (4.12)

s(k + i− 1) =(

0s(k + i− 2)

)+ µ

(g(k + i− 1)

0i−1

)i > 1. (4.13)

4.2. BLOCK EXACT APA (BEAPA) 77

The filter outputsyα(k + i) can now be written as

yα(k + i) = xT (k + i− (α− 1))w(k + i− 1)= xT (k + i− (α− 1))w(k − 1) +

i+P−1∑j=1

sj(k + i− 1)xT (k + i− (α− 1))x(k + i− j)−

P−1∑j=1

sj(k − 1)xT (k + i− (α− 1))x(k − j)

= xT (k + i)w(k − 1) +i+P−1∑j=1

sj(k + i− 1)rαj (k + i)−

P−1∑j=1

sj(k − 1)rαi+j(k + i).

The correlations are defined as

rαj (k + i) ≡ xT (k + i− (α− 1))x(k + i− j),

which is merely a shorthand notation forrj−(α−1)(k + i − (α − 1)) as defined informula 4.3.

We proceed (similarly to (4.4)) by defining a modifiedbasis filter vector

z(k − 1) = w(k − 1)−P−1∑j=1

sj(k − 1)x(k − j),

then the filter output can be written as

yα(k + i) = xT (k + i− (α− 1))z(k − 1) + sT (k + i− 1)rα(k + i), (4.14)

in which both s(k + i − 1) and rα(k + i) are vectors of lengthi + P − 1. Theautocorrelation vectorrα(k + i) is needed to calculate the correction for theα’thcomponent ofy(k+ i) (which is needed in turn to calculate theα’th component of theresidual vectore(k + i)). The first term of this equation is a filtering operation witha filter vectorz(k − 1) that is fixed over a block of sizeN2, and that is independentof α, so it can be performed efficiently in the frequency domain. The second term isgrowing inside each block. A recursion for the new filter vector can also be derivedalong the lines of [59], which gives

z(k +N2 − 1) = z(k − 1) +X(k +N2 − P )s(k +N2 − 1)|P+N2−1P , (4.15)

wheres|ji is a sub–vector consisting of thei’th throughj’th element ofs. The matrixX(k) has been defined in (4.6). Here too, the matrix–vector product can be calculatedin the frequency domain with fast convolution techniques (block sizeN2).


4.2.2 Complexity reduction

The calculation of all the components of the error vector seems to render its calculationP times as complex. However some important simplifications can be introduced.Writing out the correlation vectors used in the corrections in (4.14) (an example isgiven for a more general case further on in this chapter), one notices that a recursionexists for them :

rαβ (k + i) = rα−12−(β−1)(k + i− 1) for β ≤ 2, α > 1, (4.16)

rαβ (k + i) = rα−1β−1(k + i− 1) for β ≥ 2, α > 1. (4.17)

Since memory may be comparably expensive as processing power, this recursionap-pearsnot to be any advantage, becauseN2 + P delay lines would be needed. Butwe can build on this recursion to achieve a major complexity reduction. The updaterecursion fors(k + i) (equation (4.13)) shows that a shift operation is applied to thisvector (which grows) in each step, and only the firstP elements change after this shift.In the calculation of the error vector,s is multiplied with each of theP vectorsrα.This means that part ofsT (k+ i−1)rα(k+ i), a scalar, has already been calculated instepk+ i− 1 (namelysT (k+ i− 1− 1)rα−1(k+ i− 1)), and we can calculate a cor-rections(k+ i−1) to this that consists of the accumulated updates tos(k+ i−1−1)multiplied by the relevant first part of the correlation vector :

(s(k + i− 1)

0P

)= s(k + i− 1)−

(0

s(k + i− 2)

). (4.18)

The (fixed) length ofs(k + i− 1) is P . Forα > 1, equations 4.17 and 4.18 lead to

sT (k + i− 1)rα(k + i)

=(

s(k + i− 1)0P

)Trα(k + i) +

(0

s(k + i− 2)

)Trα(k + i)

= sT (k + i− 1)rα(k + i) + sT (k + i− 2)

rα2 (k + i)rα3 (k + i)

...rα1+i+P (k + i)

= sT (k + i− 1)rα(k + i) + sT (k + i− 2)

rα−11 (k + i− 1)rα−12 (k + i− 1)

...rα−1i+P (k + i− 1)

.(4.19)

4.3. SPARSE BLOCK EXACT APA 79

The vectorrα(k + i) is formed from the firstP components ofrα(k + i). The lastterm of equation 4.19 (a scalar) has already been calculated in stepk + i − 1. So ineach step, one needs to calculate a ’large’ vector product for the correction to the firstcomponent of the error vector (with the same size as in the BEFAP–algorithm), andP − 1 small vector products (sizeP + 1 ) since the second term from 4.19 can be fedthrough a delay line. For the calculations of the corrections for the first error vectorcomponent, this gives an average of(P+N2−1)(P+N2)

N2flops per sample, whereN2 is

the block size. To this,(P + 1) flops per sample for each of theP − 1 remainingcomponents of the error vector must be added. For typical (small) values ofP , this isa lot less complex than calculating all the components straightforwardly.

In this setting, one is free to choose if therα(k + i) are taken from previous steps asdescribed in equations 4.16 and 4.17, or if they are recalculated at the time that theyare needed (by up– and downdates). The latter requires less memory (for delay lines),but more flops.

In the acoustic echo cancelling application, often scenario’s occur with multiple mi-crophone setups. Instead of merely repeating the full AEC–scheme, the updates forthe inverse correlation matrix and the updates of the correction correlation vectors canbe shared among different microphone channels.

We have now derived an algorithm that is an exact frequency domain version of theoriginal affine projection algorithm. If QR–updating is used to keep track of the cor-relation matrix, exponential weighting can be used to incorporate regularization. Butalso FTF–type algorithms can be used to this end, and than also explicit regularisationcan be used. The fact that no approximations are made, is — referring to the resultsin the previous chapter — clearly an advantage compared to FAP and BEFAP.

4.2.3 Algorithm specification

An algorithm description of BEAPA can be found inAlgorithm 11 . We let u =[x(1), x(2), ...]Tbe the input signal, andv = [d(1), d(2), ...]T the desired signal, inboth of which the order of the samples is different from the definitions ofx andd. Aright to left arrow above a vector flips the order of the components.

4.3 Sparse Block Exact APA

In this section a sparse–equations version of Block Exact APA (Sparse–Block ExactAPA) is derived where the ’sparse equations’ technique of section 3.2 is used forregularization. The complexity of the new algorithm is


Algorithm 11 Block Exact Affine Projection Algorithmfor j = 1 to N2 + Pr1j = (u|kk−L+1)Tu|k−j−1

k−j−Lendfor

for α = 2 to Pfor j = 1 to 1 + P

rαj = (u|k−(α−1)k−L−(α−1)+1

)Tu|k−j−1k−j−L

endforendfor




r1 = r1 + uk+i+1

←−−−−−−−−−−−−−u|k+i+1k+i+1−N2−P+1 − uk−L+i+1

←−−−−−−−−−−−−−−−u|k−L+i+1k−L+i+1−N2−P+1

for α = 2 to Prα =

rα + uk+i−(α−1)+1

←−−−−−−−−−−−−−u|k+i+1k+i+1−(1+P )+1

− uk−L+i−(α−1)+1

←−−−−−−−−−−−−−−−u|k−L+i+1k−L+i+1−(1+P )+1

endforA1k+i+1+1 = (s|P+i−1

1 )T r1|i+P2

ek+i+1 = dk+i+1 − yk+i+1 −A1k+i+1+1

E1 = ek+i+1

for α = 2 to PAαk+i+1+1 = Aα−1

k+i+1 + (sD)T rα|1+P2

Eα = vk+i−α+1 − y(k + i− α+ 1)−Aαk+i+1+1endfor

<update S−1, the P × P inverse covariance matrix>g = S−1E

s =

[0

s|N2+PD−21

]+

[µg

0N2−1

]endfor

z = z+<convolution of s|P+N2−1P with u|k−P+N2+1

k−P+N2+1−L+1+1>

k = k +N2

endloop

6M1 log2M1 − 7M1 − 31N1

+ 6N2 + 13P + 2PD − 4 + 10P 2

P 2D2 − PDN2

+ P 2D −D +6M2 log2M2 − 7M2 − 31

N2. (4.20)

A typical example (see section 4.2) isP = 3, N1 = N2 = 128, N = 800, D =3 or 5 leading to1690 or 1718 flops per sample.


4.3.1 Derivation

Since the Sparse BEAPA algorithm is derived in a manner very similar to BEAPA, wewill only briefly state the derivation. The update equations are :

y(k + i) = XTP (k + i)w(k + i− 1),

e(k + i) = d(k + i)− y(k + i),

g(k + i) = (XTP (k + i)XP (k + i))−1e(k + i),

w(k + i− 1) = w(k − 1) +i∑

j=1

µXP (k + i− j)g(k + i− j). (4.21)

We can rewrite

w(k+i−1) = w(k−1)+i+PD−1∑j=1

s(k+i−1)jx(k+i−j)−PD−1∑j=1

sj(k−1)x(k−j)

A recursion fors(k) can be written :

s(k) =(

0s(k − 1)|PD−1

1

)+ µg′(k), (4.22)

s(k + i− 1) =(

0s(k + i− 2)

)+ µ

(g′(k + i− 1)

0i−1

), (4.23)

with

g′(k + i− 1) =

g1(k + i− 1)0D−1

g2(k + i− 1)0D−1

...gP (k + i− 1)

0D−1

.


The filter outputsyα(k + i) can now be written as

yα(k + i) = xT (k + i− (α− 1)D)w(k + i− 1)= xT (k + i− (α− 1)D)w(k − 1) +

i+PD−1∑j=1

sj(k + i− 1)xT (k + i− (α− 1)D)x(k + i− j)−

PD−1∑j=1

sj(k − 1)xT (k + i− (α− 1)D)x(k − j)

= xT (k + i)w(k − 1) +i+PD−1∑j=1

sj(k + i− 1)rαj (k + i)−

PD−1∑j=1

sj(k − 1)rαi+j(k + i).

The correlations are defined as

rαj (k + i) ≡ xT (k + i− (α− 1)D)x(k + i− j).

The modified filter vector is

z(k − 1) = w(k − 1)−PD−1∑j=1

sj(k − 1)x(k − j).

Then the filter output can be written as

yα(k + i) = xT (k + i− (α− 1)D)z(k − 1) + sT (k + i− 1)rα(k + i), (4.24)

in which boths(k+ i−1) andrα(k+ i) are vectors of lengthi+PD−1. A recursionfor the filter vector is

z(k +N2 − 1) = z(k − 1) +X(k +N2 − PD)s(k +N2 − 1)|PD+N2−1PD . (4.25)

The matrixX(k) has been defined in 4.6. Note that it contains all input vectors,not only one out ofD. Again, the matrix–vector product can be calculated in thefrequency domain with fast convolution techniques (block sizeN2).

4.3.2 Complexity reduction

Just like in the BEAPA–algorithm, a recursion exists for the correlation vectors usedin the corrections in 4.24. Take e.g.D = 2 :

r1(k + i− 2) = (4.26)


x(k + i − 2)x(k + i − 3) + x(k + i − 3)x(k + i − 4) + . . . + x(k + i − L − 1)x(k + i − L − 2)x(k + i − 2)x(k + i − 4) + x(k + i − 3)x(k + i − 5) + . . . + x(k + i − L − 1)x(k + i − L − 3)

.

.

.x(k + i − 2)x(k − PD + 1) + x(k + i − 3)x(k − PD) + . . . + x(k + i − L − 1)x(k − L − PD + 2)

r2(k + i) = (4.27)

x(k + i −D)x(k + i − 1) + x(k + i −D − 1)x(k + i − 2) + . . . + x(k + i −D − L + 1)x(k + i − L)x(k + i −D)x(k + i − 2) + x(k + i −D − 1)x(k + i − 3) + . . . + x(k + i −D − L + 1)x(k + i − L − 1)x(k + i −D)x(k + i − 3) + x(k + i −D − 1)x(k + i − 4) + . . . + x(k + i −D − L + 1)x(k + i − L − 2)

.

.

.x(k + i −D)x(k − PD + 1)) + . . . + x(k + i −D − L + 1)x(k − L − PD + 2)

For this example, one can see that the third to the last component of 4.27, which isthe autocorrelation vector needed to calculate thesecondcomponent of the residualvector at timek + i, have already been calculated2 time steps before as the first rowsof 4.26 for the (smaller) autocorrelation vector needed to calculate thefirst componentof the residual vector at timek + i − 2. Generalizing the above example, we can saythat all the components starting from component with indexD + 1 from rα(k + i)are already available inrα−1(k + i − D). To interpret this, one has to bear in mindthat the vectorsrα(k + i) grow in length withi. Writing out alsor1(k + i − 1) andr1(k + i) would show that the components ofrα(k + i) with indices from1 toD arealso already calculated in previous steps. This can be summarized as

rαβ (k + i) = rα−1(D+1)−(β−1)(k + i−D) for β ≤ D + 1, α > 1, (4.28)

rαβ (k + i) = rα−1β−D(k + i−D) for β ≥ D + 1, α > 1. (4.29)

Instead of usingN2 + PD delay lines of lengthD, we can again reduce this toPscalar delay lines of lengthD. For s(k + i) only the firstPD elements change afterthe shift operation in each time step :

(s(k + i− 1)

0PD

)= s(k + i− 1)−

(0D

s(k + i−D − 1)

). (4.30)

The fixed length ofs(k + i− 1) isD + PD − 1. Forα > 1, equations 4.29 and 4.30lead to


sT (k + i− 1)rα(k + i)

=(

s(k + i− 1)0PD

)Trα(k + i) +

(0D

s(k + i−D − 1)

)Trα(k + i)

= sT

(k + i− 1)rα(k + i) + sT (k + i−D − 1)

rαD+1(k + i)rαD+2(k + i)

...rαD+i+PD(k + i)

= s

T(k + i− 1)rα(k + i) + (4.31)

sT (k + i−D − 1)

rα−11 (k + i−D)rα−12 (k + i−D)

...rα−1i+PD(k + i−D)

.

The vectorrα(k + i) is formed from the firstD + PD − 1 components ofrα(k + i).The last term of equation 4.31 (a scalar) has already been calculated in stepk+ i−D.Hence in each step, one needs to calculate a ’large’ vector product for the correc-tion to the first component of the error vector (with the same size as in the BEFAP–algorithm), andP − 1 small vector products (sizePD + D). For the calculations ofthe corrections for the first error vector component, this gives an average of

(PD +N2 − 1)(PD +N2)N2

flops per sample, whereN2 is the block size in the BEFAP–algorithm. To this,(PD +D) flops per sample for each of theP − 1 remaining components of the errorvector must be added.

4.3.3 Algorithm specification

A complete specification is given inAlgorithm 12 . Again u = [x(1), x(2), ...]T isthe input signal, andv = [d(1), d(2), ...]T the desired signal. These definitions differfrom the definitions ofx andd. A right to left arrow above a vector flips the order ofthe components.

4.4. CONCLUSION 85

Algorithm 12 Sparse Block Exact APAfor j = 1 to N2 + PDr1j = (u|kk−L+1)Tu|k−j−1

k−j−Lendfor

for α = 2 to Pfor j = 1 to D + PD

rαj = (u|k−(α−1)Dk−L−(α−1)D+1

)Tu|k−j−1k−j−L

endforendfor




r1 = r1 + uk+i+1

←−−−−−−−−−−−−−−u|k+i+1k+i+1−N2−PD+1 − uk−L+i+1

←−−−−−−−−−−−−−−−−u|k−L+i+1k−L+i+1−N2−PD+1

for α = 2 to P

rα = rα + uk+i−(α−1)D+1

←−−−−−−−−−−−−−−u|k+i+1k+i+1−(D+PD)+1

−

uk−L+i−(α−1)D+1

←−−−−−−−−−−−−−−−−−u|k−L+i+1k−L+i+1−(D+PD)+1

endforA1k+i+1+D = (s|PD+i−1

1 )T r1|i+PD2

ek+i+1 = dk+i+1 − yk+i+1 −A1k+i+1+D

E1 = ek+i+1

for α = 2 to PAαk+i+1+D = Aα−1

k+i+1 + (sD)T rα|D+PD2

Eα = vk+i−αD+1 − y(k + i− αD + 1)−Aαk+i+1+Dendfor

<update S−1, the P × P inverse covariance matrix><spread the result of S−1E into g>for m = D downto 2

sm =

[0

sm−1|D+PD−21

]+

[µg

0D−1

]endfor

s1 =

[µg

0D−1

]s =

[0

s|N2+PD−21

]+

[µg

0N2−1

]endfor

z = z+<convolution of s|PD+N2−1PD with u|k−PD+N2+1

k−PD+N2+1−L+1+1>

k = k +N2

endloop

4.4 Conclusion

We have derived a block exact frequency domain version of the affine projection al-gorithm, named Block Exact APA (BEAPA). This algorithm has a complexity that is


comparable with the complexity of a block exact frequency domain version of fastaffine projection (namely BEFAP), while it does not use the approximations that arepresent in (BE)FAP. It has the advantage that the convergence characteristics of theoriginal affine projection algorithm are maintained when regularization is applied,while this is not the case when FAP–based fast versions of APA are used.

This algorithm has also been extended to allow for the ’sparse equations’ techniquefor regularization to be used. This is a technique that regularizes the affine projectionalgorithm if it is used with signals that have a large autocorrelation only for a small lag(e.g. speech). It can be used as a stand alone technique for regularization, if a ’voiceactivity detection device’ is present that can prevent the inverse correlation matrix tobecome infinitely large when no far end signal is present, as it is the case in each echocanceller.

Chapter 5

QRD–RLS based ANC

While the previous chapters were focussed on reference–based noise reduction (AEC)(which means a reference signal for the disturbances was available, namely the loud-speaker signal), in the next chapters we will concentrate on reference–less noise re-duction (ANC).

In this chapter we will derive an MMSE–optimal unconstrained filtering techniquewith a complexity that is an order of magnitude smaller than the complexity of existingnoise cancellation algorithms based upon this technique. Performance will be kept atthe same level though.

The new algorithm is based upon a QRD–RLS adaptive filter. While conventionaladaptive filtering algorithms have a ’desired signal’–input, our algorithm does not re-quire this desired signal (which is unknown for noise reduction applications). In thenext chapter we will — by thoroughly modifying the basic equations and by employ-ing the fast QRD–LSL algorithm — reduce the complexity even by another order ofmagnitude.

This chapter is organized as follows. In sections 5.2 and 5.3 we will review uncon-strained optimal filtering based ANC. Then we introduce our novel approach basedupon recursive QRD–based optimal filtering in section 5.3. In section 5.4 we intro-duce an algorithm that provides a trade–off parameter with which one can tune thesystem so that more noise reduction is obtained in exchange for some signal distor-tion. Finally, complexity figures and simulation results are given in sections 5.5 and5.6.

87

88 CHAPTER 5. QRD–RLS BASED ANC

5.1 Introduction

In teleconferencing, hands–free telephony or voice controlled systems, acoustic noisecancellation techniques are used to reduce the effect of unwanted disturbances (e.g.car noise, computer noise, background speakers).Single microphone approachestypi-cally exploit the differences in the spectral content of the noise signal(s) and the speechsignal(s) to enhance the input signal. Since a speech signal is highly non–stationary,this may result in a rapidly changing filter being applied to the noisy speech signal.A residual noise signal with continuously changing characteristics or even ’musicalnoise’ will typically appear at the output. A classic example of this class of algo-rithms isspectral subtraction[16].

Multi–microphone techniquescan additionally take into account spatial properties ofthe noise and speech sources. In general an adaptive signal processing technique isrequired since the room characteristics indeed change even with the slightest changein the geometry of the environment. In literature various adaptive multimicrophoneANC techniques have been described, all of them having their advantages and disad-vantages.Griffiths–Jim beamforming[60] is a constrained optimal filteringmethodthat aims at adaptively steering a null in the direction of the noise source(s) (or ’jam-mer(s)’), while keeping a distortionless response in the direction of the speech source.

Unconstrained optimal filtering[12][13] is an alternative approach that also takes intoaccount both spectral and spatial information. Unlike Griffiths–Jim beamforming itdoes not rely on a priori information and hence posesses improved robustness [14].

A speech–noise detection algorithm is needed and crucial for proper operation, butwill not be further investigated here, as it will be assumed that a perfect speech detec-tion signal is available. In chapter 1 references to methods for speech/noise detectionare given.

We note that when the algorithm classifies a ’speech’ period as a ’noise only’ pe-riod (for longer time periods), this will result in signal distortion, since signal then’leaks’ into the noise correlation matrix. This is probably a worse situation than whena noise–period is classified as speech. In that case only the estimation of the noisecharacteristics is not done at that time, and the statistics of the speech signal (spatialcharacteristics) are ’forgotten’, but this is not really a problem when the misclassifica-tion only occurs during a short period.

In an unconstrained optimal filtering approach, the microphone signals are fed to anadaptive filter. The optimal filter attempts to use all available information (also re-flections coming from directions other than the speech source direction) in order tooptimally reconstruct the signal of interest. This effectively means that the spatial pat-tern of the filter will resemble a beamforming pattern if the reverberation in the roomis low, but that the filter will perform better than conventional beamformers underhigher reverberation conditions [12].

5.2. UNCONSTRAINED OPTIMAL FILTERING BASED ANC 89

In [12][13][15] the unconstrained optimal filtering problem was solved by means of aGSVD (Generalised Singular Value Decomposition)–approach, while in this chapterwe will describe a QRD–based optimal filtering algorithm.

While the performance remains roughly the same, the QR–decomposition based al-gorithm is significantly less complex than the GSVD–based algorithm. The GSVD–approach has a complexity ofO(M3N3) whereM is the number of microphones andN is the number of filter taps per microphone channel. A reduced complexity ap-proximation is possible for the GSVD–approach (based on GSVD–tracking), leadingto O(27.5M2N2) [13] . The QRD–based approach that we will derive in this paperlowers the complexity toO(3.5M2N2) while the performance is equal to that of theinitial GSVD–approach, and no approximation whatsoever is employed.

Noise

Noise

Noise

Noise

w1

w2

w3

w4

h

h

x

x

x

x

1

2

3

4

h1

2

3

4Speech component hereof

Original speech signal s

is desired output signal d (unknown)

h

d+

^

Figure 5.1: Adaptive optimal filtering in the acoustic noise cancellation context.

5.2 Unconstrained optimal filtering based ANC

A typical noise cancellation setup is shown schematically inFigure 5.1 for an arraywith 4 microphones. A speaker’s voice is picked up by the microphone array, to-gether with noise stemming from sources for which no reference signal is available.Examples are computer fans, air conditioning, other people talking to each other inthe background. The absence of a reference signal is the main difference between theANC techniques we will discuss in this part of this text, and the AEC techniques inchapters 3 and 4.

The speech signals(k) in figure 5.1 is obviously unknown. If we would aim at design-ing a filter that optimally reconstructss(k) as a desired signal, then this filter wouldnot only have to cancel the noise in the microphone signals, but it would also haveto model the inverse of the acoustic impulse response from the position of the speechsource to the position of the microphones. We want to avoid this, since dereverbera-


tion is a different problem that requires different techniques.Hence, we will not usethe speech signal itself as a desired signal, but rather the speech component in one (oreach) of the microphone signals, which obviously is unknown too.

The speech component in thei’th microphone at timek is

di(k) = hi(k)⊗

s(k) i = 1 . . .M,

whereM is the number of microphones,s(k) is the speech signal andhi(k) representsthe room impulse response path from the speech source to microphonei, and

⊗is the

convolution symbol. Thei’th microphone signal is

xi(k) = di(k) + vi(k) i = 1 . . .M,

wherevi(k) is the noise component (sum of the contributions of all noise sourcesat microphonei). We define the filter input vector as in (2.4), which is repeated forconvenience here

x(k) =

x1(k)x2(k)

...xM (k)

x1(k) =

x1(k)

x1(k − 1)...

x1(k −N + 1)

.

The noise vectorv(k) and the speech component signal vectord(k) are defined in asimilar way, withx(k) = d(k) + v(k). The following assumptions are made

• The noise signal is uncorrelated with the speech signal. This results in

ε{x(k)xT (k)} = ε{v(k)vT (k)}+ ε{cross terms}︸︷︷︸=0

+ε{d(k)dT (k)}

= ε{v(k)vT (k)}+ ε{d(k)dT (k)}ε{d(k)dT (k)} = ε{x(k)xT (k)} − ε{v(k)vT (k)}.

Hereε{·} is the expectation operator.

• The noise signal is stationary as compared to the speech signal (by which wemean that its statistics change more slowly). This assumption allows us to esti-mateε{v(k)vT (k)} during periods in which only noise is present, i.e.ε{v(k)vT (k)} ∼=ε{v(k −∆)vT (k −∆)}with x(k) = v(k) + 0 during noise–only periods.

The unconstrained optimal filtering (Wiener filtering) problem is then given as

minWwf(k)

ε{∥∥xT (k)Wwf(k)− dT (k)

∥∥2

2

}, (5.1)

5.3. QRD–BASED ALGORITHM 91

whereε{•} is the expected value operator. Note that for the time being we compute theoptimal filter to estimate the speech inall (delayed) microphone signals (cfr. definitionof d(k)). We can now write the Wiener solution for the optimal filtering problem withx(k) the filter input andd(k) the (unknown) desired filter output is then given as [30]

Wwf(k) = (ε{x(k)xT (k)})−1ε{x(k)dT (k)} (5.2)

= (ε{x(k)xT (k)})−1ε{(d(k) + v(k))dT (k)}= (ε{x(k)xT (k)})−1ε{d(k)dT (k)}= (ε{x(k)xT (k)})−1(ε{x(k)xT (k)} − ε{v(k)vT (k)}).

If all statistical quantities in the above formula were available,Wwf(k) could straight-forwardly be computed withO(M3N3) complexity. Each column ofWwf (k) pro-vides the optimalMN–taps filter for optimally estimating the corresponding elementof d(k) from x(k), i.e.

dT (k) = xT (k)Wwf(k). (5.3)

In [13] a GSVD–approach to this optimal filtering is described. The GSVD–approachis based upon the joint diagonalisation

ε{x(k)xT (k)} = E(k)diag{σ2i (k)}ET (k) (5.4)

ε{v(k)vT (k)} = E(k)diag{η2i (k)}ET (k),

which is then actually calculated by means of a GSVD–decomposition of thedatamatrices (see also section 5.3). From 5.4 we get

Wwf(k) = E−T (k)diag{σ2i (k)− η2

i (k)σ2i (k)

}ET (k).

In the GSVD–algorithm, only afterwards one column ofWwf is picked to serve as afilter vector. A GSVD–approach would have aO(M3N3) complexity, but in prac-tice the GSVD solution is tracked or updated (this involves an approximation), whichmeans that the filter can be tracked inO(27.5M2N2) flops per sample.

5.3 QRD–based algorithm

In this chapter we will present an alternative QRD–updating based approach that leadsto comparable performance (or even improved performance since it does not need anSVD–tracking approximation to reduce complexity), but at a significantly lower cost.

In the QRD–approach we can select one single entry ofdT (k) that we want to es-timate,beforewe compute the optimal filter. The right hand side part of the corre-sponding LS–estimation problem will then be a vector instead of a matrix. This is themain reason for the dramatical complexity reduction, but besides this of course QRD–updating in itself is cheaper than GSVD–updating. In order to maintain the parallel


between our approach and the GSVD–procedure, we will still consider the fulld(k)–vector throughout the derivation, keeping in mind that for a practical implementation,one would select only one element of it. As we want to track any changes in the acous-tic environment, we will introduce a weighting in order to reduce the impact of thecontributions from the past. Letλs denote the forgetting factor for the speech+noisedata, which can be different fromλn, the forgetting factor for the noise–only data. Aspeech/noise detection device will be necessary to operate the algorithm. Since thenoise is assumed to be stationary as compared to the speech contribution, one willoften make0� λs < λn < 1. Our scheme will be based on storing and updating anupper triangular matrixR(k), such thatRT (k)R(k) = XT (k)X(k), where we wantXT (k)X(k) to be an estimate forε{x(k)xT (k)}. This is realized by

XT (k + 1)X(k + 1) = (5.5)

λ2sX

T (k)X(k) + (1− λ2s)x(k + 1)xT (k + 1).

Note that this is aslightly differentweighting scheme from the one that is explained inequation (2.15), the difference being merely an overall rescaling with(1 − λ2

s). Thenoise correlation matrix estimate is defined as

V T (k + 1)V (k + 1) = (5.6)

λ2nV

T (k)V (k) + (1− λ2n)v(k + 1)vT (k + 1).

The optimal filtering solution is then obtained as

Wqr(k) = (RT (k)R(k))−1(RT (k)R(k)− V T (k)V (k)︸︷︷︸≡P (k)

)

Wqr(k) = I −R−1(k)R−T (k)P (k),

whereR(k) is the Cholesky factor ofX(k) and I is the identity matrix. Due tothe second assumption in section 5.2 (namely that the noise is stationary),P (k) canbe kept fixed duringspeech + noiseperiods and updated (based on formula (5.6))duringnoise–onlyperiods.RT (k)R(k) is fixed duringnoise onlyperiods and updated(based on formula (5.5)) duringspeech+noiseperiods. Note that the computedWqr(k)corresponds to the least squares estimation problem

minWqr(k)

‖D(k)−X(k)Wqr(k)‖22,

where howeverD(k) = X(k)− V (k) is unknown.

Hence,Wqr(k) is a matrix of which the columns are filters that reduce the noise com-ponents in the microphone signals in an optimal way. It is clear thatWN

qr (k) =I −Wqr(k) then provides a set of filters that optimally estimate the noise componentsin the microphone signals with

WNqr (k) = I −Wqr(k) = R−1(k)R−T (k)P (k)︸︷︷︸

≡B(k)

.


In the procedure described here, we keep track of bothR(k) andB(k) so that at anytimeWN

qr (k) can be computed by backsubstitution in

R(k)WNqr (k) = B(k). (5.7)

The only storage required is for the matrixR(k) ∈ <MN×MN and forB(k) ∈<MN×MN . In fact, only one column ofB(k) has to be stored and updated (cfr.supra), thus providing a signal or noise estimate for the corresponding microphonesignal. There are two modes in which the different variablesR(k) andB(k) have tobe updated, namelyspeech+noise–mode, andnoise only–mode.

5.3.1 Speech+noise – mode

Whenever a signal segment is identified as a speech+noise–segment,P (k) is not up-dated (second assumption), butR(k) needs to be updated. The update formula forR(k) is (compare to (2.16))(

0R(k + 1)

)= Q

T(k + 1)

( √1− λ2

sxT (k)

λsR(k)

),

whereR(k+1) is again upper triangular1. As explained in chapter 2, this update gives

both the new upper triangular matrixR(k + 1) and the orthogonal matrixQT

(k + 1)containing the necessary rotations to obtain the update. UpdatingR(k) also implies achange in the storedB(k) = R−T (k)P (k). In order to derive this update, we need anexpression for the update ofR−1(k). It is well known2 that the same rotations used toupdatR(k) can also be used to updateR−T (k) :

QT

(k + 1)(

01λsR−T (k)

)=(

∗R−T (k + 1)

),

with ∗ a don’t care entry. Hence we have(∗

B(k + 1)

)=

(∗

R−T (k + 1)

)P (k + 1)

=(

∗R−T (k + 1)

)P (k)

= QT

(k + 1)(

01λsR−T (k)

)P (k)

= QT

(k + 1)(

01λsB(k)

).

1Q(k) = Q(k)Q(k − 1)...Q(0) does not need to be stored.

2This is easily shown starting from(

0 R−1(k))( xT (k)

R(k)

)= I


Note thatB(k) is weighted with 1λs

which is different from the standard exponen-tial weighting in the right hand side of QRD–based adaptive filtering. The completeupdate can be written in one single matrix update equation :

(0 rT (k + 1)

R(k + 1) B(k + 1)

)= (5.8)

QT

(k + 1)( √

1− λ2sxT (k + 1) 0

λsR(k) 1λsB(k)

).

The least squares solutionWNqr (k + 1) can now be computed by backsubstitution

(equation 5.7), but we will show later on that (using residual extraction) an estimate ofthe noise can be calculated directly fromr(k+1). A signal flow graph of this updatingprocedure is given inFigure 5.2.

1/λ

λ

0

0

R24

R11 R12 R13 R14

R33 R34

R44

R22

0

0

R23

0 00

x (k+1) x (k+1)

x (k)x (k)

(delay)

(delay)

memory cell

memory cell

x

x

1

1

2

2

r (k+1) r (k+1) r (k+1)1 2 3

B(k)

R(k)

Figure 5.2: Updating scheme for signal+noise mode. On the top left new input vectors enter(2 channels, and 2 taps per channel). Rotations are calculated and fed to the right hand sidewhich is updated with 0’s as input.


5.3.2 Noise only–mode.

In the noise–only case, one has to update

B(k) = R−T (k)P (k) = R−T (k)V T (k)V (k),

whileR(k) is obviously kept fixed. From equation (5.6) and the fact that in noise–onlymodeR(k + 1) = R(k), we find that

B(k + 1) = λ2nB(k) + (R−T (k + 1)

√1− λ2

nv(k + 1))√

1− λ2nvT(k + 1)).

GivenR(k + 1), we can compute(R−T (k + 1)√

1− λ2nv(k + 1)) by a backsubsti-

tution. By using an intermediate vectora(k + 1) :

RT (k + 1)a(k + 1) =√

1− λ2nv(k + 1).

A simple multiplicationa(k + 1)√

1− λ2nvT (k + 1) now gives the update for all

columns ofB(k + 1), i.e. B(k + 1) = λ2nB(k) + a(k + 1)

√1− λ2

nvT (k + 1).As already mentioned,R(k) is not updated in this mode, so in figure 5.2 only theframed black boxes (memory cells in the right hand part) are substituted with thecorresponding elements ofB(k + 1).

Note again that, while in the GSVD–based method, all columns ofWgsvd(k + 1) arecalculated, and afterwards one of them is selected (arbitrarily) to provide one specificspeech signal estimate, the QRD–based method allows one to choose one column(signal) on beforehand, and do all computations for only that one column.

5.3.3 Residual extraction

From (2.24) and (5.8) it can be shown that ifx(k + 1) belongs to asignal+noise–period, the estimate for the noise componentsv(k+ 1) in the microphone signals canbe written as

vT (k + 1) = xT (k + 1)WNqr (k + 1)

−(0−√

1− λ2sxT (k + 1)WN

qr (k + 1))√1− λ2

s

= −∏MNi=1 cos θi(k + 1)rT (k + 1)√

1− λ2s

,

which means that an estimate of the noise component is obtained as a least squaresresidual with a0 right hand side input. This is exactly the type of right hand side inputapplied in speech+noise mode updates (section 5.3.1). To obtain the signal estimatesd(k+1), the noise estimates then have to be subtracted from the reference microphone


signal

dT (k + 1) = xT (k + 1)(I −WNqr (k + 1))

= xT (k + 1)− vT (k + 1)

= xT (k + 1) +∏MNi=1 cos θi(k + 1)rT (k + 1)√

1− λ2s

. (5.9)

λ 1/λ

0

0

R24

R11 R12 R13 R14

R33 R34

R44

R22

0

0

R23

0 0 10

x (k+1) x (k+1)

x (k)x (k)

(delay) (delay)

memory cell memory cell

x

0

0

0

0

x

noise estimates

1

1

2

2

Figure 5.3: Signal flow graph for residual extraction.

In this setting, the system does not generate any output duringnoise–onlymode, sincein the absence of an input vector for the left part of the signal flow graph 5.2 (see sec-tion 5.3.1), no rotation parameters are generated, so no residual extraction is possible.In several applications this would be required though. It is perceived as being disturb-ing when the output signal is exactly zero, often some ’comfort noise’ is preferred.Also if the voice activity detector can not be trusted, and if it does not detect a speechsignal during a speech+noise segment, the output of the algorithm would remain zero.If we want to generate an output signal during segments that the voice activity detectoridentifies as noise–only segments, we can execute a residual extraction procedure asin the speech+noise–mode producing a priori error signals, be itwithoutupdating the

5.4. TRADING OFF NOISE REDUCTION VS. SIGNAL DISTORTION 97

R(k) andB(k) (’frozen mode’). (0 rT (k + 1)∗ ∗

)=

QT

(k + 1)( √

1− λ2sv

T (k + 1) 0λsR(k) 1

λsB(k)

).

This will of course increase the complexity in noise–only mode, but since the updatesneed not to be calculated completely (only the rotation parameters and the outputs),the extra complexity will be about half the complexity in speech+noise–mode. Theend result will be that the complexity in noise–only mode will become about equalto the complexity in speech+noise mode, and that the maximum complexity of thealgorithm does not rise. For real time processing, this maximum complexity is themost important.

5.3.4 Initialization

The upper triangular matrixR(0) may be initialized with a small numberη on itsdiagonal. This is required for the QRD–updating algorithm to start. This initializationcorresponds to an initial estimation of the speech+noise covariance matrix equal toη2I(white noise with varianceη2). Due to the exponential weighting in the algorithm, theinfluence of the initialization will be negligable after a number of samples.

5.3.5 Algorithm description

In this algorithm description, we choose to estimate the speech signald1(k) in the firstmicrophone signal. This means that also the right hand side consists of only the firstcolumnb(k) of B(k). In Algorithm 13 , an output signal is also generated duringnoise–only periods as described above.

5.4 Trading off noise reduction vs. signal distortion

In many applications some distortion in the speech signal can be allowed, and hence itis possible to obtain ’more than optimal’ noise reduction in exchange for some signaldistortion. We will introduce a parameter that can be used to tune this trade–off.This parameter will take the form of a regularization parameter in the Wiener filterequation.


Algorithm 13 QRD–RLS based ANCR = 0.0001ILoop :

if speech+noise

x′(k + 1) =√

1− λ2sx(k + 1)(

0 r(k + 1)R(k + 1) b(k + 1)

)=

QT

(k + 1)(

x′T (k + 1) 0λsR(k) 1

λsb(k)

)output= x1(k + 1) + r(k+1)

∏MNi=1 cos θi(k+1)√

1−λ2s

if noise--only

v(k + 1) = x(k + 1)Calculate u(k + 1) from

RT (k + 1)u(k + 1) = v(k + 1)b(k + 1) = λ2

nb(k) + x1(k + 1)(1− λ2n)u(k + 1)

x′(k + 1) =√

1− λ2nx(k + 1)(

0 r(k + 1)∗ ∗

)=

QT

(k + 1)(

x′T (k + 1) 0λsR(k) 1

λsb(k)

)d(k + 1)= x1(k + 1) + r(k+1)

∏MNi=1 cos θi(k+1)√

1−λ2s

5.4. TRADING OFF NOISE REDUCTION VS. SIGNAL DISTORTION 99

5.4.1 Regularization

In practice, an additional design parameter is often introduced to the unconstrainedoptimal filtering approach to obtain more noise reduction than achieved by the stan-dard unconstrained optimal filtering scheme. The result will be an increase in signaldistortion, but for a lot of applications this is not necessarily harmful. In [22], an al-ternative to the MMSE–criterium is derived. We use a similar, but slightly differentcriterium :

minWqr(k)

(∥∥∥ε{xT (k)Wqr(k)− d(k)}

∥∥∥2

F+ µ

∥∥∥ε{vT (k)Wqr(k)}∥∥∥2

F). (5.10)

The first term in the minimization criterium accounts for the signal distortion, thesecond one for the noise reduction. The parameterµ2 can be used to trade off noisereduction versus signal distortion. This leads to

Wqr(k) = (ε{x(k)xT (k)}+µ2ε{v(k)vT (k)})−1(ε{x(k)xT (k)}−ε{v(k)vT (k)}).

The tradeoff parameter translates into aregularizationterm in the Wiener filter equa-tion. In a deterministic setting, this leads to

Wqr(k) = (XT (k)X(k) + µ2V T (k)V (k))−1(XT (k)X(k)− V T (k)V (k)). (5.11)

5.4.2 Speech+noise mode

We return to the QRD–framework, by defining

RT (k)R(k) = XT (k)X(k) + µ2V T (k)V (k).

We will now track the Cholesky factorR(k) of XT (k)X(k) +µ2V T (k)V (k) insteadof the Cholesky factorR(k) of XT (k)X(k). Noting that

ε

([X(k) µV (k)]T

[X(k)µV (k)

])= XT (k)X(k) + µ2V T (k)V (k),

it is obvious that this can be done by applying two updates instead of one to the lefthand side of Figure 5.3 in each time step. First, an update is done with the microphoneinput vector, as explained in section 5.3.1, and then a second update is done withµtimes a noise input vector that we have stored in anoise buffer. This noise bufferconsists of successive input vectors from previous noise–only periods.

In section 5.3.2,RT (k) is needed to perform a backsubstitution step. Since in thiscase we only have access toRT (k), we have to rewrite equation (5.11) somewhat :


(RT (k)R(k) + µ2V T (k)V (k))︸︷︷︸RT (k)R(k)

Wqr(k) = RT (k)R(k)− V T (k)V (k)

= (RT (k)R(k) + µ2V T (k)V (k))−V T (k)V (k)− µ2V T (k)V (k)

= RT (k)R(k)− (1 + µ2)V T (k)V (k),

Wqr(k) = I − (1 + µ2)R−1 ˜(k)R−T

(k)(V T (k)V (k))︸︷︷︸WN

qr

.

Written in the same form as (5.7), we obtain

R(k)WNqr (k) = B(k). (5.12)

So since we storeR(k) instead ofR(k), we now have to update

B(k) = R(k)−T (1 + µ2)(V (k)TV (k)).

The full procedure in speech+noise mode is as follows : first weightR(k) with λs,and then update it with

√1− λ2

sx(k + 1) in order to obtainR′(k).

The rotation parameters that are generated by this update are used together with zerosapplied to the top of the signal flow graph to updateB(k) to B′(k) :

B′(k) = R′(k)−T (1 + µ2)(V (k)TV (k)).

This corresponds to the original scheme of section 5.3.1. We let this step generate aresidual, and we use it as an output of the noise filter.

ThenR′(k) is updated toR(k + 1) by applyingµ√

1− λ2sv(k + 1) to the left hand

part of the signal flow graph, and the rotation parameters are again used to updateB′(k) to B′′(k) :

B′′(k) = R−T (k + 1)(1 + µ2)(V T (k)V (k)).

The residual signal generated in this step should be discarded.

It is possible to update the factorV T (k)V (k) during noise–only mode. In that caseB(k + 1) = B′′(k). Another option consists in performing also these updates withnoise vectors from the noise–buffer during speech+noise mode. The update fromB′′(k) to B(k + 1) is performed as

B(k + 1) = λ2sB′′(k) + (R−T (k + 1)(1− λ2

s)(1 + µ2)v(k + 1))vT (k + 1)).

Wherev(k + 1) is taken from the noise buffer. This can again be calculated using abacksubstitution followed by a multiplication as explained in section 5.3.2.

5.5. COMPLEXITY 101

5.4.3 Noise–only mode

The factorV T (k)V (k) in the right hand side of equation (5.12) can also be updatedduring noise–only periods. In that case, the algorithm proceeds exactly as in section5.3.2 during noise–only mode. During noise–only mode, also the input vectors mustbe stored into the noise buffer3.

Algorithm 14 gives a complete specification.

5.5 Complexity

In noise-onlymode, the complexity for the unregularized algorithm (section 5.3) is

(MN)2 + 3MN +M

flops per sample if no output signal is generated during noise periods, or

3(MN)2 + 16MN + 2

if an output signalis generated during noise–only periods. Inspeech+noisemode, thenumber of flops per sample is

3.5(MN)2 + 15.5MN +M + 2.

In these calculations, one flop is one additionor one multiplication. These figures ap-ply when only one filter output is calculated. This can be compared to the complexityof a recursive version of the GSVD–based optimal filtering technique [12] [13], whichis

O(27.5(MN)2)

flops per sample. For a typical setting ofN = 20 andM = 5, we would obtain36557flops per sample for the QRD–based method as compared to275000 flops per samplefor the GSVD–based method, which amounts to an 8–fold complexity reduction4.

For the regularized algorithm of section 5.4, the complexity will be doubled duringspeech+noise mode (O(7(MN)2) compared to the unregularized optimal filteringscheme of section 5.3. The complexity during noise–only mode remains the same.

The algorithms have been implemented in real time on a Linux PC (PIII, 1Ghz). Forjust–real time performance, this leads to a maximum of 3 channels with 10 filter tapsper channel for the GSVD–based algorithm. The unregularized QRD–based algorithm

3If memory would be too expensive, an alternative is to use white noise instead of buffered noise vectors.This will probably lead to more signal distortion, but experiments show that it is still a valid alternative.

4If complexity would be prohibitive for some applications, the QRD–RLS based algorithm can be usedto generate anoise estimatewith relatively few filter taps. This estimate can then be fed to a second stage,similarly to [11].


Algorithm 14 QRD–based ANC with trade–off parameter.d(k + 1) is the resulting speech signal

estimate.R = 0.0001ILoop :if speech+noise

x′(k + 1) =√

1− λ2sx(k + 1)(

0 r(k + 1)

R′(k) b′(k)

)= Q

T(k + 1)

(x′T (k + 1) 0

λsR(k) 1λs

b(k)

)d(k + 1) = x1(k + 1) +

r(k+1)∏MNi=1 cos θi(k+1)√

1−λ2s

v(k + 1) = next noise--vector

from noise--buffer

v′(k + 1) = µ√

1− λ2sv(k + 1)(

0 r(k + 1)

R(k + 1) b(k + 1)

)= Q

T(k + 1)

(v′T (k + 1) 0

R′(k) b′(k)

)RT (k + 1)u(k + 1) = v(k + 1),

backsubstitution gives u(k + 1)(if no noise updates during noise--only)

b(k + 1) = λ2sb(k) + v1(k + 1)(1− λ2

s)(1 + µ2)u(k + 1)

(if no noise updates during noise--only)

if noise--only

Push input vector v(k + 1) = x(k + 1)

in noise--buffer

RT (k + 1)u(k + 1) = v(k + 1),

backsubstitution gives u(k + 1)(if noise updates during noise--only)

b(k + 1) = λ2nb(k) + x1(k + 1)(1 + µ2)(1− λ2

n)u(k + 1)

(if noise updates during noise--only)

x′(k + 1) =√

1− λ2sx(k + 1)(

0 r(k + 1)∗ ∗

)= Q

T(k + 1)

(x′T (k + 1) 0λsR(k) 1

λsb(k)

)d(k + 1) = x1(k + 1) +

r(k+1)∏MNi=1 cos θi(k+1)√

1−λ2s

5.6. SIMULATION RESULTS 103

we have proposed here, when implemented in the time domain, allows for 3 channelswith 30 filter taps per channel. The theoretical complexity figures are confirmed :the filter lengths can be made three times longer than for the reference setup with theGSVD–based algorithm (this is indeed expected because of the quadratic complexity,i.e. 27.5

3.5 ≈ 32), while the performance is the same. A subband implementation (16subbands, 12–fold downsampling) of the QRD–based algorithm allows to use 15 tapsper subband in 3 channels, which comes down to an equivalent of12.3.15 = 540 filtertaps.

5.6 Simulation results

Theoretically, the GSVD–based approach and the QRD–based approach solve thesame problem. We will show some subtle differences between the practical imple-mentations of the GSVD– and the QRD–based techniques. The conclusion will bethat also in a practical implementation the behaviour of the GSVD– and the QRD–based algorithm is roughly the same, and hence for the performance results for theQRD–based technique, we can refer to the literature about the GSVD–based tech-nique [14, 12].

Note that all covariance matrices in the above equations and algorithms should bepositive definite. The clean speech signal typically exists in a subspace of the inputspace. Hence a number of eigenvalues may be zero in the differenceε{x(k)xT (k)}−ε{v(k)vT (k)}. A practical estimator however will never obtain exact zeroes for theeigenvalues, and may even produce negative values for the estimation of the eigen-values ofε{x(k)xT (k)} − ε{v(k)vT (k)}. In the GSVD–approach, direct access tothe singular values is possible, and the negative eigenvalues can be corrected to bezero. This is not possible in the QRD–based approach. This difference is most of allseen for short estimation windows. For longer estimation windows, the QRD– andGSVD–results become roughly equal. On the other hand, the GSVD–approach has toincorporate an approximation in order to achieve quadratic complexity. We will alsoshow the influence of this approximation.

The speech signal is a sentence that is repeated four times. Speech+noise versus noise–only periods were marked manually. Reverberation was added by a simulated acous-tical environment (acoustic path of 1000 taps, sampling frequency 8000 Hz). Thespeech source is located at about6◦ from broadside, at2.8meters from the micro-phone array. The (white) noise source is located at about54◦ from broadside at2.2meters from the array. The microphone array consists of 4 microphones, spaced20cm each, the filters have 40 taps per channel. The first column ofW (k) is selected forsignal estimation. During each utterance of the sentence, the volume decreases, so theSNR is not constant.Figure 5.4shows the difference between the QRD–based methodand the GSVD–based method for a short estimation window. After the first speech ut-terance, in the beginning of the noise–only period, the convergence is clearly visible.


The QRD–based method has less distortion because of the approximation used in theGSVD–approach [14], while the GSVD–based method obtains more noise reductiondue to the ’corrected singular value’ estimates.Figure 5.5 compares both methodswithout applying the ’corrections’ to the singular values in the GSVD–method. Inthat case, the results are quite similar, and the QRD–based method performs slightlybetter (both concerning distortion and noise reduction) because of the tracking ap-proximation in the calculation of the SVD. InFigure 5.6 the noise estimation windowis made longer (λn = 0.99995), and the correctionsare applied in the GSVD–basedalgorithm. The figure shows that in spite of this, the performance of both algorithmsis almost the same.


2 4 6 8 10 12 14

x 104

−90

−80

−70

−60

−50

−40

−30

Signal energy

Speech at mic 1

Noiseat mic 1

GSVD−outputnegative ev set to zero

QRD−output

Time (samples)

Energy(dB)

3.6 3.8 4 4.2 4.4 4.6 4.8 5

x 104

−44

−42

−40

−38

−36

−34

−32

Signal energy

Speech at mic 1

GSVD−outputnegative ev set to zero

QRD−output

Time (samples)

Energy(dB)

Figure 5.4: Four utterances of the same sentence. The clean speech signal and the noise signalat microphone 1 are plotted (simulation). The GSVD–result is better for this case (λn = 0.9997andλs = 0.9997) than the QRD–result because the negative eigenvalues can be set to zero inthe GSVD–method. As shown on the detail below, the distortion is less for the QRD–methodthough.


2 4 6 8 10 12 14

x 104

−90

−80

−70

−60

−50

−40

−30

Signal energy

GSVD

QRD

Time (samples)

Energy(dB)

Figure 5.5: Comparison between GSVD–based and QRD–based unconstrained optimal filter-ing when the correlation matrices arenot corrected in the GSVD–approach. Again,λn =0.9997 and λs = 0.9997. The cheaper QRD–method performs slightly better because ofthe approximation used in the GSVD–approach. So the performance of the QRD and GSVD–algorithms can be considered ’almost equal’


2 4 6 8 10 12 14

x 104

−90

−80

−70

−60

−50

−40

−30

Signal energy

Energy(dB)

Time (samples)

Figure 5.6: QRD–approach versus GSVD–approach with longer estimation window : the differ-ence between both algorithms vanishes, even when the eigenvalues are corrected in the GSVD–approach. (λn = 0.99995 andλs = 0.9997)

0.8 0.9 1 1.1 1.2 1.3

x 105

−100

−90

−80

−70

−60

−50

−40

Signal energy

No trade−off

Trade off parameter = 2

Time (samples)

Energy(dB)

Figure 5.7: When a trade off (regularization) parameter is introduced, even more noise re-duction is can be achieved, in exchange for some signal distortion. The upper line is alwaysthe algorithm output without the trade off parameter, the lower line with a trade off parameterµ = 2.


The result of introducing regularization (see section 5.4) is clearly shown inFigure5.7. The upper line in the plot is the algorithm output energy for the original algo-rithm (without regularization), while the lower line shows the output energy when aregularization parameterµ = 2 is chosen. There is more noise reduction (as can beseen in the ’valleys’ of the graph, while also some signal distortion is introduced (ascan be seen at the peaks of the graph).

5.7 Conclusion

In this chapter, we have derived a new QRD–based algorithm for multichannel un-constrained optimal filtering with an “unknown” desired signal, and applied this tothe ANC problem. The same basic problem is solved as in related algorithms that aremostly based upon singular value decompositions. Our approach results in at leastan equal performance. In the GSVD–based algorithm approximate GSVD tracking isused, but since these approximations are not present in the QRD–based algorithm, theperformance is often even better.

The major advantage of the QRD–based optimal filtering technique is that its com-plexity is an order of magnitude lower than that of the (approximating) GSVD–basedapproaches.

We have also introduced a trade–off parameter in the QRD–based technique that al-lows for obtaining more noise reduction in exchange for some signal distortion.

Chapter 6

Fast QRD–LSL–based ANC

The QRD–based unconstrained optimal filtering ANC algorithm we have presentedin the previous chapter allows for a complexity reduction of an order of magnitudecompared to existing unconstrained optimal filtering approaches based on GSVD–computation and tracking. However, complexity still is quadratic in both the filterlength and the number of channels, and since in typical applications the filter length isoften a few tens–hundreds of taps, this complexity can still be prohibitive. In standardQRD–RLS adaptive filtering, the shift structure of the input signal is exploited in orderto obtain an algorithm (QRD–LSL) that is linear in the filter length. In this chapter,we will show how we can also apply this to the QRD–based algorithm of chapter 5too.

This is not straightforwardly achieved though, since in the previous chapter’s algo-rithm access to the upper triangular cholesky factor was necessary during noise onlyperiods in order to calculate the update of the right hand side. In a QRD–LSL–basedalgorithm, this matrix is not explicitly present anymore.

We will propose a QRD–Least Squares Lattice (QRD–LSL) based unconstrained op-timal filtering algorithm for ANC that obtains again the same performance as theGSVD– or QRD–RLS–approach (chapter 5) but now at a dramatically reduced com-plexity. As mentioned before, ifM is the number of microphones, andN the numberof filter taps applied to each microphone signal, then the GSVD–based approach hasa complexity ofO(M3N3). An approximateGSVD–solution (which uses GSVD–tracking) still requiresO(27.5M2N2) flops per sample. The QRD–RLS based so-lution of chapter 5 reduces this complexity toO(3.5M2N2). The algorithm pre-sented in this chapter has a complexity ofO(21M2N). For typical parameter set-tings (N = 50, M = 2), this amounts to an up to 50–fold complexity reductionwhen compared to the approximative GSVD–solution, and a8–fold complexity re-duction compared to the QRD–RLS–based algorithm. Our algorithm is based on a

109

110 CHAPTER 6. FAST QRD–LSL–BASED ANC

numerically stable fast implementation of the RLS–algorithm (QRD–LSL), applied toa reorganized version of the QRD–RLS–based algorithm of chapter 5.

In section 6.1, we describe the data model that is used for this algorithm. Then aQRD–RLS algorithm which is a modified version of the algorithm in chapter 5 is de-rived in section 6.2, and this is worked into a QRD–LSL algorithm in section 6.3.In section 6.4, the transitions between modes are studied in detail, in section 6.5, aregularization parameter is introduced which — similarly to the regularization factorin the previous chapter — can be used to trade off noise reduction versus signal dis-tortion. In section 6.6, complexity figures are given, and in 6.7 simulation results aredescribed. Conclusions are given in section 6.8.

6.1 Preliminaries

In order to show the analogy between the QRD– and GSVD– based methods, wewill again derive the algorithms with a matrixW (k) of which the columns are thefilter vectors and a vectord(k) of which the elements are the desired signals, butone should keep in mind that for a practical implementation only one column of thematrix has to be calculated. It turns out that the method described in chapter 5 can notbe straightforwardly modified into a QRD–LSL fast implementation. By reorganizingthe problem, wecan indeed substitute a QRD–LSL algorithm.

The fast RLS schemes that are available from literature are based upon the require-ment that the input signal has a shift structure, which means that each input vectorshould be a shifted version of the previous input vector. A large number of computa-tions can then be avoided, and ’re–used’ from previous time instants. As a result ofthe complexity reduction, the matrixR(k) is not explicitly available in the algorithmanymore. In the QRD–based algorithm of chapter 5, the right hand side noise covari-ance matrix is updated during noise–only periods (section 5.3.2), and the matrixR(k)is needed in order to do this, since a backsubstitution step is required. If we wantto obtain a fast algorithm, we will have to come up with a way to update the noisecorrelation matrix without needing access toR(k).

In order to derive a fast algorithm, we first have to reorder the contents of our inputvectors. They are redefined as in (2.5), repeated here for convenience

x(k) =

x1(k)...

xM (k)x1(k − 1)

...xM (k −N + 1)

,

which clearly does not have an impact on the algorithms of chapter 5.

6.1. PRELIMINARIES 111

Due to the second assumption in section 5.2, we can attempt to estimateε{x(k)xT (k)}during speech+noise–periods, andε{v(k)vT (k)} in noise–only periods. We willmake use of a weighting scheme in order to provide the ability to adapt to a changingenvironment. The input matricesX(k) andV (k) are defined as in (5.5) and (5.6). TheWiener–solution is then estimated as

W (k) = (RT (k)R(k))−1(RT (k)R(k)− V T (k)V (k))= I −R−1(k)R−T (k)V T (k)V (k)︸︷︷︸

B(k)︸︷︷︸WN (k)

, (6.1)

whereI is the identity matrix.

Let us now consider the following least squares estimation problem where the upperrows (withX(k)) represent weighted inputs from speech+noise–periods and the lowerrows (withV (k)) represent weighted inputs from noise–only periods :

minWN (k)

∥∥∥∥( X(k)βV (k)

)WN (k)−

(0

1βV (k)

)∥∥∥∥2

. (6.2)

The normal equations for this system are

(XT (k)X(k) + β2V T (k)V (k))WN (k) = (V T (k)V (k)),

such that

WN (k) = ((XT (k)X(k) + β2V T (k)V (k))︸︷︷︸RTβ (k)Rβ(k)

)−1(V T (k)V (k)) (6.3)

= R−1β (k)R−Tβ (k)(V T (k)V (k))︸︷︷︸

Bβ(k)

,

withRβ ∈ <MN×MN and upper triangular. Clearly, for the limiting case ofβ going tozero (indicated byβ → 0),Rβ→0(k) = R(k) andBβ→0(k) = B(k), soWN

β→0(k) =I − W (k), which means thatW (k) may be computed asW (k) = I − WN

β→0(k).We will now provide a QRD-based algorithm that makes use of this feature and thatis based on storing and updating the triangular factorRβ→0(k) = R(k) as well as theright hand sideBβ→0(k) = B(k). From now on, we focus on (6.2) and (6.3), keepingin mind that a desired signal estimate is then obtained as

dT (k) = xT (k)(I −WNβ→0(k))

= xT (k)− xT (k)WNβ→0(k)),

wherexT (k)WNβ→0(k) in fact corresponds to an estimate of the noise–contribution in

xT (k). Note that our updating formulae will be reorganized such that theβ does notappear anywhere, hence can effectively be set to zero.


6.2 Modified QRD–RLS based algorithm

Referring to (6.1), the algorithm will be based on storing and updating onlyRβ→0(k)andBβ→0(k). In a second step (section 6.3), this will be turned into a QRD–LSLbased algorithm. We know from (6.2) that there will be two modes of updating,depending on the input signal being classified as speech+noise or noise–only. Inspeech+noise–mode we apply

√1− λ2

sx(k + 1) as an input to the left hand side ,and0 to the right hand side. During noise–only periods,β

√1− λ2

nv(k+1) shouldbe applied to the left hand side of the SFG, and1

βv(k+) to the right hand side withβ → 0.

Assume that at timek, a QR–decomposition is available as follows

(X(k) 0βV (k) 1

βV (k)

)β→0

= Q(k)(R(k) B(k)

0 ∗

), (6.4)

whereQ(k) is not stored. From this equation,WNβ→0(k) can be computed as

WNβ→0(k) = R−1(k)B(k).

Our algorithm will however be based on residual extraction (section 2.3.3), and henceWNβ→0(k) will never be computed explicitly.

6.2.1 Speech+noise–mode

The update formula for thespeech+noise–modeis derived as follows, whereX(k) isupdated based on formula (5.5), whileV (k) is kept unchanged, i.e.V (k+1) = V (k) :

(X(k + 1) 0βV (k + 1) 1

βV (k + 1)

)β→0

=

√

1− λ2sxT (k + 1) 0

λsX(k) 0βV (k) 1

βV (k)

β→0

.

If we defineβ = βλs

, we obtain

(X(k + 1) 0βV (k + 1) 1

βV (k + 1)

)β→0

=

√

1− λ2sxT (k + 1) 0

λsX(k) 1λs

0

λsβV (k) 1λs

1βV (k)

β→0

=

(1 00 Q(k)

)√

1− λ2sxT (k + 1) 0

λsR(k) 1λsB(k)

0 ∗

.

6.2. MODIFIED QRD–RLS BASED ALGORITHM 113

1/λ λdelay delay

Z11

Z21

Z12

Z22

s

Speech+noise periods : Givens−rotations

s

21

1~

1~

1~

2~

2~

2~

2~

1~

2

0 0

s

∆

∆

∆

∆

∆

∆

mic 2

mic 11−λ

R11

R33

R55

R88

R77

R66

R44

R22

0

1

00

0

0

0

0

0

0

0ε ε

LS residual

Πcosθ

0

0

0

0

0

0

x (k)

x (k−2)

x (k+1)

x (k+1) x (k)

x (k−1)

x (k−1)

x (k−2)

(a) Givens–rotations during speech+noise periods

Figure 6.1: QRD–RLS based optimal filtering for acoustic noise suppression. Duringspeech+noise–periods, Givens–rotations are used.


λdelay

contain elements of R(k)

Z11

Z21

Z12

Z22

Noise−only :2n

Gauss transformations, left hand side

Gauss−transformations, right hand side��

1~

1~

1~

2~

2~

2~

2~

1~

2n

∆

∆

∆

∆

∆

∆

mic 2

mic 11−λ

��

��

��

��

��

��

��

��

��

� � � � � � � � � � � �

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� � � � � � � � � � � � � � � �

!�!�!�!�!!�!�!�!�!!�!�!�!�!

"�"�"�"�""�"�"�"�""�"�"�"�"

#�#�#�#�##�#�#�#�##�#�#�#�#

$�$�$�$�$$�$�$�$�$$�$�$�$�$$�$�$�$�$

%�%�%�%�%%�%�%�%�%%�%�%�%�%

&�&�&�&�&�&�&�&&�&�&�&�&�&�&�&&�&�&�&�&�&�&�&&�&�&�&�&�&�&�&

'�'�'�'�'�'�'�''�'�'�'�'�'�'�''�'�'�'�'�'�'�'

(�(�(�(�(�(�(�((�(�(�(�(�(�(�((�(�(�(�(�(�(�(

)�)�)�)�)�)�)�))�)�)�)�)�)�)�))�)�)�)�)�)�)�)

*�*�*�*�*�*�*�**�*�*�*�*�*�*�**�*�*�*�*�*�*�**�*�*�*�*�*�*�*

+�+�+�+�+�+�+�++�+�+�+�+�+�+�++�+�+�+�+�+�+�+

,�,�,�,,�,�,�,,�,�,�,

-�-�-�--�-�-�--�-�-�-

.�.�.�.�..�.�.�.�..�.�.�.�..�.�.�.�.

/�/�/�/�//�/�/�/�//�/�/�/�/

0�0�0�00�0�0�00�0�0�0

1�1�1�11�1�1�11�1�1�1

2�2�2�2�2�2�2�22�2�2�2�2�2�2�22�2�2�2�2�2�2�22�2�2�2�2�2�2�2

3�3�3�3�3�3�3�33�3�3�3�3�3�3�33�3�3�3�3�3�3�3

4�4�4�4�4�4�4�44�4�4�4�4�4�4�44�4�4�4�4�4�4�4

5�5�5�5�5�5�5�55�5�5�5�5�5�5�55�5�5�5�5�5�5�5

6�6�6�6�6�6�6�66�6�6�6�6�6�6�66�6�6�6�6�6�6�66�6�6�6�6�6�6�6

7�7�7�7�7�7�7�77�7�7�7�7�7�7�77�7�7�7�7�7�7�7

8�8�8�88�8�8�88�8�8�8

9�9�9�99�9�9�99�9�9�9

:�:�:�:�::�:�:�:�::�:�:�:�::�:�:�:�:

;�;�;�;�;;�;�;�;�;;�;�;�;�;

<�<�<�<<�<�<�<<�<�<�<

=�=�=�==�=�=�==�=�=�=

>�>�>�>�>>�>�>�>�>>�>�>�>�>

?�?�?�?�??�?�?�?�??�?�?�?�?

@�@�@�@�@�@�@�@@�@�@�@�@�@�@�@@�@�@�@�@�@�@�@@�@�@�@�@�@�@�@

A�A�A�A�A�A�A�AA�A�A�A�A�A�A�AA�A�A�A�A�A�A�A

B�B�B�B�BB�B�B�B�BB�B�B�B�B

C�C�C�C�CC�C�C�C�CC�C�C�C�C

D�D�D�D�D�D�D�DD�D�D�D�D�D�D�DD�D�D�D�D�D�D�DD�D�D�D�D�D�D�D

E�E�E�E�E�E�E�EE�E�E�E�E�E�E�EE�E�E�E�E�E�E�E

F�F�F�F�FF�F�F�F�FF�F�F�F�FF�F�F�F�F

G�G�G�G�GG�G�G�G�GG�G�G�G�G

H�H�H�H�H�H�H�HH�H�H�H�H�H�H�HH�H�H�H�H�H�H�HH�H�H�H�H�H�H�H

I�I�I�I�I�I�I�II�I�I�I�I�I�I�II�I�I�I�I�I�I�I

J�J�J�J�JJ�J�J�J�JJ�J�J�J�J

K�K�K�K�KK�K�K�K�KK�K�K�K�K

L�L�L�L�L�L�L�LL�L�L�L�L�L�L�LL�L�L�L�L�L�L�LL�L�L�L�L�L�L�L

M�M�M�M�M�M�M�MM�M�M�M�M�M�M�MM�M�M�M�M�M�M�M

N�N�N�N�NN�N�N�N�NN�N�N�N�NN�N�N�N�N

O�O�O�O�OO�O�O�O�OO�O�O�O�O

P�P�P�P�P�P�P�PP�P�P�P�P�P�P�PP�P�P�P�P�P�P�PP�P�P�P�P�P�P�P

Q�Q�Q�Q�Q�Q�Q�QQ�Q�Q�Q�Q�Q�Q�QQ�Q�Q�Q�Q�Q�Q�Q

R�R�R�RR�R�R�RR�R�R�R

S�S�S�SS�S�S�SS�S�S�S

T�T�T�T�T�T�T�TT�T�T�T�T�T�T�TT�T�T�T�T�T�T�TT�T�T�T�T�T�T�T

U�U�U�U�U�U�U�UU�U�U�U�U�U�U�UU�U�U�U�U�U�U�U

V�V�V�V�V�V�V�VV�V�V�V�V�V�V�VV�V�V�V�V�V�V�V

W�W�W�W�W�W�W�WW�W�W�W�W�W�W�WW�W�W�W�W�W�W�W

X�X�X�X�XX�X�X�X�XX�X�X�X�X

Y�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�Y

Z�Z�Z�Z�Z�Z�Z�ZZ�Z�Z�Z�Z�Z�Z�ZZ�Z�Z�Z�Z�Z�Z�ZZ�Z�Z�Z�Z�Z�Z�Z

[�[�[�[�[�[�[�[[�[�[�[�[�[�[�[[�[�[�[�[�[�[�[

R11

R33

R55

R88

R77

R66

R44

R22

0

0

0

0

0

0

0

0

x (k)

x (k+1)

x (k+1)

x (k)

x (k−1)

x (k−1)

x (k−2)

x (k−2)

Figure 6.2: QRD–RLS based optimal filtering for acoustic noise suppression. During noise–only–periods, Gauss–rotations are used.


This means (from (6.4)) that the updatedR(k + 1) andB(k + 1) may be obtainedbased on a standard QRD–updating

(0 rT (k + 1)

R(k + 1) B(k + 1)

)= (6.5)

QT

(k + 1)( √

1− λ2sxT (k + 1) 0

λsR(k) 1λsB(k)

).

A signal flow graph representation is given inFigure 6.1. For this example, the righthand side part of this signal flow graph has only 2 columns, estimating the speechcomponents inx1(k) andx2(k) only. While the left hand side has a weighting withλs, the right hand side part has weighting with1λs which is shown by means of blacksquares in boxes. As shown in equation (6.6), we do not have to calculate the filtervector in each step, but we can useresidual extraction(2.24) in order to obtain theestimate of the speech signal

d(k + 1) = x(k + 1) +r(k + 1)

∏MNi=1 cos θi(k + 1)√1− λ2

s

. (6.6)

Note that (6.5) and (6.6) are the same as (5.8) and (5.9). The update formulas for thenoise–only case however, will be different from the formulas in chapter 5.

6.2.2 Noise–only mode

The update formula for thenoise–only modeis derived as follows, where nowX(k)is kept unchanged, i.e.X(k + 1) = X(k), whileV (k) is implicitly updated as

V (k + 1) =( √

1− λ2nvT (k + 1)

λnV (k)

).

It is convenient to redefine/reorderV (k + 1) as

V (k + 1) =(

λnV (k)√1− λ2

nvT (k + 1)

),


leading to (X(k + 1) 0βV (k + 1) 1

βV (k + 1)

)β→0

=

X(k) 0

βλn︸︷︷︸˜β

V (k) 1βλn

λ2nV (k)

β√

1− λ2nvT (k + 1) 1

β

√1− λ2

nvT (k + 1)

β → 0˜β → 0

=(Q(k) 0

0 1

) R(k) λ2nB(k)

0 ∗β√

1− λ2nvT (k + 1) 1

β

√1− λ2

nvT (k + 1)

β → 0˜β → 0

.

This means that the updatedR(k + 1) andB(k + 1) may be obtained based on aQRD–updating (

R(k + 1) B(k + 1)0 rT (k + 1)

)= (6.7)

QT

(k + 1)(

R(k) λ2nB(k)

β√

1− λ2nvT (k + 1) 1

β

√1− λ2

nvT (k + 1)

),

whereQ(k + 1) is not stored and from whichWNβ→0(k + 1) can again be computed

asWNβ→0(k + 1) = R−1(k + 1)B(k + 1).

Note thatβ and 1β now appear explicitly in the QRD–updating formula. As we are

interested in the caseβ → 0, we have to work the formulas into an alternative formwhereβ does not appear explicitly. The end result will be that orthogonal Givenstransformations are replaced by Gauss–transformations, and that the input vector willbe ( √

1− λ2nvT (k + 1)

√1− λ2

nvT (k + 1))

instead of (β√

1− λ2nvT (k + 1) 1

β

√1− λ2

nvT (k + 1)).

It will also be shown that the elements of the matrixR(k) are not changed by theupdating during noise–only periods.

We consider the first orthogonal Givens rotation that is computed in the top lefthexagon of the signal flow graph inFigure 6.1.

(R11(k + 1)

0

)=(

cos θ1(k) sin θ1(k)− sin θ1(k) cos θ1(k)

)(R11(k)

β√

1− λ2nv1(k + 1)

).


Herevj(k) denotes thej’th component of vectorv(k). Sinceβ → 0, we can write

tan θ1(k) =β√

1− λ2nv1(k + 1)

R11(k)

sin θ1(k)|β→0 ≈β√

1− λ2nv1(k + 1)

R11(k)

∣∣∣∣∣β→0

(6.8)

cos θ1(k)|β→0 ≈ 1.

Hence we have (R11(k + 1)

0

)β→0

= (6.9) 1 β√

1−λ2nv1(k+1)

R11(k)

−β√

1−λ2nv1(k+1)

R11(k) 1

( R11(k)β√

1− λ2nv1(k + 1)

)∣∣∣∣∣∣β→0

,

which is equivalent to(R11(k + 1)

0

)=

(1 0

−√

1−λ2nv1(k+1)

R11(k) 1

)(R11(k)√

1− λ2nv1(k + 1)

),

whereβ has disappeared.

This rotation is then applied for the updating of the remaining elements in the first rowof R(k) . (

R1j(k + 1)β√

1− λ2nv′j(k + 1)

)β→0

= (6.10) 1 β√

1−λ2nv1(k+1)

R11(k)

−β√

1−λ2nv1(k+1)

R11(k) 1

( R1j(k)β√

1− λ2nvj(k + 1)

)∣∣∣∣∣∣β→0

,

which is equivalent to(R1j(k + 1)√

1− λ2nv′j(k + 1)

)=

(1 0

−√

1−λ2nv1(k+1)

R111

)(R1j(k)√

1− λ2nvj(k + 1)

).

(6.11)This shows that the elementsR1j(k) are indeed unaffected by this transformation.

Applying the rotation to the right hand side (B(k)–part) of the signal flow graph leadsto (

B1j(k + 1)1β

√1− λ2

nv′rhs,j(k + 1)

)β→0

= (6.12) 1 β√

1−λ2nv1(k+1)

R11(k)

−β√

1−λ2nv1(k+1)

R11(k) 1

( λ2nB1j(k)

1β

√1− λ2

nvrhs,j(k + 1)

)∣∣∣∣∣∣β→0

,


wherevrhs(k + 1) is the input to the right hand side of the SFG. During noise–onlyperiods,vrhs(k) = v(k). Equation (6.12) is equivalent to :(

B1j(k + 1)√1− λ2

nv′rhs,j(k + 1)

)=

(1√

1−λ2nv1(k+1)

R11(k)

0 1

)(λ2nB1j(k)√

1− λ2nvrhs,j(k + 1)

).

(6.13)Note thatv′rhs(k) = vrhs(k). Similar transformations are subsequently applied to theother rows ofB(k) andR(k), where the update for the second row ofR(k) usesv′(k + 1)as its input, and generatesv′′(k), which will serve as an input for the thirdrow, and so on.

Note that we have now removedβ from all equations. The above formulae effec-tively mean that — during noise–only periods — we can replace the original Givens–rotations inFigure 6.1 by so–called Gauss–transformations, as derived by formulae(6.11) and (6.13), seeFigure 6.2. Note that from these formulae it also follows that theright–hand part and the left–hand part have differently defined Gauss–transformations,namely

Gleft =(

1 0−∗ 1

)and

Gright = G-Tleft =

(1 ∗0 1

).

In chapter 5, we have developed a QRD–based algorithm for unconstrained optimalfiltering where the right hand side update for noise–only periods,

B(k + 1) = λ2nB(k) + (R−T (k + 1)v(k + 1))vT (k + 1)(1− λ2

n),

is calculated directly by a backsubstitution using an intermediate vectoru(k), and avector–vector multiplication.

RT (k + 1)u(k + 1) = v(k + 1)B(k + 1) = λ2

nB(k) + u(k + 1)vT (k + 1)(1− λ2n).

It is easily shown that this backsubstitution corresponds exactly to applying (6.11)and that the vector–vector multiplication corresponds to (6.13). The reorganized algo-rithm, however, is more easily converted into a QRD–LSL scheme, see section 6.3.

Since it is assumed that there is no speech present in noise–only mode, we couldset the desired signal estimate to zero, i.e.d(k + 1) = 0. In practice, it is oftenrequired to also have an estimate available during noise periods, becausecompletesilencecould be disturbing to the listener (see also section 5.3.3). Hence a residualextraction scheme that also operates during noise–only mode is again needed. It isobtained as follows.

Let GT

(k) be the matrix that combines all left hand part Gauss–transformations attime k. We can generate residuals by applying these left hand side rotations to an

6.3. QRD–LSL BASED ALGORITHM 119

input of0 applied to the right hand side. These rotations will also not update the righthand side. We write (withWN (k) = R−1(k)B(k))(

R(k) B(k)0 rT (k + 1)

)=

GT

(k + 1)(

R(k) B(k)√1− λ2

nvT (k + 1) 0

)

(R(k) B(k)

0 rT (k + 1)

)(−WN (k)

I

)︸︷︷︸ 0

rT (k + 1)

=

GT

(k + 1)(

R(k) B(k)√1− λ2

nvT (k + 1) 0

)(−WN (k)

I

)︸︷︷︸ 0

0−√

1− λ2nvT (k + 1)WN (k)

.

Now we can obtain the ’speech signal’ estimate :

dT (k + 1) =

√1− λ2

nxT (k + 1) + rT (k + 1)√1− λ2

n

. (6.14)

An algorithm description can be found inAlgorithm 15

6.3 QRD–LSL based algorithm

Based upon the reorganization in the previous section, we can now derive a fast algo-rithm based upon QRD–LSL. First (section 6.3.1the classification in speech+noise ornoise must be done ’per sample’ instead of per vector, in order to be able to maintaina shift structure in the input. After that (in section 6.3.2), the LSL–based algorithm isderived.

6.3.1 Per sample versus per vector classification

In the previous section we described a QRD–updating based scheme to calculate anoptimal speech signal estimate from a noisy signal. The scheme is shown inFigure 6.1for signal+noise input vectors, and inFigure 6.2 for noise–only input vectors. Most


Algorithm 15 The modified QRD–RLS algorithm (note the different Gauss–transformations in the left–and in the right hand side)

QRDRLS_Mod(x,mode){Picos = 1;if mode=noise

bIn = x[1] // right hand side inputfor i=1:M*N

{Gauss=CalcGauss(R[i][i],x[i]);for (j = i+1:M*N)

{ApplyGauss1(Gauss, R[i][j], x[j])}

ApplyGauss2(Gauss, b[i], bIn)}

else (mode=signal)bIn = 0;

for i=1:M*N{Givens=CalcGivens(R[i][i],x[i]);for (j = i+1:M*N)

{ApplyGivens(Givens, R[i][j], x[j])}

ApplyGivens(Givens, b[i], bIn, PiCos)}

return PiCos * bIn;}


of the time, namely within each noise–only segment and within each speech+noise–segment, the input vector to (the left hand side part of)1 this algorithm is a shiftedversion of the input vector of the previous time step. This in particular may allowus to derive a fast implementation with a QRD–LSL structure. We note here that theequivalence between the signal flow graphs ofFigure 2.4 andFigure 2.5 was statedin section 2.3.4 for the case where orthogonal Givens transformations are used, butequally holds when Gauss transformations (cfr. noise–only mode) are used (as this isthe limiting case withβ → 0).

However, the shift–structure of the input vectors to the signal flow graph is temporarilydestroyed by a transition between modes if the classification in noise–only periodsversus speech+noise periods is performed on aper–vectorbasis. When a transitionoccurs, two successive input vectors will indeed have different scalings (

√1− λ2

s

versus√

1− λ2n andβ → 0 versusβ = 1).

We therefore propose to do the noise–only/speech+noise–classification on aper–samplebasis. Each sample is then effectively given a “flag”f (f = 1 means signal+noisesample, andf = 0 means noise–only sample (which is multiplied byβ → 0)), whichit maintains while travelling through the signal flow graph, both horizontally throughthe delay line and vertically through successive transformations in a column. Also alltransformations are given a flagg that indicates whether the transformation is basedon (calculated from) a sample from a noise–only period or a signal+noise period. Thiswill introduce the transitions gradually into the signal flow graph, which will then al-low us to derive a fast algorithm. In this way, the first input vector of a noise periodfollowing a speech+noise period will be (including weightings)

β√

1− λ2nx1(k)

...β√

1− λ2nxM (k)√

1− λ2sx1(k − 1)...√

1− λ2sxM (k − 1)√

1− λ2sx1(k − 2)...

. (6.15)

Similarly, the first input vector of a speech+noise–period, following a noise–only pe-

1The right hand side part does not need to have shift structure in order to derive a fast algorithm.


riod is

√1− λ2

sx1(k)...√

1− λ2sxM (k)

β√

1− λ2nx1(k − 1)...

β√

1− λ2nxM (k − 1)

β√

1− λ2nx1(k − 2)...

. (6.16)

The shift-structure of the signal flow graph is then always preserved. Since the tran-sition will occur gradually, some transformations in the graph will be calculated frominputs that have been multiplied byf = 0 (i.e. β → 0), and applied to inputs whichhave not, and vice versa. Therefore, we will derive four ’rules’ that can be used forthe updates in the left hand part of the graph (the part that updatesR(k)).

1. A transformation based upon a noise–only input sample (g = 0), and applied toa noise–only input sample (f = 0), can be replaced by a Gauss–transformation.(The gauss–transformation is different for the left– and right hand part of thesignal flow graph).

2. A transformation based upon a noise–only input sample (g = 0), and applied to

a speech+noise sample (f = 1), is replaced by

(1 00 1

).

3. A transformation based upon a speech+noise input sample (g = 1), and applied

to a noise–only sample (f = 0), is replaced by

(cos θ 0− sin θ 0

).

4. A transformation based upon a speech+noise input sample (g = 1), and appliedto a speech+noise–sample (f = 1) , is an ordinary Givens–rotation.

Rule 1is proven in (6.11) and (6.13) where respectively the Gauss–transformationsfor the left hand part and the right hand part of the signal flow graph are shown.Rule4 is the standard orthogonal update.Rule 2is obvious from(

R1j(k + 1)v′j(k + 1)

)

=

1β√

1−λ2nv1(k+1)

R11(k)

−β√

1−λ2nv1(k+1)

R11(k)1

( R1j(k)√1− λ2

sxj(k + 1)

)∣∣∣∣∣∣β→0

=

(R1j(k)√

1− λ2sxj(k + 1)

), (6.17)


Z11

Z21

Z12

Z22

flag g

Signal + noise : Noise only :

λ=λλ=λ

sn

Givens− or Gausstransformations, depending on flags f and g

21

1~

1~

1~

2~

2~

2~

2~

1~

2

0 0

f

∆

∆

∆

∆

∆

∆

��

��

��

��

��

��

��

��

��

��

��

��

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� � � � � � � � �

!�!�!�!�!!�!�!�!�!!�!�!�!�!

"�"�"�"�""�"�"�"�""�"�"�"�"

#�#�#�#�#�##�#�#�#�#�##�#�#�#�#�#

$�$�$�$�$�$$�$�$�$�$�$$�$�$�$�$�$

%�%�%�%�%�%%�%�%�%�%�%%�%�%�%�%�%

&�&�&�&�&�&&�&�&�&�&�&&�&�&�&�&�&

'�'�'�''�'�'�''�'�'�'

(�(�(�((�(�(�((�(�(�(

)�)�)�)�)�)�))�)�)�)�)�)�))�)�)�)�)�)�)

*�*�*�*�*�*�**�*�*�*�*�*�**�*�*�*�*�*�*

+�+�+�+�++�+�+�+�++�+�+�+�+

,�,�,�,,�,�,�,,�,�,�,

-�-�-�-�--�-�-�-�--�-�-�-�-

.�.�.�..�.�.�..�.�.�.

/�/�/�/�/�/�//�/�/�/�/�/�//�/�/�/�/�/�/

0�0�0�0�0�0�00�0�0�0�0�0�00�0�0�0�0�0�0

1�1�1�1�11�1�1�1�11�1�1�1�1

2�2�2�2�22�2�2�2�22�2�2�2�2

3�3�3�3�33�3�3�3�33�3�3�3�33�3�3�3�3

4�4�4�44�4�4�44�4�4�44�4�4�4

5�5�5�5�5�5�55�5�5�5�5�5�55�5�5�5�5�5�55�5�5�5�5�5�5

6�6�6�6�6�6�66�6�6�6�6�6�66�6�6�6�6�6�66�6�6�6�6�6�6

7�7�7�77�7�7�77�7�7�7

8�8�8�88�8�8�88�8�8�8

9�9�9�9�9�99�9�9�9�9�99�9�9�9�9�99�9�9�9�9�9

:�:�:�:�:�::�:�:�:�:�::�:�:�:�:�::�:�:�:�:�:

;�;�;�;�;;�;�;�;�;;�;�;�;�;

<�<�<�<<�<�<�<<�<�<�<

=�=�=�=�=�=�==�=�=�=�=�=�==�=�=�=�=�=�==�=�=�=�=�=�=

>�>�>�>�>�>>�>�>�>�>�>>�>�>�>�>�>>�>�>�>�>�>

?�?�?�?�?�?�??�?�?�?�?�?�??�?�?�?�?�?�?

@�@�@�@�@�@@�@�@�@�@�@@�@�@�@�@�@

A�A�A�AA�A�A�AA�A�A�AA�A�A�A

B�B�B�BB�B�B�BB�B�B�BB�B�B�B

C�C�C�C�CC�C�C�C�CC�C�C�C�CC�C�C�C�C

D�D�D�D�DD�D�D�D�DD�D�D�D�DD�D�D�D�D

E�E�E�E�E�EE�E�E�E�E�EE�E�E�E�E�E

F�F�F�F�F�FF�F�F�F�F�FF�F�F�F�F�F

G�G�G�G�GG�G�G�G�GG�G�G�G�GG�G�G�G�G

H�H�H�H�HH�H�H�H�HH�H�H�H�HH�H�H�H�H

I�I�I�I�I�I�II�I�I�I�I�I�II�I�I�I�I�I�I

J�J�J�J�J�J�JJ�J�J�J�J�J�JJ�J�J�J�J�J�J

K�K�K�K�KK�K�K�K�KK�K�K�K�K

L�L�L�LL�L�L�LL�L�L�L

M�M�M�M�MM�M�M�M�MM�M�M�M�MM�M�M�M�M

N�N�N�NN�N�N�NN�N�N�NN�N�N�N

O�O�O�O�O�O�OO�O�O�O�O�O�OO�O�O�O�O�O�OO�O�O�O�O�O�O

P�P�P�P�P�P�PP�P�P�P�P�P�PP�P�P�P�P�P�PP�P�P�P�P�P�P

Q�Q�Q�QQ�Q�Q�QQ�Q�Q�Q

R�R�R�RR�R�R�RR�R�R�R

S�S�S�S�S�SS�S�S�S�S�SS�S�S�S�S�SS�S�S�S�S�S

T�T�T�T�T�TT�T�T�T�T�TT�T�T�T�T�TT�T�T�T�T�T

U�U�U�U�UU�U�U�U�UU�U�U�U�UU�U�U�U�U

V�V�V�VV�V�V�VV�V�V�VV�V�V�V

W�W�W�W�W�WW�W�W�W�W�WW�W�W�W�W�W

X�X�X�X�X�XX�X�X�X�X�XX�X�X�X�X�X

Y�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�Y

Z�Z�Z�Z�ZZ�Z�Z�Z�ZZ�Z�Z�Z�Z

[�[�[�[�[�[�[[�[�[�[�[�[�[[�[�[�[�[�[�[

\�\�\�\�\�\�\\�\�\�\�\�\�\\�\�\�\�\�\�\

]�]�]�]�]]�]�]�]�]]�]�]�]�]

^�^�^�^^�^�^�^^�^�^�^

_�_�_�_�__�_�_�_�__�_�_�_�_

`�`�`�``�`�`�``�`�`�`

a�a�a�a�a�a�aa�a�a�a�a�a�aa�a�a�a�a�a�a

b�b�b�b�b�b�bb�b�b�b�b�b�bb�b�b�b�b�b�b

c�c�c�cc�c�c�cc�c�c�c

d�d�d�dd�d�d�dd�d�d�d

e�e�e�e�e�e�ee�e�e�e�e�e�ee�e�e�e�e�e�ee�e�e�e�e�e�e

f�f�f�f�f�f�ff�f�f�f�f�f�ff�f�f�f�f�f�ff�f�f�f�f�f�f

g�g�g�g�gg�g�g�g�gg�g�g�g�gg�g�g�g�g

h�h�h�hh�h�h�hh�h�h�hh�h�h�h

R11

R33

R55

R88

R77

R66

R44

R22

0

1

00

0

0

0

0

0

0

0ε ε

LS residual

Πcosθ

0

0

0

0

0

0

x (k)

x (k−2)

x (k+1)

x (k+1) x (k)

x (k−1)

x (k−1)

x (k−2)

mic 2

mic 11−λ

Signal/noise flag f

Signal/noise flag f

Figure 6.3: QRD–RLS scheme for acoustic noise cancellation, classification noise or sig-nal+noise is done sample by sample. Flag g shows if the transformation is based upon asignal+noise or noise–only sample, and flag f is carried together with the sample through theSFG and shows if the sample stems from a signal+noise or a noise–only period.


andrule 3 is proven by(R1j(k + 1)v′j(k)

)=

(cos θ1(k) sin θ1(k)− sin θ1(k) cos θ1(k)

)(R1j(k)

β√

1− λ2nvj(k)

)∣∣∣∣β→0

=(

cos θ1(k) 0− sin θ1(k) 0

)(R1j(k)√

1− λ2nvj(k)

). (6.18)

It should be noted that none of the rotations computed from the input vector compo-nents that are multiplied byβ do not have any effect on the elements ofR(k). Theycan be considered to be zero as far as the updates forR(k) are concerned. For theupdates ofR(k) the input signal can be considered to be

time

x

∗ ∗ ∗∗ ∗ ∗

}S+N-updates

0 ∗ ∗0 0 ∗

}N→ S+N

0 0 00 0 0

}Noise-− only updates

∗ 0 0∗ ∗ 0

}S+N→ N

∗ ∗ ∗∗ ∗ ∗

}S+N-updates

In this structure, a pre– and post–windowing of the signal that arrives in the estimationprocess for the correlation matrix square rootR(k) is recognized.

Based on these transformation rules,Figure 6.1andFigure 6.2 (with per vector clas-sification) may be turned intoFigure 6.3 (with per sample classification).An impor-tant aspect is that this scheme provides —as one can easily verify— anR(k) whicheffectively corresponds to the triangular factor obtained withFigure 2.4 (plain QR–updating) when fed with the same sequence of input samples, be it that all noise–onlysamples are set to zero (pre– and post–windowing). In addition, the right hand sideB(k) = R−T (k)V T (k)V (k) effectively has the same V(k) as in the per–vector clas-sification case, i.e. consists of (full) noise–only input vectors (as it should be).

6.3.2 LSL–algorithm

The signal flow graph ofFigure 6.3 is readily transformed into a QRD–LSL typesignal flow graph, as shown inFigure 6.4).

Note that during a signal+noise to noise–only transition, no residuals can be calcu-lated. In practice, this means that there is no correct noise–estimate available, and thisis audible as a small click in the output signal. One could run a parallel lattice duringtransitions in order to be able to generate residuals. Since some of the transformations

6.4. TRANSITIONS 125

are void during the transition (the right hand side needs not be updated in that part ofthe graph where the Gauss–transformations have already been introduced), this doesnot double the complexity), or one could just insert ’comfort noise’ instead (e.g. froma noise buffer), since the transitions are very short in time.

The complete algorithm is shown inFigure 6.4 where each sample at the input isaccompanied by a flagf which is carried through the SFG along with the signal,and a flagg which accompanies the rotations. Flagf indicates whether the signalstems from a noise–only period or from a signal+noise–period, whileg indicates if thetransformation was calculated based upon a sample from a noise–only period or not.The combination of these flags determines whether a hexagon should be a Givens– ora Gauss–transformation according to the rules given in section 6.3.1. An algorithmdescription is given in section 16.

The full specification can be found inAlgorithm 16 .

6.4 Transitions

In this section we will look in detail to what happens ’internally’ in the algorithm dur-ing the different transitions. This information is not absolutely necessary to implementthe algorithm, but it can serve to understand the internal working of the algorithm.

6.4.1 Transition from speech+noise to noise–only mode

We will now clarify the internals of the algorithm, more specifically during transitions.First we will explain how the Givens–rotations are changed into Gauss–rotations dur-ing speech+noise to noise–only transitions. At the last sample of a speech+noise–period, the first sample from a noise period enters the signal flow graph for the latticealgorithm, since the QRD–LSL ’looks ahead’ one sample.Figure 2.5can be redrawnas is done inFigure 6.5 because ofrule 2 (rotation in the upper left corner has noeffect) andrule 3 (as shown in the figure) . Note that all rotations which are passedto the right hand side are still computed as they were in the signal+noise–period . Anequivalent QRD–RLS scheme would at this time still operate in signal+noise–mode,and generate the same rotations and residuals (cfr.Figure 6.1)

When more noise–only samples enter the graph, it can be redrawn as inFigure 6.6.The equivalent triangular scheme during the transition is shown inFigure 6.7.

It should be noted that in the upper part ofFigure 6.6, where the rotations are com-puted from input samples from the noise–only period, Gauss–transformations (arrowsfilled with horizontal lines) are introduced into the signal flow graph already, allthoughthey are not used to update the right hand side yet.


2

1

22

1

flag g


λ=λλ=λ

sn

∆

∆

∆ ∆

∆ ∆

∆ ∆

LS residual

Signal/noise flag f

Signal/noise flag f

mic 2

1−λmic 1

R22

0

0

0

0

0

0

0

0

1

0

0

0

0

Πcos θ

0

0

0

0

0

0

R11

0

0

0

0

x (k+2)

x (k+2)

x (k+1)

x (k+1)

0f

Figure 6.4: QRD–LSL based unconstrained optimal filtering for acoustic noise suppression.The hexagons are either Givens–rotations ore Gauss–rotations, depending upon the flags whichdesignate whether the sample / rotation stems from a noise–only period or from a signal+noise–period in the signal.


Algorithm 16 Fast QRD–LSL based noise cancellation algorithmQRDLSLNoise(x, mode)

PiCos=1; xl = x; xr = x; delay[0] = x;if mode=Noise {bIn = delay[1]}else {bIn = 0}for (int i=0; i < N; i++)

dxl = delay[i+1];dxr = dxl;if (mode = Signal)

Givens = ComputeGivens(Comp2[i], dxr, dWeight)Apply-

Givens(Givens, Rot2[i], xr, b[i], bIn, dWeight, PiCos)Givens = ComputeGivens(Comp1[i], xl, dWeight)ApplyGivens(Givens, Rot1[i], dxl, dWeight)

if (mode = Noise)Gauss = ComputeGauss(Comp2[i], dxr)ApplyGauss(Gauss, Rot2[i], xr, b[i], bIn, dNoiseWeight)

//Note : different transforma-tions on Rot2/xr and b/bIn !!!

Gauss = ComputeGauss(Comp1[i], xl)ApplyGauss(Gauss, Rot1[i], dxl)

if (mode = SigToNoise)if xr.IsFromNoisePeriod and dxr.IsFromNoisePeriod

Gauss = ComputeGauss(Comp2[i], dxr, dWeight)ApplyGauss(Gauss, Rot2[i], xr, dWeight)//Left hand side still weighted during transitionGauss = ComputeGauss(Comp1[i], xl)ApplyGauss(Gauss, Rot1[i], dxl)

else if xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriodxr = 0; //dxl is not changed hereGivens = ComputeGivens(Comp2[i], dxr, dWeight)Apply-

Givens(Givens, Rot2[i], xr, b[i], bIn, dWeight, PiCos)else if not xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriod

Givens = ComputeGivens(Comp2[i], dxr, dWeight)Apply-


if (mode = NoiseToSig)if this is the first sample in a NoiseToSignal transi-

tionGauss = ComputeGauss(Comp2[i], dxr)ApplyGauss(Gauss, b[i], bIn, dNoiseWeight)//xr is not modified in this case !!!dxl=0;Givens = ComputeGivens(Comp1[i], xl, dWeight)ApplyGivens(Givens, Rot1[i], dxl, dWeight)

else //as in a Signal period :Givens = ComputeGivens(Comp2[i], dxr, dWeight)Apply-


xl = xr;for (int i = N-2; i >=0 ; i--) {delay[i+1]=delay[i]}return bIn*dPiCos;


21

1 2

1 1

22d (k) d (k)

Rule 2Rule 3

∆

∆

∆ ∆

∆ ∆

∆ ∆

LS residual

filter input 2

filter input 1

R22

R55

R88

R77

R66

R44

R33

R11

ε ε

x (k+1) x (k)

x (k)x (k+1)

0

0

0

0

0

0

1

0

0

0

0

0

0

Πcos θ

0

0

0

0

0

0

0

0

0 0

Figure 6.5: Whenu(k + 1) is a noise sample (multiplied byβ → 0), the lattice signal flowgraph reduces to the above


21

1 2

1 1

22d (k) d (k)

Rule 2

Rule 3

∆

∆

∆ ∆

∆ ∆

∆ ∆

��

��

��

��

��

��

��

��

��

��

��

��

� � � � � � � � � � � � � � � � � �

��

��

��

��

��

��

��

LS residual

filter input 2

filter input 1

R22

R55

R88

R77

R66

R44

R33

R11

ε ε

x (k+1) x (k)

x (k)x (k+1)

0

0

0

0

0

0

0

0

0

0

Πcos θ

0

0

0

0

0

0

0

0

10 0

Figure 6.6: After some noise samples have entered the scheme, the upper part can be loadedwith the Gauss-transformations in order to keep the shift–structure


��

��

��

d (k)

1 2

d (k)

1 11x (k) 1x (k−1) x (k−2) x (k−3)

2x (k) 2x (k−1) 2x (k−2) 2x (k−3)1 2

1

∆

∆

∆

∆

∆

∆

��

��

LS residual

filter input 2

filter input 1

R11

R33

R55

R88

R77

R66

R44

R22

ε ε

0

0

0

0

0

0

0

0

0

0

0

0

0

0

cosΠ θ

Z(k)

Figure 6.7: The equivalent triangular scheme when the first noise–only sample enters


When all rotations are replaced by Gauss–rotations, the transition is finished, and onecan start using the rotations to update the right hand side (as described in section6.2.2). From that time on, also the weighting of the memory elements is stopped,because (since they are not updated anymore during the noise–period) they wouldotherwise become too small after a while.

6.4.2 Transition from a noise–only to a speech+noise–period

This transition takes only one sample, and it has no effect on the residual extraction. Ina first step, when the first signal+noise–sample enters the (one step look ahead) inputof the lattice, we can redraw the signal flow graph as shown inFigure 6.8. From nowon, weighting is again switched on for all memory elements.

21

1 2

1 1

22d (k) d (k)

Rule 2

0 0

0 0

0 0

Rule 3

∆

∆

∆ ∆

∆ ∆

∆ ∆

��

��

��

��

��

��

��

��

��

��

��

��

� � � � � � � � � � � � � � � � � �

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� � � � � � � � � � � �

LS residual

filter input 2

filter input 1

R22

R55

R88

R77

R66

R44

R33

R11

ε ε

x (k+1) x (k)

x (k)x (k+1)

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Figure 6.8: First step in the transition from noise–only to speech+noise. The residuals arecomputed based on Gauss–rotations.

At the next time instant, all rotations are replaced by Givens–rotations, and we againobtain the scheme ofFigure 2.5.


6.5 Noise reduction vs. signal distortion trade–off

In the QRD–RLS based algorithm in chapter 5, we have introduced a regularizationparameter that allows for tuning of signal distortion versus noise reduction. In thissection we will also do this for the QRD–LSL based algorithm. Two alternatives willbe described, the first one (section 6.5.2) comparable to the technique used in chapter5 for QRD–RLS, the other one (section 6.5.3) based upon continuously updating thesignal correlation matrix, even during noise periods. This leads to a ’self tuning’ tradeoff–parameter, which will provide infinite noise reduction during noise only periods,and a well regularized algorithm during speech+noise periods.

6.5.1 Regularization in QRD–LSL based ANC

In section 5.4 we have shown how a regularization termµ can be introduced in aQRD–RLS based system for ANC, see equation (5.10). This has lead to the followingupdate equation 0 rT2 (k + 1)

0 rT1 (k + 1)R(k + 1) B(k + 1)

= (6.19)

QT

(k + 1)

√

1− λ2sxT (k + 1) 0√

1− λ2sµ

2vT (k) 0λsR(k) 1

λsB(k)

.

Herev(k) is taken from a noise buffer. The residual signalr2(k + 1) may be used togenerate residuals, while the residual signalr1(k + 1) should be discarded. Duringnoise–only periods, the updates forB(k) remain the same as in the non–regularizedcase.

It is important to see that the property on which the derivation of fast RLS–schemesis based, namely the shift structure of the input signal, is not present anymore in thiscase (where two consecutive updates are applied). Eachx(k) is a shifted version ofx(k − 1), and eachv(k) is a shifted version ofv(k − 1). But since they are appliedto the left hand side of the signal flow graph intermittently, each input vector isnot ashifted version of the previous one. Effectively the input vectors now correspond to a(weighted) block Toeplitz structure instead of just a Toeplitz structure.

Equation (6.19) can be implemented in signal flow graphs likeFigure 6.1andFigure6.2 by applying both updates ’at the same time’. This is realized by replacing eachsingle hexagon in the signal flow graph with two hexagons. The first one performsthe rotation with the input signal, and the second one subsequently performs the ro-tation with the regularization noise. This is shown inFigure 6.9 for the hexagonsrepresenting Givens rotations. The same substitution can be applied for the rotationsthat represent Gauss rotations. As a result, since the number of hexagons doubles, for

6.5. NOISE REDUCTION VS. SIGNAL DISTORTION TRADE–OFF 133

each rotation parameter generated and applied in the original scheme (the thick grayarrows), now two of them are generated and applied in the modified scheme.

Sig in

Sig in Reg.in.

=

Reg.in.Sig in

a b c

Figure 6.9: Doubling the lines and the hexagons in the signal flow graphs

We will now describe two alternatives for implementing regularization in a QRD–LSLbased noise cancellation algorithm. The first implementation will use a noise buffer,and it will be based upon the QRD–LSL based noise cancellation algorithm derivedearlier in this chapter (Algorithm 16 ).

The second method will be based upon a standard QRD–LSL adaptive filter. It avoidsthe use of a noise buffer, and it provides a regularization mechanism which will putmore emphasis on noise cancellation during noise–only periods.

6.5.2 Regularization using a noise buffer

The fast algorithm which incorporates regularization can be straightforwardly derivedfrom Figure 6.1andFigure 6.2, modified as described inFigure 6.9.

In Figure 6.10, the complete scheme is shown. Compare this scheme toFigure 6.4and note that the thick lines are as a matter of fact ’vector signals’ which carry 2–vectors with both signal– and regularisation noise samples, corresponding toFigure6.9. Note also the extra column on the right hand side which calculates the residualsbased upon the memory elements in the one but last right hand side column. The blackarrows which depict the rotations are demultiplexed just before this column, and therotations stemming from updates with regularization noise are discarded (cfr.Figure6.9). This corresponds to discarding the residualr1(k+1) in (6.19), and only retainingther2(k + 1).

During speech+noise/echo mode, an update is done with left hand side microphoneinputsx(k) and left hand side regularization inputsµ2v(k) (taken from a noise buffer).The right hand side input is 0.


1

1

22

flag g


λ=λλ=λ

sn

2

∆

∆ ∆

∆ ∆

∆ ∆

∆

LS residual

x (k+1)

R22

0

0

0

0

0

0

0

0

1

0

0

0

0

Πcos θ

0

0

0

0

0

0

R11

0

0

0

0

Signal/noise flag f

Signal/noise flag f

1−λReg. Input 1

Reg. Input 2

Reg. Input 1

Mic Input 1Mic Input 1

Mic Input 2

x (k+2)

x (k+2) x (k+1)

f

0

Figure 6.10: Regularization in QRD–LSL based noise cancellation using a noise buffer.


During noise/echo mode, both left hand side inputs and right hand side inputs arev(k).

An algorithm description is given inAlgorithm 17 .

6.5.3 Mode–dependent regularization

As an alternative, we proposenot to keep (R(k)) fixed duringnoise–onlyperiods,but to update it continuously, be it with a forgetting factorλs (long window) duringspeech+noise periods and with forgetting factorλn (short window) during noise–onlyperiods. In this case, the statistics of the near–end (desired speech signal) componentare ’forgotten’ by the weighting scheme during noise–only periods, but experimentsshow that this approach delivers good results concerning ANC. Simulations will begiven in section 7.7 where this algorithm will be applied for combined noise/echo can-cellation. The statistics from the noise–only period (estimated with a short window)then serve as a good ’starting value’ for the estimation of the speech+noise–statisticsduring speech+noise periods (with a large window).

Due to the fact that the statistics from the near end source are indeed forgotten dur-ing noise periods, the speech signal sounds a bit ’muffled’ at the beginning of aspeech+noise period. But when the forgetting factors are chosen appropriately, thiscan be reduced to a hardly noticeable level.

A feature of this approach is that during noise–only periods, the system output isreduced to zero. This can be understood as follows. If we allowRT (k)R(k) to be up-dated also during noise–periods, as we propose here, the influence of the speech+noisecovariance estimate will gradually be ’forgotten’ during noise periods. This in factcorresponds to increasingµ in (5.10). The estimate will now converge toV T (k)V (k),which corresponds toµ→∞ (henceWN → I andW → 0) during noise–only peri-ods.

On the other hand, during speech+noise periods, the regularization–effect will gradu-ally be forgotten, resulting in a slight increase in the noise level during speech+noise,in exchange for a less distorted signal during signal+noise–periods. Hence there maybe a slightly distorted speech signal in the beginning of a speech+noise period.

This procedure can be thought of as a trade–off system that regulates itself : whenthere is no near end activity, it provides infinite noise reduction, but when there isa near end signal present, the signal quality gains importance in the optimisation.A listening test shows that good results can be achieved after some tuning of theparameters.

In order to derive a QRD–LSL based algorithm that continuously updatesR(k), we


Algorithm 17 QRD–LSL noise reduction algorithm with regularization. Inputs are’ mode’ and ’x’, the input vector

PiCos=1; xl = x; xr = x; delay[0] = x;if mode=Noise { bIn = delay[1]; add x to noise buffer; bn = 0}else { bIn = 0; bn=get noise vector from noise buffer; bn = µ * bn}extra=bIn; bnl = bn; bnr = bn; ndelay[0] = bnfor (int i=0; i < N; i++)

dxl = delay[i+1]; dxr = dxl; dbnl = ndelay[i+1]; dbnr = dbnl;if (mode = Signal)

Givens = ComputeGivens(Comp2[i], dxr, dWeight)ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, extra, dWeight, Pi-

Cos)Givens = ComputeGivens(Comp1[i], xl, dWeight)ApplyGivens(Givens, Rot1[i], dxl, dWeight)Givens = ComputeGivens(Comp2[i], dbnr, 1)ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, 1)Givens = ComputeGivens(Comp1[i], bnl, dWeight)ApplyGivens(Givens, Rot1[i], dbnl, 1)

if (mode = Noise) //different transforma-tions on Rot2/xr and z/zIn

Gauss = ComputeGauss(Comp2[i], dxr)ApplyGauss(Gauss, Rot2[i], xr, b[i], bIn, dNoiseWeight)Gauss = ComputeGauss(Comp1[i], xl)ApplyGauss(Gauss, Rot1[i], dxl)

if (mode = SigToNoise) //Left hand side still weighted dur-ing transition

if xr.IsFromNoisePeriod and dxr.IsFromNoisePeriodGauss = ComputeGauss(Comp2[i], dxr, dWeight)ApplyGauss(Gauss, Rot2[i], xr, dWeight)Gauss = ComputeGauss(Comp1[i], xl)ApplyGauss(Gauss, Rot1[i], dxl)

else if xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriodxr = 0; dbnr = 0; //dxl is not changed hereGivens = ComputeGivens(Comp2[i], dxr, dWeight)ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, ex-

tra, dWeight, PiCos)Givens = ComputeGivens(Comp2[i], dbnr, dWeight)ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, dWeight)

else if not xr.IsFromNoisePeriod and not dxr.IsFromNoisePeriodGivens = ComputeGivens(Comp2[i], dxr, dWeight)ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, ex-

tra, dWeight, PiCos)Givens = ComputeGivens(Comp1[i], xl, dWeight)ApplyGivens(Givens, Rot1[i], dxl, dWeight)Givens = ComputeGivens(Comp2[i], dbnr, 1)ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, dWeight)Givens = ComputeGivens(Comp1[i], bnl, 1)ApplyGivens(Givens, Rot1[i], dbnl, 1)

if (mode = NoiseToSig) //xr is not modified in this caseif this is the first sample in a NoiseToSignal transition

Gauss = ComputeGauss(Comp2[i], dxr)ApplyGauss(Gauss, z[i], zIn, dNoiseWeight)dxl=0;dbnl = 0;Givens = ComputeGivens(Comp1[i], xl, dWeight)ApplyGivens(Givens, Rot1[i], dxl, dWeight)Givens = ComputeGivens(Comp1[i], bnl, 1)ApplyGivens(Givens, Rot1[i], dbnl, 1)

else //as in a Signal period :Givens = ComputeGivens(Comp2[i], dxr, dWeight)ApplyGivens(Givens, Rot2[i], xr, b[i], bIn, ex-

tra, dWeight, PiCos)Givens = ComputeGivens(Comp1[i], xl, dWeight)ApplyGivens(Givens, Rot1[i], dxl, dWeight)Givens = ComputeGivens(Comp2[i], dbnr, dWeight)ApplyGivens(Givens, Rot2[i], bnr, b[i], bIn, dWeight)Givens = ComputeGivens(Comp1[i], bnl, dWeight)ApplyGivens(Givens, Rot1[i], dbnl, dWeight)

xl = xr; bnl = bnr;for (int i = N-2; i >=0 ; i--

){delay[i+1]=delay[i];ndelay[i+1]=ndelay[i]}return extra*dPiCos;


can write system (6.2) withβ = 1(X(k)V (k)

)W

N=(

0V (k)

).

The normal equations are

(XT (k)X(k) + V T (k)V (k))WN

= V T (k)V (k).

During speech+noise periods, the termV T (k)V (k) in the left hand side will becomeunimportant due to weighting, and the system will converge to

XT (k)X(k)WN

= V T (k)V (k),

(V T (k)V (k) +DT (k)D(k))WN

= V T (k)V (k),

whereD(k) is the desired speech signal. In a QRD–LSL filter, this is achieved by firstweighting both the left hand side and the right hand side withλs, and then applying aleft hand side inputu(k) and a right hand side input0.

During noise–only periods, the termXT (k)X(k) in the left hand side will be ’forgot-ten’ , and the system will converge to

V T (k)V (k)WN

= V T (k)V (k),

such that after convergence

WN

= I,

providing a ’perfect’ noise estimate.

In this mode, both left– and right hand side of the QRD–LSL adaptive filter areweighted withλn, to the left hand side the inputvT (k) is applied, and to the righthand side alsovT (k).

Note that both the left hand sideR(k) and the right hand sideB(k) are updated to-gether. This is possible since during noise only periods,R(k) will converge to thecholesky factor of the noise correlation matrix (because the desired speech statisticsare forgotten due to the weighting). After convergence, we can write the right handside during noise–only periods as

B(k) = R−T (k)V T (k)V (k)= R−T (k)RT (k)R(k)= R(k).

So the right hand side converges toR(k) (or, in a practical implementation, to thecolumn ofR(k) which corresponds to the chosen right hand side). This can indeedbe achieved by applying the input vectors to the left hand side and the right hand sidetogether.


Transitions between modes.WhenR(k) is continuously updated, the choice ofλs andλn is very important. During speech+noise periods,λs should be chosen close enoughto 1 (e.g.λs = 0.999999 for 8000 Hz sampling rate). During noise–only periods,λncan be chosen smaller (shorter windows) for many types of noise (λn = 0.9997 for8000 Hz sampling rate) so that convergence to the estimate during noise–only periodsis very fast. On transitions between modes, the weighting is switched betweenλn andλs . In a QRD–LSL filter, the shift structure in the input signal must be maintained.This is not the case if classification into signal+noise and noise–only periods is doneon a per input vector base.

We can solve this by introducing a pre– and post–windowing scheme of the inputvectors : on a transition, we will feedN zeroes to the algorithm’s input, and thenswitch the weighting parameters. This means that the residual signal is wrong duringthe transition period, but the estimates ofR(k) in the algorithm will remain correct.

The lack of a correct output during transitions can be solved by inserting comfortnoise. But on the other hand, experiments show that good results are obtained whenthe pre– and post–windowing are ignored, and the weighting factors are switched ona transition. Errors that are introduced in the estimate forR(k) in this way, will be’forgotten’ by the weighting.

For an algorithm description, we refer to the QRD–LSL algorithm (Algorithm 6 ) withinputs as described in this section.

6.6 Complexity

In the complexity calculations, an addition and a multiplication are counted as twoseparate floating point operations.Table 6.1shows the complexities of different op-timal filtering algorithms, andTable 6.2shows the complexities for some typical pa-rameter settings. It is seen that the QRD–LSL algorithm has a significantly lowercomputational complexity, compared to the GSVD–based and QRD–RLS–based al-gorithms, especially in the case where long filters are used (rightmost column). Thisresults in the QRD–LSL algorithm being suited for real time implementation.


Algorithm Mode Complexity

recursive GSVD[12][13] 27.5(MN)2

Full QRD (chapter 5) Noise–only (MN)2 + 3MN +M

Full QRD (chapter 5) Speech+noise 3.5(MN)2 + 15.5MN +M + 2

Fast QRD–LSL Noise–only 6M2N

Fast QRD–LSL Speech+noise (21N − 212 )M2 + 19MN − 7

2M

Fast QRD–LSL reg. (section 6.5.2) Speech+noise 2((21N − 212 )M2 + 19MN − 7

2M)

QRD–LSL (section 6.5.3) (21N − 212 )M2 + 19MN − 7

2M

Table 6.1: Complexities in flops per sample of different algorithms

Algorithm Mode N = 20,M = 5 N = 50,M = 2

recursive GSVD[12][13] 275 000 275 000

Full QRD (chapter 5) Noise–only 10 305 10 302

Full QRD (chapter 5) Speech+noise 36 557 36 554

Fast QRD–LSL Noise–only 3 000 1 200

Fast QRD–LSL Speech+noise 12 120 6 051

Table 6.2: Complexities in flops per sample for typical parameter settings. These figures makethe QRD–LSL algorithm suited for real time implementation.

6.7 Simulation results

For the simulations we used a simulated room environment, 4 microphones, a desiredspeaker at broadside angle and a noise source at 45 degrees. The signals are shortsentences, recorded at 8 kHz.

Figure 6.11compares the QRD–LSL–based optimal filtering method (without regu-larization) to the GSVD–based optimal filtering method. The QRD–based algorithmachieves roughly the same performance as to the GSVD–based method, as expected.

6.8 Conclusion

We derived a fast QRD–least squares lattice (QRD–LSL) based unconstrained optimalfiltering algorithm for multichannel ANC. The derivation of the QRD–LSL algorithm


3.5 4 4.5 5 5.5 6

x 104

−80

−70

−60

−50

−40

−30

Signal Energy

Time [samples]

Ene

rgy

[dB

]

GSVD−based QRD−LS−basedinput signal

Figure 6.11: Performance comparison of GSVD-based optimal filtering (dotted) and QRD–LSL–based optimal filtering (full line) versus the original signal (dashed). The performanceis equal. The energy of the signals are plotted. In the middle of the plot, a speech segmentis recognized. The algorithm ’sees’ the silence in the beginning of the plot also as a speechsegment, in order to provide a fair indication of the noise reduction during speech periods.

6.8. CONCLUSION 141

is based on a significantly reorganized version of the QRD–RLS–based unconstrainedoptimal filtering scheme of chapter 5. We have explicitly set up the transitions betweenspeech+noise– and noise–only periods in such a way that the correlation matrices thatare implicitly stored in this fast algorithm, correspond to the correlation matrices in theQRD–RLS based algorithm, which assures that the ’internal status’ of the algorithmis always correct. For typical parameter settings, a 8–fold complexity reduction is ob-tained compared to the QRD–RLS based algorithm without any performance penalty.This makes the approach affordable for real time implementation.

Some methods for incorporating regularization were also introduced, allowing for ob-taining more noise reduction in exchange for some signal distortion.

Chapter 7

Integrated noise and echocancellation

In this chapter, we will describe an approach to speech signal enhancement whereacoustic echo cancellation and noise reduction, which are traditionally handled sepa-rately, are combined in one integrated scheme. The optimization problem defined bythis scheme is solved adaptively using the QRD–based algorithms which were devel-oped in previous chapters. We show that the performance of the integrated scheme issuperior to the performance of traditional (cascading) schemes, while complexity iskept at an affordable level.

7.1 Introduction

An acoustic echo canceller (AEC) traditionally uses an adaptive filter with typicallya large number of filter taps, for instance 1000 taps for a signal sampled at 8000 Hz.The reason is that it aims to model the (first part of the) acoustic impulse response ofthe room, in this case the first 125 msec. Because of the length of the filter, one oftenhas to resort to cheap algorithms (frequency domain NLMS for example) in order tokeep complexity manageable.

Acoustic noise cancellers (ANC) typically have shorter filters (e.g. delay and sumbeamformers do not ’model’ the room impulse response, but are designed to have acertain spatial sensitivity pattern, which can obtained by relatively short filter lengths.).We are interested in ANC schemes that use multiple channels of audio (multiple mi-crophones) in order to exploit both spatial and spectral characteristics of the desiredand disturbing signals.

143

144 CHAPTER 7. INTEGRATED NOISE AND ECHO CANCELLATION

In many applications, for example teleconferencing systems, hand free telephone setsor voice controlled systems, one has to combine both acoustic echo and noise cancel-lation (AENC). Many different AENC–schemes can be found in literature [1, 7, 37,38, 6]. Obviously, the combination of both blocks can be done in two ways, as shownby Figure 7.1 : either one applies echo cancellation on each of the microphone chan-nels before the noise reduction block, or one applies one echo canceller on the outputsignal of the noise reduction block. The latter scheme has the advantage of reducedcomplexity, but then studies have shown that the former combination (first AEC, thenANC) has better performance.

+ + +

+

+

+

Figure 7.1: Two ways to combine an acoustic echo canceller with a multichannel noise reduc-tion system. Left : first noise reduction, then echo cancellation on the ANC–output. Right : firstan echo canceller on each channel, then noise reduction on the residual signals.

The mere combination (cascading) of these schemes has implications on the perfor-mance of the overall system. For the case where AEC–filters are applied on eachchannel before the ANC, the adaptive algorithms used in the AEC should be robustagainst the noise in the microphone signals (which is for example a problem for filtersbased upon affine projection (see chapter 3 and [27])). It is then also the ANC’s taskto remove the residual echo independently of the AEC. If on the other hand ANC isapplied before AEC, the ANC is fed with a signal that also contains the far end echo–signal. The AEC in this case has to track the (changing) acoustic path of the roomand the changes in the ANC filter. So both combination schemes clearly have theirdisadvantages vis–a–vis performance.

In this chapter, we propose to combine the AEC and the ANC into one single opti-misation problem which is then solved adaptively, seeFigure 7.2. This will lead toa better overall performance. It will be shown that the lengthNaec of the ’AEC–part’of the integrated scheme can be reduced significantly compared to the filter length intraditional echo cancellers, without incurring a major performance loss. The reducedfilter lengths then allows us to use more advanced adaptive algorithms, which have bet-ter convergence properties than e.g. NLMS. The algorithms in this chapter are basedupon QRD–based unconstrained optimal filtering methods for ANC, as described inchapters 5 and 6.

7.2. OPTIMAL FILTERING BASED AENC 145

When multichannel acoustic echo cancellation would be required, one will be facingthe same problems as in chapter 3. Again decorrelation techniques would have to beapplied to remove the correlation between the loudspeaker signals. In this chapterwe will make abstraction of this, and demonstrate the combined approach with monoecho cancellation.

The outline of this chapter is as follows. In section 7.2 we will describe the situation,and the optimization that we will perform. In section 7.3 the estimates for the statisticsare described. Section 7.4 describes the QRD–RLS based algorithm to implementthe optimization and section 7.5 describes the QRD–LSL approach. In section 7.6we describe how regularization (a trade–off parameter) is introduced. Section 7.7evaluates the performance of the combined acoustic echo and noise canceller, section7.8 gives complexity figures, and section 7.9 gives conclusions.

��

��

��

��

w1

w2

w3

w4

Noise

Noise

Noise

h

h

x

x

x

x

1

2

3

4

h1

2

3

4Speech component in X

Original speech signal s

is desired output signal d

h

y+he

N Naec

ANCpart

AECpart

(unknown)

Far end signal f

Figure 7.2: Combined AENC–scheme. The filters in theM microphone paths have lengthN ,the filter connected to the far end path has lengthNaec. The (unknown) desired near end speechsignal isd.

7.2 Optimal filtering based AENC

Referring toFigure 7.2, the near end speech component in thei’th microphone at timek is

di(k) = hi(k)⊗

s(k) i = 1 . . .M, (7.1)

whereM is the number of microphones,s(k) is the near end signal andhi(k) repre-sents the acoustic path between the speech source and microphonei. The echo signals


areei(k) = hei (k)

⊗f(k) i = 1 . . .M,

wheref(k) is the loudspeaker signal andhei (k) is the acoustic path between the loud-speaker and thei’th microphone.

An assumption we will make is that both the noise and the echo signal are continu-ously present, which effectively means that in the resulting scheme filter adaptationwill be frozen during off–periods of the echo and/or noise (see below). Then we candistinguish 2 modes in the input signals : first thespeech+noise/echomode, for whichwe will denote the microphone samples withx(k), and second thenoise/echo–onlymode, for which we write the inputs asx′(k). The i’th microphone signal during aspeech+noise/echo period is

xi(k) = di(k) + ni(k) + ei(k) i = 1 . . .M= di(k) + vi(k),

and during a noise/echo–only period

x′i(k) = ni(k) + ei(k)= vi(k),

whereni(k) is the noise component (sum of the contributions of all noise sources atmicrophonei). We define the microphone input vector

x(k) =

x1(k)x2(k)

...xM (k)

xi(k) =

xi(k)

xi(k − 1)...

xi(k −N + 1)

,

whereN is the number of taps for each of the filters in the ANC–part of the scheme.The noise/echo only microphone signal vectorx′(k), the desired speech vectord(k),the echo signal vectore(k) and the noise vectorsn(k) are defined in a similar way.Furthermorex(k) = d(k) + v(k) andv(k) = n(k) + e(k). The loudspeaker signalvector is

f(k) =

f(k)

f(k − 1)...

f(k −Naec+ 1)

,

whereNaec is the number of taps in the AEC–part. We define a compound signalvector during speech+noise/echo periods as

u(k) =(

x(k)f(k)

),


and during noise/echo–only periods

u′(k) =(

x′(k)f(k)

).

The following assumptions are made :

• The noise and echo signals are uncorrelated with the speech signal. This resultsin

ε{x(k)xT (k)} = ε{d(k)dT (k)}+ ε{cross terms}︸︷︷︸=0

+ε{v(k)vT (k)}

⇓ε{d(k)dT (k)} = ε{x(k)xT (k)} − ε{v(k)vT (k)},

andε{f(k)dT (k)} = 0.

Hereε{·} is the expectation operator.

• The noise and echo signals are stationary as compared to the near end speechsignal (by which we mean that their statistics change slower). This assump-tion allows us to estimateε{v(k)vT (k)} during periods in which only noiseand echo are present, i.e. wherex′(k) = v(k). This is a classical assumptionfor ANC systems, but it can be argumented that it does not hold here since theecho signale(k) typically is not stationary. However, allthough the assumptionis not fullfilled for thespectralcontent ofv(k), it is true for thespatial con-tent, since we assume that the loudspeaker which producesf(k) does not move.Experiments confirm the validity of this assumption (see section 7.7).

• The noise and echo signals are always present while the near–end signal issometimes present, i.e. is an on/off signal. One scenario in which this assump-tion is obviously fullfilled is a voice command application where the echo signalis e.g. a music signal and the near end signal consists of the voice commands.When the echo signal is also a speech signal, hence also an on/off signal, wecan either switch off the adaptation during periods where the far end signal isnot present (as is done in traditional echo cancellers), or (when adaptation is notswitched off) allow that the algorithm — during long off periods of the echosignal — ’forgets’ the position of the far end loudspeaker to which it wouldnormally — in a beam forming interpretation — attempt to steer a zero.

We can now write the optimal filtering problem as

minWwf

∥∥ε{uT (k)Wwf − dT (k)}∥∥2

F, (7.2)


with u(k) the filter input andd(k) the desired filter output, i.e. the (unknown) desiredspeech contribution in all the (delayed) microphone signals, see (7.1). The signalestimate is then

dTwf(k) = uT (k)Wwf(k)

=(

d(k) + v(k)f(k)

)TWwf(k).

The Wiener solution is

Wwf = (ε{u(k)uT (k)})−1ε{u(k)dT (k)}

= (ε{u(k)uT (k)})−1ε{u(k)(

uT (k)(I0

)− vT (k)

)}

=(I0

)− (ε{u(k)uT (k)})−1ε

(u(k)vT (k)

),

so that finally

Wwf =(I0

)− ε{u(k)uT (k)}−1ε{u′(k)x′(k)}. (7.3)

HereI is the identity matrix.

We will also use a regularization term in the optimization criterion. Referring to [22],a parameterµ can be used to trade off signal distortion, which is defined as (dT (k)−(

dT (k) 0)Wµ

wf(k)), versus residual noise/echo defined as (

(v(k)f(k)

)TWµ

wf). We

will use a similar, but slightly different approach. We define an optimization criterion

minwµ

wf

∥∥ε{uT (k)Wµwf(k)− dT (k)}

∥∥2

F+ µ

∥∥ε{( vT (k) fT (k))Wµ

wf(k)}∥∥2

F. (7.4)


Now the Wiener–solution is

Wµwf = (ε{u(k)uT (k) + µ2u′(k)u′T (k)})−1.

ε{u(k))dT (k)}= (ε{u(k)uT (k) + µ2u′(k)u′T (k)})−1.

ε{u(k)(uT (k)(I0

)− vT (k))}

= (ε{u(k)uT (k) + µ2u′(k)u′T (k)})−1.

(ε{u(k)uT (k)}(I0

)− ε{u(k)vT (k)})

= (ε{u(k)uT (k) + µ2u′(k)u′T (k)})−1.

(ε{u(k)uT (k) + µ2u′(k)u′T (k)}(I0

)−

ε{µ2u′(k)u′T (k)}(I0

)− ε{u(k)vT (k)})

=(I0

)− (ε{u(k)uT (k) + µ2u′(k)u′T (k)})−1.

(ε{µ2u′(k)x′T (k)}+ ε{u′(k)x′T (k)}),

so that finally

Wµwf =

(I0

)− (

11 + µ2

ε{u(k)uT (k)}+µ2

1 + µ2ε{u′(k)u′T (k)})−1.

ε{u′(k)x′T (k)} (7.5)

If all statistical quantities in the above formula were available,Wwf andWµwf could

straightforwardly be computed.Wwf or Wµwf is then a matrix of which each column

provides an optimal(MN + Naec)–taps filter. One of these columns can then bechosen (arbitrarily) to optimally estimate the speech part in the corresponding entryof x(k), i.e. filter out the noise/echo in one specific (delayed) microphone signal. Inpractice, of course, not the whole matrix will be calculated, but only a selected columnof it.


7.3 Data driven approach

A data–driven approach will be based on data matricesU(k),U ′(k) andX ′(k), whichare conveniently defined as

U(k) =√

1− λ2s

uT (k)

λsuT (k − 1)λ2su

T (k − 2)...

, (7.6)

U ′(k) =√

1− λ2n

u′T (k)

λnu′T (k − 1)λ2nu′T (k − 2)

...

, (7.7)

X ′(k) = U ′(k)(I0

),

whereλs denotes the forgetting factor for the speech+noise/echo data, andλn theforgetting factor for the noise/echo–only data. In order to compute (7.3), we wantUT (k)U(k) to be an estimate ofε{u(k)uT (k)}, i.e. ε{u(k)uT (k)} ≈ UT (k)U(k).This is realised by the above definition ofU(k), as can be verified from the corre-sponding update formulas

UT (k + 1)U(k + 1) =λ2sU

T (k)U(k) + (1− λ2s)u(k + 1)uT (k + 1). (7.8)

Such updates may be calculated during speech+signal/noise periods. The other esti-mate we need in (7.3) is

ε{u′(k)x′T (k)} ∼= U ′T (k)X ′(k).

This is realized by the definition ofU ′(k) andX ′(k) , as the corresponding updateformula can be verified from

U ′T (k + 1)X ′(k + 1) = (7.9)

λ2nU′T (k)X ′(k) + (1− λ2

n)u′(k + 1)x′T (k + 1),

which can be calculated during noise/echo–only periods.

Allthough (7.8) and (7.9) ensure that the estimates are correct, it may be interesting ina practical application to divide both (7.8) and (7.9) bymin((1 − λ2

n), (1 − λ2s)), in

order to avoid multiplication and division by very small numbers, and then correct forthis in the final result. For the theoretical derivation of the algorithm, we will continueto work with the unmodified equations (7.8) and (7.9).

7.4. QRD–RLS BASED ALGORITHM 151

7.4 QRD–RLS based algorithm

Using the QR–decomposition we can write

W (k) = (RT (k)R(k))−1(RT (k)R(k)(I0

)− U ′T (k)X ′(k))

=(I0

)−R−1(k)R−T (k)U ′T (k)X ′(k). (7.10)

DefineWN (k) =(I0

)−W (k), then

WN (k) = R−1(k)R−T (k)U ′T (k)X ′(k)︸︷︷︸≡B(k)

. (7.11)

We will again store and update bothR(k) andB(k) so that at any timeWN (k) canbe computed by backsubstitution

R(k)WN (k) = B(k). (7.12)

Only one column ofB(k) has to be stored and updated, thus providing a signal ornoise/echo estimate for the corresponding microphone signal.

7.4.1 Speech+noise/echo updates

R(k) andB(k) can be updated as

(0 rT (k + 1)

R(k + 1) B(k + 1)

)= (7.13)

QT

(k + 1)( √

1− λ2su

T (k + 1) 0λsR(k) 1

λsB(k)

).

The optimal filter coefficients can be computed by backsubstitution (equation (7.12)),or the least squares residuals are then obtained by multiplying the elements ofr(k+1)in formula (7.13) by the product of the cosines of the Givens rotation angles. Fordetails, we refer to chapter 5.

This rearrangement results in the signal flow graph (SFG) inFigure 7.3 for M =2, N = 4, Naec = 6.

All signal flow graphs shown in this chapter again have rearranged input vectorsu(k)


instead ofu(k), as follows

u(k) =

x1(k)...

xM (k)f(k)

−−−−−−−x1(k − 1)

...xM (k −Naec+ 1)f(k −Naec+ 1)−−−−−−−f(k −Naec)

...f(k −Naec+ 1)

.

The residualsyn(k) generated by this SFG are the noise+echo signal estimates :

yTn (k + 1) = 0− u(k + 1)WN (k + 1)√

1− λ2s. (7.14)

The overall output signal (the estimate for the near end speech signal) can then bewritten as

d(k + 1) =

u1(k + 1)u2(k + 1)

...uM (k + 1)

− yn(k + 1)√1− λ2

s

.

7.4.2 Noise/echo–only updates

During noise/echo–only periods,R(k) remains unchanged, while

B(k) = R−T (k)U ′(k)TX ′(k)

has to be updated. From equation (7.9), we find that

B(k + 1) = λ2nB(k) + (R−T (k + 1)

√1− λ2

nu′(k + 1))√

1− λ2nx′T (k + 1)}).

GivenR(k + 1), we can computea(k + 1) = (R−T (k + 1)u′(k + 1)) by a backsub-stitution in

RT (k + 1)a(k + 1) = u′(k + 1)

B(k + 1) = λ2nB(k) + a(k + 1)vT (k + 1)(1− λ2

n),

which should be substituted in the memory cells on the right hand side inFigure7.3 during noise/echo–only mode. It is — just as in chapter 5 — possible to also

7.4. QRD–RLS BASED ALGORITHM 153

1/λ

λ

(delay)

(delay)

memory cell

memory cell

a’

b’b

a

rotation cell

θ θ

x

x (k)1

x (k)2

2

f (k)

0

0

0

0

∆

∆

∆

∆

∆

∆ ∆ ∆

∆

∆

∆

εε

cosΠ θ

0

0

0

0

0

0

0

0

0

0

1−λ

mic 2

echo ref

mic 1

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

LS residual

00

y (k)y (k)1 1

Figure 7.3: Signal flow graph for residual extraction.Naec = 6, N = 4, M = 2. The signalflow graph is executed during speech+noise/echo mode, while only the memory elements in theright hand side frame are updated during noise/echo–only mode (as described in section 7.4.2)


generate residuals in noise/echo–only mode, by executing the signal flow graph in’frozen mode’.

An algorithm description can be found in

Algorithm 18 QRD–RLS algorithm for AENCQRDRLS_AENC_update (R, x, r, Weight){// x is the input vector// r = x[1] in signal mode, r=0 in noise/echo modePiCos = 1;for (i = 0; i < M * N + Naec; i++){

R[i][i] *= Weight;temp = sqrt (R[i][i] * R[i][i] + x[i] * x[i]);sinTheta = x[i] / temp;cosTheta = R[i][i] / temp;R[i][0] = temp;for (j = i+1; j < M * N + Naec; j++)

{temp = R[i][j] * Weight;R[i][j] = cosTheta * temp + sinTheta * x[j];x[j] = -sinTheta * temp + cosTheta * x[j];}

temp = z[i] / Weight;z[i] = cosTheta * temp + sinTheta * r;r = -sinTheta * temp + cosTheta * r;PiCos *= cosTheta;}

return r * PiCos;}

7.5 QRD–LSL algorithm

We will use the QRD–LSL based algorithm for acoustic noise cancellation which wasderived in chapter 6 as a basis for the QRD–LSL based algorithm for combined echo–and noise cancellation. Let us consider the alternative minimization problem withweighting schemes (7.8) and (7.9) :

minWN

fast

∥∥∥∥( U(k)βU ′(k)

)WN

fast−(

01βX′(k)

)∥∥∥∥ . (7.15)


(UT (k)U(k) + β2U ′T (k)U ′(k))WN

fast(k) = (U ′T (k)X ′(k)). (7.16)

These can be solved forWNfast(k). If β → 0, thenWN

fast(k) → WN (k). This schemeis updated withu(k) as an input and0 as the desired signal during speech+noise/echo

7.6. REGULARIZED AENC 155

periods, and withu′(k) as input andx′(k) as desired signal during noise/echo – onlyperiods.

Residual extraction can be used to obtain noise estimates , which can then be sub-tracted from the input signal in order to get clean signal estimates.

Since often one will want to use more filter taps for the AEC part than for the ANC part(Naeccan be made longer thanN ), one can alternatively use a scheme for QRD–LSLwith unequal channel lengths. We refer to [46] for details, examples of signal flowgraphs with unequal channel lengths are shown below. For an algorithm description,we refer toAlgorithm 16 , with inputs as described in this section (the input vector isextended with a channel containing the echo reference signal).

7.6 Regularized AENC

Experiments show that a better noise/echo cancellation can be obtained by modifyingthe optimization function where more emphasis is put on the noise/echo cancellationterm, at the expense of increased signal distortion. As a matter of fact, this regulariza-tion is indispensable when combined echo– and noise cancellation is involved. Thecorresponding optimization problem is given in (5.10).

The update equation becomes

0 rT2 (k + 1)0 rT1 (k + 1)

R(k + 1) B(k + 1)

= (7.17)

QT

(k + 1)

√

1− λ2su

T (k + 1) 0√1− λ2

sµ2u′T (k) 0

λsR(k) 1λsB(k)

,

wherex′(k) is taken out of a noise buffer. During noise–only periods,B(k) is updatedas in section 7.4.2.

7.6.1 Regularization using a noise/echo buffer

The QRD–LSL based noise cancellation algorithm with regularization from section6.5.2 in chapter 6 can be used as a basis for a QRD–LSL based acoustic noise andecho cancelling algorithm. IfNaec > N a QRD–LSL structure with unequal channellengths can be used[46].

During speech+noise/echo mode, an update is done with microphone inputsu(k), and


regularization inputsµ2x′(k), taken from a noise buffer. The right hand side inputsare 0.

During noise/echo mode, the inputs for the left hand side areu′(k), and the inputs forthe right hand side arex′(k). For an algorithm description, we refer toAlgorithm17, where the input vector is extended with one channel containing the echo referencesignal.

7.6.2 Mode–dependent regularization

In order to use the alternative noise cancellation algorithm from section 6.5.3 as abasis for combined noise/echo cancellation, we we write (7.15) withβ = 1 :

(U(k)U ′(k)

)W

N

fast =(

0X ′(k)

).


(UT (k)U(k) + U ′T (k)U ′(k))W

N

fast = U ′T (k)X ′(k).

During speech+noise/echo periods, due to weighting the system will converge to

UT (k)U(k)WN

fast = U ′T (k)X ′(k)

((V T (k)FT (k)

)(V (k) F (k)

)+

(DT (k) 0

)( D(k)0

))W

N

fast = U ′T (k)X ′(k),

whereD(k) is the desired speech signal. In the QRD–LSL filter inFigure 7.4, thisis achieved by first weighting both the left hand side and the right hand side withλs,and then applying a left hand side inputu(k) and a right hand side input0.

During noise/echo–only periods, the system converges to

U ′T (k)U ′(k)W

N

fast = U ′T (k)X ′(k),

such that after convergence

WN

fast =(I0

).

In this mode, both left– and right hand sideFigure 7.4are weighted withλn, and the

to the left hand side the inputu′T (k) =(

x′T (k) 0)

is applied, and to the right

hand sidex′T (k).

7.7. PERFORMANCE 157

During transitions between modes, pre–and post–windowing should be used, as ex-plained in chapter 6.

For an algorithm description we refer toAlgorithm 6 , with inputs as described in thissection.

7.7 Performance

To set up a performance comparison, we have implemented a conventional cascadedmultichannel scheme (right hand side ofFigure 7.1) consisting of two blocks : firstthe echo is removed from all of the microphone channels, and then the signals areprocessed by a noise cancellation scheme. For the echo cancellers we have chosenan RLS–algorithm (QRD–lattice) which is not often used in practice because of itscomplexity, but this assures us that we achieve the best possible result for the two–block–scheme. The noise cancellation algorithm used is the QRD–based scheme from[56]. This ’traditional’ setup is compared to the integrated approach from section7.6.2.

The sampling frequency was8kHz. A simulated room environment was used, with4 microphones spaced 20 centimeter. The near end speaker is located at about 10degrees from broadside, a white noise source is at 45 degrees, and the loudspeaker forthe far end signal at -45 degrees. The near end speaker utters a phrase with decreasingenergy, so the signal to noise+echo ratio varies from -10 dB in the beginning of thephrase to -40 dB at the end of the phrase (Figure 7.5 shows some utterances of theused phrase). The signal to noise ratio varies from +13 dB in the beginning of thephrase to -14 dB at the end. The parameters used areλecho,trad = 0.9997 for theforgetting factors in the RLS–based traditional echo cancellers, andλs,trad = λn,trad =0.9997 for the forgetting factors for the noise cancellation algorithm in the traditionalsetup. No regularization was applied here. For the new method, we have chosenλs =0.999999 andλn = 0.9997. While the new methoddoesincorporate regularization,the simulations will show that less signal distortion results. The simulations compareonly speech+noise/echo periods, since the new algorithm suppresses all signal duringnoise/echo only periods, and this would not result in a relevant comparison.

All speech/noise detection has been done manually (which means a perfect speechdetector is assumed).

Figure 7.6 andFigure 7.7 show that the integrated approach outperforms the con-ventional method for a simulated acoustic path of 200 taps, and with an echo can-celler lengthNaec = 200, M = 4 microphones, andN = 40 taps per microphone.Both algorithms operate in speech+noise/echo–mode in this plot. The valleys (speechpauses) are up to 20 dB lower for the combined algorithm (which means more noisereduction), while the peaks are slightly higher, which shows that there is less signaldistortion.


x (k)1

2x (k)

x (k)3

2

Right hand sideinput

Left hand sideinputs

0

00 ∆ ∆∆

∆ ∆∆

∆ ∆∆

0

0

0

∆ ∆∆

0

0

0

∆0

0

0

0

∆

0

ε

cosΠ θ

1−λ

mic 2

echo ref

mic 1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

01

Figure 7.4: QRD–LSL AENC scheme. During speech+noise/echo–periodsλ = λs, resulting ina large window, and during noise/echo–only periods,λ = λn, resulting in a shorter window.


0 1 2 3 4 5 6 7 8 9

x 104

−140

−120

−100

−80

−60

−40

−20

0

Energy (dB)

Time (samples)

Figure 7.5: Some utterances of the phrase which was used for the simulations. The lower curveis the energy (in dB) of the clean speech signal which reaches the microphone, while the uppercurve is the energy of one of the microphone signals. This shows that the SENR varies from -10dB to -40 dB in each utterance.

0 0.5 1 1.5 2 2.5 3 3.5

x 104

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (samples)

Figure 7.6: Comparison of the output signal of the cascaded scheme (black) and the integratedapproach (light gray). Both algorithms operate in speech+noise/echo–mode.


0 0.5 1 1.5 2 2.5 3 3.5

x 104

−200

−180

−160

−140

−120

−100

−80

−60

−40

−20

0

AEC followed by ANC

Combined scheme

Time (samples)

Energy (dB)

Figure 7.7: Comparison of the energy in the output of the cascaded scheme with ANC followinga AEC, and of the integrated approach as described in this paper. Notice the deeper valleys(higher noise cancellation during speech pauses) and the higher peaks (less signal distortionduring speech) for the integrated algorithm. Both algorithms operate in speech+noise/echo–mode.

0.5 1 1.5 2 2.5 3

x 104

−110

−100

−90

−80

−70

−60

−50

−40

−30

−20

N=10, Naec

=100

N=20, Naec

=100

N=10, Naec

=200

N=20, Naec

=200

Energy (dB)

Time (samples)

Figure 7.8: Performance of the integrated scheme when undermodelling the echo path.


0.5 1 1.5 2 2.5 3

x 104

−100

−90

−80

−70

−60

−50

−40

−30

−20Energy (dB)

Time (samples)

Figure 7.9: Comparison for undermodelling : the cascaded approach (full line) can not handlethis situation as well as the integrated approach (dashed line) (both cases M=4,Naec = 100,N=40)

0.5 1 1.5 2 2.5

x 104

−100

−90

−80

−70

−60

−50

−40

−30

−20

Time (samples)

Energy (dB)

Figure 7.10: For almost the same total number of taps, the cascaded approach with sufficientorder for the echo path (M=4,Naec = 200, N=20, total taps = 280, full line) performs worsethan the integrated approach with an undermodeled echo path (M=4,Naec = 100, N=40, totaltaps = 260, dotted line). During the pauses between the word utterances, the difference is verylarge !


0.5 1 1.5 2 2.5 3

x 104

−110

−100

−90

−80

−70

−60

−50

−40

−30

−20

Energy (dB)

Time (samples)

Figure 7.11: Even if for both algorithms N=40, M=4, and the cascaded scheme is sufficientorder for the echo (Naec = 200) while the new scheme is undermodelled (Naec = 100) (dashedline), the integrated scheme still performs better because the noise/echo information is notprocessed in two independant stages.

The performance of the integrated approach for the case of undermodelling — whichis often the case in a realistic situation — of the echo–path is shown inFigure 7.8. Acomparison of this situation with the cascaded scheme is depicted inFigure 7.9.

The (undermodelling) echo–cancellers in the cascaded scheme will produce a largeinstantaneous misadjustment which is due to the non–stationarity of the far–end signal[57]. The —independently adapted— noise cancellation filter in the cascaded schemecan not compensate for this, since its input signal is disturbed by the behaviour of thefirst (AEC) block. The integrated approach is shown to handle this situation far better.

In Figure 7.10, an integrated scheme where the echo–canceller part is undermodelledis compared with a cascaded scheme with a sufficient order modelling of the echo pathwith (about) the same total number of filter taps. Also here the integrated approachoutperforms the conventional cascading scheme.

Finally, Figure 7.12shows that the echo canceller filter can indeed be made shorterdue to the advantageous effect of adding the noise filters. The performance with thenoise filters in a combined scheme in a noise free environment is better than withoutthe noise filters.


1 2 3 4 5 6 7 8 9

x 104

−110

−100

−90

−80

−70

−60

−50

−40

−30

−20

Energy (dB)

Time (samples)

Figure 7.12: Simulation in noise free environment shows that echo cancelling is aided by theM lengthN filters in the signal path. The dotted line is a combined scheme with a 300 tapsecho filter and 4 channels with each 25 taps noise filters. It is better than a 300 taps traditionalecho canceller alone (full line).


7.8 Complexity

In the complexity calculations, an addition and a multiplication are counted as twoseparate floating point operations.Table 7.1shows the complexities of the algorithms.

Algorithm Complexity

Full QRD, noise (MN +Naec)2 + 3M(N +Naec) +M

Full QRD, speech 3.5(MN +Naec)2 + 15.5(MN +Naec) +M + 2

Regul. Full QRD, speech 152 (MN +Naec)

2 + 692 (MN +Naec) + 3

Cont. Upd. QRD–LSL (21N − 212 )(M + 1)2 + 19(M + 1)N − 7

2 (M + 1) + 21(Naec−N)

Table 7.1: Complexities of different algorithms in flops per sample.M is the number of micro-phones,Naec =is the length for the AEC–part, andN is the number of filter taps per microphonechannel in the ANC part.

For a typical setting in a car environment,Naec = 200, N = 10, M = 3, the com-plexity for the new continuously updating QRD–LSL based technique is7912 flopsper sample. This is to be compared with the complexity of a cascaded scheme. TheQRD–LSL based noise cancellation algorithm we have derived in [51], has a com-plexity of 2346 flops per sample for these settings. An NLMS–based echo cancellerwould have a complexity of 800 flops per sample with these parameters. This meansthat a cascading scheme with first echo cancelling and then a QRD–LSL based noisecancellation scheme would amount to 3*800+2346=4746 flops per sample.

7.9 Conclusion

In this chapter, we have extended both QRD–RLS and QRD–LSL based schemes fornoise cancellation with an extra echo–reference input signal, thus proposing schemeswhich handle combined noise– and echo cancellation as one single optimization prob-lem. We have shown by simulations that the performance is better when such a globaloptimization problem is solved than when the traditional cascading approach is used.The complexity and performance figures show that allthough somewhat more com-plex, the more performant QRD–LSL based approach presented here can be appliedfor real time processing as an alternative for cascading techniques.

Chapter 8

Conclusions

In this thesis we have developed a number of techniques which can be used to ’cleanup’ a speech signal picked up in an adverse acoustic environment. We have madea distinction between disturbances for which a reference signal is available, and dis-turbances for which no reference signal is available. The first type of disturbancesgives rise to techniques which we classify as ’acoustic echo cancellation techniques’(AEC), while the second type of disturbances can be reduced by ’noise cancellationtechniques’ (ANC) . Acoustic echo cancellation is treated in the first part of the text,acoustic noise cancellation in the second part, and in the third part of the text, thecombination of both is discussed.

Acoustic echo cancellation

The NLMS algorithm is a cheap algorithm which may exhibit some performance prob-lems in case non–white input signals are used. On the other hand, the RLS–algorithm,which performs very well even for non–white signals, is much more expensive. Aclass of ’intermediate’ algorithms is the APA–family of adaptive filters. RLS andAPA can be seen as an NLMS–filter with pre–whitening applied.

Not too long ago, only NLMS–filters and even cheaper frequency domain variantswere used to implement acoustic echo cancellation, because of the long adaptive filtersinvolved, allthough both RLS and APA are well known to perform better. Due to theincrease of computing power over the years, also APA–filters more and more find theirway into this field, notably when multichannel acoustic echo cancellation is involved.

We have shown that if affine projection techniques are used for acoustic echo can-celling, it is important to provide sufficient regularization in order to obtain robustness

165

166 CHAPTER 8. CONCLUSIONS

for continuously present near end background noise. This is important in single chan-nel echo cancellation, but even more in multichannel, where the cross–correlationbetween the loudspeaker signals (and hence the input signals of the adaptive filter)leads to ill–conditioning of the problem.

We have pointed out the advantages and disadvantages of the FAP–algorithm. In itstraditional version, what we have called ’explicit regularization’ can easily be incor-porated into it, because it uses the FTF–algorithm for updating the sizeP correlationmatrix. On the other hand, it makes some assumptions concerning how much regular-ization is used, and it exposes problems when an exponential weighting technique isused for regularization. Another disadvantage is that periodic restarting of the estima-tion is necessary due to numerical stability problems in the FTF–algorithm.

We proposed to replace the update of the small correlation matrix of sizeP by a QRD–based updating procedure which is numerically stable. Explicit regularization is noteasily implemented in this QRD–based approach, and therefore we have described analternative approach, which we have called ’sparse equations technique’. This is atechnique that regularizes the affine projection algorithm if it is used with signals thathave a large autocorrelation only for a small lag (e.g. speech). It can be used as astand alone technique for regularization.

Unfortunately, this technique violates the assumptions on which the FAP algorithm isbased, which motivates the development of a fast APA algorithm which does not in-corporate the assumptions present in FAP. But the main reason for developing such analgorithm is the fact that if much regularization is used in FAP, the algorithm exhibitsproblems.

We have thus derived an exact frequency domain version of the affine projection algo-rithm, named Block Exact APA. This algorithm has a complexity that is comparablewith a frequency domain version of fast affine projection (namely BEFAP), and sinceit is an exact implementation of APA, the convergence characteristics of the originalaffine projection algorithm are maintained when regularization is applied, while thisis not the case when FAP–based fast versions of APA are used.

This algorithm was extended to allow for the ’sparse equations’ technique for regular-ization to be used.

Acoustic Noise Cancellation

In the literature, several noise cancellation schemes can be found. Most of theseschemes use multiple microphones in order to take advantage of the spatial charac-teristics of both speech and noise. Apart from the classical beamforming approaches,also unconstrained optimal filtering approaches exist. Traditionally they have beenbased upon singular value decomposition techniques, which inherently have a large

167

complexity.

We have derived a new QRD–based algorithm for unconstrained optimal multichannelfiltering with an “unknown” desired signal, and applied this to adaptive acoustic noisesuppression.

The same basic problem is solved as in related algorithms from the literature, butdue to the high computational complexity of the SVD–algorithm which is used in thetraditional techniques, often approximations (SVD–tracking) are introduced in orderto keep complexity manageable.

Using the QRD–approach results in a performance which is the same as that of theSVD–based algorithms, or even better since no approximations are required. Thecomplexity of the QRD–based optimal filtering technique is an order of magnitudelower than that of the (approximating) SVD–based approaches.

We have also introduced a ’trade–off’ parameter which allows for obtaining morenoise reduction in exchange for some (tolerable) signal distortion.

Besides the QRD–based approach, we have also derived a fast QRD–LSL–based al-gorithm and applied it on the ’unknown desired signal’–case which is encounteredin acoustic noise cancellation. This algorithm is based on a significantly reorganizedversion of the QRD–RLS–based unconstrained optimal filtering scheme. While theQRD–based unconstrained optimal filtering algorithm has a complexity which is anorder of magnitude lower than the complexity of the (approximating) SVD–trackingbased algorithm, fast QRD–LSL based unconstrained optimal filtering even achievesa complexity wich is about 8 times lower than QRD–based unconstrained optimalfiltering (for typical parameter settings).

Combination of echo and noise cancellation

We have extended both QRD–RLS and QRD–LSL based schemes for noise cancella-tion with an extra echo–reference input signal, thus proposing schemes which handlecombined noise– and echo cancellation as one single optimization problem. We haveshown by simulations that the performance is better when such a global optimizationproblem is solved than when the traditional cascading approach is used. The complex-ity and performance figures show that allthough somewhat more complex, the moreperformant QRD–LSL based approach can be applied for real time processing as analternative for cascading techniques.

168 CHAPTER 8. CONCLUSIONS

Further research

In the field of acoustic echo cancellation, no ’perfect’ solutions exist yet for multi-channel decorrelation. For speech signals non–linearities like half–wave rectifiers areproviding sufficiently good results, but in applications where multichannel audio isinvolved (voice command applications for audio devices), these solutions introduceintolerable distortion. This subject clearly requires more research.

The adaptive filtering techniques which form the core of acoustic echo cancellers arewell explored. For cheap consumer products NLMS and frequency domain adaptivefilters can be used, while a whole range of better (and more expensive) algorithms existif one can afford the extra complexity. For the class of noise cancellation algorithmswe have described in this thesis, namely the unconstrained MMSE–optimal filteringclass, only SVD–based and (as derived in this text) QRD–based algorithms exist. An-other interesting subject for future research would be if this topic could be handledby ’cheaper’ adaptive filtering algorithms, like perhaps APA–based filters. We wouldlike to refer to [23], where the NLMS–algorithm is used to implement unconstrainedoptimal filtering for multichannel noise cancellation.

Finding cheaper algorithms is even more important when the combination of echo–and noise reduction is considered as in chapter 7, since the filter length correspondingto the echo path is usually much larger than the filters used in the ’noise reduction’part of the algorithm. In traditional setups, where echo– and noise cancellation werehandled in two separate schemes which were cascaded, cheap filters could be used forthe long paths in echo cancelling, while more complex algorithms could be used forthe shorter noise reduction paths. But the experiments in chapter 7 clearly indicatethat there is an advantage in solving the combined problem as a whole, so it would beinteresting to invest time in trying to reduce the complexity of the integrated optimalfiltering approach.

Bibliography

[1] M. Ali. Stereophonic acoustic echo cancellation system using time–varying all–pass filtering for signal decorrelation. InICASSP. IEEE, 1998.

[2] F. Amand, J. Benesty, A. Gilloire, and Y. Grenier. A fast two–cahnnel projec-tion algorithm for stereophonic acoustic echo cancellation. InICASSP96. IEEE,1996.

[3] Duncan Bees, Maier Blostein, and Peter Kabal. Reverberant speech enhance-ment using cepstral processing. InProceedings of the 1991 IEEE Int. Conf. onAcoust., Speech and Signal Processing, pages 977 – 980. IEEE, May 1991.

[4] J. Benesty, F. Amand, A. Gilloire, and Y. Grenier. Adaptive filtering algorithmsfor stereophonic acoustic echo cancellation. InICASSP, pages 3099 – 3102.IEEE, 1995.

[5] J. Benesty, A. Gilloire, and Y. Grenier. A frequency domain stereophonic acous-tic echo canceller exploiting the coherence between the channels and using non-linear transformations. InProceedings of International Workshop on Acousticsand Echo Cancelling (IWAENC99), pages 28–31. IEEE, 1999.

[6] J. Benesty, D. R. Morgan, and J. L. Hall adn M. M. Sondhi. Synthesised stereocombined with acoustic echo cancellation for desktop conferencing. InProceed-ings of ICASSP99, 1999.

[7] J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi. Stereophonic acous-tic echo cancellation using nonlinear transformations and comb filtering. InICASSP. IEEE, 1998.

[8] F. Capman, J. Boudy, and P. Lockwood. Acoustic echo cancellation using a fastqr-rls algorithm and multirate schemes.Proceedings of ICASSP, pages 969–972,1995.

[9] F. Capman, J. Boudy, and P. Lockwood. Controlled convergence of qr leastsquares adaptive algorithms — application to speech echo cancellation. InPro-ceedings of ICASSP, pages 2297–2300. IEEE, 1997.

169

170 BIBLIOGRAPHY

[10] C. Carlemalm, F. Gustafsson, and B. Wahlberg. On the problem of detection anddiscrimination of double talk and change in the echo path. InICASSP ConferenceProceedings. IEEE, ?

[11] S. Doclo, E. De Clippel, and M. Moonen. Multi–microphone noise reductionusing gsvd–based optimal filtering with anc postprocessing stage. InProc. of the9th IEEE DSP Workshop, Hunt TX, USA. IEEE, Oct. 2000.

[12] S. Doclo and M. Moonen. SVD–based optimal filtering with applications tonoise reduction in speech signals. InProc. of the 1999 IEEE Workshop on Ap-plications of Signal Processing to Audio and Acoustics (WASPAA’99), New PaltzNY, USA, pages 143–146. IEEE, Oct 1999.

[13] S. Doclo and M. Moonen. Noise reduction in multi-microphone speech signalsusing recursive and approximate GSVD–based optimal filtering. InProc. ofthe IEEE Benelux Signal Processing Symposium (SPS-2000), Hilvarenbeek, TheNetherlands, March 2000.

[14] S. Doclo and M. Moonen.GSVD-Based Optimal Filtering for Multi-MicrophoneSpeech Enhancement, chapter 6 in “Microphone Arrays: Signal ProcessingTechniques and Applications” (Brandstein, M. S. and Ward, D. B., Eds.), pages111–132. Springer-Verlag, May 2001.

[15] S. Doclo and M. Moonen. GSVD-based optimal filtering for single and multi-microphone speech enhancement.IEEE Trans. Signal Processing, 50(9):2230–2244, September 2002.

[16] Matthias Dörbecker and Stefan Ernst. Combination of two–channel spectral sub-traction and adaptive wiener post–filtering for noise reduction and dereverbera-tion. In Proceedings of EUSIPCO96, page 995, September 1996.

[17] P. Dreiseitel, E. Hansler, and H. Puder. Acoustic echo and noise control — along lasting challenge, 1998.

[18] K. Eneman.Subband and Frequency–Domain Adaptive Filtering Techniques forSpeech Enhancement in Hands–free Communication. PhD thesis, KatholiekeUniversiteit Leuven, Heverlee, Belgium, March 2002.

[19] K. Eneman and M. Moonen. Hybrid Subband/Frequency–Domain Adaptive Sys-tems.Signal Processing, 81(1):117–136, January 2001.

[20] P. Eneroth, T. Gänsler, S. Gay, and J. Benesty. Studies of a wideband stereo-phonic acoustic echo canceler. InProc. 1999 IEEE Workshop on applications ofSignal Processing to Audio and Acoustics, pages 207–210. IEEE, October 1999.

[21] P. Eneroth, S. Gay, T. Gänsler, and J. Benesty. A hybrid frls/nlms stereo acousticecho canceller. InProceedings of IWAENC, 1999.

BIBLIOGRAPHY 171

[22] Y. Ephraim and H. L. Van Trees. A signal subspace approach for speech en-hancement.IEEE Transactions on speech and audio processing, 3(4):251–266,july 1995.

[23] D. A. F. Florencio and H. S. Malvar. Multichannel filtering for optimum noisereduction in microphone arrays. InEEE International Conference on Acoustics,Speech, and Signal Processing, pages 197–200. IEEE, May 2001.

[24] T. Gänsler and J. Benesty. Stereophonic acoustic echo cancellation and two–channel adaptive filtering : an overview.International Journal of Adaptive Con-trol and Signal Processing, February 2000.

[25] T. Gänsler, S. L. Gay, M. M. Sondhi, and J. Benesty. Double–talk robust fast con-verging algorithms for network echo cancellation. InProc. 1999 IEEE Workshopon applications of Signal Processing to Audio and Acoustics, pages 215–218.IEEE, October 1999.

[26] S. L. Gay and S. Tavathia. The fast affine projection algorithm. InICASSP,pages 3023–3026. IEEE, 1995.

[27] Steven Gay.Fast projection algorithms with application to voice echo cancella-tion. PhD thesis, Rutgers, The State University of New Jersey, New Brunswick,1994.

[28] A. Gilloire and V. Turbin. Using auditory properties to improve the behaviour ofstereophonic acoustic echo cancellers. InICASSP. IEEE, 1998.

[29] Golub and Van Loan.Matrix Computations, chapter 12. Johns Hopkins, 1996.

[30] Simon Haykin.Adaptive Filter Theory. Prentice Hall, 3 edition, 1996.

[31] P. Heitkämper. An adaptation control for acoustic echo cancellers.IEEE Signalprocessing letters, 4(6):170 – 173, June 1997.

[32] Q.-G. Liu, B. Champagne, and P. Kabal. A microphone array processing tech-nique for speech enhancement in a reverberant space.Speech Communication,18:317–334, 1996.

[33] S. Makino and S. Shimauchi. Stereophonic acoustic echo cancellation - anoverview and recent solutions. InProceedings of International Workshop onAcoustics and Echo Cancelling (IWAENC99), pages 12–19. IEEE, 1999.

[34] S. Makino, K. Strauss, S. Shimauchi, Y. Haneda, and A. Nakagawa. Subbandstereo echo canceller using the projection algorithm with fast convergence to thetrue echo path. InProceedings of the ICASSP, pages 299 – 302. IEEE, 1997.

[35] Henrique S. Malvar.Signal Processing With Lapped Transforms. Artech House,0.

172 BIBLIOGRAPHY

[36] K. Maouche and D. T. M. Slock. The fast subsampled–updating fast affine pro-jection (fsu fap) algorithm. Research report, Institut EURECOM, 2229, routedes Cretes, B.P.193, 06904 Sophia Antipolis Cedex, December 1994.

[37] Rainer Martin and Peter Vary. Combined acoustic echo cancellation, dereverber-ation and noise reduction : a two microphone approach. InAnn. Telecommun.,volume 49, pages 429–438. 1994.

[38] Rainer Martin and Peter Vary. Combined acoustic echo control and noise re-duction for hands–free telephony — state of the art and perspectives. InEU-SIPCO96, page 1107, 1996.

[39] J. G. McWhirter. Recursive least squares minimisation using a systolic array. InProc. SPIE Real Time Signal Processing IV, volume 431, pages 105–112, 1983.

[40] M. Miyoshi and Y. Kaneda. Inverse Filtering of Room Acoustics.IEEE trans.on Acoustics, Speech and Signal Proc., 36(2):145–152, February 1988.

[41] M. Mohan Sondhi, D. R. Morgan, and J. L. Hall. Stereophonic acoustic echocancellation — an overview of the fundamental problem.IEEE Signal Process-ing Letters, 2(8):148–151, August 1995.

[42] G. V. Moustakides and S. Theodoridis. Fast newton transversal filters – a newclass of adaptive estimation algorithms.IEEE Transactions on signal processing,39(10):2184 – 2193, October 1991.

[43] K. Ozeki and T. Umeda. An adaptive filtering algorithm using an orthogonal pro-jection to an affine subspace and its properties.Electronics and communicationsin Japan, 67-A(5):126 – 132, February 1984.

[44] C. B. Papadias and D. T. M. Slock. New adaptive blind equalization algorithmsfor constant modulus constellations. InICASSP94, pages 321–324, Adelaide,Australia, April 1994. IEEE.

[45] J. Prado and E. Moulines. Frequency domain adaptive filtering with applicationsto acoustic echo cancellation.Ann. Telecommun, 49(7-8):414–428, 1994.

[46] J. G. Proakis, C. M. Rader, F. Ling, C. L. Nikias, M. Moonen, and I. K.Proudler. Algorithms for Statistical Signal Processing. Prentice–Hall, ISBN:0-13-062219-2, 1/e edition, 2002.

[47] G. Rombouts and M. Moonen. Avoiding explicit regularisation in affine projec-tion algorithms for acoustic echo cancellation. InProceedings of ProRISC99,Mierlo, The Netherlands, pages 395–398, November 1999.

[48] G. Rombouts and M. Moonen. A fast exact frequency domain implementationof the exponentially windowed affine projection algorithm. InProceedings ofSymposium 2000 for Adaptive Systems for Signal Processing, Communicationand Control (AS-SPCC), pages 342–346, Lake Louise, Canada, 2000.

BIBLIOGRAPHY 173

[49] G. Rombouts and M. Moonen. Regularized affine projection algorithms for mul-tichannel acoustic echo cancellation. InProceedings of IEEE-SPS2000, pagecdrom, Hilvarenbeek, The Netherlands, March 2000. IEEE.

[50] G. Rombouts and M. Moonen. Sparse–befap : A fast implementation offast affine projection avoiding explicit regularisation. InProceedings of EU-SIPCO2000, pages 1871–1874, September 2000.

[51] G. Rombouts and M. Moonen. Fast QRD–lattice–based optimal filtering foracoustic noise reduction. Internal Report KULEUVEN/ESAT-SISTA/TR 01-48,Submitted for publication., May 2001.

[52] G. Rombouts and M. Moonen. Acoustic noise reduction by means of qrd–basedoptimal filtering. InProceedings of MPCA2002, Leuven, Belgium, November2002.

[53] G. Rombouts and M. Moonen. An integrated approach to acoustic noise andecho suppression. Submitted for publication, January 2002.

[54] G. Rombouts and M. Moonen. Qrd–based optimal filtering for acoustic noisereduction. InProceedings of EUSIPCO2002, Toulouse, France, page CDROM,September 2002.

[55] G. Rombouts and M. Moonen. A sparse block exact affine projection algorithm.IEEE Transactions on Speech and Audio Processing, 10(2):100–108, February2002.

[56] G. Rombouts and M. Moonen. QRD–based optimal filtering for acoustic noisereduction. Internal Report KULEUVEN/ESAT-SISTA/TR 01-47, Accepted forpublication in Elsevier Signal Processing, February 2003.

[57] D. W. E. Schobben and P. C. W. Sommen. On the performance of too shortadaptive fir filters. InProceedings Circuits Systems and Signal Proc. (ProRISC),Mierlo, The Netherlands, pages 545–549, November 1997.

[58] S. Shimauchi, Y. Haneda, S. Makino, and Y. Kaneda. New configuration for astereo echo canceller with nonlinear pre–processing. InICASSP. IEEE, 1998.

[59] M. Tanaka and S. Makino. A block exact fast affine projection algorithm.IEEETransactions on Speech and Audio Processing, 7(1):79–86, January 1999.

[60] D. Van Compernolle and S. Van Gerven. Beamforming with microphone arrays.In V. Cappellini and A. Figueiras-Vidal, editors,Applications of Digital SignalProcessing to Telecommunications, pages 107–131. COST 229, 1995.

174 BIBLIOGRAPHY

List of publications

• Vandaele P., Rombouts G., Moonen M., “Implementation of an RTLS blindequalization algorithm on DSP”, in Proc. of the 9th IEEE International Work-shop on Rapid System Prototyping, Leuven, Belgium, Jun. 1998, pp. 150-155.

• Rombouts G., Moonen M., “Avoiding Explicit Regularisation in Affine Projec-tion Algorithms for Acoustic Echo Cancellation”, in Proc. of the ProRISC/IEEEBenelux Workshop on Circuits, Systems and Signal Processing (ProRISC99),Mierlo, The Netherlands, Nov. 1999, pp. 395-398.

• Rombouts G., Moonen M., “A fast exact frequency domain implementation ofthe exponentially windowed affine projection algorithm”, in Proc. of Sympo-sium 2000 for Adaptive Systems for Signal Processing, Communication andControl (AS-SPCC), Lake Louise, Canada, Oct. 2000, pp. 342-346.

• Rombouts G., “Regularized affine projection algorithms for multichannel acous-tic echo cancellation”, in Proc. of the IEEE Benelux Signal Processing Sympo-sium (SPS2000), Hilvarenbeek, The Netherlands, Mar. 2000.

• Rombouts G., Moonen M., “Sparse-BEFAP : A fast implementation of fastaffine projection avoiding explicit regularisation”, in Proc. of the EuropeanSignal Processing Conference (EUSIPCO), Tampere, Finland, Sep. 2000, pp.1871-1874.

• Schier J., Vandaele P., Rombouts G., Moonen M., “Experimental implemen-tation of the spatial division multiple access (SDMA) algorithms using DSPsystem with the TMS320C4x processors”, in The proceedings of the Third Eu-ropean DSP Education and Research Conference, Paris, France, Sept. 2000, pp.CD-ROM.

• Rombouts G., Moonen M., “Acoustic noise reduction by means of QRD-basedunconstrained optimal filtering”, in Proc. of the IEEE Benelux Workshop onModel based processing and coding of audio (MPCA), Leuven, Belgium, Nov.2002.

175

• Rombouts G., Moonen M., “A sparse block exact affine projection algorithm”,IEEE Transactions on Speech and Audio Processing, vol. 10, no. 2, Feb. 2002,pp. 100-108.

• Rombouts G., Moonen M., “QRD-based optimal filtering for acoustic noisereduction”, Accepted for publication in Elsevier Signal Processing, Internal Re-port 01-47, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2001.

• Rombouts G., Moonen M., “QRD–based optimal filtering for acoustic noisereduction”, EUSIPCO 2002, Toulouse, France, CDROM.

Submitted papers

• Rombouts G., Moonen M., “An integrated approach to acoustic noise and echosuppression”, Internal Report 02-206, ESAT-SISTA, K.U.Leuven (Leuven, Bel-gium), 2002, submitted to Elsevier Signal Processing.

• Rombouts G., Moonen M., “Fast-QRD-based optimal filtering for acoustic noisereduction”, Internal Report 01-48, ESAT-SISTA, K.U.Leuven (Leuven, Bel-gium), 2001, resubmitted to IEEE Transactions on Speech And Audio process-ing for 2nd review.

176

Curriculum Vitae

Geert Rombouts was born in Turnhout on august 11, 1973. He studied at the KatholiekeUniversiteit Leuven, faculty of applied sciences (Faculteit Toegepaste Wetenschap-pen) from 1991 to 1997, where he received his M.Sc. degree in electrical engineering(Burgerlijk Ingenieur Elektrotechniek) in 1997. From 1997 to 2002 he did Phd. re-search at the Katholieke Universiteit Leuven, faculty of applied sciences.

177

Adaptive ﬁltering algorithms for acoustic echo and noise...

Documents

Transcript of Adaptive ﬁltering algorithms for acoustic echo and noise...