DerkReefman DSD Wp-2323

7/31/2019 DerkReefman DSD Wp-2323

1/50

Signal processing for Direct Stream DigitalA tutorial for digital Sigma Delta modulation and 1-bit digital audio processing

Derk [email protected]

and

Erwin Janssen

[email protected]

version 1.018 December 2002

1


2/50

Contents

1 Introduction 6

2 Characteristics of Direct Stream Digital 7

2.1 Example: Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Example: Non-linear operations . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Example: Anti-aliasing filters . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Sigma Delta Modulation 11

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 A linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Bit stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Characteristics of SD modulators 18

4.1 SDM silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 SDM stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Idle tones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Design of SDM modulators: I 20

5.1 Loop-filter design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Enforcing SDM stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Design of SDM modulators: II 28

7 Signal processing 30

8 Dithering and linearizing SDMs 34

9 Non-linearity in a SDM 369.1 Pre-correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389.2 SDPC and dither . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409.3 Performance of a realistic SDM with SDPC . . . . . . . . . . . . . . . . . . 45

10 Acknowledgements 47

A SDM-code 48

2


3/50

Glossary

ADC: Analogue-to-Digital Converter. This device converts analogue input signals (from,e.g., a microphone) to a digital signal that can be used in computations (for example in aPC program)

(Anti-) aliasing filter: Filter designed to remove any signal larger than the Nyquist fre-quency.

Authoring: Process in which the final disc image is created. This includes lossless com-pression, creation of the table of contents etc..

Class-D: Amplifier topology that relies on Pulse Modulation. The pulses drive switcheswhich connect the load (loudspeaker) either to the positive or negative supply voltage.Characterised by high efficiency; often also called digital amplifier.

Clipping: The phenomenon that when a format is designed to handle signal levels no largerthan a level C, every level larger than C is coded as C. For example, the digital format ona CD cannot handle more than 65536 sub-levels; any signal corresponding to a level largerthan +32767 is represented as +32767 (and likewise for negative signals less than -32768).Clock jitter: Technically the unwanted phase shift of digital pulses over a transmissionmedium. A discrepancy between when a digital edge transition is supposed to occur and

when it actually does occur.

DAC: Digital-to-Analogue Converter: the reverse of a ADC.

Distortion: Any deviation from a linear input/output relationship, where a linear rela-tionship is defined such that the output equals (apart from a constant gain factor) theinput.

Dithering: The addition of a (quasi-)random number to the signal which is subsequentlyquantised. Due to the dither, the quantization appears as an (almost) linear process.

DSD: The digital format stored in Super Audio CD. DSD is a format in which 2822400times per second a 1-bit signal is stored. Lowpass-filtering this signal will restore theoriginal waveform.

DST encoding: Direct Stream Digital, a lossless compression algorithm specifically tailored

3


4/50

to the lossless compression of DSD signals.

Editing: In it simplest form, editing is the process of cutting and pasting the music suchthat undesirable parts of the recording are removed. Often, also volume changes are appliedand mixing of different channels is performed.

Filter ringing: The effect that a filter with a steep transition band in the frequency domainproduces artefacts in the time domain that extent over a significant period of time.

Idle tone: Tone appearing at the output of a noise shaper that bears a simple relation tothe input of the Sigma Delta Modulator.

Limit cycle: Signal at the output of a Sigma Delta Modulator that requires a preciselydefined input in order to occur, and disappears if the input deviates slightly from thementioned precise value.

Linearity: See distortion.

Lossless compression: A way ofcompacting digital audio streams such, that when they areunpacked the original stream is restored. Comparable with the ZIP program on PCs.

Mastering: Process in which the edit master is subject to processes such as EQ to obtainthe best sound performance.

Matching: The accuracy to which electronic components are the same. This is importantif an electronic circuit relies on the cancellation of two signals: if the components are notexactly identical, a residual (undesirable) signal will remain.

Noise shaping: The shift of spectral content of the (quantization) noise. For example, in aSigma Delta Modulator the energy of the quantization noise is shifted to high frequencies,leaving no or little noise at low frequency.

Nyquist Frequency: The largest frequency that can be represented by a digital format; theNyquist frequency is half the sample frequency.

PCM: Pulse Code Modulation. A digital format, used for example in CD, whereby a digitalsignal is represented by an accurate representation (e.g., 16 bits, meaning that the range-1,+1 is subdivided in 65536 sub-intervals) of the wave form at equidistant points in time(for example, in CD 44100 times per second a 16-bit approximation of the wave form isstored).

Pulse Density Modulation: A form of pulse modulation where a large positive signal isrepresented by a long series of positive pulses; a zero signal is represented by alernating

4


5/50

positive and negative pulses.

Recording: The process of storing the music signals on a medium - either in analogue formor in digital form.

(Re-)Quantization: The mapping of a signal of infinite precision to a signal with limited

precision. On a CD, e.g., a signal is quantized to 16 bits.

Sigma Delta Modulator: Device which transforms an analogue or PCM signal in a DSDsignal. Often abbreviated to SDM, and also often referred to as Delta Sigma Modulator.

Super Audio CD: Super Audio Compact Disc. Format for music distribution proposed byPhilips and Sony. Super Audio CD is based on a new digital format called DSD.

Topology: Particular way of connecting building blocks to create a circuit.

Up/Down sampling: A signal processing technique whereby the sample rate of a digitalsignal is enlarged or reduced. In the latter case, this also corresponds to a loss of informa-tion.

5


6/50

1 Introduction

The introduction of Super Audio Compact Disc (SACD) as a successor to the CD, hasintroduced the need for a change in signal processing. Underlying this change, is theradically different signal format that is adopted in SACD compared to CD. Whereas inCD the audio format is called Pulse Code Modulation (PCM), a 16-bit word, at a sample

rate of 44100 samples per second, for SACD this is Direct Stream Digital (DSD), a 1-bit word at a sample rate of 64 times 44100 samples per second. In the early nineties,the time of the conception of DSD, analogue-to-digital converters (ADCs) and digital-to-analogue converters (DACs) were built with 1-bit technology [9]. The driving forces forthe use of this technology were pure technical: in the CD era, demands for distortion levelswere becoming more stringent, and it proved virtually impossible to create low distortiondevices with many (16) bits. Contrary to that, it was much easier to create low-distortionconverters using a digital format of 1 bit, which were running at very high sample ratessuch as 64 or 128 times 44.1 kHz. The conversion of this high speed, 1-bit format to44.1 kHz/16 CD format can easily be accomplished in the digital domain using filteringand signal processing, which does not introduce any non-linear distortion. This techniquehas been highly successful, and the so-called oversampling and /or bitstream technologyhas dramatically increased the performance of the CD-players in the nineties. In fact, thoseCD-players were all generating their own DSD internally from the CD source; this DSDwould then be fed into a high quality, 1-bit DAC.It therefore seemed logical to introduce a format that would store this 1-bit output directly,instead of the intermediate CD format: in this way, all filtering and signal processingneeded to convert to and from the 1-bit format is eliminated which, by definition, can onlyincrease the sound quality. After the first experiments with DSD, it appeared indeed thatthe sound quality was significantly better compared to the 44.1 kHz/16 bit format. Also,at the same time, new ADCs and DACs were appearing on the market, that were still

using high sample rates (64 or 128 times 44.1 kHz), but exploited a few bits (1.5 to 5)instead of 1. Again, this had purely technical fundamentals: ingenious tricks to reduce thedistortion problems of a multi-bit converter had appeared, and were feasible to implementfor a limited number of bits. Because 1-bit converters are more sensitive to clock-jitter,the few-bit converters took their place in the high-end audio market. This re-introducedthe need for some mild signal processing, because SACD can only store a 1-bit format.Interestingly, this did not lead to any observable degradation in sound quality. Therefore,it is now believed, that the very high sample rate of DSD is the key factor in the extremelygood sound quality of SACD. The fact that the data is 1 bit instead of few bit, however, hasretained its value because it reduces the storage requirements of the audio, thus creatingthe possibility to store over 70 minutes of stereo and multi-channel DSD on a single Super

Audio CD.The purpose of this document is to explain some technical details of Direct Stream Digital.It tries to give an overview of several signal processing steps which are needed in the worldof DSD, which are different from the accustomed way of doing things. Its purpose is notto give a full explanation of the perceived sound quality of DSD; this white paper is meant

6


7/50

to be an introduction to DSD and DSD signal processing for the educated DSD-novice.Reflecting the importance for SACD, a crucial part in this paper is the 1-bit Sigma DeltaModulator (SDM). The design of such a device will be discussed in detail, and a workingexample will be designed to illustrate the design process.

Another important issue that will be dealt with, is DSD signal processing. A typical signalprocessing chain for DSD is provided in Fig. 1. In Fig. 1, several steps are envisagedwhich occur typically in the creation of an SACD. Most of these steps involve analog ordigital signal processing in one way or another. Starting with the AD converter, this is notnecessarily a native 1-bit converter. Often, high-end AD converters are 3-6 bit convertersrunning at sample rates between 128fs and 512fs, where fs is symbolic for a sample rateof 44.1 kHz or 48 kHz. These signal formats need to be converted to 1-bit formats, whereany change to the signal information is to be avoided. As this introduces the need fora 1-bit SDM, we will start with some introduction to Sigma Delta Modulation, and thevarious options that exist to realize a SDM.

In the editing phase, volume adjustments need to be done, and switching between bitstreams is necessary. Switching of bit streams is a technique which is rather different from

standard signal processing, and is detailed in a separate document [12]. In the masteringphase, heavy signal processing is often involved, ranging from relatively simple equalizationto sophisticated reverberation techniques. In the sequel, it will be demonstrated how mostof the sophisticated techniques developed for PCM can be easily adjusted for applicationto DSD. In this respect, it is essential to realize that DSD at 64fs is a consumerformat -hence, not necessarily the format that is used in the studio which can be in principle anyformat as long as it is of equal or better quality compared to standard DSD.

In the authoring phase, finally, no changes to signal content are made anymore. However,in most cases the format of the data will be transformed to DST (Direct Stream Transfer),which is the compressed format of DSD. This lossless compression scheme allows multi-

channel, high quality DSD data to fit on a the approximately 4.7 Gbyte of a high densitylayer of an SACD disk.

2 Characteristics of Direct Stream Digital

Before diving into the generation of Direct Stream Digital (DSD), we will first review somecharacteristics of the format as it is used within the context of Super Audio CD. Firstand foremost, DSD characterizes itself by the huge sample rate of 64 times 44.1 kHz, or2.8 MHz. Rather irrespective of the number of bits, high sample rates in the digital worldare desirable because the larger the sample rate, the less the audio artefacts introduced bythe time quantization. We will review a few examples, which show up the phenomenonthat 44.1 kHz (or a small multiple of it) is not enough to avoid significant signal distortionsdue to the time quantization.

7


8/50

Recording

Editing

Mastering

Authoring/

DST encoding

SACD Player

Figure 1: Typical signal processing chain for DSD applications.

8


9/50

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.2 0.4 0.6 0.8 1 1.2 1.4

warpedfrequency

analog frequency

Figure 2: The effect of warping the analog frequencies to the limited range of digital frequen-cies; the frequencies are in reduced units (i.e., 0 . . . ). Red: the warped frequency. Green:the original frequency. The vertical line shows till what frequency the warped frequency canbe considered to be an accurate representation of the actual frequency.

2.1 Example: Filtering

A well-known issue in (time discrete) digital systems is the problem of mapping (warping)the infinitely high frequencies, which are allowed in a time-continuous (analogue) system,

to a system where the highest representable frequency is the Nyquist frequency (half thesample rate). Obviously, as the ultimate goal of digital signal processing is to present animprovement over analog signal processing, this is a very serious issue. Exemplary for thisproblem is the bi-linear transform, which maps the analogue frequency a to the digitalfrequency d according to:

d =2

Tatan(aT/2) (1)

where T is the sampling period. As illustrated in Fig. 2, this mapping is almost linear onlyfor a limited frequency regime; for frequencies above 0.1 fs quite substantial deviationsoccur. As a result, mapping an analog filter (say, a Butterworth filter) to its digitalequivalent causes significant distortion of its frequency response. If the sample rate is veryhigh (as, for example, with DSD) the mapping artefacts are benign in the frequency regimewhich is most important for audio reproduction. Obviously, it is still possible to create afilter which has the characteristics of a digital filter at low sample rate; hence with the useof DSD, one has significant freedom in the choice of filters and filter characteristics.

9


10/50

2.2 Example: Non-linear operations

In audio signal processing, operations such as compression/limiting and clipping are quitecommon. In compression/limiting, the gain of the signal is adjusted accordingto the signal;this clearly represents a non-linear operation. Also in clipping, the signal transfer is highlynon-linear. If these non-linear operations are performed in the analog domain, they will

cause higher harmonics to appear. For example, if a 14 kHz signal is clipped, this will giverise to a third harmonic component at 42 kHz. In the analog domain, this could then befiltered off, if desired. If the clipping were done in the digital domain at a sample rateof 44.1 kHz, however, the 42 kHz harmonic would alias back to low frequency: 42 kHzis 19.95 kHz above the Nyquist frequency (22.05 kHz). The third harmonic would thusbe aliased to (22.05-19.95) = 2.1 kHz, which would give very audible distortion as thatfrequency is not harmonically related to 14 kHz. Also, there is no way to remove thisdistortion by a filter operation. The only remedy is to up-sample to a high frequency,and do the non-linear operation at that high rate, thus ensuring that only high frequency,high order harmonic components are aliased. This causes less harm, because high ordercomponents tend to be of lower amplitude. Then, down-sample with the appropriate low

pass filtering again to 44.1 kHz. Now, obviously, in DSD the sample rate is so high thatnon-linear operations behave as they would in the analog domain. Hence, no up- and downsampling is required, and the decision whether to remove high order distortion componentsor not is to the sound engineer - and not dictated by the format.

2.3 Example: Anti-aliasing filters

Because of the extremely high sample rate, DSD sets only very relaxed requirements for theanti-aliasing filters, which, hence, can be chosen to be rather sloppy. As a result, the ringingin the time domain is substantially lower compared to systems of lower sample rate where

steep anti-aliasing filters are mandatory. This effect is clearly illustrated in Fig. 3. Theimpulse responses of 4 different systems in a multi-channel configuration are depicted: a48 kHz system, with a bandwidth of 20 kHz (that is, 8 kHz transition bandwidth is allowedfor anti-aliasing filtering), a 96 kHz system with 35 kHz bandwidth (26 kHz transitionbandwidth), a 192 kHz system with 75 kHz bandwidth (42 kHz transition bandwidth)and an SACD system with 95 kHz bandwidth (and about 120 kHz transition bandwidth).Though none of the systems reproduce the input exactly, the DSD system shows the leastartefacts. Clearly, the 48 kHz system has great difficulty in reproducing the click; due tothe steep filtering it starts wobbling, or ringing, at a -30 dB level with respect to thetop of the response approximately 0.4 ms before the click, which is very audible (this isalso the reason why many people prefer sloppy anti-alias filters in CD-players; even atthe cost of reduced anti-aliasing characteristics). It also continues to ring after the clickfor the same length of time, but most possible this after-ringing is audibly masked bythe click it self, and, hence, not as important as the pre-ringing. Apart from this effect,also the amplitude is only a fifth from what it should be. Especially when the sound willtraverse through a non-linear medium, such as the human ear, this may lead to even larger

10


11/50

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.0052 0.0054 0.0056 0.0058 0.006 0.0062 0.0064

Amplitude

time (s)

test.48test.dsdtest.192

test.96

Figure 3: Responses (from left to right) of a DSD, a 192 kHz, a 96 kHz and a 48 kHzsystem on a -6 dB block input (click) of 3 s duration, and amplitude 0.25. Note thelinear amplitude scale.

perceived differences than what can be concluded directly from Fig. 3. Also at the highersampling frequencies, the ringing phenomenon cannot be removed, though it is reducedsignificantly. Only the DSD system is very effective in suppressing the ringing effect, dueto very slow filtering above 95 kHz. The price to pay for this is the increase in noise floor

with respect to the other systems; however, as the noise floor contains only high frequencycomponents which are uncorrelated with the audio, they are not perceptible.

3 Sigma Delta Modulation

In this section, it will be assumed throughout that the sample rate equals 64 times 44.1 kHz,( 2.8MHz) i.e., the sample rate of SACD. By far the most common way to generate such a1-bit DSD stream is by the use of a Sigma Delta Modulator (SDM), although it is nowherestated in the definitions of Super Audio CD [10] that the bit-stream present on the diskmust be generated by a SDM.

In fact, recently many other methods have been developed which are not simply a (single)SDM. For example, in [3] a type of SDM with an elaborate re-ordering scheme is presented,and in [5] a so-called Trellis-SDM is presented. In [11], a cascaded structure of 2 SDMsis presented, which will be presented in a slightly modified form in Sec. 9. All of thesenew developments have in common that their performance is in some way better than that

11


12/50

-220

-200

-180

-160

-140

-120

-100

-80

-60

-40

-20

0

100 1000 10000 100000 1e+06

Power(dB)

Frequency

Figure 4: Typical output spectrum of an SDM (4 kHz, -6 dB input).

of an ordinary SDM, but at the same time there is a substantial increase in complexity.Because a single SDM is still at the basis of all these new developments, and because astandard SDM is still by far the most widely used device to generate a bitstream, we will

continue by elaborating on the principles of a simple SDM.

3.1 Overview

Sigma Delta Modulation, often also known as noise shaping, is in most general terms atechnique which allows (digital) quantization errors to be spectrally shaped. In the SDMsthat are typically used for DSD applications, the aim of this spectral shaping is to pushthe gross quantization errors made by the course 1-bit quantizers to high frequencies,where these errors are inaudible. This is possible due to the high oversampling factor: 64,which leaves a band of approximately 80-100 kHz (which is determined by the maximumallowable input, as will be discussed later in Sec. 5) to 1.4 MHz (the Nyquist frequency)to accommodate virtually all the quantization errors. An illustration of this phenomenonis given in Fig. 4.Indeed, the spectrum illustrates that this SDM design allows for a very high dynamic rangein the audio band (0-20 kHz), decreasing dynamic range in the band from 20 to 80-100 kHz,from where the dynamic range remains constant till 1.4 MHz.

12


13/50

Schematically, a SDM can be represented as in Fig. 5.

QH(z)-

u y

Qy

-

u

F(z)

-

Figure 5: Above: Sigma Delta structure (in feed forward configuration). Below: equivalentnoise shaper structure.

Historically, the SDM is preceded by the noise-shaper (NS) (also see Fig. 5). The mostsignificant difference between a noise shaper architecture and a sigma delta structure is

the position of the filter: in a noise shaper, the filter is in the feedback loop, in a SDMthe filter in the feedforward loop. Due to the filter in the feedback loop, the error of thequantizer is spectrally shaped by the filter F(z) and fed back to the input of the quantizer.It is this process, which is called noise shaping of the quantization error. Though thisappears rather different from a SDM, the noise shaper structure is virtually identical tothe SDM topology. In fact, the SDM and the NS in Fig. 5 are identical if the filter F(z) inthe noise shaper equals F(z) = H(z)/(H(z) + 1). In that case, the input still needs to bepre-amplified by the filter H(z)/(H(z) + 1) to obtain an identical signal transfer function.It is important to realize that, because of their equivalence, both a noise shaper and anSDM perform noise shaping of the quantization noise. Because of that reason, a SDM isoften (mistakenly) called a noise shaper, even though the topology of a noise shaper is

different from a SDM.The noise-shaper architecture is not often used in analog to digital converters becausematching in the analog domain is difficult, and thus leads to implementation problems.Generally one resorts to SDM topologies, where one has less analogue problems. In thedigital domain, where precision is arbitrary, matching is not a fundamental problem and

13


14/50

both structures can be used. Because of the identical nature, we will restrict the discussionto the SDM-like structures.

3.2 A linear model

For applications in SACD, the quantizer Q in a SDM is a 1-bit quantizer, which outputs

only values of +1 and 1. This is a highly non-linear element, which has its ramificationson the operation of the SDM. To gain some initial insight in the characteristics of theSDM, however, we will resort to a simple linear model and replace the highly non-linearquantizer by a (linear) gain c and a noise source n, which models the quantization error,as indicated in Fig. 6.

H(z)-

u y

n

c

Figure 6: Linearization of Sigma Delta structure. The quantizer is replaced by a (signalindependent) gain, and an additive noise source. The signal transfer function STF andnoise transfer function NTF are defined byY = STF.U+ NTF.N, whereY is the fouriertransform of the output y, U is the fourier transform of the input u and N the fouriertransform of the additive noise n.

Doing this, we can write for the signal transfer function (STF) and the noise transferfunction (NTF) the following expressions:

ST F(z) =cH(z)

1 + cH(z)(2)

NTF(z) =1

1 + cH(z)(3)

Assuming that the quantizer gain c 1, this shows how, in a situation where the loop-gainH(z) is very large, the signal transfer function approximates 1. The noise transfer function,on the contrary, is negligible for large H(z). As the loop-filter H(z) typically is a low passfilter, with large LF gains, it shows that in SACD applications, the quantization noise issuppressed in the audio band. In Fig. 4, for example, the loop-filter is a Chebyshev typeII design with a corner frequency of 90 kHz.It is of crucial importance, however, to realize that the replacement of the quantizer by again element c and an additive noise source, is a very crude approximation, the more so ifc = 1 is taken. Typically, the Signal-to-Noise Ratios (SNRs) as calculated from simulationson the actual SDM with the non-linearity included, differ significantly from those obtained

14


15/50

T T T T

+

c c cc1 2 3 4

Q

y

x-

T T T T

c cc3 1

x

Qy

c4 2

Figure 7: Above: A fourth order Sigma Delta structure in feed forward configuration.

Below: a fourth order feedback topology. If c

1 = c1/c4; c

2 = c2/c4 etc., the NTFs of thesemodulators are identical.

by the use of the linearized model. Also other characteristics, discussed in Sec. 4, are notproperly, or not at all, explained by the linearized model.There also exist other SDM realizations. Whereas the SDM structure in Fig. 5 is referredto as a feed-forward topology, there also exist feedback topologies. A feedback topologyis displayed in Fig. 7. Like in the comparison of the noise-shaper vs. feed-forward SDM,there is some equivalence between a feedback and feed-forward topology. We will see thisin a next section. The choice of which topology to use is then dependent on the design ofthe complete system.

3.3 Bit stream

In Fig. 8, a characteristic output sequence of a SDM is shown, receiving a sinewave ofamplitude 0.95 and frequency 20 kHz as its input. Even though Fig. 4 leaves no doubt

15


16/50

-1

-0.5

0

0.5

1

250 300 350 400 450 500

sample number

Figure 8: Comparison of the DSD output of a SDM (red) and the input to the SDM (blue).Clearly, in regions where the input sine wave is negative, the bits that are output from theSDM are predominantly negative, and vice versa.

about the very high accuracy with which the signal is represented in the SDM output,

it is hard to visualize the sine wave from a series of +1s and 1s. An idea is that thesignal that is represented by the bit stream can be obtained by taking a local average ofthe bitstream: clearly, when the input sine wave is positive/negative, most bits that areoutput from the SDM are positive/negative too, and outnumber the opposite bits by far.Likewise, around zero input the number of positive and negative bits is roughly identical.Hence, the global wave form of the underlying (low frequency) signal can be estimatedby taking a local average of the bit stream - akin to pulse-density modulation, which issometimes used in Class-D amplifiers. Obviously, this local average will not represent ahighly accurate representation of the wave form. A better impression about the accuracywith which the input is represented is obtained by filtering the output of the SDM with afilter which removes the signal in the DSD stream above 20 kHz (in fact, local averaging

is a low pass filter, albeit not a very good one).It is, therefore, informative to build a system as presented in Fig. 9.This system allows us to compare the original input signal with the signal which has passedthrough the Sigma Delta Modulator. To this end, the bit stream output of the SDM islowpass filtered with a steep filter, such to remove any components above 20 kHz. The

16


17/50

SDM

n T

Figure 9: Setup which allows to compare an upsampled, low rate high resolution signalwith its DSD equivalent. Note, that the down sampling is necessary only for the purposeof comparison.

input signal can be any signal of large enough resolution; below, we will take signals witha (digital) resolution of 24 bits. Due to the filter after the SDM, the input signal has to bedelayed by an appropriate amount to compensate for the delay introduced by the filter. Ifthe input and output signals are subtracted, the residual signal can be inspected.

-8e-07

-6e-07

-4e-07

-2e-07

0

2e-07

4e-07

6e-07

8e-07

0 5e-05 0.0001 0.00015 0.0002 0.00025 0.0003 0.00035 0.0004

Absolute

difference

Time (s)

1kHz20kHz

Figure 10: Time domain representation of the difference signal , for both a 1 kHz inputsignal (red) and a 20 kHz signal (green).

In Fig. 10, two results are displayed: a first residual signal , where the input signal was asine wave (0 dB SACD, 1 kHz) and a second signal (0 dB SACD, 20 kHz). The resultingsignal is, for both inputs, noise-like, with an amplitude that corresponds to a resolutionof at least 120 dB. Like wise, this experiment can be performed with real audio signals,where the result will be the same. Obviously, when the low-pass filtering applied in thedown sampling process is not suppressing the noise above 20 kHz to a level of -120 dB, theresidual signal will be larger.

17


18/50

A separate issue that becomes clear from Fig. 8 is the fact that while negative parts of asine wave are represented by predominantly negative output values, the pattern in whichthe +1s and 1s appear is never the same. This observation leads to the issue of editingDSD streams. While this is an important topic, the reader is referred to [12] for a discussionof editing and switching DSD, as this is a non-trivial issue.

4 Characteristics of SD modulators

Sigma Delta modulators represent a new class of devices, which will display other phenom-ena as we are used to in the PCM world. In the sequel, a few of these features which areimportant in practical applications will be highlighted.

4.1 SDM silence

Sigma Delta modulators have some characteristics which we are not familiar with in thePCM world. A first important aspect is that the output of the SDM always has a power

equaling 1, because the output can only take the values 1. As a result, silence, as referredto in DSD, only means that the power spectrum of the DSD is empty below a threshold,above which any signal cannot be perceived. For example, the following repetitive patternsare often used and are referred to as DSD silence patterns:

pattern hex code pattern frequency01010101 0x55 1.4 MHz10101010 0xaa 1.4 MHz10010110 0x96 352.8 kHz01101001 0x69 352.8 kHz

Indeed, the patterns do not contain signal components below 80 kHz; however, they stillrepresent signals with a total power of 1. Often, these patterns are referred to by theirhexadecimal equivalent. The fact that these signals are silent, but still contain information,can be exploited to use these signal as synchronization words [4].

4.2 SDM stability

Another important aspect is that, while the output of the SDM varies between 1, its inputmost often cannot vary over this range because the SDM becomes unstable for inputs ofhigh amplitude. While a full theoretical description of this phenomenon is still lacking, awealth of heuristic knowledge [9] is available on the stability of higher (> 2) order SDMs.Because of all this experimentally obtained insight, accurate descriptions of instability arepresent that can be used in the design of properly functioning modulators (see Sec. 5).In fig. 11, the performance of a SDM is shown as function of its input amplitude. Clearly,above a certain threshold, the performance collapses (in fact, the SDM gets into wildoscillations if no precautions are taken). The exact amplitude where the sudden collapse

18


19/50

0

20

40

60

80

100

120

140

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7

SignaltoNoiseRatio(dB)

input amplitude

Figure 11: Graphical representation of the stability problems for large inputs: for a simpleSDM (in red) discussed in Sec. 5 the SNR collapses for signal amplitudes (in this case: a4 kHz sine wave) of more than 0.59. In green the result of so-called graceful degradationis shown.

19


20/50

occurs, is dependent on the wave form of the input and its frequency, and the SDM design,and is thus not an easy quantity to determine. In section 5, precautions that can betaken to prevent this uncontrolled behaviour are discussed, that lead to so-called gracefuldegradation: instead of a sudden collapse in performance, the performance drops in a muchless aggressive way. This overload phenomenon is the reason why the SACD 0 dB referencelevel has been set to 50% of the maximum theoretically possible modulation depth [10];

in the cases discussed here this means that allowable input levels vary between 0.5 and0.5. This definition introduces the possibility to allow signal levels which are larger than0 dB, in contrast to PCM which has a clear limit at 0 dB: all inputs larger than 0 dB areharshly clipped to 0 dB. As will be clear in following sections, for SDMs this overload ispossible at the limited cost of increased distortion (clipping of the internal integrators). Inthis respect, the DSD format compares to analogue tape recordings, which also allowed forserious signal overload, but also at the cost of significant distortion. Obviously, for highfidelity recordings for Super Audio CD, the 0 dB level should never be crossed.

4.3 Idle tones

As discussed in the section Sec. 4.1, silence in DSD is often equivalent to having a highpowered tone outside the signal band. These tones are called idle tones. For higherorder SDMs, the 1-bit output signal still carries these idle tones, although they have muchreduced amplitude compared to the purely repetitive patterns shown in Sec. 4.1, and areembedded in a large amount of uncorrelated noise. For non-zero DC inputs, these tonesstart to move down in frequency with increasing DC level; at the same time, tones maystart to appear in the LF part, which can, potentially, be audible.The origin of the tones appearing in the LF part lies in the feedback character of a SDM:suppose, we have a DC input of 0.25. The most likely combination of bits which representsthat value is 1,

1, 1,

1, 1,

1, 1, 1. If this sequence is repeated, a tone a frequency of

8 44.1 kHz will result. For each halving of the DC input, this frequency will be halvedtoo; eventually, this tone will end up below 20 kHz. This phenomenon can be reduced, oreven removed, in several ways, as will be discussed more extensively in Sec. 8, by ditheringand other means.If the SDM is undithered, audibility of these tones depends on the SDM used. Typically,the higher the order of the SDM, the lower the power of the tone in the audio band. Fora typical (undithered) SDM the tones in the audible band are below -130 dB.

5 Design of SDM modulators: I

In this section, a fully operational SDM will be designed. We will use the linearized modelof the SDM to obtain values for the coefficients of the SDM, following to a large extentthe design route proposed in [2], and also discuss ways to ameliorate the stability problem.From the start, it is important to know that the only way to obtain reliable insight inthe performance of the SDM, is by simulation; although the linear approximation usually

20


21/50

results in a working SDM, it is too crude to provide numbers about SNR and, even moreimportant, it does not provide any insight in stability.Also, in the design process, we assume an effective quantizer gain c = 1. Simulations basedon this design can give some idea about what the effective gain c actually is within thelimitations of the linear model, and be used for further refinement of the loop-filter.

5.1 Loop-filter design

A very convenient way to start the design of a SDM modulator is the linear model of Fig. 6,where we take the gain c = 1. We take a feed-forward structure from Fig. 7, and writedown the NTF that is associated with it. We can write for the loop-filter H(z):

H(z) = c1z1

1 z1 + c2(z1

1 z1 )2 + c3(

z1

1 z1 )3 + c4(

z1

1 z1 )4 (4)

and making use of the relation NTF(z) = 1/(1 + H(z)) we arrive at:

NTF(z) = (1 z

1)4(1 z1)4 + c1z1(1 z1)3 + c2z2(1 z1)2 + c3z3(1 z1) + c4z4 (5)

which is to be recognized as a filter of the appearance NTF(z) = (1z1)n/Pn(z1). Thisis the form of a Butterworth or a Chebyshev type II filter1; the choice of either of thoserealizations dictates the final appearance of the polynomial P(z). Likewise, the STF canbe computed as ST F(z) = 1 NTF(z), resulting in:

ST F(z) =c1z

1(1 z1)3 + c2z2(1 z1)2 + c3z3(1 z1) + c4z4(1

z1)4 + c1z1(1

z1)3 + c2z2(1

z1)2 + c3z3(1

z1) + c4z4

(6)

The approach that can now be followed is to design a high-pass filter for NTF(z), accordingto Butterworth or a Chebyshev-II (or any other) rules, and reorganize terms such that itis in the shape of Eq. (5). One way of approaching this is to use a symbolic manipulationpackage such as Mathematica [14], or to collect terms in powers of z and equate identicalpowers. From an engineering point of view, a very easy way of obtaining the coefficientsci is by recognizing that 1/NTF(z) is linear in the coefficients ci. It is then possible to setup a linear system for (at least as many as the order of the system) different values of z.These values must have no simple relation to each other, but need not be complex. In thisway, it is also irrelevant whether the Butterworth filter is provided as a cascade of biquads,or as a direct realization.

When we inspect the feedback structure (lower part of Fig. 7), we see that the transfer char-acteristic for the NTF(z) is identical to the NTF of the feed-forward structure discussedabove.

1albeit scaled such that the first term c0z0 ofH(z) equals zero. If this term were non-zero, the resulting

SDM would not contain a delay in the closed loop and hence be not realizable.

21


22/50

However, the STF is given by

ST F(z) =z4

(1 z1)4 + c1z1(1 z1)3 + c2z2(1 z1)2 + c3z3(1 z1) + c4z4 (7)

which, for low frequencies equals about 1 if the coefficient c4 equals unity (this refers to thescaling as applied in Fig. 7). For higher frequencies, the STF displays an almost third-orderroll-off. This is in contrast to the feed-forward topology, where the STF rolls off only veryslightly (first order) for high frequencies.As an example, we will design a fourth order SDM, with a NTF according to a Butterworthhigh-pass filter design. The cut-off frequency is chosen as 150 kHz. Because the SDM needsto be realizable, the total loop needs to embody at least a single delay, i.e., the term withz0 in the STF needs to be zero. This corresponds with the requirement that the high passfilter should have 1 as its first value of the impulse response. This can be accomplished bymultiplying the high pass filter with a certain coefficient (larger than 0), resulting in a HFgain which is larger than 1. With the above in mind, we obtain for the NTF:

NTF(z) =+1.00z0 4.00z1 + 6.00z2 4.00z3 + 1.00z4+1.00z0 3.13z1 + 3.75z2 2.03z3 + 0.42z4 (8)

This results in the following coefficients in the feed-forward structure:

c1 = 0.8707115357c2 = 0.3594322506c3 = 0.0811807847c4 = 0.0083240406

(9)

For the feed-forward structure, the STF is now given by:

ST F(z) =+0.00z0 + 0.87z1 2.25z2 + 1.97z3 0.58z4+1.00z0 3.13z1 + 3.75z2 2.03z3 + 0.42z4 (10)

For the feedback structure, the STF is given by:

ST F(z) =z4

+1.00z0 3.13z1 + 3.75z2 2.03z3 + 0.42z4 (11)In Fig. 12, the different STFs for a feed-forward and feedback structure, with an identicalNTF, have been calculated. The NTFs are designed as 4th order Butterworth high passcharacteristics, with a cut-off frequency of 150 kHz. Clearly, the strong roll-off character-istic of the feedback structure can be observed. Interestingly, the feed-forward topologydisplays a strong peak in its transfer characteristic at the cross-over frequency. This featureis not obvious from Eq. (3) if only the magnitude response |H| is used. The maximumpeak height is in this case about 6 dB.This loop-filter design gives rise to an SDM with a maximum input of about -5 dB (i.e.,0.57 w.r.t. the feedback signal from the quantizer). At an input of a sine with an amplitude

22


23/50

-60

-50

-40

-30

-20

-10

0

10

10 100 1000 10000 100000 1e+06 1e+07

Gain(dB)

frequency (Hz)

FF.STFFB.STF

Figure 12: Signal transfer functions for a feed-forward topology (red) and a feedback topology(green) with identical NTFs.

23


24/50

cut-off (kHz) DR (dB) max. input level100 85 0.77120 90 0.70150 97 0.57170 100 0.49

Table 1: Trade-off of the maximum input range and the SNR in the base-band.

of 0.5, the (unweighted) Signal to Noise Ratio (SNR) in the band 0-20 kHz is about 97dB. In SACD applications, this is not sufficient: a signal-to-noise ratio of at least 100 dBis desirable. However, one might argue that the A-weighted SNR is much better, becausethe noise floor is large only for frequencies close to 20 kHz. Indeed, for this example, theA-weighted SNR amounts to about 105 dB. More important is the maximum modulationdepth of the modulator. The definition of the 0 dB level in SACD is 50% modulation depth,i.e., the sine wave from the previous example would correspond to 0 dB SACD exactly.Peaks in the signal of +3.1 dB are allowed (though for a short period only) 2. Hence, theSDM needs to be stable for inputs up to a level of about 0 .71.

For every SDM design, there is a trade-off between stability of the modulator and the SNRin the base-band. As an example, consider the results in table 1 for different 4th orderSDMs, which have all been created using Butterworth high pass filters as design NTF.

Clearly, for these modulators it is not possible to obtain a dynamic range (unweighted)exceeding 100 dB, while maintaining the possibility for seriously overloading the SDM toa level of +3.1 dB.

One way of increasing the SNR in the audio band, while hardly reducing the maximuminput level, is to use higher order filters for the NTF, and to use a Chebyshev type II -likehigh pass filter for the NTF design instead of a Butterworth characteristic. Chebyshev

type II high pass filters can easily be created in SDMs by the construction of resonatorsections, as displayed in Fig. 13.

The construction in Fig. 13 is, in principle, applicable to a feed-forward topology; for afeedback topology, a similar arrangement with a feedback loop over two integrator sectionsis possible. In Fig. 13, two outputs of the resonator section are indicated, R1 and R2; therelation between these is that R2(z) = h(z)R1(z), designating the transfer characteristicof the integrator section as h(z) = z1/(1 z1).Also, two different realizations of the feedback path (with coefficient f) are possible. Thefull drawn curve in Fig. 13 doesnt incorporate the delay that the dotted realization does.The effects of the dotted feedback structure can be obtained as follows. The transfer R2(z)

of the resonator section becomes:

R2(z) =h2(z)

1 + fh(z)2(12)

2These and other audio requirements are in part 2 of the SACD scarlet book[10]

24


25/50

T T

R1

R2

f

Figure 13: A cascade of two integrator sections in a SDM, with a feedback loop between theintegrators. The two different ways of incorporating the feedback loop result in slightly dif-

ferent pole characteristics. Indicated are the two different outputs, which are characterizedby a transfer function R1(z) and R2(z), respectively.

This function has a pole at zp when z= zp solves (1 + f)z2 2z1 + 1 = 0, i.e.,

zp = 1 if (13)Hence, the norms |zp| > 1. The reduced frequencies fpole of these poles are thus given by

fpole = atan(

f) (14)

In the case of the full feedback path in Fig. 13, the resonator has a transfer functionR1, R2(z) given by:

R1(z) =h(z)

1 + zfh(z)2; R2(z) = h(z)R1(z) (15)

In this case, the poles are given by

zp = 1 f2 i

2

4f f2 (16)

Contrary to the previous case, these poles are exactly on the unit circle. The pole frequen-cies are given by:

fpole = acos(1 f2

) (17)

which, for small values off, virtually coincides with the pole frequencies given by Eq. (14).As such a feedback loop over two integrator sections transforms the two poles at DC(z1 = 1) in two complex conjugate poles away from DC, care should be taken that thereis enough DC gain in the loop-filter to avoid DC drift. As an example, consider the 4th orderSDM with a Butterworth design, corner frequency 150 kHz. Choosing the poles to movefrom DC to 10 and 19 kHz, the numerical values of the feedback coefficients obtainedare 0.000496 and 0.001789. The SDM obtained has a maximum input of 0.57 (0.57 withoutresonators) and a SNR of 107 dB (97 dB without resonators). Indeed, the addition of the

25


26/50

TC

+C

Figure 14: Principle of a clipped integrator. The absolute value of the output of the inte-grator cannot exceed a value of C.

poles, turning the Butterworth characteristic in a Chebyshev II - like characteristic, givessignificant better SNR; the DC suppression of the loopfilter is still better than 120 dB,which is sufficient. Compared to the A-weighted SNR figures, the improvement is less,because the poles primarily serve to suppress the noise between 10 and 20 kHz.A further improvement can be obtained when using a fifth order SDM, with a ButterworthNTF design (corner frequency 110 kHz) plus the poles at 10,19 kHz: in that case the SDMis stable to inputs up to 0.58, with a SNR of 120 dB. Note, that in this case, there is still

1 integrator with a pole at DC, and thus there cannot be any DC drift. To clarify theoperation of such a SDM, pseudo-code of the SDM is provided in App. A.

A drawback of the above implementations of resonator sections is that the resulting filteris not minimum phase; due to this, not the full potential that noise shaping offers can berealized. Although the improvement that can be realized by a minimum-phase filter is (inthis case) limited, a very interesting suggestion is the following3. Suppose that we createa resonator section, which contains both the dotted and the full drawn realization of thefeedback. Denote the feedback coefficient in the full drawn realization by f1, the feedbackcoefficient in the dotted structure by f2. The poles ((1 + f2) (1 f12 )2) are then given by

zp = (1 f12

) i

(1 + f2) (1 f12

)2 (18)

and have |zp| =

1 + f2, with reduced pole frequencies fpole = atan(

1+f2(1f1/2)2

1).Hence, the radius and pole position can be adjusted independently, and it is possibleto have |zp| < 1 at the cost of an additional feedback path in the resonator section.

5.2 Enforcing SDM stability

So far, we have not bothered about what happens if the SDM input exceeds its maximum:

the SDM gets into wild oscillations, with constantly increasing amplitude in the integratorstates and decreasing frequency. Even worse, when the input is removed from the system,the SDM does not return to its original state. To avoid such a situation, it is customaryto use clippers in each integrator stage. In Fig. 14, a schematic representation of a clipped

3This observation has been made by prof. S.P. Lipshitz.

26


27/50

integrator is given. The idea is that the output of the integrator can never exceed its clipvalue, C. In other words, the integrator section simply stops integrating when the cliplevelC has been reachedThe purpose of these clippers is to avoid a situation where the values in the integratorstages get too high (and cause the SDM to start to oscillate), while still allowing integratorvalues which occur during normal operation. Whereas the main purpose of the clippers

is to let the SDM return to normal operation after overload, it is also desirable to avoidserious distortion in the signal if clipping occurs.Heuristic ways of obtaining reasonable numerical values for the clipper levels are monitoringthe integrator levels during very large sine wave inputs and square wave inputs, close tooverload of the SDM. The clipper levels C1 and C2 of the first 2 integrator stages can beset according to these values. If the higher integrator stages are assigned values accordingto this recipe as well, the situation occurs that the SDM returns to normal operation afteroverload, but can have all clippers activated simultaneously. This will cause serious clicksand pops (especially if the first integrators run in their clippers). Hence, the higher orderclippers should be designed such that the high order clippers are activated first, before thelow order clippers are activated.As an example, let us consider the fifth order SDM designed previously. Its feed-forwardcoefficients are:

c1 = 0.79188240; c2 = 0.30454538; c3 = 0.06992965; c4 = 0.00949572; c5 = 0.00060680

with resonator coefficients:

f1 = 0.000496; f2 = 0.001789

The pseudo-code of this SDM is provided in App. A, suitable for easy implementation inany programming language. Without any clippers, the SDM is stable for sine inputs up

to 0.58; for higher amplitudes, the SDM gets fully unstable. Looking at the maximumintegrator values during operation close to overload, we obtain a value C1 and C2 for thefirst and second clipper respectively of about 4 and 9. The following clipper values arechosen such, that the product Cici of the clipper value and the corresponding feed-forwardcoefficient is reduced by about 1.5 - 2 per integrator stage. This is illustrated in table 2.From table 2, we can obtain some idea about the influence of the clippers on the SDMoperation. The clippers are sometimes activated during operation at 0.5 input level, whichcauses a small reduction in SNR with respect to the 120 dB without clippers. However,whereas the original SDM turned unstable at inputs of 0.59, its clipped version showscontinuous stable operation. Even at inputs of 0.65, the first integrator is not clipped,indicating that the signal distortion is still limited, and highly audible clicks are absent.In fact, only at input levels exceeding 0.75, the initial integrator will clip, which causes aclearly audible effect. At the level of 0.75, the SNR has dropped to about 60 dB.As an alternative to clipping in the SDM, clipping before the SDM might be considered.However, in this case dynamic range must be sacrificed, although the resulting system isunconditionally stable for large inputs.

27


28/50


29/50

-250

-200

-150

-100

-50

0

10 100 1000 10000 100000 1e+06 1e+07

test.AvgPwr

Figure 15: Example of a SDM which has been created according to NTF design by cascadinga third order high pass filter and a fourth order high pass filter.

The first, in line with the previous section, consists of cascading 2 (or more) high passfilters, which then make up the SDM NTF. For example, one could wish to create a SDMwhich is third order starting from 150 kHz, and than turns 7th order at about 40 kHz.

An example of such a design is given in Fig. 15. That SDM has been obtained by designingan NTF as a cascade of a third order Chebyshev high pass filter, with a corner frequencyof 150 kHz, and a fourth order filter of the same type with a corner frequency of 40 kHz.The cascade is hence 7th order below 40 kHz, and in this way some of the merits of a loworder and high order SDM can be combined.

A more heuristic approach is to set each coefficient ci in the SDM to a fraction of itsprevious coefficient ci1. An example of such a SDM in non-delayed feed-forward topologyis given in Fig. 16, which represents a 7th order SDM where each coefficient is 0 .475 timesits previous coefficient. Note, that this is really a recipe; the actual performance of theSDM is determined too by its topology (e.g., a SDM with delayed feedback topology wouldbe unstable with these coefficients).

It is interesting to see, that a NTF characteristic as displayed in Fig. 16 can be approx-imated by a cascade of first order filters with different corner frequencies. In that case,there is full control over the SDM design.

29


30/50

-250

-200

-150

-100

-50

0

10 100 1000 10000 100000 1e+06 1e+07

test.AvgPwr

Figure 16: Example of a SDM which has been created by setting each feed-forward coefficientci to 0.475ci1 (c1 = 1).

7 Signal processing

A crucial point in any audio chain is signal processing, ranging from simple volume adjust-ments to complex equalizations. It is immediately apparent, that a direct translation ofthe PCM-way of signal processing does not exist in DSD. For example, if a DSD signal is

volume-adjusted, with a gain g = 0.123456, the resulting output (the one-bit signal multi-plied with g) is a multi-bit word. Hence, any signal processing for DSD is always consistingof a cascade of the actual processing step, followed by a re-quantization as shown in Fig. 17.

It is possible to contract some signal processing steps and the SDM re-modulator. Anexample, where an IIR filter is contracted with a SDM, is shown in Fig. 18. It is importantto note, however, that such a device is not different from the cascade of signal processing/re-modulation, although the intermediate multi-bit path is absent.

To obtain a realizable system, a low pass filter is generally necessary as indicated in Fig. 19.The reason for this is that the SDM which is used as a re-modulator, cannot cope with thehigh signal levels the DSD presents. As virtually all of the power of these signals is above100 kHz, a low pass filter operating above this frequency is sufficient to remove enoughpower such that the re-modulator remains in stable operation. In this respect, the feed-forward and feedback structures have quite different behaviour. As elaborated in Sec. 5,the feed-forward structure has little suppression of the input signal over the whole band(up to Nyquist), and sometimes even a gain just at the corner frequency of the NTF filtercharacteristic. The feedback structure, on the contrary, has strong suppression of the input

30


31/50

DSDinput

Multibit

intermediate

DSD

outputGain

IIR

DSDinput intermediate

DSD

output

Multibit

High rate!

Figure 17: Examples of DSD signal processing: gain adjustment and filter operations.

T

DSD input

DSDoutput

T T T T

Figure 18: Contraction of IIR filter characteristic and SDM, giving a structure with DSDinput and DSD output.

DSDoutput

Gain

inputDSD

intermediate

Multibit

High rate!

Figure 19: Advisable way of performing two operations on DSD data. First, a gain adjust-ment is applied, after which an IIR filter operation is applied without leaving the interme-diate high rate, multi-bit domain.

31


32/50

-140

-120

-100

-80

-60

-40

-20

0

20

0 50000 100000 150000 200000 250000 300000 350000

Magnitude(dB)

frequency (Hz)

Total

Figure 20: Transfer function of a filter which can be used to remove the HF of a DSDsignal, such that it can be input to a subsequent SDM.

32


33/50

Amplitude(dB)

Signalquality

20 100

frequency (kHz) # requantizations

Figure 21: Schematic presentation of the effect of multiple quantizations.

signal from the fore mentioned corner frequency (see also Fig. 12). Hence, a feed-forwardSDM will need more severe filtering of its input signal compared to a feedback SDM inorder to maintain stability. The response of a (64 taps) FIR filter which gives sufficientHF suppression to allow subsequent re-quantization, is shown in Fig. 20. The total signaltransfer characteristic of the cascade of a feed-forward SDM and this filter will be roughlyidentical to the STF of a feedback SDM. Clearly, the application of such a filter will turnthe 1-bit signal directly in a multi-bit signal. It is therefore important to realize, that thebenefits of DSD are in the high sample rate, they are not in the fact that DSD is 1-bit! Theimportance of this remark is further emphasized by the following notion: suppose, that thesequence of signal processing steps is necessary. If each of these steps is built accordingto Fig. 17, the total signal path will contain multiple requantizations. As a result of this,

build-up of HF noise will occur. This effect is illustrated in Fig. 21, where schematicallythe effect of multiple requantizations is displayed. This figure can be explained as follows.If we have a DSD signal, its noise starts to rise above 20-30 kHz, and reaches an almostflat level at about 90 kHz. If, in a subsequent re-quantization, the bandwidth of DSD ismaintained, the signal is low pass-filtered at a frequency of about the same value (90 kHz).If this signal is fed to a next SDM, its output signal will contain both its own quantizationnoise, as well as the quantization noise that has been input to it. If this cascade is repeated,it is easy to see why there will be a build-up of HF noise in the area of about 80-90 kHz.Eventually, this signal will be large enough to drive the SDM into its clippers, or, worse:instability. This effect is shown in the right of Fig. 21; as the number of requantizations

increases, the signal quality drops slowly. At the moment that the HF noise is large enoughto activate the clippers, the signal quality drops rapidly.

Hence, allsignal processing should be done in a multi-bit domain; only after the final signalprocessing step the conversion to 64fs 1-bit signals should be made.

33


34/50

8 Dithering and linearizing SDMs

SDMs are devices with a quantizer; as we are used to with the quantizers from the PCMworld, we need to linearizethe devices that use a quantizer. With the multi-bit quantizingPCM devices, it is common knowledge that the quantizers need to be dithered with TPDFdither (dither, distributed according to a Triangular shaped Probability Density Function)

of full width at half height of 1 LSB [6]. Such dither can easily be obtained by adding2 random numbers from a uniform distribution of width 1 LSB. For SDM, this recipeis a contradiction in terminis, since the quantizer spans only one bit and, hence cannotaccommodate the afore mentioned tpdf dither which spans 2 bits. Still, dithering in whatwe will coin the classical sense is a very useful technique and has been well-researched;see [8] and [9] and references therein. Even so, new dither techniques are being discussed,which are more appropriate for 1-bit coders; see, e.g. [3]. Next, we will discuss someaspects of dithering in the classical sense.

As dither is used to remove the effect of non-linearity, we can distinguish two differentappearances of the non-linearity: limit cycles4, idle-tones and distortion. As the idle

tones and distortion are heavily suppressed by the loopfilter, we will ignore it for themoment. In Sec. 9, a more detailed discussion about non-linearity in an SDM is presented.Limit cycles, however, can be very annoying: they can appear in the audible range and,even in the audible range, have high power. Consider, for example, an SDM with thetopology at the top of Fig. 7, characterized by the following feedforward coefficients: c1 =2048; c2 = 768; c3 = 128; c4 = 16; c5 = 1. Clearly, this SDM is extremely well-suitedfor implementation in hardware, as the coefficients represent simple powers of 2, except c2,which is the sum of two powers.

Its spectrum, input zero, is displayed in Fig. 22 which does not show any resemblancewith the familiar noise-shaped curve: it is a limit cycle. A limit cycle is a purely repetitivepattern of certain length; for example, a repeated sequence (representing zero input - see

also Sec. 4.1) of 1,1,1, 1,1, 1, 1,1 represents a limit cycle of length 8. The limitcycle in Fig. 22 has length 32, as can be read from its fundamental at 88 kHz.

Fortunately, little needs to be done to break up the limit cycle. For example, any inputsignal exceeding an amplitude of -90 dB will remove the limit cycle completely. To allowfor digital silence, though, the use of dither is required, and a very useful way is by applyingdither with a rectangular PDF (RPDF dither) just before the quantizer. In the case ofthe SDM we are discussing here, an appropriate amount of dither has a pdf with a widthof 200 (and a mean of 0), and needs to be added immediately before the quantizer. Theresulting spectrum is displayed in Fig. 22. This has the advantage, that the dither willbecome noise-shaped too (as the quantization error) and the increase in noise floor will be

marginal. In this case, the undithered SDM has a dynamic range of 98.4 dB (full scaleSACD), whereas the dithered SDM has a dynamic range of 98.0 dB. The maximum input,before the SDM turns unstable, has been reduced from 0.7104 to 0.7098 for an input of a

4In the literature, these tones are sometimes also called idle tones. We reserve the name idle tones forsignals which are not purely repetitive - see also Sec. 9.

34


35/50

-300

-250

-200

-150

-100

-50

0

100 1000 10000 100000 1e+06

Power(dB)

frequency (Hz)

Figure 22: Example of a limit cycle occurring in a SDM with zero input (green). In red,the spectrum after application of dither (also zero input) is shown.

35


36/50

-220

-200

-180

-160

-140

-120

-100

-80

-60

-40

-20

100 1000 10000 100000 1e+06

Power(dB)

Frequency

Figure 23: A noise shaper which is typically used in SACD applications. The spectrumhas been coherently averaged 100 times, and this has been repeated 10 times to obtain apower averaged spectrum.

1 kHz sine wave. Hence, this amount of dither has hardly any drawbacks, and significantadvantages.The distortion introduced by the SDM amounts to -150 dB in the band 0-20 kHz (seeSec. 9 for a more detailed discussion about non-linearity in an SDM). The dither addedto the quantizer, will hardly change that number, but it is disputable that this amount of

distortion (in PCM, this would have been below the 25 bit level) would lead to audibleeffects.

9 Non-linearity in a SDM

To present a realistic situation, a spectrum of a SDM that is typically used in SACDapplications is presented in Fig. 23. For the purpose of this discussion, this SDM has notbeen dithered. The input to this SDM has been a 4 kHz sine (-6 dB SACD amplitude).If we are interested in the base-band, extending from 0 to 20 kHz, the relevant distortionproducts are the 2nd up to the 4th component. From inspection of Fig. 23, it can beconcluded that the distortion components are all at most -165 dB, where the noise in theFFT obscures any information deeper than that. The noise floor of this SDM is at -127 dB,resulting in a DR of about 120 dB (recall, that the SACD reference 0 dB level has beendefined as -6 dB with respect to the level in the feedback path). It is also instructive toextend the region of interest to the band 0-80 kHz. Obviously, the noise floor is increasing

36


37/50

DIGITAL

LPF

n-BIT

SDMDAC

ANALOGUE

LPF

DSD; 64fs n-bit; m.64fsmulti-bit; m.64fs analogue

Figure 24: Example of an audio chain found in an SACD-capable player. The DSD is first

low pass filtered in the digital domain, followed by up-sampling to m fs, typically, 128 or256 fs. This high-rate signal is then fed to an n-bit SDM, where n typically varies between1.5 and 5. Finally, the analog output is passed through an analog low pass filter.

steeply (in the case presented in Fig. 23, this increase is fifth order) causing the maximumSignal-to-Noise Ratio (SNR) to drop to about 90 dB in the band 0-40 kHz, and about55 dB in the band 0-80 kHz. Any harmonic distortion component, however, is at a levelat least below -95 dB. Clearly, any harmonic distortion component that we are dealingwith in the broader sense of the audio band, is extremely small, and its importance forthe perceptual audio quality can be doubted. In view of the fact that this SDM has not

been dithered, it is clear that dithering will even further reduce these numbers. In fact, ifthis SDM is dithered to its maximum level (where it is just not overloaded) the distortioncomponents in the audio band are all below -180 dB, only observable after 5000 coherentaverages, and the components in the broader audio band are below -110 dB.

Still, the total amount of coherent power that is present in the dithered signal is significant.The amount of coherent power can easily be estimated if the actual noise is assumed tohave no correlation with the signal. It appears that the total amount of coherent powerwhich is present in Fig. 23, is about -10 dB. It is obvious that this power is mostly above1 MHz; 99.99% of the coherent power is found in this high frequency area. The exactvalue of the frequency above which most of the correlated signal is found, is dependent

on the signal which is input to the SDM; it will, however, never be very much lower thanthe quoted 1 MHz. It is beyond doubt, that the origin of these signals in the very highfrequency area is in the non-linear behavior of the SDM. Indeed, if a triangular pdf ditheredmulti-bit quantizer is used in the noise-shaper, the high frequency components disappear.Thus, the coherent signal above 1 MHz can be considered in some sense to be distortion.

To judge whether these distortion components are harmful, we need to look at the full audiochain which is used to replay DSD in a typical SACD-capable player. Such a configurationis shown in Fig. 24. A typical DAC-chip (see e.g. [1] or [7]) contains the first 4 blocksdisplayed in Fig. 24. The digital filter in the path leading to the n-bit SDM is a crucialpart, where most of the HF signal present in the DSD signal can be removed without anycompromise. As an example, consider a filter that is designed according to the followingcriteria: pass-band: 0-100 kHz, flat within 0.01 dB; transition band 100 kHz - 900 kHz;stop band: 900 kHz - 1.4MHz, suppression 100 dB. This leads to a filter with only 22 taps,and thus does not pose any additional constraint in terms of hardware; the filters which arenecessary to do proper up-sampling from a low sample rate format to the required m 64fs,are much more demanding. Also, the digital LPF does not influence the impulse response

37


38/50

of DSD [13], as the transition width is extremely large. It is clear, that the application ofthis filtering will lead to significant suppression of the high frequency components presentin the original DSD stream. Still, the signal contains substantial amounts of HF, whichis foremost white noise. The signal is then up-sampled to a frequency that is used toperform the digital-to-analog conversion on. The SDM will noise-shape this signal intoan n-bit signal, where n typically varies between 3 [1] and 5 [7]. It is this signal, which

is converted to the analog domain. Due to the noise shaping process, which is intrinsicin modern, high-end DA converters, and is the sole basis for their very high performance,some additional high frequency noise extending to frequency regimes well above 1 MHz isintroduced. This noise is usually removed by an analog low pass filter of first or secondorder. This filtering is most often passive, and can thus be performed with exceptionallylow distortion and inter-modulation. In most SACD players, some additional filtering isprovided, to reduce the amount of HF noise (which by then, is mostly due to the DSDsignal) even further to levels well below -30 dB. It is important to remark, that the HFsignal levels at which these additional filters need to operate are quite low due to thedigital pre-filtering (which removed a very substantial amount of HF signal causing thetotal signal power to be substantially less than 1); hence, the linearity of the filters canbe quite high and the filtering operation is performed without additional inter-modulationproducts.

This example of a typical SACD signal path shows, that the non-linearity above 1 MHz isnot important at all, and does not influence the signal quality. In fact, one can argue thatthese components are favorable. Because the total power of the SDM output is constantand equals 1, the power which is present in these high frequency tones causes the SNR in thelower frequencies to be higher than anticipated on basis of the linear noise transfer function.Hence, they contribute favorably to the dynamic range of an SDM. This discussion thenleads to the question whether it would be possible to linearize a SDM in the importantsignal band, without bothering about its high frequency behavior.

9.1 Pre-correction

In order to have a system which demonstrates in a clear way the effects that we will studyin this section, a third-order SDM has been designed. Such low order SDMs are notoriousfor their relatively bad signal properties [9]. The spectrum of the third order SDM thatwill be used in the sequel of this paper is shown in Fig. 25.

While this third-order SDM has a dynamic range of about 90 dB, its third harmonic is ata level of -104 dB. While this is still a rather respectable number, it is about 60 dB largerthan the distortion component of the SDM shown in the previous section. The higher

order harmonic distortion products are significant, too. Also in the broader signal band(0-80 kHz) the distortion components are larger. It should be remarked, that this type ofSDM is not recommended for practical use.

When we model the SDM as a non-linear element , its transfer characteristic can bewritten as:

38


39/50

-180

-160

-140

-120

-100

-80

-60

-40

-20

100 1000 10000 100000 1e+06

Power(dB)

Frequency

Figure 25: Spectrum of the third order noise-shaper used in the analysis of the pre-correction technique. The input signal is a 3 kHz sine wave, -6 dB SACD. To obtainthis spectrum, a series of 4 coherent averages and 10 power averages has been used.

(x) = x + 2x2 + 3x

3 + . . . (19)

Now, if we could create a signal s(x) according to:

s(x) = x 2x2

3x3

. . . (20)then the resulting output signal f(v(x)) would be given by:

(s(x)) = x 222x3 + O(x4) (21)In other words, the second harmonic distortion component has been completely removed,and the third harmonic component has been substantially reduced (note, that for the lowdistortions we are dealing with, i 1). An estimate of the signal s(x) can be obtainedusing the structure depicted in Fig. 26.The topology of Fig. 26 operates as follows. The first SDM generates a signal, whichis subtracted from the original input signal x. This difference signal v now contains allthe distortion components which are generated by the SDM, and the uncorrelated noisewhich has been added to the signal because of the noise shaping. This signal v is nowlow-pass filtered in the filter F, which has, for example, a cut-off frequency of 100 kHz.This results in the signal denoted F(v) in Fig. 26. Next, the original input signal x (afterthe appropriate delay to correct for the delay in the filter f) is added to F(v), resulting in

39


40/50

SDMy

+

+

+SDM Fv s(x)F(v)

Delayx

SDPC

Figure 26: Basic Sigma Delta Pre-Correction (SDPC) structure.

the signal s(x). While the filtering action has removed all HF noise, more in particular,it has removed the strong signals above 1 MHz, it has not removed any noise in the bandbelow 100 kHz. Hence, the signal s(x) presents only an approximation to the signal s(x)in Eq. 20. The signal s(x) is than input to a next SDM, which is identical to the SDMused to generate v, resulting in the final output signal y.To gain some insight in the performance of this algorithm, which we will refer to as SigmaDelta pre-correction (SDPC), we have applied it to the third order SDM displayed inFig. 25. The spectrum of the resulting signal y is displayed in Fig. 27 in the range 0-100 kHz. The huge suppression of the distortion components is clearly visible. Typically,the distortion has been reduced by about 20 dB. For higher frequencies, the suppressionbecomes less effective, even though the signal s(x) contains all distortion components unat-tenuated in the frequency regime. As always, there is a price to pay for this improvementin THD, which in this case is an increase in the noise floor by 3 dB. This is clear from in-spection of Fig. 27, when one realizes that the corrected spectrum has been obtained usingtwice as many coherent averages which lowers the noise floor by 3 dB, and that the noisefloor is identical to the noise floor of the uncorrected spectrum. This also corroborates thefact that this is white noise indeed; if it was correlated, it would result in a more than

3 dB increase. The origin of the increase of the noise floor is the fact that the signal s

(x)still contains the quantization noise present in the low frequency range; the second SDM inthe cascade adds its own quantization noise to it. Though not visible in Fig. 27, the highfrequency signals above 1 MHz are completely unchanged using the new topology, whichis expected on basis of the absence of correction components in the signal s(x).

9.2 SDPC and dither

To appreciate the effect of SDPC, it is also instructive to study the combined action ofdither and pre-correction. To that end, we have applied a dither level of 0.1 (the SDMstarts overloading at levels of 0.8) to the SDM.Spectra of the original SDM, and the SDPC spectrum are displayed in Fig. 28. Also in thiscase, the suppression of the distortion components is at least 22 dB in the band 0-20 kHz;in fact, even after 64 coherent averages, no distortion components can be observed. Note,that distortion has decreased to levels below -135 dB! Hence, the combined action of smallamounts of dither, and the pre-correction technique result in extremely low distortion

40


41/50

-180

-160

-140

-120

-100

-80

-60

-40

-20

1000 10000 100000

Power(dB)

Frequency

Figure 27: Spectra of the original SDM (green), and its implementation according to Fig. 26(red). The spectrum of the original SDM has been obtained using 4 coherent averages and10 power averages; the other using 8 coherent averages and 10 power averages. The factthat the noise floors of the spectra coincide precisely illustrates the 3 dB loss in SNR dueto SDPC.

41


42/50

-180

-160

-140

-120

-100

-80

-60

-40

-20

1000 10000 100000

Power(dB)

Frequency

Figure 28: Spectra of the original (dithered) SDM (green), and its implementation accord-ing to Fig. 26 (red) using the same dither. The spectrum of the original SDM has beenobtained using 8 coherent averages and 10 power averages; the other using 64 coherentaverages and 5 power averages.

42


43/50

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

100 1000 10000

Phase

(rad)

Frequency

Figure 29: Phase characteristic of the signal transfer function of the third order SDM usedin this paper.

figures. Again, the reduced distortion suppression for higher frequencies is visible; forexample in the region above 40 kHz, the suppression is typically only 8-10 dB.

While the higher harmonics are suppressed less than the lower harmonics, which is shown

by Eq. (21), this does not fully explain the reduced suppression. Another origin of thisreduced suppression for higher frequencies lies in the fact that the phase characteristic ofthe SDM used here is not straight for frequencies above 20 kHz. This results in somephase distortion, which is not accounted for in the pre-correction technique according toFig. 26. To obtain an estimate of the significance of these errors, consider a single harmonich(t) = A sin(t), which is positioned around 50 kHz. The absence of phase correction will cause incomplete cancellation of the harmonic; a residual power of 4A22 will remain.In this case, this results in a maximum power reduction of the harmonic by only 14 dB.

An improved pre-correction technique is therefore displayed in Fig. 30. In this diagram, thephase error introduced by the SDM, is corrected for by the filter L. Another improvementcan be obtained by cascading the structures displayed in Figs. 26 and 30. In a non-cascadedstructure, the cancellation of lower order terms, causes the generation of higher order terms,albeit of much lower amplitude, as can be concluded from Eq. (21). These new, higherorder terms, can in turn be canceled in exactly the same way as the lower order ones werecanceled, resulting in cascading the structure in Fig. 30.

43


44/50

x

SDMy

+

+

+SDMv

L

GF (v ) s (x)

Delay

SDPC

Figure 30: Improved pre-correction structure. By cascading the Sigma Delta Pre-Correction structure (SDPC) n times, n harmonics can be removed.

-200

-180

-160

-140

-120

-100

-80

10000 20000 30000 40000 50000 60000 70000 80000

Powe

r(dB)

Frequency

Figure 31: Fifth order SDM, with a 3 kHz input of -6 dB. The uncorrected spectrum (green)has been obtained after 16 coherent and 10 power averages; the corrected spectrum (red)after 2048 coherent and 10 power averages.

44


45/50

-220

-200

-180

-160

-140

-120

-100

-80

-60

100 1000 10000 100000

Power(dB)

Frequency

Figure 32: Fifth order SDM, with a DC input of 1/1024. The uncorrected spectrum (green)has been obtained after 4 coherent and 10 power averages; the corrected spectrum (red)after 32 coherent and 10 power averages.

9.3 Performance of a realistic SDM with SDPC

To end with a realistic situation, and to show how SDPC also suppresses DC tones, astandard fifth order SDM has been designed, with a SNR of 118 dB over 0-20 kHz.As illustrated in Fig. 31, harmonic distortion levels of this SDM in the phase-corrected

SDPC structure are reduced to well below -185 dB if undithered, which amounts to animprovement of about 35 dB compared to 20 dB improvement with the standard SDPC.If the SDM is slightly dithered, the distortion levels drop to much deeper levels, whichnumerically appeared to be inaccessible (i.e., below -220 dB). Also, distortion levels athigher frequencies are reduced more compared to the standard SDPC algorithm. As withthe uncorrected SDM, the SNR in the base-band (0-20 kHz) is slightly reduced from 118 dBto about 115 dB (no dithering) or 114 dB (with dithering).The effects of a DC input to the SDPC system are illustrated in Fig. 32. As input to thissystem, a DC value of 1/1024 has been applied, which results in a tone around 5.5 kHz.The SDM has not been dithered.In the spectrum of Fig. 32, a tone can be observed with an amplitude of about -145 dB.Application of the pre-correction algorithm, in its basic form, reduces this amplitude toabout -165 dB. If a small amount of dithering (RPDF with amplitude 0.05) is applied, whichis much less than the maximum allowed amount of dither (0.4 RPDF), the amplitude ofthe tone cannot be observed after 256 coherent averages, indicating that the tone is atleast less than about -175 dB. Also application of the improved SDPC results in values for

45


46/50

spurious signals that are not easily accessible numerically.

46


47/50

10 Acknowledgements

The authors want to thank prof. S.P. Lipshitz, prof. J. Vanderkooy, Dr. J.D. Reiss andH. ten Pierick for their valuable comments and proofreading of the manuscript.

47


48/50

A SDM-code

In this appendix, we provide the C-like pseudo code for the SDM discussed in Sec. 5.2.The code simulates 100000 clock cycles of the SDM, with a DC input of 0.1 .

/* Coefficients: */

c = {

0.791882,

0.304545,

0.069930,

0.009496,

0.000607

};

f = {

0.000496,

0.001789

};

/* Initialization */

s 0 = s 1 = s 2 = s 3 = s 4 = 0 ;

y = 1 ;

N = 100000;

/* Main loop */

for (i = 0; i < N; i++) {

sum = c[0]*s0 + c[1]*s1 + c[2]*s2 + c[3]*s3 + c[4]*s4;

if (sum >= 0)

y = 1 ;

else

y = -1;

x = 0.1;

s4 = s4 + s3;

s3 = s3 + s2 - f[1]*s4;

s2 = s2 + s1;

s1 = s1 + s0 - f[0]*s2;s0 = s0 + (x-y);

}

}

48


49/50

References

[1] B. Adams, K. Nguyen, and K. Sweetland. A 116 db snr multi-bit noise shaping dacwith 192 khz sample rate. In Proceedings of the 106th AES convention, 1999. preprint4963, Munich (1999).

[2] R.W. Adams, P.F. Ferguson, A. Ganesan, S. Vincelette, A. Volpe, and A. Libert.Theory and practical implementation of a fifth order sigma-delta a/d converter. J.Audio Eng. Soc., 39:515528, 1991.

[3] M.O.J. Hawksford. Time-quantized frequency modulation with time dispersive codesfor the generation of sigma-delta modulation. In Proceedings of the AES 112th con-vention, 2002. Preprint 5618, 2002 may 10-13 munich.

[4] H. Inose and Y Yasuda. A unity bit coding method by negative feedback. Proc. IEEE,51:15241535, 1963.

[5] H. Kato. Trellis noise-shaping convertors and 1-bit digital audio. In Proceedings ofthe AES 112th convention, 2002. Preprint 5615, 2002 may 10-13 munich.

[6] S.P. Lipshitz, R.A. Wannamaker, and J. Vanderkooy. Quantization and dither: atheoretical survey. J. Audio Eng. Soc., 40:355375, 1992.

[7] S. Nakao, H. Terasaw, F. Aoyagi, N. Terada, and T. Hamasaki. A 117db d-rangecurrent-mode multi-bit audio dac for pcm and dsd audio playback. In Proceedings ofthe 109th AES convention, 2000. preprint 5190, Los Angeles (2000).

[8] S.R. Norsworthy and D.A. Rich. Idle channel tones and dithering in delta-sigma

modulators. In Proceedings of the AES 95th convention, 1993. preprint 3711, 1993october New York.

[9] S.R. Norsworthy, R. Schreier, and G.C. Temes. Delta-Sigma Converters, Theory,Design and Simulation. IEEE Press, New York, 1997.

[10] Philips and Sony. Super Audio CD System Description. Philips licensing, Eindhoven,The Netherlands, 2002.

[11] D. Reefman and E. Janssen. Enhanced sigma delta structures for super audio cdapplication. In Proceedings of the AES 112th convention, 2002. preprint 5616, 2002may 10-13 munich.

[12] D. Reefman and P.A.C.M. Nuijten. Editing and switching in 1-bit audio streams.In Proceedings of the AES 110th convention, 2001. preprint 5399, 2001 may 12-15amsterdam.

49


50/50

[13] D. Reefman and P.A.C.M. Nuijten. Why direct stream digital is the best choice asa digital format. In Proceedings of the AES 110th convention, 2001. preprint 5396,2001 may 12-15 amsterdam.

[14] S. Wolfram. The Mathematica Book. Wolfram Media/Cambridge University Press,Cambridge, 4 edition, 1999.

DerkReefman DSD Wp-2323

Documents

Transcript of DerkReefman DSD Wp-2323