Zhang et al 2008

10
Use of local Moran's I and GIS to identify pollution hotspots of Pb in urban soils of Galway, Ireland Chaosheng Zhang a, ⁎ , Lin Luo b , Weilin Xu b , Valerie Ledwith a a Department of Geography, National University of Ireland, Galway, Ireland b State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, China A R T I C L E I N F O A B S T R A C T  Article history: Received 7 December 2007 Received in revised form 6 March 2008 Accepted 11 March 2008 Available online 28 April 2008 Pollution hot spot s in urban soil s need to be identif ied for better envi ronment al management . It is imp ort ant to know if there are hot spot s and if the hotspots are statistically significant. In this study identification of pollution hotspots was investigated using Pb concentrations in urban soils of Galway City in Ireland as an example, and the influencing factors on results of hotspot identification were investigated. The index of local Moran's I is a usefu l too l for identifying pollution hotspots ofPb pollu tion in urban soils , and for classifying them into spatial clusters and spatial outliers. The results were affected by the definition of weight function, data transformation and existence of extreme values. Compared with the results for the positively skewed raw data, the transformed data and data with extreme values excluded revealed a larger area for the high value spatial clusters in the city centre. While it is hard to decide the best way of using this index, it is suggested that all these influencing factors should be considered until reasonable and reliable results are obt ained.GISmapping can be appliedto hel p evaluate the res ult s via vis ual izat ionof the spatial patterns. Meanwhile, selected pollution hotspots (extreme values) in this study were confirmed by re-analyses and re-sampling. © 2008 Elsevier B.V. All rights reserved. Keywords: Hotspot Spatial cluster Spatial outlier Urban soil Local Moran's I 1. Introductio n Urban geochemistry has received wide attention in recent years (Zhang, 2006) resulting in databases containing a large number of soil samples. Due to the spatial heterogeneity of urban soils, the challenge is to identify spatial patterns and hotspots of pollution. Geographical information system (GIS) map pin g and mul tiv ari ateanaly ses are use ful too ls to hel p the iden tific ation of spatial patt erns of pollu tion and poss ible pollution sources can be evaluated (Zhang, 2006). Furth er- more, hotspots-areas where there are high levels of pollution in comparison to the surrounding area-need to be identified in order to provide a scientific basis for better environmental management. In urban soil pollutionstudies, it is imp ort antto kno w: (1) if ther e are hotspot s and (2) if the hotspo ts are statist icall y sign ifica nt. GIS mapp ing tech nique s can help to iden tify hotspots visually, but not statistically. Among a few methods proposed for hotspot or spatial cluster identification, such as Getis's G index ( Getis and Ord, 1992), spatial scan statistics (Ishioka et al., 2007) and Tango' C index ( Tango, 1995; Zhang and Lin, 2006), the local Moran's I index seems to be the most popularly used (Anselin, 1995; Getis and Ord, 1996). While the global Moran's I (Cliff and Ord, 1981; Odland, 1988; Zhang and Selinus, 1998) is a global parameter for the measurement of spatial autocorrelation, the local Moran's I index examines the individual locations, enabling hotspots to be identified based on a comparison with the neighbouring samples. The local Moran's I index has been successfully applied in hotspot identification of disea ses ( Ruiz et al., 2004; Goovaerts and  Jacquez, 2004), mortality rates (  James et al., 2004; McLaughlin and Bosco e, 2007 ), as we ll as in envir onm ent al pla nni ng (Brody S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 22 2 1 Corresponding author. Fax: +353 91 495505. E-mail address: [email protected] (C. Zhang). 0048-9697/$ see front matter © 2008 Elsevier B.V. All right s reserved. doi:10.1016/j.scitotenv.2008.03.011 available at www.sciencedirect.com www.elsevier.com/locate/scitotenv

Transcript of Zhang et al 2008

Page 1: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 1/10

Use of local Moran's I and GIS to identify pollution hotspots of 

Pb in urban soils of Galway, Ireland

Chaosheng Zhanga,⁎ , Lin Luob, Weilin Xub, Valerie Ledwitha

aDepartment of Geography, National University of Ireland, Galway, IrelandbState Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, China

A R T I C L E I N F O A B S T R A C T

  Article history:

Received 7 December 2007

Received in revised form

6 March 2008

Accepted 11 March 2008

Available online 28 April 2008

Pollution hotspots in urban soils need to be identified for better environmental

management. It is important to know if there are hotspots and if the hotspots are

statistically significant. In this study identification of pollution hotspots was investigated

using Pb concentrations in urban soils of Galway City in Ireland as an example, and the

influencing factors on results of hotspot identification were investigated. The index of local

Moran's I is a useful tool for identifying pollution hotspots of Pb pollution in urban soils, and

for classifying them into spatial clusters and spatial outliers. The results were affected by

the definition of weight function, data transformation and existence of extreme values.

Compared with the results for the positively skewed raw data, the transformed data and

data with extreme values excluded revealed a larger area for the high value spatial clusters

in the city centre. While it is hard to decide the best way of using this index, it is suggested

that all these influencing factors should be considered until reasonable and reliable results

are obtained. GISmapping can be applied to help evaluate the results via visualizationof the

spatial patterns. Meanwhile, selected pollution hotspots (extreme values) in this study were

confirmed by re-analyses and re-sampling.

© 2008 Elsevier B.V. All rights reserved.

Keywords:

Hotspot

Spatial cluster 

Spatial outlier 

Urban soil

Local Moran's I

1. Introduction

Urban geochemistry has received wide attention in recent

years (Zhang, 2006) resulting in databases containing a large

number of soil samples. Due to the spatial heterogeneity of 

urban soils, the challenge is to identify spatial patterns and

hotspots of pollution. Geographical information system (GIS)

mapping and multivariateanalyses are useful tools to help theidentification of spatial patterns of pollution and possible

pollution sources can be evaluated (Zhang, 2006). Further-

more, hotspots-areas where there are high levels of pollution

in comparison to the surrounding area-need to be identified

in order to provide a scientific basis for better environmental

management.

In urban soil pollution studies, it is importantto know: (1) if 

there are hotspots and (2) if the hotspots are statistically

significant. GIS mapping techniques can help to identify

hotspots visually, but not statistically. Among a few methods

proposed for hotspot or spatial cluster identification, such as

Getis's G index (Getis and Ord, 1992), spatial scan statistics

(Ishioka et al., 2007) and Tango' C index (Tango, 1995; Zhang 

and Lin, 2006), the local Moran's I index seems to be the most

popularly used (Anselin, 1995; Getis and Ord, 1996). While the

global Moran's I (Cliff and Ord, 1981; Odland, 1988; Zhang andSelinus, 1998) is a global parameter for the measurement of 

spatial autocorrelation, the local Moran's I index examines

the individual locations, enabling hotspots to be identified

based on a comparison with the neighbouring samples. The

local Moran's I index has been successfully applied in hotspot

identification of diseases (Ruiz et al., 2004; Goovaerts and

 Jacquez, 2004), mortality rates ( James et al., 2004; McLaughlin

and Boscoe, 2007), as well as in environmental planning (Brody

S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

⁎ Corresponding author. Fax: +353 91 495505.E-mail address: [email protected] (C. Zhang).

0048-9697/$ – see front matter © 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.scitotenv.2008.03.011

a v a i l a b l e a t w w w . s c i e n c e d i r e c t . c o m

w w w . e l s e v i e r . c o m / l o c a t e / s c i t o t e n v

Page 2: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 2/10

et al., 2006) and environmental sciences (McGrath and Zhang,

2003; Zhang and McGrath, 2004).

This study identifies pollution hotspots by lead (Pb) in the

urbansoil of GalwayCity(Ireland)using thelocalMoran's I index

and GIS. In addition, the influences of weight function, data

transformation, and extreme values on the results of hotspot

identification using local Moran's I index are investigated.

2. Materials and methods

2.1. Study area

Galway is located on the western coast of Ireland. The study

area extends 9 km E–W and 6 km N–S in Galway City (Zhang,

2006). It was chosen based on the 2nd edition of the Galway

Street Map available from Ordnance Survey Ireland (OSI),

excluding 1 km in the west and 1.5 km in the east that are

mainly rural areas1. The bedrock types are limestone in the

east and granite in the west (Zhang, 2006). The natural soil

type in the study area is mainly grey brown podzols. In the

granite area, there are small areas of lithosol with poorly

developed soils.

The built-up areas are mainly located in the city centre,

extending about 3 kmto the west, 3 kmto the east, and 3 kmto

the north. The urban land uses are primarily for residential

and commercial purposes. There are several new industrial

estates in the eastern part of the city, but these are mainly

high-technology industries with little discharge of traditional

pollutants. The main pollution sources in the study area are

traffic and the burning of peat and coal for home heating 

(Zhang, 2006). There is also evidence of dumping in parts of 

the city that are now used for other purposes. For example,

Carr et al. (2008) found evidenceof rubbish in an area currently

used as a sports ground.

2.2. Sampling and chemical analyses

A total of 166 surface soil samples (0–10 cm depth) were taken

from parks and grasslands in Galway city during Nov. 1–Dec.

16, 2004 (Zhang, 2006). About 1 kg of each soil sample was

collected using a stainless steel spade and a plastic scoop,

and fresh samples taken from a single hole with an area of 

about 30×30 cm were placed in clean bags. The coordinates of 

sampling locations were recorded using a differential Global

Position System (GPS) receiver. The sampling density was 1

sample per 0.25 km2 based on thegrid system of Galway Street

Map from OSI using a stratified random sampling scheme. No

samples were taken at grids where access was hard to obtain.

Raw samples were sent to OMAC Laboratories in Loughrea,

County Galway, and dried at 40 °C. The b2 mm part of all

samples were retained, and half of the sample splits were

milled to pass through a 0.1 mm pore size sieve. The milled

samples of 0.2 g were digested to dryness using a 4-acid di-

gestion with10 ml HF, 5 ml HClO4,2.5mlHCl,and2.5mlHNO3,

then dissolved in 20% aqua regia and made up to 10 ml for ICP-

AES analysis for a total of 26 chemical elements. Certified

referencesampleswere used forquality control, and theerrors

were generally better than 5%.

It was found the Pb was one of the main pollutants in

Galway urban soils (Zhang, 2006). In this study, the chemical

element of Pb is chosen as an example for detailed investiga-

tion of the topic of pollution hotspot analyses.

2.3. Spatial cluster and spatial outlier analyses

Pollution hotspots can be clustered (spatial clusters) or exist

individually (spatial outliers). In this study, spatial clusters of 

pollution would be soil samples with a high Pb concentration

surrounded by other samples with a high concentration. In

contrast, spatial outliers of pollution would be samples with a

high Pb concentration surrounded by samples with normal or 

low values. They can be identified using the local Moran's I

index (Anselin, 1995; Getis and Ord, 1996; Levine, 2004):

Ii ¼zi À z

P

r2

Xn

 j¼1; j p i

wij z j À zP

À ÁÂ Ãð1Þ

where zi is the value of the variable z at location i; z – is the

average value of z with the sample number of n; z j is the value

of the variable z at all the other locations (where j≠ i); σ2 is the

variance of variable z; and wij is a weight which can be defined

as the inverse of the distance dij among locations i and j. The

weight wij can also be determined using a distance band:

samples within a distance band are given the same weight,

while those outside the distance band are given the weight

of 0.

A high positive local Moran's I value implies that the loca-

tion under study has similarly high or low values as its neigh-

bours, thus the locations are spatial clusters. Spatial clusters

include high–high clusters (high values in a high value neigh-

bourhood) and low–low clusters (low values in a low value

neighbourhood) (Fig. 2). In soil pollution, low–low clusters are

“cool spots”, while high–high spatial clusters can be regarded

as “regional hotspots”.

A high negative local Moran's I value means that the loca-

tion under study is a spatial outlier. Spatial outliers are those

values that are obviously different from the values of their 

surrounding locations (Lalor and Zhang, 2001). Spatial outliers

include high–low (a high value in a low value neighbourhood)

and low–high (a low value in a high value neighbourhood)

outliers (Fig. 1). In soil pollution, high–low spatial outliers can

be regarded as isolated “individual hotspots”.

Local Moran's I can be standardised so that its significance

level can be tested based on an assumption of a normal dis-

tribution (Anselin, 1995; Levine, 2004). However, since the

probability distribution of local Moran's I may not necessarily

be normal, especially when the raw data are heavily skewed, a

method called “conditional permutation” (Anselin, 1995) is

preferred as it makes no assumption about the data. Under a

conditional permutation, when thevalueon a location is being 

assessed, its value is fixed and all theother valuesare shuffled

randomly on all the other locations. Each time when the other 

values are shuffled, the local Moran's I index is calculated to

form a reference distribution. The significance level can be

1 In all maps in this study, the 500×500 m grid system of theGalway Street Map is adopted for easy reference of geographical

locations, e.g.,“

L11”

represents the location of the Lth row and11th column.

213S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 3: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 3/10

estimated by comparing the observed index with these simu-

lated values, and it is called “pseudo significance” (Anselin,

2005). The pseudo significance is computed as (M+1)/(R +1)

where R is the number of permutations and M is the number 

of instances where a statistic computed from the permuta-

tions is equal to or greater than the observed value (for posi-

tive local Moran' I index) or less or equal to the observed value

(for negative local Moran's I index) (Spatial Analysis Labora-

tory, 2007). For example, if the observed index is higher than

all the values of the reference distribution with 9999 permuta-

tions, the pseudo p value is (0+1)/(9999+1)=0.0001. In this

study, all the Local Moran's I indices were tested using 9999

permutations, and the significance level was chosen at b0.05.

The univariate Local Moran's I index was used for pollu-

tion hotspot identification in this study. When calculating the

index, distance bands were used to determine the weight

matrix(Anselin, 1995). It should be mentioned that besides the

method of distance band, an alternative way of calculatingthe

weight function is to use an inverse distance weighted func-

tion. This function is not yet available in thecurrent version of 

GeoDa, butthe developers are considering it in itsnext version

(Luc Anselin, personal communication). Meanwhile, the de-

veloper of another software package called CrimeStat (Levine,

2004) is considering providing the permutation function to-

gether with the inverse distance weights (Ned Levine, per-

sonal communication). It needs to be mentioned that even

though the inverse distance weight function is provided, the

decision for the power parameter also remains an important

decision for the calculation of the weight function.

2.4. Data transformation

To evaluate the effects of data transformation on the results

of hotspot identification, two data transformation methods

were considered: Box–Cox transformation and normal score

transformation.

The Box–Cox transformation is a power transformation

and is one of the most frequently used methods (Box and Cox,

1962; Jobson, 1991; Zhang and Zhang, 1996; Zhang et al., 1998).

The Box–Cox transformation is given by:

 y ¼xk À 1

kk p 0

ln xð Þ k ¼ 0

8<: ð2Þ

where y is the transformed value, and x is the value to betransformed. For a given data set (x1, x2, …, xn), the parameter 

λ is estimated based on the assumption that the transformed

values ( y1, y2, …, yn) are normally distributed. When λ=0, the

transformation becomes the logarithmic transformation.

For the normal score transformation, the raw data were

ranked in ascending order. The expected normal values

(normal scores) were calculated by taking the z-scores of 

cumulative probability, (i−0.375)/(n+0.25), where i is the rank

in increasing order, and n is the number of samples (Blom,

1958).

2.5. Data analyses using computer software

The calculation of spatial clusters and spatial outliers was

performed using the software GeoDa (version 0.95i, Spatial

Analysis Laboratory, 2007). All maps were produced using 

ArcView® (version 3.3) and ArcGIS® (version 9.2) software. The

basic GIS data were acquired from Ordinance Survey Ireland.

3. Results and discussion

3.1. Pb concentration in Galway

Table 1 provides information on Pb concentrations for Galway

City soils and its comparison with soils in Ireland. The median

values of Pb concentrations in the surface soils of Galway City

was 58 mg/kg (Zhang, 2006) which was much higher than the

median value of 24.8 mg/kg in either the soils of Ireland or the

Table 1 – Comparison between Pb concentrations in soils of Galway City and Ireland (in mg/kg)

N Min. 5% 10% 25% Median 75% 90% 95% Max.

Galway City soilsa 166 25 30 35 42 58 86 132 187 543

Soils of Irelandb 1310 1.1 11.7 13.6 18.2 24.8 33.5 48.0 61.9 2634.7

Mineral soils of Irelandb 977 4.8 12.4 14.3 18.8 24.8 33.3 47.8 61.0 550.9

a

Zhang (2006);b

Fay et al. (2007).

Fig. 1 –Sketch figure showing the relationship of a location

and its neighbourhood: a) and d) spatial cluster; b) and

c) spatial outlier; a) and b) hot spots; c) and d) cool spots.

214  S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 4: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 4/10

mineral soils of Ireland (Fay et al., 2007). In fact, it is worth

noting that the minimum value in Galway City (25 mg/kg) is

equivalent to the median value of Ireland. This trend is ap-

parent in all other percentiles of Pb concentrations in Galway

City except in the maximum value category. However, care

should be taken in interpreting this result since it is likely the

maximum value of the country was caused by mineralisation

(Fay et al., 2007).

The spatial distribution of Pb in the soils of Galway City

(Fig. 2) shows the influence of traffic pollution (Zhang, 2006)

and historical rubbish dumping (Carr et al., 2008). In the city

centre, there is a high value pattern of Pb in soils and there are

some high values along the traffic route of N59 left of River 

Corrib. There are also some relatively high values scattered

throughout the study area showing the complexity or spatial

heterogeneity of urban soils.

3.2. Effects of weight function on hotspot identification

To investigate the effects of different distance band on the

results, three distance bands were considered in this study:

1000 m, 2000 m and 5000 m. While calculating the index for 

each location, the weights for neighbouring locations were

assigned 1 if the distances were within the band, otherwise

the weights were 0. It needs to be mentioned that choiceof the

distance bands in this study was arbitrary as there is no spe-

cific criterion to determine the optimal distance band yet.

Generally speaking, they should not be shorter than the sam-

pling interval (e.g., about 500 m in this study), and not longer 

than half of the maximum distance between all the sample

pairs. The results are shown in Fig. 3.

It was found that results from the three distance bands

were different. When the distance band was short (1000 m,

Fig. 3a), the majority of samples were not significant. There

were only 9 high–high values clustered in the city centre. Only

one high–low outlier (Sample No. F19) was identified. There

were several low–high outliers in or near the city centre.

When the distance band increased to 2000 m (Fig. 3b), the

number of high–high values increased to 18, and there were

also some low–high spatial outliers in or near the city centre.

The number of high–low spatial outliers increased to 6 and

were located in the east part of the study area. In addition, a

low–low spatial cluster was observed in the eastern part of the

city.

Most of the samples in the western part of the city be-

came significant when the distance band increased to 5000 m

(Fig. 3c), while most samples in the east became insignificant.

High–high spatial clusters were located in the city centre

as well as in the west, together with many low–high spatial

outliers.

Despite the variations that emerged with different distance

bands, a pollution hotspot is visible in the city centre, where

soil samples with high concentrations of Pb are surrounded by

samples with similarly high Pb concentrations. Taking the

results of the distance band analysis into consideration with

the spatial distribution of Pb in Galway (Fig. 2), it seems that

the results from distance band of 2000 m are most reason-

able among the three chosen bands. Using this distance band

highlights the spatial clustering in the city centre, but is

inclusive enough to allow other general patterns to emerge.

For example, the east of the city is dominated by a low value

spatial cluster. However, there are some high–low spatial

outliers in the area. In the west, the values are moderate, with

a mixture of relatively high and low values, resulting in rela-

tive randomness in this area. Whileit is still hard to establisha

causal relationship between the pollution hotspots and the

Fig. 2 –Spatial distribution and point values of Pb concentrations in soils of Galway City.

215S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 5: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 5/10

influencing factors based on the spatial distribution patterns,

the most likely factor is pollution from traffic in the city centre

(Zhang, 2006).

3.3. Effects of data transformation on hotspot identification

Since calculation of local Moran's I index involves mean values

andvariance which are stronglyaffectedby positive skewness of 

data (with somevery high values), it is necessaryto considerdata

transformation. The popularly used effective transformation

methods are Box–Cox transformation and normal score trans-

formation. In this study, both transformations were applied to

Pb concentrations in Galway soils and local Moran's I indices

were calculated using a distance band of 2000 m (Figs. 4 and 5).

Results for the data transformed by the Box–Cox method

andnormal score methodwere almostthe same except for the

Fig. 3 –Spatial distribution map of significant hotspots and cool spots for raw data calculated using different distance bands:

a) 1000 m; b) 2000 m; and c) 5000 m.

216 S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 6: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 6/10

samples H12 and A3. The sample A3 was a low–low cluster 

value for the Box–Cox transformed data while not significant

for the normal score transformed data. The city centre and its

adjacent west part formed a clear high–high spatial pattern.

The east part was mainly a low–low cluster area with several

samples of high–low spatial outliers. Compared with the

results for raw data without a transformation, the high–high

spatial cluster covered a much larger area. For a positively

skewed data set like Pb concentrations in Galway soils, the

Box–Cox transformation and normal score transformation

Fig. 4 –Spatial distribution map of significant hotspots and cool spots for Box–Cox transformed data calculated using a distance

 band of 2000 m.

Fig. 3 (continued ).

217S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 7: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 7/10

yielded a lower mean value for the transformed data, and

resulted in relatively more values higher than the mean and

thus more high–high values of spatial clusters.

3.4. Effects of extreme values on hotspot identification

Extreme valueswere defined basedon a Box-and-Whiskers plot:

those which were higher than the 75th percentile plus 3 times

the inter-quartile range (difference between 75th and 25th

percentiles), which was 218 mg/kg (calculated based on results

of Table 1). A total of 7 samples (No. D10, D15, F11, H9, H12, H13,

L11) were identified as extreme values. To investigate the in-

fluence of extreme values on hotspot identification, they were

excluded from the data set for the calculation of Local Moran'sI

indices using a distance band of 2000 m ( Fig. 6). It needs to be

mentioned that this exercise assumes that the extreme values

are obvious hotspots which do not need a statistical test to

 justify, and the objective is to check if there are other hotspots

when the influence of these extreme values is removed.

Compared with the results when the extreme values were

included (Fig. 3b), a larger area of high–high spatial cluster was

found in the central-west part of the study area, while several

high–low spatial outliers were observed in the east part. The

removal of extreme (high) values in the data set reduced the

mean value, thus the number of values higher than the mean

increased, resultingin a largerareaof high–highspatialcluster.

3.5. Comparison among results from different data

treatments

For comparison of the results from the above differentways of 

data treatments, the number of significant and non-signifi-

cant spatial outliers and spatial clusters were summarised in

Table 2.

For the raw data, the number of spatial outliers and spatial

clusters increased as the distance band used for calculation

of the local Moran index increased, especially the high–high

spatial clusters and low–high spatial outliers. Taking the

spatial distribution and value point map of Pb concentrations

(Fig. 2) into consideration, it seems that too few high–high

spatial clusters were identified when the distance band was

1000 m, and too many high–high spatial clusters and low–high

spatial outliers were identified when the distance band was

5000 m. While it was hard to decide the best distance band to

use, this provides further justification for applying a 2000 m

distance band among the three distance bands tested in this

study.

The Box–Cox transformed data and normal score trans-

formed data showed almost the same results, implying that

more reliable results could be obtained via proper data trans-

formation. When the extreme values were removed, the re-

sults for the raw data were similar to those of the transformed

data.

The results of using local Moran's I for hotspot identifica-

tion were affected by all the factors considered in this study:

weight function, data transformation and extreme values.

While it is hard to judge the best way of using Local Moran's I,

it is suggested that all the above factors should be considered

until reasonable and stable results are achieved.

While there are some limitations associated withusing this

methodology for examining urban soil pollution, the benefits

far outweigh them if care is taken in the analysis of the data.

By taking the effects of data distribution, transformations

and weighting into consideration this research provides a

Fig. 5 –Spatial distribution map of significant hotspots and cool spots for normal score transformed data calculated using a

distance band of 2000 m.

218 S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 8: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 8/10

framework for future research and highlights the need to

incorporate these issues to ensure reasonable and reliable

results.

3.6. Comparison between original and re-sampled values

  for extreme values

Most of the above defined extreme values were identified as

either significant spatial clusters or significant high–low spa-

tial outliers ( pb0.05, Figs. 3–6), thusthey should be regarded as

hotspots. To confirm these hotspots, the samples were sent

back to the laboratory for re-analysis. It was found that the re-

analysed values were in line with the original values (Table 3).

The closest results were the values for Sample No. D15 with

the original and re-analysed values of 305 mg/kg and 302 mg/

kg, respectively. The largest difference was found for Sample

No. L11 and thus the re-analysis for this sample was carried

outtwice. Even for theworst case, the difference wasonly 5.7%

(calculated as (543−512)/543), demonstrating the high quality

control of laboratory analyses and confirming the original

results.

Further sampling was carried out for these confirmed

hotspots with 5 samples collected within 2–3 m of each of the

original sampling locations in May, 2005. In soil geochemical

studies, samples taken from 2–3 m apart can generally be

regarded as having been taken from the same site. Results for 

the re-collected samples are shown in Table 3.

Taking the median value of 58 mg/kg in the study area

into consideration, all the re-collected samples were obviously

higher than the median. The sample D15 showed very con-

sistent results implying relatively weakspatial variability of Pb

concentrations in that site. The sampling site of D15 is a park

in a residential area. It is close to a traffic route, but it was

found that soils under the grasses were consistently dark and

it was suspected that the imported soils were slightly con-

taminated by Pb which needs further investigation. The rela-

tively even spread of imported soils may be the main reason

causing its relatively weak spatial variability. Results for the

Table 2 – Comparison of numbers of significant and non-significant spatial outliers and spatial clusters (  p=0.05)

Data treatment⁎ Not significant High–high Low–low Low–high High–low Total

Raw data, d = 1000 m 129 9 20 7 1 166

Raw data, d = 2000 m 92 18 31 19 6 166

Raw data, d = 5000 m 65 34 3 63 1 166

Box–Cox transformed, d =2000 m 70 45 33 11 7 166

NScore transformed, d = 2000 m 72 44 32 11 7 166

Extreme values excluded, d = 2000 m 66 28 38 21 6 159

⁎“d” is the “distance band” used for calculation of Local Moran's I in software GeoDa.

Fig. 6 –Spatial distribution map of significant hotspots and cool spots for raw data with the removal of extreme values and

calculated using a distance band of 2000 m.

219S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 9: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 9/10

other samples were fairly variable, e.g., the values for D10

varied from 178 mg/kg to 611 mg/kg. The re-colleted samples

for sites F11and H12showed relatively lower valuesthan their 

original values, but they also provided evidence that these

sites were at least slightly contaminated compared with themedian values. Two samples of Sample No. L11 even demon-

strated Pb values higher than 1000 mg/kg. Such results led to

more intensive work on that site and the discovery of a

seriously contaminated historic landfill site in Galway City by

the research group (Carr et al., 2008).

4. Conclusions

This paper highlights the value of using Local Moran's I for the

identification of pollution hotspots in urban soils. However,

the results associated with the examination of Pb pollution

in the soils of Galway City illustrate some important issuesthat need to be taken into consideration when applying this

method. Specifically, the results were affectedby the existence

of extreme values for Pb concentrations in the soil, as well as

data transformations. Compared with the results for the posi-

tively skewed raw data, the transformed data and data with

extreme values excluded revealed a larger area for the high

value spatial clusters in the city centre.

This paper also highlights the influence of different dis-

tance bands on the distribution of significant spatial clusters

and outliers. Using GIS mapping to evaluate the results of the

analysis in the context of Pb distribution in Galway was an

important aid in determining which distance band to use.

Moreover, the researchers confirmed the pollution hotspotswith additional analysis and further sampling in the areas

identified.

Acknowledgements

This study was partly funded by Irish Environmental Protec-

tion Agency (EPA) and Teagasc under the National Develop-

ment Plan 2000–2006 (Project No. 2001-CD/S2-M2), and the

“985” Programme of China which provided funds to the Re-

search Institute for Southwestern China's Resources and En-

vironments of Sichuan University. Helpful discussions with

Prof. Luc Anselin and Dr. Ned Levine were acknowledged. The

authors are grateful to the two reviewers whose comments

and suggestions have improved the quality of this paper.

R E F E R E N C E S

Anselin L. Local indicators of spatial association — LISA. Geogr 

Anal 1995;27:93–115.Anselin L. Exploring spatial data with GeoDa™: a workbook.

Spatial Analysis Laboratory, Department of Geography.Urbana-Champaign, Urbana, IL: University of Illinois; 2005.p. 61801. 226.

Blom G. Statistical Estimates and Transformed Beta Variables.New York: John Wiley; 1958. 176 pp.

Box GEP, Cox DR. An analysis of transformations. J R Stat Soc Ser B1962;26(2):211–52.

Brody SD, Highfield WE, Thornton S. Planning at the urban fringe:an examination of the factors influencing nonconforming development patterns in southern Florida. Environ Plann BPlann Des 2006;33:75–96.

Carr R, Zhang CS, Moles N, Harder M. Identification and mapping of heavy metal pollution in soils of a sports ground in Galway

City, Ireland, using a portable XRF analyser and GIS. EnvironGeochem Health 2008;30(1):45–52.

Cliff AD, Ord JK. Spatial Processes, Models and Applications.London: Pion; 1981. 266 pp.

FayD, McGrath D,Zhang C,CarriggC, O'Flaherty V,CartonOT,et al.EPA Report: Toward a National Soil Database (2001-CD/S2-M2);2007. http://www.epa.ie/downloads/pubs/research/land/ (lastaccessed: March 2, 2008).

Getis A, Ord JK. The analysis of spatial association by use of distance statistics. Geogr Anal 1992;24:189–206.

Getis A, Ord JK. Local spatial statistics: an overview. In: Longley P,Batty M, editors. Spatial Analysis: Modelling in a GISEnvironment. Cambridge, England: GeoInformationInternational; 1996. p. 261–77.

Goovaerts P, Jacquez GM. Accounting for regional background and

population size in the detection of spatial clusters and outliersusing geostatistical filtering and spatial neutral models: thecase of lung cancer in Long Island, New York. Int J Health Geogr 2004;3:14.

Ishioka F, Kurihara K, Suito H, Horikawa Y, Ono Y. Detection of hotspots for three-dimensional spatial data and its applicationto environmental pollution data. J Environ Sci Sustain Soc2007;1:15–24.

 James WL, Cossman RE, Cossman JS, Campbell C, Blanchard T. Abrief visual primer forthe mapping of mortality trend data. IntJHealth Geogr 2004;3:7.

 Jobson JD. Applied Multivariate Data Analysis. Regression andExperimental DesignNew York: Springer-Verlag; 1991.

Lalor G, Zhang CS. Multivariate outlier detection and remediationin geochemical databases. Sci Total Environ 2001;281:99–109.

Levine N. CrimeStat III: a spatial statistics program forthe analysisof crime incident locations. Ned Levine & Associates, Houston,TX, and the National Institute of Justice, Washington, DC; 2004.November 2004.

McGrath D, Zhang CS. Spatial distribution of soil organic carbonconcentrations in grassland of Ireland. Appl Geochem2003;18:1629–39.

McLaughlin CC, Boscoe FP. Effects of randomization methods onstatistical inference in disease cluster detection. Health Place2007;13:152–63.

Odland J. Spatial Autocorrelation. California: Sage Publications;1988. 87 pp.

Ruiz MO, Tedesco C, McTighe TJ, Austin C, Kitron U.Environmental and socialdeterminants of human risk during aWest Nile virus outbreak in the greater Chicago area, 2002. Int JHealth Geogr 2004;3:8.

Table 3 – Comparison between the original results of extreme value hotspots and results for the re-analysedand re-collected samples near the original samplinglocations (Pb in mg/kg)

Sample no. D10 D15 F11 H9 H 12 H13 L11

Original result 236 305 237 531 255 378 543

Re-analysed results 231 302 230 538 260 388 512/531

5 re-collected

samplesa214 295 159 269 246 430 537

611 295 120 448 138 537 569

226 303 144 566 172 439 542

371 310 176 305 1 09 611 2768

178 299 213 255 80 371 1001

aSamples were re-collected within 2–3 m away from the original

sampling sites using a differential GPS for location.

220 S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1

Page 10: Zhang et al 2008

8/4/2019 Zhang et al 2008

http://slidepdf.com/reader/full/zhang-et-al-2008 10/10

Spatial Analysis Laboratory,. GeoDa: an introduction to spatialdata analyses. Spatial Analysis Laboratory Department of Geography. Urbana-Champaign, Urbana, IL: University of Illinois; 2007. p. 61801. Http: https://www.geoda.uiuc.edu/.(Last accessed: Oct. 2, 2007).

Tango T. A class of test for detecting  ‘general’ and ‘focused’clustering of rare diseases. Stat Med 1995;14:2323–34.

Zhang CS. Using multivariate analyses and GIS to identify

pollutants and their spatial patterns in urban soils in Galway,Ireland. Environ Pollut 2006;142(3):501–11.

Zhang CS, Zhang S. A robust-symmetric mean: a new wayof mean calculation for environmental data. GeoJournal1996;40(1-2):209–12.

Zhang CS, Selinus O. Statistics and GIS in environmentalgeochemistry — some problems and solutions. J GeochemExplor 1998;64:339–54.

Zhang CS, McGrath D. Geostatistical and GIS analyses onsoil organic carbon concentrations in grassland of southeastern Ireland from two different periods. Geoderma2004;119(3-4):261–75.

Zhang TL, Lin G. A supplemental indicator of high-value or 

low-value spatial clustering. Geogr Anal 2006;38:209–25.Zhang CS, Selinus O, Schedin J. Statistical analyses on heavy metal

contents in till and root samples in an area of southeasternSweden. Sci Total Environ 1998;212:217–32.

221S C I E N C E O F T H E T O T A L E N V I R O N M E N T 3 9 8 ( 2 0 0 8 ) 2 1 2 – 2 2 1