Part 6 QTL Analysis - USP€¦ · BC QTL MAPPING Single Marker Analysis; Example with Backcross...
Transcript of Part 6 QTL Analysis - USP€¦ · BC QTL MAPPING Single Marker Analysis; Example with Backcross...
QTL ANALYSIS
METHODS FOR MAPPING QTL
ð Single Marker Analysis
ð Interval Mapping
ð Composite Interval Mapping
ð Bayesian Methods
QTL MAPPING
ð Methods based on linkage disequilibrium between markers
and QTL (line crossing or segregating population)
ð Requirements:
� Linkage (marker) maps
� Variation for the quantitative trait
M1 M2 M3 Mk-1 Mk
…r1 r2 r3 r(k-2) r(k-1)
QTL ?
BC
QTL MAPPING Single Marker Analysis; Example with Backcross
× Purebreds, lines
80 40 F1
65
×
68 55 57 61 59
QTL MAPPING Single Marker Analysis; Example with Backcross
65 68 55 57 61 59
Marker
59 61 55 57 68 65
Genotype 65
60
55
70
QTL MAPPING Single Marker Analysis; Example with Backcross
65 68 55 57 61 59
Marker
61 59 55 68 57 65
Genotype 65
60
55
70
SINGLE MARKER ANALYSIS
C Simple example with candidate gene and backcross population
Q1Q1 Q2Q2
Q1Q2 Q1Q1
Q1Q2 Q1Q1 δ
µ1
µ2
Q1Q2 Q1Q1
Genotype Obs. Mean STD Q1Q1 n1 m1 s1
Q1Q2 n2 m2 s2
ð H0: δ = 0 vs H1: δ ≠ 0
)2(
21
2
2121
~11
−+
⎟⎟⎠
⎞⎜⎜⎝
⎛+
−= nnt
nns
mmt
2nns)1n(s)1n(
s21
222
2112
−+
−+−=
2)( :)]1( ;[
21
2
)2/;2(12 21 −+±−− −+ nn
stmmCI nn ααδ
SINGLE MARKER ANALYSIS
µ3
µ2
µ = (µ1 + µ3)/2
GG Gg gg
QTL genotypes
y α
α
τ
µ1 Additive
Dominance
SINGLE MARKER ANALYSIS
C QTL and marker (M); recombination frequency = r
M1 M1 Q1 Q1
M1 M2 Q1 Q2
M1 M1 Q1 Q1
M1 M2 Q1 Q2
M1 M2 Q1 Q1
M1 M1 Q1 Q2
Genotype Freq. E[y] Marker group Freq. E[y] M1M1Q1Q1 (1-r)/2 µ1 M1M1 ½ M1M1Q1Q2 r/2 µ2 M1M2Q1Q1 r/2 µ1 M1M2 ½ M1M2Q1Q2 (1-r)/2 µ2
21 )1( µµ rr −+
21)1( µµ rr +−
Difference between marker group expected values
2121 )1()1( µµµµ rrrr −−−−+
δµµ )21())(21( 12 rr −=−−=
SINGLE MARKER ANALYSIS
(EXAMPLE)
ð Brassica napus; Flowering time ð 10 Markers
(positions: 0, 8.8, 20.6, 27.4, 34.2, 42.9, 53.6, 64.1, 69.2, 83.9 cM)
ð 104 individuals; Double haploid
3.0204 -1 -1 -1 -1 -1 -1 -1 -1 -99 -1
2.9704 -1 -1 -1 -1 -99 -1 -1 -1 -1 1
2.7408 -1 -1 1 1 1 1 1 1 1 1
3.3673 1 1 1 1 -1 -1 -1 -1 -1 1
3.0681 1 1 1 1 -99 1 1 1 -1 -1
3.2771 -1 -99 -1 -1 -1 -1 -1 -1 -1 -1
(Satagopan et al. Genetics 144: 805-816, 1996)
!
Chrom. Marker µ τ LRT F p-value!
!
1 1 3.184 -0.202 9.379 9.624 0.002 **!
1 2 3.204 -0.230 11.378 11.789 0.001 ***!
1 3 3.232 -0.266 14.706 15.485 0.000 ***!
1 4 3.229 -0.259 13.885 14.562 0.000 ***!
1 5 3.240 -0.276 15.554 16.446 0.000 ****!
1 6 3.259 -0.307 19.518 21.041 0.000 ****!
1 7 3.252 -0.302 19.747 21.312 0.000 ****!
1 8 3.257 -0.318 23.450 25.775 0.000 ****!
1 9 3.258 -0.330 25.156 27.884 0.000 ****!
1 10 3.252 -0.362 31.518 36.059 0.000 ****
0
5
10
15
20
25
30
35
40
0 10 20 30 40 50 60 70 80 90Position (cM)
F va
lues
INTERVAL MAPPING
M QTL N
r1 r2
r
(Lander & Botstein, 1989)
M m N n
Backcross
M m Q q N n
m m q q n n
m m n n
M m n n
m m N n
δ µ
QQ Qq
iii qy εδµ ++=
phenotype QTL genotype
residual
0 , if qq 1 , if Qq qi =
INTERVAL MAPPING
),0(~ 2σε Ni ) ,(~| 2σδµ iii qNqy +If
⎭⎬⎫
⎩⎨⎧ −−−= 2
22)(
21exp
21)|( δµ
σπσiiii qyqyp
[ ]∏=
==+==∝N
iiiiiii qqyfqqyfL
1
2 )1Pr()1|()0Pr()0|()|,,,,( yqλσδµ
∏=⎢⎣
⎡=
⎭⎬⎫
⎩⎨⎧ −−∝
N
iii qyL
1
222
2 )|0Pr()(21exp1)|,,,,( λµσσ
λσδµ yq
⎥⎦
⎤=
⎭⎬⎫
⎩⎨⎧ −−−+ )|1Pr()(21exp1 222
λδµσσ
ii qyQTL position
INTERVAL MAPPING
Marker Genotypes Pr(qi = QQ) Pr(qi = Qq)
M,N (1 - r1)(1 - r2)/(1 - r) r1 r2 /(1 – r)
M,n (1 - r1) r2 / r r1 (1 - r2 )/ r
m,N r1 (1 - r2 )/ r (1 - r1) r2 / r
m,n r1 r2 /(1 - r) (1 - r1)(1 - r2 )/(1 - r)
)|Pr( λiq is modeled in terms of recombinations between flanking markers and QTL:
Markers Pr(qi = QQ) Pr(qi = Qq)
M,N 1 0
M,n (1 - p) p
m,N p (1 - p)
m,n 0 1
Approximation: (no double recombination)
rrp 1=
INTERVAL MAPPING
ð Likelihood estimation: EM algorithm to estimate parameters, including λ (position of QTL).
ð Alternatively: Fix λ (grid search) and evaluate LOD.
⎥⎦
⎤⎢⎣
⎡
==
)0,|ˆ,ˆ,ˆ(L)|ˆ,ˆ,ˆ,ˆ(LlogLOD 2
2
10 δσµσδµ
λ yqyq
C A QTL is detected whenever the LOD score gets larger than a threshold; estimated position of the QTL maximizes LOD.
INTERVAL MAPPING
REGRESSION APPROACH (Haley & Knott, 1992)
εXβy +=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
+⎥⎦
⎤⎢⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
NNNN pp
pppp
y
yy
ε
ε
ε
µ
µ
2
1
2
1
21
2221
1211
2
1⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
+⎥⎦
⎤⎢⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
NNN p
pp
y
yy
ε
ε
ε
δ
µ
2
1
2
22
12
2
1
1
11
yXXXβ ')'(ˆ 1−=
yXβyy ''ˆ'RSS −=
Residual Sum of Squares:
Estimated position of the QTL minimizes RSS.
alternatively
GENE MAPPING Interval Mapping; Example with Backcross 65 68 55 57 61 59
Chromosome, marker positions (cM)
Test
sta
tistic
s (e
vide
nce
for Q
TL)
M1 M2 M3 M4 M5 M6
INTERVAL MAPPING
ð COMMENTS:
� Backcross to both parental lines, or use F2 design, to estimate additive and dominance effects.
� Threshold; multiple testing; false positives
� Confidence intervals
� Multiple QTL, ghost QTL
COMPOSITE INTERVAL MAPPING
(Zeng, 1993, 1994)
ð Interval analysis adding marker cofactors (to account for the effects of unlinked QTLs); combination of single interval mapping and multiple linear regression.
Mj
QTL
λ
Mj-1 Mj+1 Mj+2
Flanking markers
Cofactors Cofactors
COMPOSITE INTERVAL MAPPING
(Zeng, 1993, 1994)
ijjk
ikkiji wxy εβββ +++= ∑+≠ 1,
*0
Intercept Genetic effect of the putative QTL
(between markers j and j+1)
Dummy variables
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
NpNNj
pj
pj
wwx
wwxwwx
1
2212
1111
1
11
X
εXβy +=
yXXXβ ')'(ˆ 1−=
(EXAMPLE)
ð Brassica napus; Flowering time (Satagopan et al., 1996)
INTERVAL MAPPING
0
5
10
15
20
25
30
35
40
0 10 20 30 40 50 60 70 80 90Position (cM)
LRT
(EXAMPLE)
ð Brassica napus; Flowering time (Satagopan et al., 1996)
COMPOSITE INTERVAL MAPPING
0
2
4
6
8
10
12
14
16
18
0 10 20 30 40 50 60 70 80 90Position (cM)
LRT
EXAMPLES
MARKER ASSISTED SELECTION
ð MAS: Use of genetic markers to improve the efficiency of genetic selection
ð Basic idea behind of MAS:
MARKER ASSISTED SELECTION
• Most traits of economic importance are controlled by a fairly large number of genes
• Some of these genes, however, with larger effect
• Following the pattern of inheritance of such genes might assist in selection
Phenotype Genotype
(Kinghorn &
van der Werf, 2000)
BASIC IDEA BEHIND MAS
ð MAS can potentially improve genetic gain by: 1) Increasing accuracy of genetic prediction, 2) Increasing selection intensity, and 3) Decreasing generation interval.
TYPES OF GENETIC MARKERS
Direct Markers: loci that code for the functional mutation
LD Markers: loci that are in population-wide LD with the functional mutation
LE Markers: loci that are in population LE with the mutation
M QTL
r = 0
Direct Markers:
(e.g. Halothane gene in pigs, double muscling gene in cattle)
M QTL
r
Indirect Markers:
INDIRECT MARKERS
Let: Marker M (alleles M1 and M2) and QTL Q (alleles Q1 and Q2)
Recombination rate: rMQ
Allelic frequencies: Pr(M1) = p1; Pr(Q1) = q1
If linkage equilibrium:
Pr(M1Q1) = p1q1
Pr(M1Q2) = p1(1 - q1)
Pr(Q1|M1) = q1
Pr(Q2 |M1) = (1 - q1)
MAS MAY HELP IMPROVE
� Low heritability traits
� Phenotypes that can be measured on one sex only
� Characteristics that are not measurable before sexual maturity
� Traits that are difficult to measured or require sacrifice
� Size (effect) of QTL
� Frequency of favorable allele
� Recombination rate between marker(s) and QTL
EFFICIENCY OF MAS
Lir
Yr/BV BVEBV;TBV σ××=Δ
interval GenerationVariationIntensityAccuracyYr/BV ××
=Δ
GENETIC GAIN; THE KEY EQUATION
ð MAS can improve genetic gain by: 1) increasing accuracy of genetic prediction, 2) increasing selection intensity, and 3) decreasing generation interval.
LONG TERM MAS
ð Selection Index that combines the EBV of the QTL with the EBV for polygenes :
)q()u(
uqI +=
uqbI +=
(Soller, 1978)
(Dekkers and van Arendonk, 1998)
Phe
noty
pe
Generation
MAS
Polygenes
εβ +++= ZuWqXy
),0(~ 2uANu σ
phenotype
fixed effects (environmental)
QTL effects
Polygenic effects
residual ),0(~ 2
εσε IN
MODELING EFFECTS AT THE QTL GENOTYPE
ð Two-step iterative scheme:
� Calculating QTL genotype probabilities using
segregation analysis
� Regressing phenotypes on these probabilities
(Kinghorn et al., 1993), or carrying out regression
weighted by these probabilities (Meuwissen and
Goddard, 1997)
QTL-GENOTYPE AS A FIXED EFFECT
ð Useful when limited number of genotypic effects at the QTL (i.e. limited number of alleles, and effect of different genotypes are equal across families/herds)
ð Easily accommodates dominance at the QTL (and epistasis if more than one QTL)
ð Incidence matrix W: genotype probabilities at the QTL
ð With QTL genotype as fixed effect: E[Wq] = Wq and Var[Wq] = 0
CONSIDERATIONS
(Fernando and Grossman, 1989)
The effect of a QTL is modeled as the sum of the two gametic effects:
εβ +++= ZuWvXy
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
=⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
2
2
2
000000
εσ
σ
σ
ε IA
Guv
Var u
vv
Dimension: 2n (for each animal: paternal and maternal gametic effects)
Gametic relationship matrix
QTL-GENOTYPE AS A RANDOM EFFECT
This approach can be used to calculate animal EBV’s at the QTL, much as we use coefficients of
relationship to estimate ‘polygenic’ breeding values
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
=⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
+
+−
−
yWyZyX
vu
GWWZWXWWZAZZXZWXZXXX
vv
u
'''
'''''''''
1
1
β
α
α
MIXED MODEL EQUATIONS
ð Gives the probabilities of identity between each of the two alleles in each individual
ð Example with and without marker information:
105.5.00601005.5.55.0100045.00100305.0010205.00011654321Site
12 34
56
QTL
109.1.00601001.9.59.0100041.00100301.0010209.00011654321Site
12 AB
34 AC
56 AC
QTL
Marker (alleles A, B, C)
r = 0.1
GAMETIC RELATIONSHIP MATRIX
Conditions that favor QTL detection F High heritability, i.e. small environmental influence F Genetic differences caused by few loci F Large family sizes F Large data base of phenotypic information
When is MAS most useful as an adjunct to conventional selection for polygenic traits? F Traits with low heritability F Sex limited traits (expressed in one sex) F Phenotypes costly to measure F Phenotypes expressed late in life F Phenotypes cannot be measured in breeding
animals, e.g. carcass traits
Notice: Conditions that optimize MAS are often opposite to those that favor QTL detection.
POTENTIAL PROBLEMS IN MAS
ð Favorable QTL allele(s) fixed/absent for a specific herd/commercial population
ð QTL effect may depend on genetic background (in general, QTL have stronger effects in the genetic background in which they were detected)
ð QTL with unfavorable effect(s) on other trait(s)
ð Cost, etc.
(QTL detected in resource population)