Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and...

42
Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1 / 34

Transcript of Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and...

Page 1: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Finite Mixture Models

Partha Deb

Hunter College and the Graduate Center, CUNYNBER

July 2008

Deb (Hunter College) FMM July 2008 1 / 34

Page 2: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Introduction

The �nite mixture model provides a natural representation ofheterogeneity in a �nite number of latent classes

It concerns modeling a statistical distribution by a mixture (orweighted sum) of other distributions

Finite mixture models are also known as

latent class modelsunsupervised learning models

Finite mixture models are closely related to

intrinsic classi�cation modelsclusteringnumerical taxonomy

Deb (Hunter College) FMM July 2008 2 / 34

Page 3: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Introduction

The �nite mixture model provides a natural representation ofheterogeneity in a �nite number of latent classes

It concerns modeling a statistical distribution by a mixture (orweighted sum) of other distributions

Finite mixture models are also known as

latent class modelsunsupervised learning models

Finite mixture models are closely related to

intrinsic classi�cation modelsclusteringnumerical taxonomy

Deb (Hunter College) FMM July 2008 2 / 34

Page 4: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionIntuition

Heterogeneity of e¤ects for di¤erent �classes�of observations

wine from di¤erent vineyardshealthy and sick individualsnormal and complicated pregnancies

Deb (Hunter College) FMM July 2008 3 / 34

Page 5: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionCanonical Example

Estimating parameters of the distribution of lengths of halibut

It is known that female halibut is longer, on average, than male �shand that the distribution of lengths is normal

Gender cannot be determined at measurement

Then distribution is a 2-component �nite mixture of normals

A �nite mixture model allows one to estimate:

mean lengths of male and female halibutmixing probability

Deb (Hunter College) FMM July 2008 4 / 34

Page 6: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionCanonical Example

Estimating parameters of the distribution of lengths of halibut

It is known that female halibut is longer, on average, than male �shand that the distribution of lengths is normal

Gender cannot be determined at measurement

Then distribution is a 2-component �nite mixture of normals

A �nite mixture model allows one to estimate:

mean lengths of male and female halibutmixing probability

Deb (Hunter College) FMM July 2008 4 / 34

Page 7: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionA graphical view

x

 N(0,1)  N(5,2) FM(N(0,1) wp 0.3, N(5,2) wp 0.7

­3 ­2 ­1 0 1 2 3 4 5 6 7 8 9 10 11 12

.00001

.398471

Deb (Hunter College) FMM July 2008 5 / 34

Page 8: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionA graphical view

x

 N(0,1)  N(5,2) FM(N(0,1) wp 0.2, N(5,2) wp 0.8

­3 ­2 ­1 0 1 2 3 4 5 6 7 8 9 10 11 12

.00001

.398471

Deb (Hunter College) FMM July 2008 6 / 34

Page 9: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionA graphical view

x

 N(0,1)  N(5,2) FM(N(0,1) wp 0.1, N(5,2) wp 0.9

­3 ­2 ­1 0 1 2 3 4 5 6 7 8 9 10 11 12

.00001

.398471

Deb (Hunter College) FMM July 2008 7 / 34

Page 10: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionExamples

Characteristics of wine by cultivar

Infant Birthweight - two types of pregnancies �normal� and�complicated�

Medical Services - two types of consumers �healthy�and �sick�

Public goods experiments - sel�sh, reciprocal, and altruist

Stock Returns in �typical� and �crisis� times

Using somatic cell counts to classify records from healthy or infectedgoats

Models of internet tra¢ c

Deb (Hunter College) FMM July 2008 8 / 34

Page 11: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

IntroductionMore generally from a statistical perspective

FMM is a semiparametric / nonparametric estimator of the density(Lindsay)

Experience suggests that usually only few latent classes are needed toapproximate density well (Heckman)

In practice FMM are �exible extensions to basic parametric models

can generate skewed distributions from symmetric componentscan generate leptokurtic distributions from mesokurtic ones

Deb (Hunter College) FMM July 2008 9 / 34

Page 12: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Outline of talk

Introduction

Model

FormulationEstimationPopular densitiesProperties

Examples

Color of wineBirthweight and prenatal careMedical care

Deb (Hunter College) FMM July 2008 10 / 34

Page 13: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelFormulation

The density function for a C -component �nite mixture is

f (y jx; θ1, θ2, ..., θC ;π1,π2, ...,πC ) =C

∑j=1

πj fj (y jx; θj )

where 0 < πj < 1, and ∑Cj=1 πj = 1

More generally

f (y jx; z; θ1, θ2, ..., θC ;π1,π2, ...,πC ) =C

∑j=1

πj (z)fj (y jx; θj )

Deb (Hunter College) FMM July 2008 11 / 34

Page 14: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelFormulation

The density function for a C -component �nite mixture is

f (y jx; θ1, θ2, ..., θC ;π1,π2, ...,πC ) =C

∑j=1

πj fj (y jx; θj )

where 0 < πj < 1, and ∑Cj=1 πj = 1

More generally

f (y jx; z; θ1, θ2, ..., θC ;π1,π2, ...,πC ) =C

∑j=1

πj (z)fj (y jx; θj )

Deb (Hunter College) FMM July 2008 11 / 34

Page 15: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelEstimation

Maximum likelihood

maxπ,θ

ln L =N

∑i=1

log(

C

∑j=1

πj fj (y jθj )!

Trick to ensure 0 < πj < 1, and ∑Cj=1 πj = 1

πj =exp(γj )

exp(γ1) + exp(γ2) + ...+ exp(γC�1) + 1

EM

Bayesian MCMC

Deb (Hunter College) FMM July 2008 12 / 34

Page 16: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelEstimation

Maximum likelihood

maxπ,θ

ln L =N

∑i=1

log(

C

∑j=1

πj fj (y jθj )!

Trick to ensure 0 < πj < 1, and ∑Cj=1 πj = 1

πj =exp(γj )

exp(γ1) + exp(γ2) + ...+ exp(γC�1) + 1

EM

Bayesian MCMC

Deb (Hunter College) FMM July 2008 12 / 34

Page 17: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelEstimation

Maximum likelihood

maxπ,θ

ln L =N

∑i=1

log(

C

∑j=1

πj fj (y jθj )!

Trick to ensure 0 < πj < 1, and ∑Cj=1 πj = 1

πj =exp(γj )

exp(γ1) + exp(γ2) + ...+ exp(γC�1) + 1

EM

Bayesian MCMC

Deb (Hunter College) FMM July 2008 12 / 34

Page 18: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelPopular mixture component densities

Normal (Gaussian)

Poisson

Gamma

Negative Binomial

Student-t

Weibull

Deb (Hunter College) FMM July 2008 13 / 34

Page 19: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelSome basic properties

Conditional mean:

E(yi jxi ) =C

∑j=1

πjλj where λj = Ej (yi jxi )

Marginal e¤ects:

∂Ej (yi jxi )∂xi

=∂λj∂xi

�! within component

∂E(yi jxi )∂xi

=C

∑j=1

πj∂λj∂xi

�! overall

Deb (Hunter College) FMM July 2008 14 / 34

Page 20: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelSome basic properties

Conditional mean:

E(yi jxi ) =C

∑j=1

πjλj where λj = Ej (yi jxi )

Marginal e¤ects:

∂Ej (yi jxi )∂xi

=∂λj∂xi

�! within component

∂E(yi jxi )∂xi

=C

∑j=1

πj∂λj∂xi

�! overall

Deb (Hunter College) FMM July 2008 14 / 34

Page 21: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelSome basic properties

Prior probability that observation yi belongs to component c :

Pr[yi 2 population c jxi , θ] = πc

c = 1, 2, ..C

Posterior probability that observation yi belongs to component c :

Pr[yi 2 population c jxi , yi ;θ] =πc fc (yi jxi , θc )

∑Cj=1 πj fj (yi jxi , θj )

c = 1, 2, ..C

Deb (Hunter College) FMM July 2008 15 / 34

Page 22: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelSome basic properties

Prior probability that observation yi belongs to component c :

Pr[yi 2 population c jxi , θ] = πc

c = 1, 2, ..C

Posterior probability that observation yi belongs to component c :

Pr[yi 2 population c jxi , yi ;θ] =πc fc (yi jxi , θc )

∑Cj=1 πj fj (yi jxi , θj )

c = 1, 2, ..C

Deb (Hunter College) FMM July 2008 15 / 34

Page 23: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelEstimation challenges

The number of components has to be speci�ed - we usually have littletheoretical guidance

Even if prior theory suggests a particular number of components wemay not be able to reliably distinguish between some of thecomponents

In some cases additional components may simply re�ect the presenceof outliers in the data

Likelihood function may have multiple local maxima

Deb (Hunter College) FMM July 2008 16 / 34

Page 24: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelExtending the model

Parameterize γj = Zαj in

πj =exp(γj )

exp(γ1) + exp(γ2) + ...+ exp(γC�1) + 1

Parameterizing mixing probabilities

may lead to �nite sample identi�cation issuesmay lead to computational di¢ culties

Deb (Hunter College) FMM July 2008 17 / 34

Page 25: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelSelecting number of components

Estimate models with 2 and then more components

At each step calculate

AIC = �2 log(L) + 2KBIC = �2 log(L) +K log(N)

Pick the model with the smallest AIC , BIC

Deb (Hunter College) FMM July 2008 18 / 34

Page 26: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

ModelImplementation in Stata

Stata package fmm

fmm depvar [indepvars] [if] [in] [weight],components(#) mixtureof(density)

where density is one of

gammanegbin1negbin2normalpoissonstudentt

predict and mfx give predictions and marginal e¤ects of means,component means, prior and posterior probabilitiesDeb (Hunter College) FMM July 2008 19 / 34

Page 27: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 1Color of Wine

Results of a chemical analysis of wines grown in the same region in Italybut derived from three di¤erent cultivars

Cultivar Freq. % of total Color intensity (mean)1 59 33.15 5.5282 71 39.89 3.0863 48 26.97 7.396Total 178 100 5.058

Deb (Hunter College) FMM July 2008 20 / 34

Page 28: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 1Color of Wine

0.0

5.1

.15

.2D

ensi

ty

0 5 10 15Color

kernel = epanechnikov, bandwidth = 0.7076

Wine Color Density

Deb (Hunter College) FMM July 2008 21 / 34

Page 29: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 1Color of Wine

Finite mixture of Normals with 3 components

f (yi jθ1, θ2, ..., θC ;π1,π2, ...,πC )

=C

∑j=1

πj1q2πσ2j

exp

� 12σ2j

(yi � xi βj )2!

Deb (Hunter College) FMM July 2008 22 / 34

Page 30: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 1Color of Wine

Parameter component 1 component 2 component 3Constant 4.929 7.548 2.803

(0.334) (0.936) (0.244)π 0.365 0.312 0.323

(0.176) (0.117) (0.107)

Cultivar Freq. % of total Color (mean)1 59 33.15 5.5282 71 39.89 3.0863 48 26.97 7.396Total 178 100 5.058

Deb (Hunter College) FMM July 2008 23 / 34

Page 31: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 1Color of Wine

Parameter component 1 component 2 component 3Constant 4.929 7.548 2.803

(0.334) (0.936) (0.244)π 0.365 0.312 0.323

(0.176) (0.117) (0.107)

Cultivar Freq. % of total Color (mean)1 59 33.15 5.5282 71 39.89 3.0863 48 26.97 7.396Total 178 100 5.058

Deb (Hunter College) FMM July 2008 23 / 34

Page 32: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 1Color of Wine

Posterior probability (median)Cultivar component 1 component 2 component 31 0.737 0.195 9.00e-52 0.048 0.023 0.9233 0.030 0.970 7.54e-14

Deb (Hunter College) FMM July 2008 24 / 34

Page 33: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 2Infant Birthweight and Prenatal Care

Data from the National Maternal and Infant Health Survey

Number of observations: 5219

Number of covariates: 12

Deb (Hunter College) FMM July 2008 25 / 34

Page 34: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 2Infant Birthweight and Prenatal Care

0.0

2.0

4.0

6.0

8D

ensi

ty

­40 ­20 0 20 40Residuals

kernel = epanechnikov, bandwidth = 1.1307

Density of birthweight residuals

Deb (Hunter College) FMM July 2008 26 / 34

Page 35: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 2Infant Birthweight and Prenatal Care

Variable OLS FMMcomponent 1 component 2

black -1.213** -1.231** -0.775*(0.312) (0.215) (0.393)

edu 0.353** 0.292** 0.040(0.074) (0.050) (0.102)

numdead -1.181** -0.170 -0.585**(0.163) (0.117) (0.171)

onsethat -0.501** -0.294* 0.006(0.183) (0.127) (0.234)

π 0.864 0.136se(π) (0.005) (0.005)

Deb (Hunter College) FMM July 2008 27 / 34

Page 36: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 3Medical Care Use

Data from the RAND Health Insurance Experiment

Number of observations: 20186

Number of covariates: 17

Deb (Hunter College) FMM July 2008 28 / 34

Page 37: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 3Medical Care Use

010

2030

Perc

ent

0 20 40 60 80Number face­to­fact MD v is its

010

2030

Perc

ent

0 5 10 15 20Number face­to­fact MD v is its

Deb (Hunter College) FMM July 2008 29 / 34

Page 38: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 3Medical Care Use

The density of the C -component �nite mixture is speci�ed as

f (yi jθ1, θ2, ..., θC ;π1,π2, ...,πC )

=C

∑j=1

πjΓ(yi + ψj ,i )

Γ(ψj ,i )Γ(yi + 1)

ψj ,i

λj ,i + ψj ,i

!ψc ,i

λj ,iλj ,i + ψj ,i

!yi

where λ = exp(xβ) and ψ = (1/α)λk

k = 1 NB-2

k = 0 NB-1

k = 0 �ts best

Deb (Hunter College) FMM July 2008 30 / 34

Page 39: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 3Medical Care Use

Parameter Estimatesnb1 fmm nb1

component 1 component 2logc -0.149* -0.203* -0.024

(0.012) (0.020) (0.031)educdec 0.023* 0.027* 0.015

(0.003) (0.005) (0.010)disea 0.021* 0.019* 0.033*

(0.001) (0.002) (0.004)π 0.802 0.198

(0.037) (0.037)

log L �42405 �42037BIC 84999 84461

Deb (Hunter College) FMM July 2008 31 / 34

Page 40: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 3Medical Care Use

Marginal E¤ectsnb1 fmm nb1overall overall component 1 component 2

E (y jx) 2.561 2.511 1.887 5.038

logc -0.382* -0.331* -0.382* -0.121(0.030) (0.032) (0.032) (0.158)

educdec 0.058* 0.056* 0.052* 0.073(0.007) (0.009) (0.008) (0.053)

disea 0.054* 0.062* 0.036* 0.167*(0.003) (0.004) (0.004) (0.024)

Deb (Hunter College) FMM July 2008 32 / 34

Page 41: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 3Medical Care Use

Deb (Hunter College) FMM July 2008 33 / 34

Page 42: Finite Mixture Models - Stata · 2008-09-03 · Finite Mixture Models Partha Deb Hunter College and the Graduate Center, CUNY NBER July 2008 Deb (Hunter College) FMM July 2008 1

Example 3Medical Care Use

Prior and posterior probabilities0

510

15de

nsity

0 .2 .4 .6 .8 1probability

prior posterior

Deb (Hunter College) FMM July 2008 34 / 34