TV-slant presentatie_politicologen_etmaal

32
Political slant in public broadcasting 9 June 2011 Politicologenetmaal, Amsterdam Bart de Goede Maarten Marx woensdag 8 juni 2011 (week )

description

 

Transcript of TV-slant presentatie_politicologen_etmaal

Page 1: TV-slant presentatie_politicologen_etmaal

Political slant in public broadcasting

9 June 2011Politicologenetmaal, Amsterdam

Bart de Goede Maarten Marx

woensdag 8 juni 2011 (week )

Page 2: TV-slant presentatie_politicologen_etmaal

Research aim

• Frivolous research for a bachelor thesis

• Research aim: Apply methodology Gentzkow & Shapiro (2010) to Dutch situation, perhaps improve using NLP

• Future applications:

• Analysis of Dutch media landscape (NewsMonitor)

• Agendasetting and framing research (Timmermans, Breeman)

• Parliament and media: lag or lead? (Vliegenthart)

woensdag 8 juni 2011 (week )

Page 3: TV-slant presentatie_politicologen_etmaal

Disclaimer

• We are information scientists, not political scientists

• We might have made awful conceptual mistakes

• We will have missed almost all important references

woensdag 8 juni 2011 (week )

Page 4: TV-slant presentatie_politicologen_etmaal

Disclaimer

• Our aim is to show a powerful technique

• We concentrated on getting the data ‘in shape’, rather than interpretation of results

woensdag 8 juni 2011 (week )

Page 5: TV-slant presentatie_politicologen_etmaal

Talk outline

1. Research plan and methodology

2. Description of our research

3. Results

4. What’s next?

woensdag 8 juni 2011 (week )

Page 6: TV-slant presentatie_politicologen_etmaal

Gentzkow & Shapiro

• Econometrical research: compare language of news outlets to political language

• ‘An economically significant demand for news slanted towards one’s own political ideology’

Gentzkow, M. and Shapiro, J. M. (2010). What drives media slant? Evi-dence from U.S. daily newspapers. Econometrica, 78(1):35–71.

woensdag 8 juni 2011 (week )

Page 7: TV-slant presentatie_politicologen_etmaal

Gentzkow & Shapiro

• Find characteristic words and phrases of Democrats and Republicans in Hansards (‘death tax’ versus ‘estate tax’)

• Count relative frequencies of these words in newspapers

• Score newspapers on ‘political slant’ by comparing frequencies of Democratic and Republican words

• ... (even more, but not relevant to us)

Operationalization

woensdag 8 juni 2011 (week )

Page 8: TV-slant presentatie_politicologen_etmaal

Our research

• Dutch versus English: compound words, unigrams instead of bigrams

• Television data instead of newspapers

• Far more political parties

• Other, more powerful technique for finding characteristic words

Reproduce, with some alterations

woensdag 8 juni 2011 (week )

Page 9: TV-slant presentatie_politicologen_etmaal

Our research

1. Collecting TV data

2. Selecting appropriate broadcasts

3. Defining political groups

4. Obtaining data for each group

5. Obtaining characteristic words

6. Compare word use in political groups and TV broadcasts

An outline

woensdag 8 juni 2011 (week )

Page 10: TV-slant presentatie_politicologen_etmaal

TV Data

• Subtitles for the hearing impaired (http://tt888.nl)

• Complete data from January 2008 till February 2011

• Problem: hardly any useful metadata (63% only has date and time of broadcast)

woensdag 8 juni 2011 (week )

Page 11: TV-slant presentatie_politicologen_etmaal

TV Data

• TV guide

• Used http://tv2day.nl to combine broadcast time with (unambiguous) program title

Solution Before After

Programme with title

Unique titles

Single events

Broadcast frequency

> 2

16.995 32.491

4.560 -> 2.702

2.238

1.598 1.174

1.104 1.064

woensdag 8 juni 2011 (week )

Page 12: TV-slant presentatie_politicologen_etmaal

Selected broadcasts

Pauw & Witteman895.935 words

Nova362.844 words

Nos Journaal12.609.620 words

NOS Jeugdjournaal1.383.728 words

Netwerk879.635 words

Goedemorgen Nederland760.658 words

EenVandaag1.556.642 words

DWDD1.626.929 words

Buitenhof DWDDEenVandaag Goedemorgen NederlandHet Elfde Uur Holland DocKnevel en Van den Brink NetwerkNieuwsuur NOS JeugdjournaalNos Journaal NovaOchtendspits Pauw & WittemanPowNews SchoolTV WeekjournaalSinterklaasjournaal TegenlichtUitgesproken VragenuurtjeZembla

woensdag 8 juni 2011 (week )

Page 13: TV-slant presentatie_politicologen_etmaal

Political groups

• Parliamentary period with greatest overlap on TV data set:Balkenende IV

• Experiments with e.g. Wordfish have shown that text comparisons mostly measure government - opposition, not left - right (Hirst et al., 2010)

Hirst, G., Riabinin, Y., Graham, J., and Boizot-Roche, M. Text to Ideology

or Text to Party Status?

woensdag 8 juni 2011 (week )

Page 14: TV-slant presentatie_politicologen_etmaal

Political groups

• Therefore, we choose:

• Government (CDA, PvdA and ChristenUnie)

• Left wing opposition (GroenLinks, SP)

• Right wing opposition (PVV, VVD)

woensdag 8 juni 2011 (week )

Page 15: TV-slant presentatie_politicologen_etmaal

ObtainingProceedings data

$collection//HAN1995//root[date restriction]//speech[@party matches(party names)]/p/text()

Trivial, using the PoliticalMashup database

Explain query:HAN1995: all Hansards since 1995woensdag 8 juni 2011 (week )

Page 16: TV-slant presentatie_politicologen_etmaal

Characteristicwords

• Transform word frequency counts into probability distributions of words (maximum likelyhood estimation)

• Compare distributions of subsets to distribution of all words

• Choose words from subset whose frequency is much higher than expected

• Adjust probabilities

• Iterate to convergence

Parsimonious language model

et = tf(t,D) · λ(t|D)

(1− λ)P (t|C) + λP (t|D)

P (t|D) =et�t et

woensdag 8 juni 2011 (week )

Page 17: TV-slant presentatie_politicologen_etmaal

Characteristicwords

• Filter out (corpus specific) ‘stopwords’ (e.g. ‘voorzitter’)

• Remove noise (‘kopvoddentaks’ out, ‘sharia’ in)

Why take the trouble?

woensdag 8 juni 2011 (week )

Page 18: TV-slant presentatie_politicologen_etmaal

In action

left (SP, GroenLinks) right (PVV, VVD)

politiecrimineelstrafillegaalboete

leraarstudentkinderombudsmandocentbonus

Top 5 characteristic words

woensdag 8 juni 2011 (week )

Page 19: TV-slant presentatie_politicologen_etmaal

In action

Source: http://politiekinzicht.comwoensdag 8 juni 2011 (week )

Page 20: TV-slant presentatie_politicologen_etmaal

In action

Source: http://politiekinzicht.comwoensdag 8 juni 2011 (week )

Page 21: TV-slant presentatie_politicologen_etmaal

In action

woensdag 8 juni 2011 (week )

Page 22: TV-slant presentatie_politicologen_etmaal

In action

woensdag 8 juni 2011 (week )

Page 23: TV-slant presentatie_politicologen_etmaal

In action

woensdag 8 juni 2011 (week )

Page 24: TV-slant presentatie_politicologen_etmaal

In action

woensdag 8 juni 2011 (week )

Page 25: TV-slant presentatie_politicologen_etmaal

Comparison

1. Find most characteristic words for each political group

2. For each political group, estimate the probability that an arbitrary word in a tv-programme is one of their characteristic words

P̂ (q|TV ) =�

t∈q

tft,TV

|TV |

woensdag 8 juni 2011 (week )

Page 26: TV-slant presentatie_politicologen_etmaal

Results

0

0,175

0,350

0,525

0,700

50 100 150 200 250 500 750 1000 1500 2000 2500 3000

DWDD

Est

imat

ed p

rob

abili

ty o

f wor

ds

app

earin

g

n parsimonious derived words

gov left right *condensed values on x-axis

woensdag 8 juni 2011 (week )

Page 27: TV-slant presentatie_politicologen_etmaal

Results

0

0,175

0,350

0,525

0,700

50 100 150 200 250 500 750 1000 1500 2000 2500 3000

PowNews

Est

imat

ed p

rob

abili

ty o

f wor

ds

app

earin

g

n parsimonious derived words

gov left right *condensed values on x-axis

woensdag 8 juni 2011 (week )

Page 28: TV-slant presentatie_politicologen_etmaal

Results

0

0,010

0,020

0,030

0,040

50 100 150 200 250

News (Journaal, Ochtendspits, etc.)

Est

imat

ed p

rob

abili

ty o

f wor

ds

app

earin

g

n parsimonious derived wordscda christenunie d66 groenlinkspvda pvdd pvv sgpsp verdonk vvd

woensdag 8 juni 2011 (week )

Page 29: TV-slant presentatie_politicologen_etmaal

Results

0

0,008

0,015

0,023

0,030

50 100 150 200 250

Talkshows

Cum

ulat

ive

pro

bab

ility

of w

ord

s ap

pea

ring

n parsimonious derived words

cda christenunie d66 groenlinkspvda pvdd pvv sgpsp verdonk vvd

woensdag 8 juni 2011 (week )

Page 30: TV-slant presentatie_politicologen_etmaal

‘Conclusions’

• Right never ‘wins’

• Possible explanations:

• TV = left church

• TV does not pick up right-wing slanted words

• Or: is TV-language use not different from regular Dutch?

woensdag 8 juni 2011 (week )

Page 31: TV-slant presentatie_politicologen_etmaal

What’s next?

• First, turn all this into a bachelor thesis (deadline in two weeks)

• Future:

• Team up with researcher(s) in political science and media analysisCandidates?

• Try out more sophisticated NLP techniques

• ...

• Publish article

woensdag 8 juni 2011 (week )

Page 32: TV-slant presentatie_politicologen_etmaal

Questions?

Slides available at http://www.politicalmashup.nl

woensdag 8 juni 2011 (week )