TV-slant presentatie_politicologen_etmaal

Post on 05-Dec-2014

1.388 views 0 download

description

 

Transcript of TV-slant presentatie_politicologen_etmaal

Political slant in public broadcasting

9 June 2011Politicologenetmaal, Amsterdam

Bart de Goede Maarten Marx

woensdag 8 juni 2011 (week )

Research aim

• Frivolous research for a bachelor thesis

• Research aim: Apply methodology Gentzkow & Shapiro (2010) to Dutch situation, perhaps improve using NLP

• Future applications:

• Analysis of Dutch media landscape (NewsMonitor)

• Agendasetting and framing research (Timmermans, Breeman)

• Parliament and media: lag or lead? (Vliegenthart)

woensdag 8 juni 2011 (week )

Disclaimer

• We are information scientists, not political scientists

• We might have made awful conceptual mistakes

• We will have missed almost all important references

woensdag 8 juni 2011 (week )

Disclaimer

• Our aim is to show a powerful technique

• We concentrated on getting the data ‘in shape’, rather than interpretation of results

woensdag 8 juni 2011 (week )

Talk outline

1. Research plan and methodology

2. Description of our research

3. Results

4. What’s next?

woensdag 8 juni 2011 (week )

Gentzkow & Shapiro

• Econometrical research: compare language of news outlets to political language

• ‘An economically significant demand for news slanted towards one’s own political ideology’

Gentzkow, M. and Shapiro, J. M. (2010). What drives media slant? Evi-dence from U.S. daily newspapers. Econometrica, 78(1):35–71.

woensdag 8 juni 2011 (week )

Gentzkow & Shapiro

• Find characteristic words and phrases of Democrats and Republicans in Hansards (‘death tax’ versus ‘estate tax’)

• Count relative frequencies of these words in newspapers

• Score newspapers on ‘political slant’ by comparing frequencies of Democratic and Republican words

• ... (even more, but not relevant to us)

Operationalization

woensdag 8 juni 2011 (week )

Our research

• Dutch versus English: compound words, unigrams instead of bigrams

• Television data instead of newspapers

• Far more political parties

• Other, more powerful technique for finding characteristic words

Reproduce, with some alterations

woensdag 8 juni 2011 (week )

Our research

1. Collecting TV data

2. Selecting appropriate broadcasts

3. Defining political groups

4. Obtaining data for each group

5. Obtaining characteristic words

6. Compare word use in political groups and TV broadcasts

An outline

woensdag 8 juni 2011 (week )

TV Data

• Subtitles for the hearing impaired (http://tt888.nl)

• Complete data from January 2008 till February 2011

• Problem: hardly any useful metadata (63% only has date and time of broadcast)

woensdag 8 juni 2011 (week )

TV Data

• TV guide

• Used http://tv2day.nl to combine broadcast time with (unambiguous) program title

Solution Before After

Programme with title

Unique titles

Single events

Broadcast frequency

> 2

16.995 32.491

4.560 -> 2.702

2.238

1.598 1.174

1.104 1.064

woensdag 8 juni 2011 (week )

Selected broadcasts

Pauw & Witteman895.935 words

Nova362.844 words

Nos Journaal12.609.620 words

NOS Jeugdjournaal1.383.728 words

Netwerk879.635 words

Goedemorgen Nederland760.658 words

EenVandaag1.556.642 words

DWDD1.626.929 words

Buitenhof DWDDEenVandaag Goedemorgen NederlandHet Elfde Uur Holland DocKnevel en Van den Brink NetwerkNieuwsuur NOS JeugdjournaalNos Journaal NovaOchtendspits Pauw & WittemanPowNews SchoolTV WeekjournaalSinterklaasjournaal TegenlichtUitgesproken VragenuurtjeZembla

woensdag 8 juni 2011 (week )

Political groups

• Parliamentary period with greatest overlap on TV data set:Balkenende IV

• Experiments with e.g. Wordfish have shown that text comparisons mostly measure government - opposition, not left - right (Hirst et al., 2010)

Hirst, G., Riabinin, Y., Graham, J., and Boizot-Roche, M. Text to Ideology

or Text to Party Status?

woensdag 8 juni 2011 (week )

Political groups

• Therefore, we choose:

• Government (CDA, PvdA and ChristenUnie)

• Left wing opposition (GroenLinks, SP)

• Right wing opposition (PVV, VVD)

woensdag 8 juni 2011 (week )

ObtainingProceedings data

$collection//HAN1995//root[date restriction]//speech[@party matches(party names)]/p/text()

Trivial, using the PoliticalMashup database

Explain query:HAN1995: all Hansards since 1995woensdag 8 juni 2011 (week )

Characteristicwords

• Transform word frequency counts into probability distributions of words (maximum likelyhood estimation)

• Compare distributions of subsets to distribution of all words

• Choose words from subset whose frequency is much higher than expected

• Adjust probabilities

• Iterate to convergence

Parsimonious language model

et = tf(t,D) · λ(t|D)

(1− λ)P (t|C) + λP (t|D)

P (t|D) =et�t et

woensdag 8 juni 2011 (week )

Characteristicwords

• Filter out (corpus specific) ‘stopwords’ (e.g. ‘voorzitter’)

• Remove noise (‘kopvoddentaks’ out, ‘sharia’ in)

Why take the trouble?

woensdag 8 juni 2011 (week )

In action

left (SP, GroenLinks) right (PVV, VVD)

politiecrimineelstrafillegaalboete

leraarstudentkinderombudsmandocentbonus

Top 5 characteristic words

woensdag 8 juni 2011 (week )

In action

Source: http://politiekinzicht.comwoensdag 8 juni 2011 (week )

In action

Source: http://politiekinzicht.comwoensdag 8 juni 2011 (week )

In action

woensdag 8 juni 2011 (week )

In action

woensdag 8 juni 2011 (week )

In action

woensdag 8 juni 2011 (week )

In action

woensdag 8 juni 2011 (week )

Comparison

1. Find most characteristic words for each political group

2. For each political group, estimate the probability that an arbitrary word in a tv-programme is one of their characteristic words

P̂ (q|TV ) =�

t∈q

tft,TV

|TV |

woensdag 8 juni 2011 (week )

Results

0

0,175

0,350

0,525

0,700

50 100 150 200 250 500 750 1000 1500 2000 2500 3000

DWDD

Est

imat

ed p

rob

abili

ty o

f wor

ds

app

earin

g

n parsimonious derived words

gov left right *condensed values on x-axis

woensdag 8 juni 2011 (week )

Results

0

0,175

0,350

0,525

0,700

50 100 150 200 250 500 750 1000 1500 2000 2500 3000

PowNews

Est

imat

ed p

rob

abili

ty o

f wor

ds

app

earin

g

n parsimonious derived words

gov left right *condensed values on x-axis

woensdag 8 juni 2011 (week )

Results

0

0,010

0,020

0,030

0,040

50 100 150 200 250

News (Journaal, Ochtendspits, etc.)

Est

imat

ed p

rob

abili

ty o

f wor

ds

app

earin

g

n parsimonious derived wordscda christenunie d66 groenlinkspvda pvdd pvv sgpsp verdonk vvd

woensdag 8 juni 2011 (week )

Results

0

0,008

0,015

0,023

0,030

50 100 150 200 250

Talkshows

Cum

ulat

ive

pro

bab

ility

of w

ord

s ap

pea

ring

n parsimonious derived words

cda christenunie d66 groenlinkspvda pvdd pvv sgpsp verdonk vvd

woensdag 8 juni 2011 (week )

‘Conclusions’

• Right never ‘wins’

• Possible explanations:

• TV = left church

• TV does not pick up right-wing slanted words

• Or: is TV-language use not different from regular Dutch?

woensdag 8 juni 2011 (week )

What’s next?

• First, turn all this into a bachelor thesis (deadline in two weeks)

• Future:

• Team up with researcher(s) in political science and media analysisCandidates?

• Try out more sophisticated NLP techniques

• ...

• Publish article

woensdag 8 juni 2011 (week )

Questions?

Slides available at http://www.politicalmashup.nl

woensdag 8 juni 2011 (week )