Seman&c(Role(Labeling( 의미역 결정)(leeck/NLP/SRL.pdf · 2016. 8. 31. · Structure(Parser...
Transcript of Seman&c(Role(Labeling( 의미역 결정)(leeck/NLP/SRL.pdf · 2016. 8. 31. · Structure(Parser...
Seman&c Role Labeling (의미역 결정)
자연언어처리
자연언어 분석 단계
2
형태소 분석 (Morphological Analysis)
구문 분석 (Syntax Analysis)
의미 분석 (Seman&c Analysis)
화용 분석 (Pragma&c Analysis)
분석 결과
자연언어문장
Seman&c Role Labeling: Defini&on
SRL: Defini&on – cont’d
SRL: Defini&on – cont’d
SRL: Defini&on – cont’d
SRL: Defini&on – cont’d
SRL: Proper&es
SRL: Applica&ons
SRL in Informa&on Extrac&on (Surdeanu et al., 2003)
PropBank (Palmer et al., 2005) • Annota&on of all verbal predicates in WSJ (Penn Treebank) • hRp://verbs.colorado.edu/∼mpalmer/projects/ace.html • Add a seman&c layer to the Syntac&c Trees
PropBank – cont’d
• Theory neutral numeric core roles (Arg0, Arg1, etc.) – Interpreta&on of roles: verb-‐specific framesets – Arg0 and Arg1 usually correspond to prototypical Agent and Pa&ent/Theme roles
– Other arguments do not consistently generalize across verbs
– Different senses have different framesets – Syntac&c alterna&ons that preserve meaning are kept together in a single frameset
• Closed set of 13 general labels for Adjuncts (e.g., Temporal, Manner, Loca&on, etc.)
PropBank: Frame Files • sell.01: commerce: seller
– Arg0=“seller” (agent); Arg1=“thing sold” (theme); Arg2=“buyer” (recipient); Arg3=“price paid”; Arg4=“benefac&ve”
– [Al Brownstein]Arg0 sold [it]Arg1 [for $60 a boRle]Arg3 • sell.02: give up
– Arg0=“en&ty selling out” – [John]Arg0 sold out
• sell.03: sell un&l none is/are lef – Arg0=“seller”; Arg1=“thing sold”; ... – [The new Harry PoRer]Arg1 sold out [within 20 minutes]ArgM
−TMP
PropBank: Annota&on • Numbered arguments : the core argument are labeled by numbers
A0 : agent(행위자), experiencer A1 : pa&ent(피동자), theme(대상) A2 : instrument, benefac&ve, aRribute, end state A3 : start point, benefac&ve, instrument, aRribute A4 : end point 외에 A5, AA 가 있다.
• Adjuncts(AM-‐) : 어떤 동사가 선택적으로 취할 수 있는 논항 AM-‐ADV : 일반 목적 AM-‐MOD : 법조 동사(a modal verb) AM-‐CAU : 원인 AM-‐NEG : nega&on marker AM-‐DIR : 방향 AM-‐PNC : 목적 AM-‐DIS : 담화 표지 AM-‐PRD : 술어 AM-‐LOC : 장소 AM-‐TMP : 시제 AM-‐MNR : 태도(manner) * 법조동사 : can, may, will… 과 같은 동사들을 법조동사라고 부른다.(즉, 조동사) * 담화 표지(discourse marker) : well, on the other hand처럼 구어에서 대화 내용의 이동을 나타내는 어구
• References(R-‐) : 문장의 다른 부분에서 인식된 논항(R-‐A0와 같이 표기) • Verbs(V) : 동사
PropBank: Main Characteris&cs
• Representa&ve sample of text – but: limited genre of WSJ text
• Non situa&on specific labels – but: core labels do not (completely) generalize across verbs
• Has become the primary resource for research in SRL
NomBank (Meyers et al., 2004) • NomBank Project: hRp://nlp.cs.nyu.edu/meyers/NomBank.html
• Annota&on of the nominal predicates in WSJ–PennTreeBank – IBM appointed John – John was appointed by IBM – IBM’s appointment of John – The appointment of John by IBM – John is the current IBM appointee
• Annota&on similar to PropBank – [Her]Arg0 gif of [a book]Arg1 [to John]Arg2
Other Languages • Chinese PropBank
– hRp://verbs.colorado.edu/chinese/cpb/ • Korean PropBank
– hRp://www.ldc.upenn.edu/ • AnCora corpus: Spanish and Catalan
– hRp://hRp://clic.ub.edu/ancora/ • Prague Dependency Treebank: Czech
– hRp://ufal.mff.cuni.cz/pdt2.0/ • Penn Arabic TreeBank: Arabic
– hRp://www.ircs.upenn.edu/arabic/
Other Extensions
• CoNLL-‐2004/2005: Phrase Structure Grammar 기반 – PropBank 이용
• CoNLL–2008 shared task: joint representa&on for syntac&c and seman&c dependencies – Dependency Structure 기반 – PropBank + NomBank – hRp://www.yr-‐bcn.es/conll2008/
• CoNLL–2009 shared task: extension to mul&ple languages (Catalan, Chinese, Czech, English, German, Japanese, Spanish) – hRp://ufal.mff.cuni.cz/conll2009-‐st/
CoNLL-‐2008 shared task • Dependency-‐based SRL • PropBank + NomBank
CoNLL-‐2008 shared task: Results
• Best results (Johanson & Nugues, 2008): – WSJ: LAS=90.13; F1=81.75; Overall: 85.95 – Brown: LAS=82.81; F1=69.06; Overall: 75.95
• Mostly pipeline architectures – Best systems are s&ll pipelined (syntax, then seman&cs)
• Comparison to CoNLL-‐2005: – Results on the dependency representaIon are slightly beKer than those on consItuents
• Fair post-‐compe&&on comparison by Johansson (2008)
SRL Example
No one wants stock on their books.
1. predicate iden&fica&on(PI) 2. predicate classifica&on(PC) 3. argument iden&fica&on(AI) 4. argument classifica&on(AC)
SRL System
Input sentence
SRL Example – cont’d
1. predicate iden&fica&on(PI) 2. predicate classifica&on(PC) 3. argument iden&fica&on(AI) 4. argument classifica&on(AC)
SRL System
SRL 특징
• Domain에 의존적 – Domain이 바뀌면 ~10% F1 성능 하락
• Parser의 성능에 의존적 – Gold vs. automa&c parses: ~90% vs. ~80% F1
• Dependency Parser 기반 SRL이 Phrase Structure Parser 기반 SRL 보다 성능이 더 좋음
Korean SRL 보유 기술
• Korean PropBank 기반 SRL 시스템 – 의존 구문분석 기반
• Phrase Structure à Dependency Structure 변환 – 4882 문장 (train=4096, test=786)
• Structural SVM 기반 (2단계) – PIC와 AIC 문제 à Sequence Labeling Problem – PIC: PI=98.88%, PIC=96.44% – AIC: AIC=74.27% (Precision=81.33%, Recall=68.34)
한국어 SRL 예제 • Input: 그는 르노가 3월말까지 인수제의 시한을 갖고 있다고 덧붙였다.
• 형태소분석/품사태깅 à 의존구문분석 • PIC
– 그는 르노가 3월말까지 인수제의 시한을 [갖고]갖.1 있다고 [덧붙였다.]덧붙.1
• AIC – 그는 [르노가]ARG0 [3월말까지]ARGM-‐TMP 인수제의 [시한을]ARG1 [갖고]갖.1 [있다고]AUX 덧붙였다.
– [그는]ARG0 르노가 3월말까지 인수제의 시한을 갖고 [있다고]ARG1 [덧붙였다.]덧붙.1
한국어 SRL 예제
Dependency Parser
SemanIc Role Labeling
Frame File 예 • 갖.01
– English Define = have – Role set
• ARG0: agent • ARG1: possession
– Mapping • Rel = 갖다 • Src = sbj, Trg = ARG0 • Src = obj, Trg = ARG1
• 덧붙.01 – English Define = add – Role set
• ARG0: adder • ARG1: thing added • ARG2: added to
– Mapping1 • Rel = 덧붙이다 • Src = sbj, Trg = ARG0 • Src = obj, Trg = ARG1 • Src = comp, Trg=ARG2
– Mapping2 • Rel = 덧붙이다 • Src = sbj, Trg = ARG0 • Src = s-‐comp, Trg = ARG1 • Src = np-‐comp, Trg=ARG2
Sequence Labeling Problem
!s = s1, s2,...sn!o = o1,o2,...on
HMM
MEMM
CRF
St-1 St
Ot
St+1
Ot+1 Ot-1
...
...
St-1 St
Ot
St+1
Ot+1 Ot-1
...
... St-1 St
Ot
St+1
Ot+1 Ot-1
...
...
∏=
−∝||
11 )|()|(),(
o
ttttt soPssPosP
!!!
∏∑
∑
∏
=
−
=−
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
+∝
∝
−
||
1
1
,
||
11
),(
),(exp1
),|()|(
1
o
tk
ttkk
jttjj
os
o
tttt
xsg
ssf
Z
ossPosP
tt
!
!!!
µ
λ
∏∑
∑
=
−
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
+∝
||
1
1
),(
),(exp1)|(
o
tk
ttkk
jttjj
o xsg
ssf
ZosP
!
!
!!
µ
λ
Structural SVM
x
y
한나라당/nn 조해진/nn 대변인/nc 은/jc …
B-‐Org — B-‐Per — O — O … | | | |
한나라당/nn 조해진/nn 대변인/nc 은/jc … !!한나라당→−
→
→−
→−
→−
→−
⎟⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
=Ψ
OrgBncOnnPerBjcOrgBncOrgBnnOrgB
111001
),( yx
구문 분석 예
개체명 인식 예
!