WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven.

WP4-22. Final Evaluation of Subtitle Generator

Vincent Vandeghinste, Pan Yi

CCL – KULeuven

Example

Transcript:

Het meest spectaculaire aan de daadwerkelijke start van de euro is dat er eigenlijk niets spectaculairs te melden valt.

Ondertitel:Het meest spectaculaire aan de start van de euro was dat er niets spectaculairs te melden valt.

Availability Calculator

• Pronunciation Time of Input Sentence => estimate nr of characters available in subtitle

• If UNKNOWN, estimate it by– counting nr of syllables– Average speaking rate for Dutch

Syllable Counter

• Rule-based

• Evaluated on CGN-lexicon combined with FREQ-lists

• Estimated nr Nr of syl in phonetic transcripts

• 99.63% of all words in CGN is correctly estimated

Average Syllable Duration

ASD No pauses Pauses included

Literature 177 ms

All CGN files 186 ms 237 ms

One Speaker 185 ms 239 ms

Read-aloud 188 ms 256 ms

Availability Calculator

• When pronunciation time not given: estimate it

• Subtitles: 70 chars / 6 sec = 11.67 chars/sec

• If nr of chars > nr of available chars => compress sentence

Sentence Compressor

• Parallel Corpus

• Sentence Analysis

• Sentence Compression

• Evaluation

Parallel Corpus

• Sentence aligned

• Source & Target corpus:– Tagging– Chunking– SSUB detection

• Chunk alignment

Chunk Alignment

Every 4-gram from src-chnk is compared with every 4-gram from tgt-chnk

A = ( m / (m+n)) . (L1 + L2)/2If (A > 0.315) then Align Chunk

F-value for NP/PP-alignment is 95%

Sentence Analysis

• Tagging (TnT): accuracy = 96.2% (Oostdijk et al., 2002)

• Chunking

Chunk Type Prec. Recall F-value

NP 94.36% 93.91% 94.13%

PP 94.84% 95.22% 95.03%

Sentence Analysis (2)

• SSUB detection

Type of S Prec. Recall F-value

OTI 71.43% 65.22% 68.18%

RELP 69.66% 68.89% 69.27%

SSUB 56.83% 60.77% 58.74%

Sentence Compression

• Use of statistics

• Use of rules

• Word reduction

• Selection of the Compressed Sentence

Use of statistics

Use of rules

• To avoid generating ungrammatical sentences

• Rules of type

For every NP, never remove the head noun

• Rules are applied recursively

Word Reduction

• Example: replace gevangenisstraf by straf

• Counterexample: replace voetbal by bal• Making use of Wordbuilding module (WP2)• Introduces a lot of errors: added accuracy?• Better integration with rest of system should

be possible

Selection of the Compressed Sentence

• All previous steps result in an ordered list of sentence alternatives– Supposedly grammatically correct– Sentences are ordered depending on their

probability– First sentence (most probable) with a length

smaller than available nr of chars is chosen

Evaluation

Condition A B C

ASD 185 ms/syl 192ms/syl 256 ms/syl

No output 44.33% 41.67% 15.67%

Red rate 39.93% 37.65% 16.93%

Interrater Agreement

86.2% 86.9% 91.7%

Accurate 4.8% 8.0% 28.9%

± accurate 28.1% 26.3% 22.1%

Reasonable 32.9% 34.3% 51%

Subtitle Layout Generator

Actieve of gewezen voetballers

zoals Ruud Gullit of Dennis

Bergkamp moeten het stellen met

nauwelijks anderhalf miljard .

wordtActieve of gewezen voetballers

zoals Ruud Gullit of

Dennis Bergkamp moeten het stellen

met nauwelijks anderhalf miljard .

Conclusion

• System approach works very well:– If sentence analysis is correct

– If there are possible reductions (according to the ruleset)

• A lot of No Output cases: System cannot reduce sentence– Sentence cannot be reduced (even by humans)

– Rule-set is too strict / Wrong sentence analysis

– Not fine-grained enough statistical info

• Bad output:– Wrong sentence analysis (CONJ)

– Wrong word-reductions

Future

• Near future (within Atranos)– Better integration of word-reduction

– Combine advantages of CNTS approach and CCL approach into one approach

• Far future (outside Atranos)– Better sentence analysis: full parse is needed

– More fine-grained analysis of parallel corpus

WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven.

Documents

Transcript of WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven.

Titel arial Factsheet Betonnen rotterdam Rioolbuizen 30/36 · Titel arial rotterdam 30/36 Versievermelding (nummer, evt datum) Arial light 6 zwart extra informatie / subtitle / datum

dothi5a.com · 2020. 7. 14. · Phuong pháp Iây bình quân giá tri thi truðng giao dich cúa Cô phiêu CCL binh quân 10 phiên gân nhât tir ngày 11/09/2015 dén ngày 24/09/2015.

WP4-22. Final Evaluation of Subtitle Generator

AMICE WP4 – Transnationaal overstromingsthema - Samenvattingamice-project.eu/docs/pa1_pr104_1378842614_WP4A27... · AMICE WP4 – Transnationaal overstromingsthema - Samenvatting

fpf.org.pefpf.org.pe/wp-content/uploads/2018/08/Res.-N°-0090... · de Fútbol (en adelante, el Reglamento). Que, mediante la Resolución NO 0123 — CCL — 2017, la Comisión impuso

Click to add subtitle Pijnmeting bij de geriatrische patiënt · Post-CVA Slechte positionering Krampen Doorligwonden Tandpijn Amputatie Contracturen Nekprobleem Hoofdpijn/migraine

AthenaPlus: WP4 Eva Coudyzer Koninklijke Musea voor Kunst en Geschiedenis Europeana Overlegplatform, 7 juni 2013.

WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016

Catalogus 2014 - e-P2SE · E8 VOORAFBETAALBARE OPLOSSINGEN 23 E8.1. CCL-Card 23 E8.2. CCL-Term 23 E8.3. CCL-Soft 23 E9 OPLAAD WIJZEN 24 E10 ... dienst voorafbetaling d.m.v. van een

SA HLM du Cotentin Actualités - SA HLM du Cotentin...puissance fiscale à 7 CVs La perte éventuelle de salaire pour participer au CCL sur justificatif de l'employeur, La participation

Register your product and get support at ...€¦ · 19. subtitle: субтитрлерді қосу немесе өшіру. 20. 0-9(Санды түймелер): ТД арналарын

DIMA: rapport Economische randvoorwaarden (WP4)

AthenaPlus: WP4

Anne Verhoef - WP4

WP4: Finaal voorstel van eisen - Technieken en handhaving · 1150-001 • VEA - Studie bepalen eisen technische installaties van gebouwen - Vlaamse energieprestatieregelgeving 4 2

ccl-bg.com. PEARL... · 2016. 4. 12. · Ha BapaHTL,1, KOVITO He ca npaB0T0 cv,l aa 3anuwaT aKUVIV1 0T YBenuqeHL,1eT0 Ha Kan',1Tana, Bb3 OCHOBa Ha peweHv,1eT0 Ha CA, MoraT aa HanpaBBT

etecescolasticarosa.com.bretecescolasticarosa.com.br/datafiles/conteudo/262/3_2018... · 2018. 2. 21. · CCL OR OR QUA OR oe ieteäca - Int nsino Médio SAB SAB SAB Dias Dias Dias

images1.cafef.vnimages1.cafef.vn/download/270718/ccl-bao-cao-tinh-hinh-quan-tri-cong-ty-6-thang-dau... · Hoat dong giám sát cüa HDQT dói vói Ban Gihm dóc; HÐQT thuðng xuyên