ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

31
Statistical Automatic Post Editing Santanu Pal * The work has been carried out in Translated 1

Transcript of ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Page 1: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Statistical Automatic Post Editing

Santanu Pal * The work has been carried out in Translated

1  

Page 2: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Introduction §  Motivations §  System Description

§  Preprocessing §  Improved Word Alignment §  Hierarchical PB-SMT §  Advantage of using Hierarchical PB-SMT

§  Experiments §  Evaluations §  Conclusions

Outline

2  

Page 3: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  The translation provided by current MT systems often fail to deliver perfect translation output

§  to achieve sufficient quality output, translations often need to be corrected or post-edited by human translators

§  “Post-Editing” (PE) is defined as the correction by human over the translation produced by an MT system (Veale and Way 1997).

§  Often the process of improving translation provided by an MT system with a minimum of manual labor (TAUS report, 2010).

Introduction

3  

Page 4: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  The major goal of using APE system is §  Post-editing MT output instead of translating from scratch §  Timesaving §  Cost-effective §  reduce the effort of the human post-editor

§  In some cases, recent studies have even shown that §  The quality of MT plus PE can exceed the quality of human

translation (Fiederer and O‘Brien 2009, Koehn 2009, DePalma and Kelly, 2009)

§  Productivity (Zampieri and Vela, 2014).

Introduction(Cond…)

4  

Page 5: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Many studies regarding the impacts of various factors and methods in PE §  Those were examined against the volume of PE effort. §  instead in a commercial work environment.

§  The overall purpose of the present study is to answer two fundamental questions: §  What would be the optimal design of a PE system? which

is ultimately determined by the quality of MT output in a commercial work environment

§  How can human involvement to be optimized to reduce post editing effort in a commercial work environment?

Introduction(Cond…)

5  

Page 6: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  The advantage of an APE system is that it can adapt any Black-box MT engine output as an input and provide possible automatic PE output with out having to retrain or re–implement the first-stage MT engine.

§  PB-SMT (Koehn et al., 2003) can be applied as APE system. (Simard et al., 2007). §  The SMT system trained on the output RBMT output and reference

human translations. §  This PB-SMT based APE system is able to correct the systematic errors

produced by RBMT system. §  This approach achieved large improvements in performance

Motivation

6  

Page 7: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Translations provided by MT including: §  wrong lexical choice §  Incorrect word ordering §  word insertion §  word deletion

§  The proposed APE system based on HPB-SMT and improved hybrid alignments that is able to handle above errors

§  This method also able to correct word ordering error to some extent.

Motivation(Cond…)

7  

Page 8: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  APE system based on monolingual SMT system trained on MT output as the source language and reference human translations as the target language.

§  The proposed APE system is designed as follows: §  Preprocess data

§  Parallel text [MT-output and PE output] §  Monolingual data

§  Improved word alignment (Hybrid) §  Hierarchical PB-SMT

System Design

8  

Page 9: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Parallel Text and Monolingual data §  Some sentences are Noisy and mixed with other

languages §  some sentences contain URLs §  The preprocessor cleans this noise by using a language

identification tool

Preprocessing

9  

Page 10: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Statistical Word Aligner §  Giza++

§  implements maximum likelihood estimators for all the IBM 1-5 models and a HMM alignment model as well as Model 6

§  SymGiza++ §  Computes symmetric word alignment models with the capability to

take advantage of multi-processor systems. §  Alignment quality improves more than 17% compared to Giza++.

§  Berkeley Aligner §  Cross Expectation Maximization word aligner §  its jointly trained HMM models as a result AER reduce by 29%.

§  Edit distance based Aligner §  TER Alignment §  METEOR Alignment

Improved Word Alignment

10  

Page 11: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Edit distance based Aligner §  TER Alignment

§  TER is an evaluation metric which measures the ratio between the number of edit operations required for a hypothesis H and the corresponding reference R to the total number of words in the R.

§  Can also be used as monolingual word alignment §  METEOR Alignment

Improved Word Alignment (Cond…)

11  

Page 12: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  METEOR Alignment §  The alignment mapping method between words of H an R, has been

build incrementally by sequence of word-mapping modules: §  Exact: maps if they are exactly the same. §  Porter stem: maps if they are the same after they are stemmed §  WN synonymy: maps if they are considered synonyms in

WordNet. §  If multiple alignment exists, Meteor selects the alignment with fewest

cross alignment links. §  The final alignment has been produced between H and R, as the

union of all stage alignments (e.g. Exact, Porter Stem and WN synonymy).

Improved Word Alignment (Cond…)

12  

Page 13: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Hybridization §  Union: In union method, we consider all alignments

correct. All the alignment tables are unioned together and duplicate entries are removed.

Improved Word Alignment (Cond…)

13  

Page 14: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  ADD additional Alignments: §  Consider one of the alignments generated by GIZA++

GDFA (A1) or Berkeley aligner (A2), SymGiza++ (A3) as the standard alignment (SA)

§  ALGORITHM: §  Step 1: Choose a standard alignment (SA) from A1, A2

and A3. §  Step 2: Correct the alignment of SA by looking at the

alignment table of A4 and A5. §  Step 3: Find additional alignments from A2 , A3. A4 and

A5. using intersection method (A2∩A3∩A4∩A5) if A1 as SA.

§  Step 4: Add additional entries with SA.

Improved Word Alignment (Cond…)

14  

Page 15: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Hierarchical PB-SMT is based on Synchronous Context Free Grammar (SCFG) (Aho and Ullman, 1969). SCFG rewrites rules on right-hand side with aligned pairs (Chiang, 2005). §  X →< γ, α, ∼>

§  X represents nonterminal, γ, α represents both terminal and nonterminal strings and

§  ∼ represents one-to-one correspondence between occurrences of nonterminal in γ and α.

Hierarchical PB-SMT

15  

Page 16: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  There exist two additional rules called “glue rule” or “glue grammar” : §  S →< SX,SX > §  S →< X, X >

§  These rules are used when §  no rules could match or §  the span exceeds a certain length

§  These rules simply monotonically connect translations of two adjacent blocks together.

Hierarchical PB-SMT

16  

Page 17: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  The hybrid word alignment provided better quality good alignment table

§  During phrase extraction, the system can automatically handle and estimate §  Word insertion error (by considering one-to-many alignment

link) §  Word deletion error (by considering many-to-one alignment

link) §  Lexical error ( estimating high lexical weighting during model

estimation) §  Word ordering (Using Hierarchical model facilitates word

ordering, because it uses formally hierarchical phrases)

Benefits

17  

Page 18: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Dataset §  MateCat data contains 312K data §  After cleaning 213,795 Parallel MT-PE data

§  Training data: 211,795 §  Development set Data: 1000 §  Test set Data: 1000

§  Monolingual Data consist of PE data and Europarl clean data

Experiments

18  

Page 19: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Experimental Setup §  5-gram Language Model [KenLM] §  Phrase length 7 §  Hierarchical PB-SMT [Moses]

§  Maximum Chart Span 100 §  Minimum Chart Span 20 §  use Good Turing discounting of the phrase translation

probabilities §  Filtered Phrase table for faster decoding

Experiments(Cond…)

19  

Page 20: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  System Tuning with Development data §  MERT

§  60% TER and 40% BLEU §  Maximum Iteration 25

§  MIRA §  Batch-mira

§  Tuned parameter for monolingual APE System MT-ITà APE-IT: §  Language Model = 0.0569997 §  Word Pnalty = 0.118199 §  PhrasePenalty = 0.127955 §  Translation Model0 = 0.148562 -0.0700695 0.275438 0.0861944 §  Translation Model1 = 0.116582

Experiments(Cond…)

20  

Page 21: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  The evaluation process has been carried out into two directions §  Automatic and §  Manual evaluation with 4 expert translators §  The automatic evaluation provides significant improvement

using three automatic evaluation metric such as §  BLEU, §  TER and §  METEOR.

§  Our evaluation using human judgments shows that the APEs always improve the overall translation adequacy and it improved 7% of post-edited sentences.

Evaluations

21  

Page 22: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Automatic sentence level evaluation over 145 Sentences

Evaluations

22  

Metric   APE  System  better  than  Google  translation  

Google  Translation  better  than  APE  System  

%  Improvement  over  1000  sentences  

%  Loss  over  1000  sentences  

Sentence  BLEU   91   54   9.1%   5.4%  

Page 23: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Automatic evaluation over 1000 sentences

Evaluations

23  

Metric   APE  System   Google  Translation   %  Relative  Improvements  

BLEU   63.87   61.26   4.2%  

 TER   28.67   30.94   7.9%  

 METEOR   73.63   72.73   1.2%  

Page 24: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Manual Evaluation with 4 expert translators for 145 sentences, (EN=English, DE= German, FR= French, ES = Spanish, CA= Catalan, IT= Italian)

Human Evaluations

24  

  Qualifications of Translators   Expertise   Experience   APE System  

Google Translation  

Uncertain  

Translator 1   Degree in Translation   EN,FR à IT.   1 years   91   22   32  

Translator 2   Degree in Linguistic and Cultural Studies  

EN, FR, ES, CA à IT  

2 years   57   17   71  

Translator 3   Degree in European Languages and Cultures  

EN, FR, ES, DE à IT  

1 years   72   37   36  

Translator 4   Degree in Business & Administration   EN à IT   1 years   65   23   58  

Average         71   25   49  

% Improvements  

      7.1%   2.5%   4.9%  

Page 25: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

0  

20  

40  

60  

80  

100  

Translator  1  

Translator  2  

Translator  3  

Translator  4  

APE  System  

Google  Translation  

Uncertain  

Human Evaluations

25  

Page 26: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Over  all  Evaluation  of  the  1000  sentences  in  total  

Equal  output  given  by  MT  and  APE    APE  Choosen  by  Human  MT  Choosen  by  Human  Uncertain  

Human Evaluations

26  

Page 27: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Human Evaluation

27  

0  

1  

2  

3  

4  

5  

6  

7  

8  

APE  Choosen  by  Human  

MT  Choosen  by  Human  

Uncertain  

%  of  145  sentences  over  1000  sentence,  the  rest  are  ties  between  MT  and  APE  

%  of  145  sentences  over  1000  sentence,  the  rest  are  ties  between  MT  and  APE  

Page 28: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  From 145 sentences evaluated by 4 translators the overall measure is (based on at lease one translator agree to vote a particular system) §  APE: 105 §  UN: 94 §  GT: 62

Human Evaluation

28  

Page 29: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  The proposed APE system was successful in improving over the baseline MT system performance.

§  Although some APE translations were deemed as worse than the original MT output by the human evaluators, however, they were very few in numbers.

§  Manual inspection revealed that these lower quality APE translations are very similar to the original MT translations.

§  These worse translations can be avoided by adding more features (e.g., syntactic or semantic) which can also improve the overall performance of the post-editing system.

§  The presented system can easily be plugged into any state-of-the-art system and the runtime complexity is similar to that of other statistical MT systems.

Conclusions

29  

Page 30: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  In future, we will try bootstrapping strategies for further tuning the model and add more sophisticated features beyond lexical level.

§  We will improve our hybrid word alignment algorithm by incorporating additional word aligners such as fastaligner, Anymaligner, etc.

§  We also want to extend the system by incorporating source knowledge as well as improving word ordering by using Kendall reordering method.

§  To consolidate the user evaluation, we will measure inter-annotator agreement.

§  We will also evaluate our system in a real-life setting in commercial environment to analyse time gain and productivity gain provided by automatic post-editing.

Future Works

30  

Page 31: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

§  Thank you!

31