ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Post on 13-Apr-2017

98 views 1 download

Transcript of ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Statistical Automatic Post Editing

Santanu Pal * The work has been carried out in Translated

1  

§  Introduction §  Motivations §  System Description

§  Preprocessing §  Improved Word Alignment §  Hierarchical PB-SMT §  Advantage of using Hierarchical PB-SMT

§  Experiments §  Evaluations §  Conclusions

Outline

2  

§  The translation provided by current MT systems often fail to deliver perfect translation output

§  to achieve sufficient quality output, translations often need to be corrected or post-edited by human translators

§  “Post-Editing” (PE) is defined as the correction by human over the translation produced by an MT system (Veale and Way 1997).

§  Often the process of improving translation provided by an MT system with a minimum of manual labor (TAUS report, 2010).

Introduction

3  

§  The major goal of using APE system is §  Post-editing MT output instead of translating from scratch §  Timesaving §  Cost-effective §  reduce the effort of the human post-editor

§  In some cases, recent studies have even shown that §  The quality of MT plus PE can exceed the quality of human

translation (Fiederer and O‘Brien 2009, Koehn 2009, DePalma and Kelly, 2009)

§  Productivity (Zampieri and Vela, 2014).

Introduction(Cond…)

4  

§  Many studies regarding the impacts of various factors and methods in PE §  Those were examined against the volume of PE effort. §  instead in a commercial work environment.

§  The overall purpose of the present study is to answer two fundamental questions: §  What would be the optimal design of a PE system? which

is ultimately determined by the quality of MT output in a commercial work environment

§  How can human involvement to be optimized to reduce post editing effort in a commercial work environment?

Introduction(Cond…)

5  

§  The advantage of an APE system is that it can adapt any Black-box MT engine output as an input and provide possible automatic PE output with out having to retrain or re–implement the first-stage MT engine.

§  PB-SMT (Koehn et al., 2003) can be applied as APE system. (Simard et al., 2007). §  The SMT system trained on the output RBMT output and reference

human translations. §  This PB-SMT based APE system is able to correct the systematic errors

produced by RBMT system. §  This approach achieved large improvements in performance

Motivation

6  

§  Translations provided by MT including: §  wrong lexical choice §  Incorrect word ordering §  word insertion §  word deletion

§  The proposed APE system based on HPB-SMT and improved hybrid alignments that is able to handle above errors

§  This method also able to correct word ordering error to some extent.

Motivation(Cond…)

7  

§  APE system based on monolingual SMT system trained on MT output as the source language and reference human translations as the target language.

§  The proposed APE system is designed as follows: §  Preprocess data

§  Parallel text [MT-output and PE output] §  Monolingual data

§  Improved word alignment (Hybrid) §  Hierarchical PB-SMT

System Design

8  

§  Parallel Text and Monolingual data §  Some sentences are Noisy and mixed with other

languages §  some sentences contain URLs §  The preprocessor cleans this noise by using a language

identification tool

Preprocessing

9  

§  Statistical Word Aligner §  Giza++

§  implements maximum likelihood estimators for all the IBM 1-5 models and a HMM alignment model as well as Model 6

§  SymGiza++ §  Computes symmetric word alignment models with the capability to

take advantage of multi-processor systems. §  Alignment quality improves more than 17% compared to Giza++.

§  Berkeley Aligner §  Cross Expectation Maximization word aligner §  its jointly trained HMM models as a result AER reduce by 29%.

§  Edit distance based Aligner §  TER Alignment §  METEOR Alignment

Improved Word Alignment

10  

§  Edit distance based Aligner §  TER Alignment

§  TER is an evaluation metric which measures the ratio between the number of edit operations required for a hypothesis H and the corresponding reference R to the total number of words in the R.

§  Can also be used as monolingual word alignment §  METEOR Alignment

Improved Word Alignment (Cond…)

11  

§  METEOR Alignment §  The alignment mapping method between words of H an R, has been

build incrementally by sequence of word-mapping modules: §  Exact: maps if they are exactly the same. §  Porter stem: maps if they are the same after they are stemmed §  WN synonymy: maps if they are considered synonyms in

WordNet. §  If multiple alignment exists, Meteor selects the alignment with fewest

cross alignment links. §  The final alignment has been produced between H and R, as the

union of all stage alignments (e.g. Exact, Porter Stem and WN synonymy).

Improved Word Alignment (Cond…)

12  

§  Hybridization §  Union: In union method, we consider all alignments

correct. All the alignment tables are unioned together and duplicate entries are removed.

Improved Word Alignment (Cond…)

13  

§  ADD additional Alignments: §  Consider one of the alignments generated by GIZA++

GDFA (A1) or Berkeley aligner (A2), SymGiza++ (A3) as the standard alignment (SA)

§  ALGORITHM: §  Step 1: Choose a standard alignment (SA) from A1, A2

and A3. §  Step 2: Correct the alignment of SA by looking at the

alignment table of A4 and A5. §  Step 3: Find additional alignments from A2 , A3. A4 and

A5. using intersection method (A2∩A3∩A4∩A5) if A1 as SA.

§  Step 4: Add additional entries with SA.

Improved Word Alignment (Cond…)

14  

§  Hierarchical PB-SMT is based on Synchronous Context Free Grammar (SCFG) (Aho and Ullman, 1969). SCFG rewrites rules on right-hand side with aligned pairs (Chiang, 2005). §  X →< γ, α, ∼>

§  X represents nonterminal, γ, α represents both terminal and nonterminal strings and

§  ∼ represents one-to-one correspondence between occurrences of nonterminal in γ and α.

Hierarchical PB-SMT

15  

§  There exist two additional rules called “glue rule” or “glue grammar” : §  S →< SX,SX > §  S →< X, X >

§  These rules are used when §  no rules could match or §  the span exceeds a certain length

§  These rules simply monotonically connect translations of two adjacent blocks together.

Hierarchical PB-SMT

16  

§  The hybrid word alignment provided better quality good alignment table

§  During phrase extraction, the system can automatically handle and estimate §  Word insertion error (by considering one-to-many alignment

link) §  Word deletion error (by considering many-to-one alignment

link) §  Lexical error ( estimating high lexical weighting during model

estimation) §  Word ordering (Using Hierarchical model facilitates word

ordering, because it uses formally hierarchical phrases)

Benefits

17  

§  Dataset §  MateCat data contains 312K data §  After cleaning 213,795 Parallel MT-PE data

§  Training data: 211,795 §  Development set Data: 1000 §  Test set Data: 1000

§  Monolingual Data consist of PE data and Europarl clean data

Experiments

18  

§  Experimental Setup §  5-gram Language Model [KenLM] §  Phrase length 7 §  Hierarchical PB-SMT [Moses]

§  Maximum Chart Span 100 §  Minimum Chart Span 20 §  use Good Turing discounting of the phrase translation

probabilities §  Filtered Phrase table for faster decoding

Experiments(Cond…)

19  

§  System Tuning with Development data §  MERT

§  60% TER and 40% BLEU §  Maximum Iteration 25

§  MIRA §  Batch-mira

§  Tuned parameter for monolingual APE System MT-ITà APE-IT: §  Language Model = 0.0569997 §  Word Pnalty = 0.118199 §  PhrasePenalty = 0.127955 §  Translation Model0 = 0.148562 -0.0700695 0.275438 0.0861944 §  Translation Model1 = 0.116582

Experiments(Cond…)

20  

§  The evaluation process has been carried out into two directions §  Automatic and §  Manual evaluation with 4 expert translators §  The automatic evaluation provides significant improvement

using three automatic evaluation metric such as §  BLEU, §  TER and §  METEOR.

§  Our evaluation using human judgments shows that the APEs always improve the overall translation adequacy and it improved 7% of post-edited sentences.

Evaluations

21  

§  Automatic sentence level evaluation over 145 Sentences

Evaluations

22  

Metric   APE  System  better  than  Google  translation  

Google  Translation  better  than  APE  System  

%  Improvement  over  1000  sentences  

%  Loss  over  1000  sentences  

Sentence  BLEU   91   54   9.1%   5.4%  

§  Automatic evaluation over 1000 sentences

Evaluations

23  

Metric   APE  System   Google  Translation   %  Relative  Improvements  

BLEU   63.87   61.26   4.2%  

 TER   28.67   30.94   7.9%  

 METEOR   73.63   72.73   1.2%  

§  Manual Evaluation with 4 expert translators for 145 sentences, (EN=English, DE= German, FR= French, ES = Spanish, CA= Catalan, IT= Italian)

Human Evaluations

24  

  Qualifications of Translators   Expertise   Experience   APE System  

Google Translation  

Uncertain  

Translator 1   Degree in Translation   EN,FR à IT.   1 years   91   22   32  

Translator 2   Degree in Linguistic and Cultural Studies  

EN, FR, ES, CA à IT  

2 years   57   17   71  

Translator 3   Degree in European Languages and Cultures  

EN, FR, ES, DE à IT  

1 years   72   37   36  

Translator 4   Degree in Business & Administration   EN à IT   1 years   65   23   58  

Average         71   25   49  

% Improvements  

      7.1%   2.5%   4.9%  

0  

20  

40  

60  

80  

100  

Translator  1  

Translator  2  

Translator  3  

Translator  4  

APE  System  

Google  Translation  

Uncertain  

Human Evaluations

25  

Over  all  Evaluation  of  the  1000  sentences  in  total  

Equal  output  given  by  MT  and  APE    APE  Choosen  by  Human  MT  Choosen  by  Human  Uncertain  

Human Evaluations

26  

Human Evaluation

27  

0  

1  

2  

3  

4  

5  

6  

7  

8  

APE  Choosen  by  Human  

MT  Choosen  by  Human  

Uncertain  

%  of  145  sentences  over  1000  sentence,  the  rest  are  ties  between  MT  and  APE  

%  of  145  sentences  over  1000  sentence,  the  rest  are  ties  between  MT  and  APE  

§  From 145 sentences evaluated by 4 translators the overall measure is (based on at lease one translator agree to vote a particular system) §  APE: 105 §  UN: 94 §  GT: 62

Human Evaluation

28  

§  The proposed APE system was successful in improving over the baseline MT system performance.

§  Although some APE translations were deemed as worse than the original MT output by the human evaluators, however, they were very few in numbers.

§  Manual inspection revealed that these lower quality APE translations are very similar to the original MT translations.

§  These worse translations can be avoided by adding more features (e.g., syntactic or semantic) which can also improve the overall performance of the post-editing system.

§  The presented system can easily be plugged into any state-of-the-art system and the runtime complexity is similar to that of other statistical MT systems.

Conclusions

29  

§  In future, we will try bootstrapping strategies for further tuning the model and add more sophisticated features beyond lexical level.

§  We will improve our hybrid word alignment algorithm by incorporating additional word aligners such as fastaligner, Anymaligner, etc.

§  We also want to extend the system by incorporating source knowledge as well as improving word ordering by using Kendall reordering method.

§  To consolidate the user evaluation, we will measure inter-annotator agreement.

§  We will also evaluate our system in a real-life setting in commercial environment to analyse time gain and productivity gain provided by automatic post-editing.

Future Works

30  

§  Thank you!

31