ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Statistical Automatic Post Editing

Santanu Pal * The work has been carried out in Translated

1

§  Introduction §  Motivations §  System Description

§  Preprocessing §  Improved Word Alignment §  Hierarchical PB-SMT §  Advantage of using Hierarchical PB-SMT

§  Experiments §  Evaluations §  Conclusions

Outline

2

§  The translation provided by current MT systems often fail to deliver perfect translation output

§  to achieve sufficient quality output, translations often need to be corrected or post-edited by human translators

§  “Post-Editing” (PE) is defined as the correction by human over the translation produced by an MT system (Veale and Way 1997).

§  Often the process of improving translation provided by an MT system with a minimum of manual labor (TAUS report, 2010).

Introduction

3

§  The major goal of using APE system is §  Post-editing MT output instead of translating from scratch §  Timesaving §  Cost-effective §  reduce the effort of the human post-editor

§  In some cases, recent studies have even shown that §  The quality of MT plus PE can exceed the quality of human

translation (Fiederer and O‘Brien 2009, Koehn 2009, DePalma and Kelly, 2009)

§  Productivity (Zampieri and Vela, 2014).

Introduction(Cond…)

4

§  Many studies regarding the impacts of various factors and methods in PE §  Those were examined against the volume of PE effort. §  instead in a commercial work environment.

§  The overall purpose of the present study is to answer two fundamental questions: §  What would be the optimal design of a PE system? which

is ultimately determined by the quality of MT output in a commercial work environment

§  How can human involvement to be optimized to reduce post editing effort in a commercial work environment?

Introduction(Cond…)

5

§  The advantage of an APE system is that it can adapt any Black-box MT engine output as an input and provide possible automatic PE output with out having to retrain or re–implement the first-stage MT engine.

§  PB-SMT (Koehn et al., 2003) can be applied as APE system. (Simard et al., 2007). §  The SMT system trained on the output RBMT output and reference

human translations. §  This PB-SMT based APE system is able to correct the systematic errors

produced by RBMT system. §  This approach achieved large improvements in performance

Motivation

6

§  Translations provided by MT including: §  wrong lexical choice §  Incorrect word ordering §  word insertion §  word deletion

§  The proposed APE system based on HPB-SMT and improved hybrid alignments that is able to handle above errors

§  This method also able to correct word ordering error to some extent.

Motivation(Cond…)

7

§  APE system based on monolingual SMT system trained on MT output as the source language and reference human translations as the target language.

§  The proposed APE system is designed as follows: §  Preprocess data

§  Parallel text [MT-output and PE output] §  Monolingual data

§  Improved word alignment (Hybrid) §  Hierarchical PB-SMT

System Design

8

§  Parallel Text and Monolingual data §  Some sentences are Noisy and mixed with other

languages §  some sentences contain URLs §  The preprocessor cleans this noise by using a language

identification tool

Preprocessing

9

§  Statistical Word Aligner §  Giza++

§  implements maximum likelihood estimators for all the IBM 1-5 models and a HMM alignment model as well as Model 6

§  SymGiza++ §  Computes symmetric word alignment models with the capability to

take advantage of multi-processor systems. §  Alignment quality improves more than 17% compared to Giza++.

§  Berkeley Aligner §  Cross Expectation Maximization word aligner §  its jointly trained HMM models as a result AER reduce by 29%.

§  Edit distance based Aligner §  TER Alignment §  METEOR Alignment

Improved Word Alignment

10

§  Edit distance based Aligner §  TER Alignment

§  TER is an evaluation metric which measures the ratio between the number of edit operations required for a hypothesis H and the corresponding reference R to the total number of words in the R.

§  Can also be used as monolingual word alignment §  METEOR Alignment

Improved Word Alignment (Cond…)

11

§  METEOR Alignment §  The alignment mapping method between words of H an R, has been

build incrementally by sequence of word-mapping modules: §  Exact: maps if they are exactly the same. §  Porter stem: maps if they are the same after they are stemmed §  WN synonymy: maps if they are considered synonyms in

WordNet. §  If multiple alignment exists, Meteor selects the alignment with fewest

cross alignment links. §  The final alignment has been produced between H and R, as the

union of all stage alignments (e.g. Exact, Porter Stem and WN synonymy).


12

§  Hybridization §  Union: In union method, we consider all alignments

correct. All the alignment tables are unioned together and duplicate entries are removed.


13

§  ADD additional Alignments: §  Consider one of the alignments generated by GIZA++

GDFA (A1) or Berkeley aligner (A2), SymGiza++ (A3) as the standard alignment (SA)

§  ALGORITHM: §  Step 1: Choose a standard alignment (SA) from A1, A2

and A3. §  Step 2: Correct the alignment of SA by looking at the

alignment table of A4 and A5. §  Step 3: Find additional alignments from A2 , A3. A4 and

A5. using intersection method (A2∩A3∩A4∩A5) if A1 as SA.

§  Step 4: Add additional entries with SA.


14

§  Hierarchical PB-SMT is based on Synchronous Context Free Grammar (SCFG) (Aho and Ullman, 1969). SCFG rewrites rules on right-hand side with aligned pairs (Chiang, 2005). §  X →< γ, α, ∼>

§  X represents nonterminal, γ, α represents both terminal and nonterminal strings and

§  ∼ represents one-to-one correspondence between occurrences of nonterminal in γ and α.

Hierarchical PB-SMT

15

§  There exist two additional rules called “glue rule” or “glue grammar” : §  S →< SX,SX > §  S →< X, X >

§  These rules are used when §  no rules could match or §  the span exceeds a certain length

§  These rules simply monotonically connect translations of two adjacent blocks together.

Hierarchical PB-SMT

16

§  The hybrid word alignment provided better quality good alignment table

§  During phrase extraction, the system can automatically handle and estimate §  Word insertion error (by considering one-to-many alignment

link) §  Word deletion error (by considering many-to-one alignment

link) §  Lexical error ( estimating high lexical weighting during model

estimation) §  Word ordering (Using Hierarchical model facilitates word

ordering, because it uses formally hierarchical phrases)

Benefits

17

§  Dataset §  MateCat data contains 312K data §  After cleaning 213,795 Parallel MT-PE data

§  Training data: 211,795 §  Development set Data: 1000 §  Test set Data: 1000

§  Monolingual Data consist of PE data and Europarl clean data

Experiments

18

§  Experimental Setup §  5-gram Language Model [KenLM] §  Phrase length 7 §  Hierarchical PB-SMT [Moses]

§  Maximum Chart Span 100 §  Minimum Chart Span 20 §  use Good Turing discounting of the phrase translation

probabilities §  Filtered Phrase table for faster decoding

Experiments(Cond…)

19

§  System Tuning with Development data §  MERT

§  60% TER and 40% BLEU §  Maximum Iteration 25

§  MIRA §  Batch-mira

§  Tuned parameter for monolingual APE System MT-ITà APE-IT: §  Language Model = 0.0569997 §  Word Pnalty = 0.118199 §  PhrasePenalty = 0.127955 §  Translation Model0 = 0.148562 -0.0700695 0.275438 0.0861944 §  Translation Model1 = 0.116582

Experiments(Cond…)

20

§  The evaluation process has been carried out into two directions §  Automatic and §  Manual evaluation with 4 expert translators §  The automatic evaluation provides significant improvement

using three automatic evaluation metric such as §  BLEU, §  TER and §  METEOR.

§  Our evaluation using human judgments shows that the APEs always improve the overall translation adequacy and it improved 7% of post-edited sentences.

Evaluations

21

§  Automatic sentence level evaluation over 145 Sentences

Evaluations

22

Metric APE System better than Google translation

Google Translation better than APE System

% Improvement over 1000 sentences

% Loss over 1000 sentences

Sentence BLEU 91 54 9.1% 5.4%

§  Automatic evaluation over 1000 sentences

Evaluations

23

Metric APE System Google Translation % Relative Improvements

BLEU 63.87 61.26 4.2%

TER 28.67 30.94 7.9%

METEOR 73.63 72.73 1.2%

§  Manual Evaluation with 4 expert translators for 145 sentences, (EN=English, DE= German, FR= French, ES = Spanish, CA= Catalan, IT= Italian)

Human Evaluations

24

Qualifications of Translators Expertise Experience APE System

Google Translation

Uncertain

Translator 1 Degree in Translation EN,FR à IT. 1 years 91 22 32

Translator 2 Degree in Linguistic and Cultural Studies

EN, FR, ES, CA à IT

2 years 57 17 71

Translator 3 Degree in European Languages and Cultures

EN, FR, ES, DE à IT

1 years 72 37 36

Translator 4 Degree in Business & Administration EN à IT 1 years 65 23 58

Average 71 25 49

% Improvements

7.1% 2.5% 4.9%

0

20

40

60

80

100

Translator 1

Translator 2

Translator 3

Translator 4

APE System

Google Translation

Uncertain

Human Evaluations

25

Over all Evaluation of the 1000 sentences in total

Equal output given by MT and APE APE Choosen by Human MT Choosen by Human Uncertain

Human Evaluations

26

Human Evaluation

27

0

1

2

3

4

5

6

7

8

APE Choosen by Human

MT Choosen by Human

Uncertain

% of 145 sentences over 1000 sentence, the rest are ties between MT and APE

% of 145 sentences over 1000 sentence, the rest are ties between MT and APE

§  From 145 sentences evaluated by 4 translators the overall measure is (based on at lease one translator agree to vote a particular system) §  APE: 105 §  UN: 94 §  GT: 62

Human Evaluation

28

§  The proposed APE system was successful in improving over the baseline MT system performance.

§  Although some APE translations were deemed as worse than the original MT output by the human evaluators, however, they were very few in numbers.

§  Manual inspection revealed that these lower quality APE translations are very similar to the original MT translations.

§  These worse translations can be avoided by adding more features (e.g., syntactic or semantic) which can also improve the overall performance of the post-editing system.

§  The presented system can easily be plugged into any state-of-the-art system and the runtime complexity is similar to that of other statistical MT systems.

Conclusions

29

§  In future, we will try bootstrapping strategies for further tuning the model and add more sophisticated features beyond lexical level.

§  We will improve our hybrid word alignment algorithm by incorporating additional word aligners such as fastaligner, Anymaligner, etc.

§  We also want to extend the system by incorporating source knowledge as well as improving word ordering by using Kendall reordering method.

§  To consolidate the user evaluation, we will measure inter-annotator agreement.

§  We will also evaluate our system in a real-life setting in commercial environment to analyse time gain and productivity gain provided by automatic post-editing.

Future Works

30

§  Thank you!

31

ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015

Data & Analytics

Transcript of ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015