ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
-
Upload
riilp -
Category
Data & Analytics
-
view
98 -
download
1
Transcript of ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
Statistical Automatic Post Editing
Santanu Pal * The work has been carried out in Translated
1
§ Introduction § Motivations § System Description
§ Preprocessing § Improved Word Alignment § Hierarchical PB-SMT § Advantage of using Hierarchical PB-SMT
§ Experiments § Evaluations § Conclusions
Outline
2
§ The translation provided by current MT systems often fail to deliver perfect translation output
§ to achieve sufficient quality output, translations often need to be corrected or post-edited by human translators
§ “Post-Editing” (PE) is defined as the correction by human over the translation produced by an MT system (Veale and Way 1997).
§ Often the process of improving translation provided by an MT system with a minimum of manual labor (TAUS report, 2010).
Introduction
3
§ The major goal of using APE system is § Post-editing MT output instead of translating from scratch § Timesaving § Cost-effective § reduce the effort of the human post-editor
§ In some cases, recent studies have even shown that § The quality of MT plus PE can exceed the quality of human
translation (Fiederer and O‘Brien 2009, Koehn 2009, DePalma and Kelly, 2009)
§ Productivity (Zampieri and Vela, 2014).
Introduction(Cond…)
4
§ Many studies regarding the impacts of various factors and methods in PE § Those were examined against the volume of PE effort. § instead in a commercial work environment.
§ The overall purpose of the present study is to answer two fundamental questions: § What would be the optimal design of a PE system? which
is ultimately determined by the quality of MT output in a commercial work environment
§ How can human involvement to be optimized to reduce post editing effort in a commercial work environment?
Introduction(Cond…)
5
§ The advantage of an APE system is that it can adapt any Black-box MT engine output as an input and provide possible automatic PE output with out having to retrain or re–implement the first-stage MT engine.
§ PB-SMT (Koehn et al., 2003) can be applied as APE system. (Simard et al., 2007). § The SMT system trained on the output RBMT output and reference
human translations. § This PB-SMT based APE system is able to correct the systematic errors
produced by RBMT system. § This approach achieved large improvements in performance
Motivation
6
§ Translations provided by MT including: § wrong lexical choice § Incorrect word ordering § word insertion § word deletion
§ The proposed APE system based on HPB-SMT and improved hybrid alignments that is able to handle above errors
§ This method also able to correct word ordering error to some extent.
Motivation(Cond…)
7
§ APE system based on monolingual SMT system trained on MT output as the source language and reference human translations as the target language.
§ The proposed APE system is designed as follows: § Preprocess data
§ Parallel text [MT-output and PE output] § Monolingual data
§ Improved word alignment (Hybrid) § Hierarchical PB-SMT
System Design
8
§ Parallel Text and Monolingual data § Some sentences are Noisy and mixed with other
languages § some sentences contain URLs § The preprocessor cleans this noise by using a language
identification tool
Preprocessing
9
§ Statistical Word Aligner § Giza++
§ implements maximum likelihood estimators for all the IBM 1-5 models and a HMM alignment model as well as Model 6
§ SymGiza++ § Computes symmetric word alignment models with the capability to
take advantage of multi-processor systems. § Alignment quality improves more than 17% compared to Giza++.
§ Berkeley Aligner § Cross Expectation Maximization word aligner § its jointly trained HMM models as a result AER reduce by 29%.
§ Edit distance based Aligner § TER Alignment § METEOR Alignment
Improved Word Alignment
10
§ Edit distance based Aligner § TER Alignment
§ TER is an evaluation metric which measures the ratio between the number of edit operations required for a hypothesis H and the corresponding reference R to the total number of words in the R.
§ Can also be used as monolingual word alignment § METEOR Alignment
Improved Word Alignment (Cond…)
11
§ METEOR Alignment § The alignment mapping method between words of H an R, has been
build incrementally by sequence of word-mapping modules: § Exact: maps if they are exactly the same. § Porter stem: maps if they are the same after they are stemmed § WN synonymy: maps if they are considered synonyms in
WordNet. § If multiple alignment exists, Meteor selects the alignment with fewest
cross alignment links. § The final alignment has been produced between H and R, as the
union of all stage alignments (e.g. Exact, Porter Stem and WN synonymy).
Improved Word Alignment (Cond…)
12
§ Hybridization § Union: In union method, we consider all alignments
correct. All the alignment tables are unioned together and duplicate entries are removed.
Improved Word Alignment (Cond…)
13
§ ADD additional Alignments: § Consider one of the alignments generated by GIZA++
GDFA (A1) or Berkeley aligner (A2), SymGiza++ (A3) as the standard alignment (SA)
§ ALGORITHM: § Step 1: Choose a standard alignment (SA) from A1, A2
and A3. § Step 2: Correct the alignment of SA by looking at the
alignment table of A4 and A5. § Step 3: Find additional alignments from A2 , A3. A4 and
A5. using intersection method (A2∩A3∩A4∩A5) if A1 as SA.
§ Step 4: Add additional entries with SA.
Improved Word Alignment (Cond…)
14
§ Hierarchical PB-SMT is based on Synchronous Context Free Grammar (SCFG) (Aho and Ullman, 1969). SCFG rewrites rules on right-hand side with aligned pairs (Chiang, 2005). § X →< γ, α, ∼>
§ X represents nonterminal, γ, α represents both terminal and nonterminal strings and
§ ∼ represents one-to-one correspondence between occurrences of nonterminal in γ and α.
Hierarchical PB-SMT
15
§ There exist two additional rules called “glue rule” or “glue grammar” : § S →< SX,SX > § S →< X, X >
§ These rules are used when § no rules could match or § the span exceeds a certain length
§ These rules simply monotonically connect translations of two adjacent blocks together.
Hierarchical PB-SMT
16
§ The hybrid word alignment provided better quality good alignment table
§ During phrase extraction, the system can automatically handle and estimate § Word insertion error (by considering one-to-many alignment
link) § Word deletion error (by considering many-to-one alignment
link) § Lexical error ( estimating high lexical weighting during model
estimation) § Word ordering (Using Hierarchical model facilitates word
ordering, because it uses formally hierarchical phrases)
Benefits
17
§ Dataset § MateCat data contains 312K data § After cleaning 213,795 Parallel MT-PE data
§ Training data: 211,795 § Development set Data: 1000 § Test set Data: 1000
§ Monolingual Data consist of PE data and Europarl clean data
Experiments
18
§ Experimental Setup § 5-gram Language Model [KenLM] § Phrase length 7 § Hierarchical PB-SMT [Moses]
§ Maximum Chart Span 100 § Minimum Chart Span 20 § use Good Turing discounting of the phrase translation
probabilities § Filtered Phrase table for faster decoding
Experiments(Cond…)
19
§ System Tuning with Development data § MERT
§ 60% TER and 40% BLEU § Maximum Iteration 25
§ MIRA § Batch-mira
§ Tuned parameter for monolingual APE System MT-ITà APE-IT: § Language Model = 0.0569997 § Word Pnalty = 0.118199 § PhrasePenalty = 0.127955 § Translation Model0 = 0.148562 -0.0700695 0.275438 0.0861944 § Translation Model1 = 0.116582
Experiments(Cond…)
20
§ The evaluation process has been carried out into two directions § Automatic and § Manual evaluation with 4 expert translators § The automatic evaluation provides significant improvement
using three automatic evaluation metric such as § BLEU, § TER and § METEOR.
§ Our evaluation using human judgments shows that the APEs always improve the overall translation adequacy and it improved 7% of post-edited sentences.
Evaluations
21
§ Automatic sentence level evaluation over 145 Sentences
Evaluations
22
Metric APE System better than Google translation
Google Translation better than APE System
% Improvement over 1000 sentences
% Loss over 1000 sentences
Sentence BLEU 91 54 9.1% 5.4%
§ Automatic evaluation over 1000 sentences
Evaluations
23
Metric APE System Google Translation % Relative Improvements
BLEU 63.87 61.26 4.2%
TER 28.67 30.94 7.9%
METEOR 73.63 72.73 1.2%
§ Manual Evaluation with 4 expert translators for 145 sentences, (EN=English, DE= German, FR= French, ES = Spanish, CA= Catalan, IT= Italian)
Human Evaluations
24
Qualifications of Translators Expertise Experience APE System
Google Translation
Uncertain
Translator 1 Degree in Translation EN,FR à IT. 1 years 91 22 32
Translator 2 Degree in Linguistic and Cultural Studies
EN, FR, ES, CA à IT
2 years 57 17 71
Translator 3 Degree in European Languages and Cultures
EN, FR, ES, DE à IT
1 years 72 37 36
Translator 4 Degree in Business & Administration EN à IT 1 years 65 23 58
Average 71 25 49
% Improvements
7.1% 2.5% 4.9%
0
20
40
60
80
100
Translator 1
Translator 2
Translator 3
Translator 4
APE System
Google Translation
Uncertain
Human Evaluations
25
Over all Evaluation of the 1000 sentences in total
Equal output given by MT and APE APE Choosen by Human MT Choosen by Human Uncertain
Human Evaluations
26
Human Evaluation
27
0
1
2
3
4
5
6
7
8
APE Choosen by Human
MT Choosen by Human
Uncertain
% of 145 sentences over 1000 sentence, the rest are ties between MT and APE
% of 145 sentences over 1000 sentence, the rest are ties between MT and APE
§ From 145 sentences evaluated by 4 translators the overall measure is (based on at lease one translator agree to vote a particular system) § APE: 105 § UN: 94 § GT: 62
Human Evaluation
28
§ The proposed APE system was successful in improving over the baseline MT system performance.
§ Although some APE translations were deemed as worse than the original MT output by the human evaluators, however, they were very few in numbers.
§ Manual inspection revealed that these lower quality APE translations are very similar to the original MT translations.
§ These worse translations can be avoided by adding more features (e.g., syntactic or semantic) which can also improve the overall performance of the post-editing system.
§ The presented system can easily be plugged into any state-of-the-art system and the runtime complexity is similar to that of other statistical MT systems.
Conclusions
29
§ In future, we will try bootstrapping strategies for further tuning the model and add more sophisticated features beyond lexical level.
§ We will improve our hybrid word alignment algorithm by incorporating additional word aligners such as fastaligner, Anymaligner, etc.
§ We also want to extend the system by incorporating source knowledge as well as improving word ordering by using Kendall reordering method.
§ To consolidate the user evaluation, we will measure inter-annotator agreement.
§ We will also evaluate our system in a real-life setting in commercial environment to analyse time gain and productivity gain provided by automatic post-editing.
Future Works
30
§ Thank you!
31