[Engine] Accurate Numerical Conversion! Quality Evaluation of Machine Translation Engines

"I don't know which translation engine is good."
This is a common question we hear from customers who are considering introducing machine translation.
As for methods of evaluating machine translation quality, automatic evaluations such as BLEU score and TER score are well known, but since these evaluations are done mechanically, there are often cases where the evaluation does not match the actual quality.
Therefore, at Human Science Co., Ltd., we not only conduct automatic evaluations by machines, but also evaluate by native translators to accurately assess the quality of machine translation engines.

Evaluation Method

We evaluate the output of 3-5 engines using two methods: human evaluation (evaluation by native translators) and automatic evaluation (BLEU score, TER score).

Quality evaluation by multiple native translators

The output of the MT engine will be evaluated based on five criteria: "grammar", "fluency", "terminology", "consistency", and "cultural and linguistic characteristics", using four standards: "Excellent", "Good", "Medium", and "Poor".

Excellent…Excellent quality. Comparable to human translation, with very few revisions needed.
Good…Good quality. There are some errors, but the original meaning can be understood.
Medium...Average quality. Requires correction.
Poor…Poor quality. Needs to be translated from scratch.

After machine translation, the process of post-editing is necessary to improve the quality of the translated text. The output quality is evaluated based on the following two criteria.

Can the quality be improved by post-editing?
Is it better quality to translate from scratch rather than post-edit?

Depending on the quality targeted for post-editing, the quality that can be post-edited and the quality that is better to be re-translated from scratch may differ. Therefore, we will conduct an evaluation after hearing the targeted quality.

Automatic Evaluation by Score

BLEU Score

The automatic evaluation scale that is most widely used in the evaluation of MT systems.
By comparing the reference translation (correct answer) with the translation result, the accuracy of the translation is evaluated based on similarity. A score is calculated between 0% and 100%, and the higher the score, the better the quality. A score of 50% or higher indicates good quality.

TER Score

This is an automatic evaluation scale for calculating translation error rates.
To obtain a reference translation (correct answer), the percentage of modifications (replacements, insertions, deletions, shifts) made to the translation result is calculated.
The error rate is calculated between 0% and 100%, and the lower the score, the fewer errors and the lower the post-edit load.
A value below 30% can be considered as good quality.

Quality Evaluation Analysis Report

This is a sample report of quality comparison using three machine translation engines that are submitted as deliverables in actual projects.

Quality Evaluation Report Free Download

Machine Translation
Contact Us
Request for Materials, Quotes, and Proposals
is free of charge.

Related Services

Machine Translation Evaluation Service

Machine Translation Seminar Scheduled

The machine translation seminar is held every month.
If you would like to receive seminar information emails, please register using the button below.

Seminar Information
Email Registration

Blog Writing Team

Tokuda Ai

・As a machine translation consultant, I provide consulting services for Japanese companies on the implementation and process building of machine translation.
・I also place importance on the quality of the source text, and provide consulting services for manual creation suitable for machine translation in the Japanese writing process.
・I also give presentations on the following topics related to machine translation.
- Presentation at the 23rd JTF (Japan Translation Federation) Translation Festival in 2013
"Approaches to Machine Translation in Multilingual Environments - From the Perspectives of Evaluation and Process"
- Presentation at the 2014 AAMT (Asia-Pacific Association for Machine Translation) Machine Translation Fair
"Mastering Machine Translation - Improving Quality and Productivity"