Some parts of this page may be machine-translated.

 

Improve machine translation accuracy using AutoML Translation

Improve machine translation accuracy using AutoML Translation

Our machine translation solutions, MTrans for Memsource and MTrans for Trados, are now compatible with Google Cloud AutoML Translation (hereinafter referred to as AutoML Translation). By combining the translation support tools Memsource or Trados with MTrans for Memsource/Trados, you can use terminology and style conversion functions for your own machine translation models.

 

We evaluated AutoML Translation for the development of MTrans for Memsource/Trados. In this article, we will explain in detail the evaluation results and AutoML Translation itself.

 

 

Table of Contents

1. What is AutoML Translation?

2. What are the benefits of AutoML Translation?

3. Data, Time, and Cost Required for One Model Training

4. Translation Cost

5. Evaluation Results

6. Translation Examples and Challenges

7. Solution to the Problem

8. Best Practices

 

 

1. What is AutoML Translation?

This is a service provided by Google that allows customers to train their own machine translation models using their own original and translated text pairs against Google's general-purpose machine translation model.

2. What are the benefits of AutoML Translation?

By learning, translation suitable for specific fields will be generated, reducing the time required for post-editing.

3. Data, Time, and Cost Required for One Model Training

・Data: Over 1,000 pairs of original and translated sentences
・Learning Time: 2 hours or more depending on the amount of data
・Cost: $90 to $300 depending on the learning time
https://cloud.google.com/translate/automl/pricing

4. Translation Cost

0-500,000 characters: Free
500,000-25,000,000 characters: $80 per 1 million characters
https://cloud.google.com/translate/automl/pricing

5. Evaluation Results

When comparing Google's general model and our original model using BLEU values, the translation accuracy has improved. (BLEU values are mechanically evaluated by comparing human-translated text and machine-translated text comprehensively to determine how similar they are. The higher the value, the higher the translation accuracy is considered to be.)

 

Generic Model 39.71
Original Model 42.89

 

Overall, the translation accuracy has improved, but what about individual translations?

6. Translation Examples and Challenges

・Issue 1: The translation of sentences included in the learning data is not always used as is.

In the following example, the first half of the translation uses the training data, but the second half does not. However, due to the training, the translation is similar to the training data.

 

Source That’s not right, but try it again
Translated Learning Data Incorrect. Please try again.
Translated text for general model That's not it, please try again.
Translated text for original model Incorrect. Please try again.

 

・Issue 2: The translation of terms included in the learning data is not always used as is.

In the following example, we want to translate "Layer 2" to "第2層", but that translation is not being used.

 

Source The NIC exists on the ‘Data Link Layer’ (Layer 2).
Translated Learning Data NIC is located in the "data link layer" (layer 2).
Translated text for general model NIC exists in the "data link layer" (layer 2).
Translated text for original model NIC exists in the "data link layer" (Layer 2).

 

・Issue 3: Style rules will be ignored.

In the following example, spaces between full-width and half-width characters and parentheses are ignored, following the style rule of using half-width characters.

 

Source The NIC exists on the ‘Data Link Layer’ (Layer 2).
Translated Learning Data NIC▲ is located in the "data link layer"▲ (layer 2) ▲.
("▲" indicates a half-width space)
Translated text for general model NIC exists in the "data link layer" (layer 2).
Translated text for original model NIC exists in the "data link layer" (Layer 2).

7. Solution to the Problem

There are challenges that cannot be solved by creating a proprietary model. The solutions for each are as follows.

 

・Issue 1: When you want to reuse a previously translated text
Use a translation memory in translation support tools. Translation memories also include contextual information, making it possible to reuse past translations more reliably.

 

・Task 2: Using specific terminology
Use the glossary function. There are two options for glossary function: one provided by Google and one integrated into MTrans for Memsource/Trados. For more information on Google's glossary function, please refer to Creating and Using Glossaries (Advanced Features) in Google Cloud Translation. As you can see from the guide, using Google's glossary function requires a very complicated process. The glossary function in MTrans for Memsource/Trados is available with a simple procedure.

 

・Task 3: Using Specific Styles
Use the style conversion function of MTrans for Memsource/Trados. You can insert spaces between full-width and half-width characters, specify full-width and half-width symbols, and also unify to the "dearu" style.

8. Best Practices

AutoML Translation improves the overall accuracy of machine translation, but it is not perfect. Also, creating models takes time and money. Before introducing AutoML Translation, it is recommended to check if there are any areas that can be improved in the existing environment.

 

Quality of translation memory has a significant impact on the productivity of translators. It is important to regularly maintain the translation memory to avoid mistranslations, unfinished translations, and extremely outdated translations. And, this translation memory is used for model training in AutoML Translation. Accumulating high-quality translation memory not only improves the productivity of translators, but also prepares training data for AutoML Translation. On the other hand, if you do not have the necessary data for model training, it may be premature to consider introducing AutoML Translation.

 

Also, please check the functions that are included in existing machine translation services. If there are any unused functions such as glossaries and style replacements, please try them out. MTrans for Memsource/Trados adds glossary and style replacement functions to not only Google, but also DeepL, Microsoft, and NAVER Papago engines. By utilizing these functions, you can improve the accuracy of machine translation without model training.

 

When all the various functions of translation memory and machine translation services are fully utilized, it is time to consider introducing AutoML Translation.

 

If you are interested in AutoML Translation and MTrans for Memsource and MTrans for Trados, please contact us. We will assist you in improving your translation tasks.

 

MTrans for Memsource

https://www.science.co.jp/nmt/service/memsource.html

 

MTrans for Trados

https://www.science.co.jp/nmt/service/nmt.html

 

 

Author Information

Takeyoshi NakayamaLanguage Solutions Department
Localization Group
Automatic Translation Team Leader

  • - Over 15 years of experience in translation and review tasks, as well as development and support for automated translation technology.
  • ・Technically improve the quality and efficiency of customer and in-house post-edit projects.
  • ・JTF Translation Festival, TC Symposium, AAMT, TAUS, LocWorld, and many other presentations in Japan and overseas.
  • - Contributed articles on machine translation to "Interpretation and Translation Journal" and "Industrial Translation Perfect Guide" (both published by Icarus Publishing).

 

 

 

Popular Article Ranking
Archive
Category

For those who want to know more about translation

Tokyo: +81-3-5321-3111
Nagoya: +81-52-269-8016

Reception hours: 9:30 AM to 5:00 PM JST

Contact Us / Request for Materials