Improve machine translation accuracy using AutoML Translation

Our machine translation solutions, MTrans for Memsource and MTrans for Trados, are now compatible with Google Cloud AutoML Translation (hereinafter referred to as AutoML Translation). By combining the translation support tools Memsource or Trados with MTrans for Memsource/Trados, you will be able to utilize glossary features and style conversion functions for your unique machine translation models.

In the development of MTrans for Memsource/Trados, we conducted an evaluation of AutoML Translation. This article provides a detailed explanation of the evaluation results and AutoML Translation itself.

Table of Contents

1. What is AutoML Translation?

2. What are the benefits of AutoML Translation?

3. Data, time, and cost required for one model training

4. Translation Costs

5. Evaluation Results

6. Translation Examples and Challenges

7. Solutions to Issues

8. Best Practices

1. What is AutoML Translation?

This is a service from Google that allows you to create a custom machine translation model by training it with your unique pairs of source texts and translations for Google's general-purpose machine translation model.

2. What are the benefits of AutoML Translation?

By learning, translations suitable for specific fields are generated, reducing the time required for post-editing.

3. Data, time, and cost required for one model training

・Data: Over 1,000 pairs of original and translated sentences
・Learning Time: 2 hours or more depending on the amount of data
・Cost: $90 to $300 depending on the learning time
https://cloud.google.com/translate/automl/pricing

4. Translation Costs

0 to 500,000 characters: Free
500,000 to 250 million characters: $80 per 1 million characters
https://cloud.google.com/translate/automl/pricing

5. Evaluation Results

Comparing Google's general model with the custom model we created using the BLEU score shows an improvement in translation accuracy. (The BLEU score mechanically evaluates how similar human translations and machine translations are by comprehensively comparing them. A higher score indicates higher translation accuracy.)

General Model	39.71
Unique Model	42.89

Overall, the translation accuracy has improved, but how about the individual translations?

6. Translation Examples and Challenges

・Issue 1: The translations of the sentences included in the training data are not necessarily used as is.

In the following example, the first half of the translation uses training data, while the second half does not. However, due to what has been learned, the translation is close to the training data.

Source	That’s not right, but try it again
Translation of Learning Data	Incorrect. Please try again.
Translation of General Purpose Model	That's not it, but please try again.
Translation of Unique Model	Incorrect. Please try again.

・Issue 2: The translations of terms included in the training data are not necessarily used as is.

In the following example, I would like to translate "Layer 2" as "the second layer," but that translation is not being used.

Source	The NIC exists on the ‘Data Link Layer’ (Layer 2).
Translation of Learning Data	NIC is located at the "Data Link Layer" (Layer 2).
Translation of General Purpose Model	NIC exists at the "Data Link Layer" (Layer 2).
Translation of Unique Model	NIC exists at the "Data Link Layer" (Layer 2).

・Issue 3: Style rules are ignored.

In the following example, the style rule of using half-width spaces between full-width and half-width characters, as well as half-width parentheses, is ignored.

Source	The NIC exists on the ‘Data Link Layer’ (Layer 2).
Translation of Learning Data	NIC is located at the "Data Link Layer" (Layer 2). ("▲" indicates a half-width space)
Translation of General Purpose Model	NIC exists at the "Data Link Layer" (Layer 2).
Translation of Unique Model	NIC exists at the "Data Link Layer" (Layer 2).

7. Solutions to Issues

There are challenges that cannot be solved just by creating a unique model. The solutions for each are as follows.

・Issue 1: When you want to reuse a translation that was translated once in the past
We will use translation memory from translation support tools. Since translation memory also includes contextual information, it allows for a more reliable reuse of past translations.

・Issue 2: When you want to use specific terminology
Utilize the glossary feature. There are two types of glossary features: one provided by Google and the other built into MTrans for Memsource/Trados. For more details on Google's glossary feature, please refer to Google Cloud Translation's "Creating and Using Glossaries (Advanced Features)". As you can see from the guide at the link, using Google's glossary feature requires a very complicated procedure. The glossary feature in MTrans for Memsource/Trados is available with a simple procedure.

・Issue 3: When you want to use a specific style
Utilize the style conversion feature of MTrans for Memsource/Trados. You can insert spaces between full-width and half-width characters, specify the full-width or half-width of symbols, and unify the style to a "dearu" tone, among other things.

8. Best Practices

AutoML Translation improves the overall accuracy of machine translation, but it is not万能. Additionally, creating models requires both time and cost. Before implementing AutoML Translation, it is recommended to investigate whether there are areas for improvement in the existing environment.

The quality of translation memory particularly has a significant impact on the productivity of translators. It is important to regularly maintain the translation memory to ensure that it does not contain mistranslations, incomplete translations, or extremely outdated translations. This translation memory is used for training the AutoML Translation model. Accumulating high-quality translation memory not only enhances the productivity of translators but also prepares the training data for AutoML Translation. Conversely, if the necessary data for model training is not available, it may be premature to consider the implementation of AutoML Translation.

Also, please make sure to check the features included in existing machine translation services. If there are any unused features such as glossaries or style substitutions, please try them out. MTrans for Memsource/Trados adds glossary and style substitution features to engines like DeepL, Microsoft, and NAVER Papago, in addition to Google. By utilizing these features alone, you can improve the accuracy of machine translation without model training.

Once the various features of translation memory and machine translation services are fully utilized, it is finally time to consider the introduction of AutoML Translation.

If you are interested in AutoML Translation and MTrans for Memsource and MTrans for Trados, please contact us. We will assist you in improving your translation operations.

MTrans for Phrase TMS

https://www.science.co.jp/nmt/service/memsource.html

MTrans for Trados

https://www.science.co.jp/nmt/service/nmt.html