How Accurate Is Translation with Generative AI? Comparing ChatGPT, GNMT, DeepL, ChatGPT, Gemini, and Claude!

This time, we will clearly introduce the recently popular topic of "translation using generative AI."

For those who wonder, "AI translation seems convenient, but how accurate is it really?" or "What points should I be careful about when using it?", we have summarized everything from basic information to benefits and precautions.

Furthermore, in the latter part of the blog, we will compare the translation accuracy of each service and provide a detailed explanation. Please use this as a reference to understand which tool is suitable for which situation.

Table of Contents

1. What is Generative AI?
- 1-1. Basic Explanation of Generative AI
2. Points to Note When Using Generative AI
- 2-1 Security and Privacy Risks
3. Benefits of Using Generative AI in Translation Work
4. What is the Translation Accuracy of Generative AI?

1. What is Generative AI?

1-1. Basic Explanation of Generative AI

Generative AI refers to artificial intelligence that "generates" natural text, images, audio, and more, similar to humans.

Representative tools include conversational AI like ChatGPT and image-generating AI like DALL·E. These are increasingly used in various business scenarios such as answering questions, translating and summarizing texts, and drafting proposals.

Traditional AI mainly processes tasks such as "recognition," "classification," and "prediction" based on data provided by humans, and it excels at deriving correct answers or patterns close to the correct answers from past data.

On the other hand, generative AI excels at understanding context and nuances from vast amounts of data and producing creative outputs as if conceived by humans. While traditional AI is strong at "choosing the correct answer," generative AI is skilled at "creating new things," enabling more flexible and human-like communication, which is a major characteristic.

2. Points to Note When Using Generative AI

While generative AI is convenient, it is necessary to use it with caution.

Especially in translation work, it is important to understand the following risks.

2-1 Security and Privacy Risks

When using generative AI for translation, be careful not to input confidential or personal information.

In some AI services, the inputted content may be used as training data, which carries the risk of information leakage. When using these services for business purposes, it is important to check your company's information management rules and the terms of service of the AI service in advance.

3. Benefits of Using Generative AI in Translation Work

By utilizing generative AI for translation, various benefits can be gained in terms of work efficiency and cost.

3-1 Reduction of Working Time

Generative AI can perform translations at high speed, making it possible to complete deliverables in a shorter time compared to traditional manual translation.

Especially when handling a large volume of text, a significant reduction in working time can be expected.

3-2 Cost Reduction

By reducing the workload of translators and decreasing the number of requests to external translation services, it leads to a reduction in translation costs.

If simple translations can be completed in-house, it also contributes to improving overall operational efficiency.

3-3 Efficiency in Multilingual Support

Generative AI also supports simultaneous translation into multiple languages, providing significant assistance to companies expanding globally.

The preparation of materials and websites for overseas markets can also be handled more quickly and flexibly than before.

4. What is the Translation Accuracy of Generative AI?

4-1. Differences Between Neural Machine Translation and Generative AI Translation

Traditional Neural Machine Translation (NMT) utilizes AI (artificial intelligence) to predict the optimal translation based on a large amount of training data. Its characteristic is improved accuracy through continuous learning, excelling in grammatical and vocabulary precision.

However, it is difficult to consistently maintain the overall context and style of the document, which can result in unnatural expressions.

On the other hand, generative AI is based on large language models (LLM) that learn from vast amounts of text data and generate natural sentences while understanding the context. It translates by considering the meaning and flow of the entire sentence rather than on a word-by-word basis, making it adept at human-like expressions and natural phrasing.

However, there is a risk that overly free translation may alter the original meaning, and mistranslation of technical terms can also occur, so it is important to use them appropriately according to the purpose.

This time, we checked 200 sentences each from five types of machine translation engines and compared the number of occurrences of "critical errors that significantly affect comprehension."

4-2. Overview of the Evaluation Method

Evaluation Method

We conduct automated evaluations using an evaluation tool developed with our in-house LLM.

* This time, we did not review the evaluation results by translators and only compared the mechanically detected results.

Number of sentences evaluated

Each engine: 200 sentences

Types of errors checked

Omission
Mistranslation
Unnatural expressions
Grammar errors
Formatting errors

Furthermore, among the above, the AI redefines those that have a significant impact on meaning comprehension and business operations as "critical errors."

4-3. Comparison of Total Number of Errors

First, let's look at the total number of errors detected by each engine.

Engine	Total number of errors
Google	80
DeepL	58
ChatGPT	54
Gemini	94
Claude	65

The engine with the fewest overall errors was ChatGPT, followed by DeepL. On the other hand, Gemini showed a higher total number of errors compared to the other engines.

At first glance, it may seem that ChatGPT, with the fewest errors, is the best, but that is not necessarily the case.

Next, we will look at the quality of errors.

4-4. Trends by Error Type

1. Omission of Translation

Engine	Omission
Google	17
DeepL	20
ChatGPT	31
Gemini	42
Claude	15

ChatGPT and Gemini tended to have more omissions in translation.

In particular, Gemini frequently has partial information omissions, so caution is necessary for specifications, contracts, and the like.

2. Mistranslation

Engine	Mistranslation
Google	23
DeepL	29
ChatGPT	11
Gemini	32
Claude	19

ChatGPT had the fewest mistranslations in the results.

On the other hand, DeepL and Gemini have slightly more errors of the meaning-misinterpretation type.

3. Unnatural Translations

Engine	Unnatural expressions
Google	24
DeepL	5
ChatGPT	6
Gemini	12
Claude	23

The naturalness of DeepL and ChatGPT stands out as a remarkable result. Google and Claude often produced cases where "the meaning is correct, but the Japanese is stiff/unnatural."

4. Grammar and Format Errors

Engine	Grammar errors	Formatting errors
Google	6	7
DeepL	1	1
ChatGPT	1	4
Gemini	2	6
Claude	2	5

There are generally few grammatical or formal errors,

It can be seen that current major MT systems are quite strong in superficial sentence structure.

4-5. Focusing on "Fatal Errors"

Next, let's look at the most important metric, "Fatal Errors."

Engine	Fatal errors	Per 200 sentences
Google	8	4%
DeepL	17	8.5%
ChatGPT	14	7%
Gemini	30	15%
Claude	15	7.5%

The impression changes significantly here.

Google has a high total number of errors, but the fewest critical errors.
Gemini has an exceptionally high rate of critical errors.
DeepL, Claude, and ChatGPT have high expression quality but a certain number of critical errors.

In other words,

Superficial naturalness ≠ Accuracy

has appeared as numerical data.

*This verification is calculated based on the results of mechanical quality checks using AI by an LLM proofreading tool. Since the quality check results themselves have not been verified, there is a possibility that the actual quality trends may differ.

4-6. Conclusion

The results this time revealed the tendencies of each tool.

In particular, it was found that even when it appears to be translated naturally and accurately, there may be many hidden critical errors.

While machine translation technology, led by generative AI, has made significant advances, it can be said that relying solely on machine translation without human review poses a major risk, especially in fields such as manuals and contracts where accuracy must be ensured.

If accuracy is required, verification and correction by a post-editor are considered essential.

On the other hand, if the purpose is simply to grasp the general meaning, it is also possible to complete the process using only machine translation.

It is important to build the optimal workflow according to the purpose of the translation.

Related Services

Translation Services
Post-Editing and Operation Support Services
Machine Translation / Automatic Translation