Does Generative AI Translation Accuracy Surpass Existing Machine Translations? A Comparative Study of OpenAI GPT-4.1 vs Google Translate

The quality of machine translation has improved dramatically due to advances in AI technology. In recent years, with the emergence of generative AI represented by ChatGPT, there have been increasing cases where translation results surpass traditional machine translation engines in terms of contextual understanding and naturalness of expression. This article compares and examines the results of translating Japanese business manuals using OpenAI's GPT-4.1 (a generative AI model) and Google Translate API (v2). Through translation examples and specific evaluation points, we explore the capabilities and challenges of generative AI translation and consider the possibility that translation accuracy by generative AI is beginning to exceed that of existing machine translation.

Table of Contents

1. Overview of Comparison | Conditions of Generative AI Translation and Existing Machine Translation
1-1. Models Used and Translation Conditions
1-2. Evaluation Perspectives
2. Detailed Comparison of Translation Results | Accuracy Verification of Generative AI Translation and Existing Machine Translation
2-1. Presence or Absence of Mistranslations and Omissions
2-2. Interpretation of Context and Supplementation of Omitted Parts
2-3. Accuracy of Formats Such as Symbols and Numbers
2-4. Grammatical Accuracy and Naturalness of Expression
3. Summary and Considerations | Does Generative AI Translation Surpass Existing Machine Translation Accuracy?
3-1. Comparison with Other Cases and Evaluations
3-2. Benefits and Points to Note in Utilizing Generative AI Translation
3-3. Future Prospects
4. Consultations on AI Utilization with Human Science
4-1. Proactive Use of the Latest AI Translation Tools
4-2. Extensive Translation Experience and Expertise
4-3. Complete Security Room in Our Company

1. Overview of Comparison | Conditions of Generative AI Translation and Existing Machine Translation

1-1. Models Used and Translation Conditions

In this comparison, we used the following two types of translation engines.

OpenAI GPT-4.1 (via API)

This is the latest large-scale language model released by OpenAI in April 2025. It possesses advanced text generation capabilities unique to generative AI, enabling natural and flexible translations based on contextual understanding.

Google Translate API v2

This is a conventional neural machine translation engine provided by Google. It uses the same translation engine as the web version of Google Translate via API, characterized by fast response and stable quality.

A portion of a business manual written in Japanese (including specialized terms and internal abbreviations) was machine-translated into English and Simplified Chinese, and the results were compared. The original text contains Japanese-specific ambiguities such as omissions of subjects and objects and connections using the symbol “＋”, making it a suitable material to test the capabilities of machine translation.

1-2. Evaluation Perspectives

The quality of the translation results was evaluated by professional translators from the following five perspectives. (Evaluation symbols: ◎ = Excellent, ○ = Good, △ = Somewhat problematic, × = Problematic)

・Presence of mistranslations or omissions: Whether there are mistranslations that significantly distort the original meaning or untranslated parts (missing translations).
・Supplementation of omitted information in the original text: Whether omitted subjects, objects, or contextually necessary information in the original text are appropriately supplemented in the translation.
・Accuracy of formatting: Whether there are any unnatural points in terms of appearance, such as bullet numbering, symbols, or punctuation.
・Grammatical accuracy: Whether there are any grammatical errors in the translation (such as tense mistakes or broken syntax).
・Naturalness of expression: Whether the translation avoids a literal style and reads as natural and easy to understand.

Particularly, the supplementation of omitted information from the original text and the smoothness of expression are considered strengths of generative AI translation, and these points were given special attention in the evaluation.

2. Detailed Comparison of Translation Results | Accuracy Verification of Generative AI Translation and Existing Machine Translation

We compare the translations of OpenAI GPT-4.1 and Google Translate API for each of the above evaluation criteria. First, we summarized the overall evaluation for both English and Chinese translations in a table.

Evaluation Items	OpenAI (English Translation)	Google (English Translation)	OpenAI (Chinese Translation)	Google (Chinese Translation)
Presence or Absence of Mistranslations and Omissions	Good	Good	Good	△
Supplementation of omitted original text information	Good	✕	Good	✕
Accuracy of Format	△	✕	◎	✕
Grammatical Accuracy	Good	△	Good	△	Naturalness of Expression	△	△	△	✕

*Result of evaluating the entire translation. OpenAI (GPT-4.1) generally received higher ratings, with the difference being especially notable in the Chinese translation.

Below, we explain each viewpoint with specific examples.

2-1. Presence or Absence of Mistranslations and Omissions

⇒In terms of the accuracy of conveying important information, neither engine exhibited any fatal mistranslations or significant omissions. Both GPT-4.1 and Google Translate performed generally well in grasping the overall meaning of the original text. For example, translations of numerical data and basic factual relationships were accurate, and no content present in the original text was missing. This indicates that both engines have a high level of fundamental performance.

However, when looking closely at the details, slight differences were observed in the choice of translation terms and the handling of proper nouns. Here are some examples.

・The position "店長" (store manager) in the original text was appropriately translated as "Store Manager" in the OpenAI translation. On the other hand, Google Translate rendered it simply as "Manager," generalizing the nuance of the position. OpenAI's translation is more accurate in conveying the subtle meaning.
・Regarding the katakana proper noun "ゴールデンラック," Google’s Chinese translation left it as "Golden Rack" in English, resulting in what appears to be an untranslated omission. In contrast, OpenAI’s Chinese translation chose the appropriate term "黄金货架."

As described above, although there were some nuances differences, there were hardly any mistakes that would cause a major misunderstanding of the overall meaning of the text. However, caution is needed with technical terms, and in some cases, pre-registration using a glossary or post-editing corrections may be required.

2-2. Interpretation of Context and Supplementation of Omitted Parts

⇒In terms of the ability to supplement based on contextual understanding, GPT-4.1 excelled. While OpenAI inferred and supplemented information not explicitly stated in the original text from the surrounding context to produce the translation, Google Translate tended to produce more literal translations, making the meaning harder to grasp in some cases.

In the original Japanese text, subjects and objects are omitted, and there are sentences where the connection seems ambiguous, such as "~sase + 〇〇." We compare the behavior of both engines in these parts.

・Example of complementing omitted objects:
　Original: "Conduct weekly numerical analysis every Sunday, reflect it in the sales floor by Monday opening, and create an environment that can generate sales..."
　OpenAI translation: "Conduct weekly numerical analysis every Sunday, reflect the results in the sales floor by Monday opening to create an environment that can generate sales,…"
　Google translation: "Conduct weekly numerical analysis every Sunday, and reflect the sales floor by Monday opening to create an environment that can generate sales…"

In this sentence, the phrase "売り場反映させ" originally means "to reflect (the analysis results) on the sales floor," but the object "analysis results" is omitted. In OpenAI's translation, the omitted object is inferred from the surrounding context and supplemented as "reflect the results in the sales floor". This makes the meaning of the English sentence clear and coherent. On the other hand, Google Translate renders it literally as "reflect the sales floor," making it unclear what is being reflected. This difference can be attributed to the disparity in contextual understanding between the two engines.

・Handling of the "+" Symbol:
Continuing from the above original text, the "+" symbol connects sentences in the form of "...create an environment + check the manual below...". In the OpenAI translation, this "+" was replaced with "and" according to the context, connecting the sentences smoothly as a continuous English sentence. In the Google translation, the "+" was left as is, just like in the original, resulting in an unnatural connection such as ". +" appearing within the English text. This caused the sentence to be broken and made it difficult to read.

From the above, it is clear that GPT-4.1 has a strong ability to read between the lines of the original text and provide necessary supplements. Even when the Japanese original omits subjects or objects, it is a significant advantage that the AI can consider the context and smoothly render the text into English. On the other hand, Google Translate is so faithful to the original text that it risks producing a literal translation that can be difficult to understand.
However, OpenAI's supplementation is not perfect. For example, in the Chinese translation, there were cases where GPT-4.1 tried to fill in omitted parts but ended up misunderstanding the meaning. Therefore, even with AI-generated translations, overreliance is unwise, and human verification is essential to ensure that important information is correctly translated.

2-3. Accuracy of Formats Such as Symbols and Numbers

⇒In terms of post-translation formatting, the OpenAI translation showed consistency. The Google translation exhibited various formatting issues such as inconsistent bullet numbering and missing punctuation, resulting in the need for reformatting.
In business manual documents, the handling of step numbers and symbols is also important. In this comparison, OpenAI (GPT-4.1) handled symbols and numbering relatively stably, whereas Google Translate showed noticeable inconsistency in formatting.

• Numbering in Bulleted Lists: The original text used full-width numbers like "①②③④." The OpenAI translation basically retained these numbers, but there were some items where the numbers themselves disappeared (e.g., the number was missing in the English translation of step ①). In the Google translation, although the numbers were kept, the formats were inconsistent. For example, after "①②," a different format like "3)" suddenly appeared, followed by "4." with a period, causing a visually quite disorganized appearance. Additionally, the presence or absence of spaces after the numbers was not uniform, requiring detailed corrections.

• Capitalization at the beginning of sentences and punctuation: In the OpenAI translation, there was a phenomenon where the first word of each bulleted item was capitalized (e.g., "Action" or "Store Manager"). Since in English it is customary to start words with lowercase letters except at the beginning of sentences, this aspect requires some minor corrections before using the translation as is. In contrast, the Google translation sometimes left the beginning of sentences in lowercase. Additionally, there were sentences missing a period (.) at the end despite being in English, so proofreading is necessary.

・Symbols: As mentioned above, OpenAI appropriately replaced the “+” symbol, whereas Google left the original “+” intact. Additionally, in finer formatting details such as the placement of parentheses and symbols, the OpenAI translation is more polished.

Based on the above, OpenAI's translation was evaluated as requiring less final adjustment effort in terms of formatting. However, the OpenAI translation is not perfect and still requires detailed adjustments such as correcting the capitalization issue. On the other hand, Google Translate's output is inconsistent in formatting as-is, so it will require considerable editing to compile it into a proper document.
Point: When documenting machine translation results in practical work, it is important to check not only the content of the translation but also the formatting. Although GPT-4.1 produced a relatively well-structured format this time, it is still not perfect. When using automatic translation, it is important to either use style retention features during translation if possible, or to perform formatting adjustments for headings and bullet points afterward to create a final deliverable that is easy to read.

2-4. Grammatical Accuracy and Naturalness of Expression

⇒Both engines have generally good basic grammar, but the OpenAI translation was slightly better in terms of naturalness of expression. However, in parts where the original text is lengthy and complex, both showed some unnaturalness and require post-editing.

・Grammatical Accuracy: No noticeable grammatical errors were found in GPT-4.1’s translations (both English and Chinese). Subject-verb agreement and article usage were generally correct, indicating high grammatical quality. In the English translation by Google Translate, there was one grammatical error caused by comma usage (a run-on sentence), but it was minor and did not significantly hinder overall comprehension. Judging by grammar alone, both can be considered to have passed.

・Naturalness of Expression: The OpenAI translation avoided a literal style and produced relatively polished sentences. For example, it appropriately replaced the previously mentioned "+" with a conjunction and supplemented abbreviations in the translation, showing consideration for the reader. As a result, the OpenAI version felt easier to read. On the other hand, in parts where the original text's structure itself was complex, unnaturalness remained even in the OpenAI translation. For instance, for a long sentence, GPT-4.1 translated it as a single sentence separated by semicolons. This English sentence is somewhat information-heavy and slightly difficult to read, so it would be better to divide the sentence into shorter segments here.

Google Translate's output gave an overall impression of somewhat mechanical and literal stiffness. Especially in longer sentences, it connected clauses with too many commas, causing the structure to become somewhat broken, resulting in verbose and hard-to-read translations. In the Chinese translations, unnatural expressions such as "销售领域" (an awkward translation for "sales floor") and "与服装协调" (a literal and unnatural translation for "coordinate with clothing") were observed, with expressions that would feel odd to native speakers scattered throughout. Overall, I felt that the post-editing workload was heavier for Google Translate.

As shown, GPT-4.1 produces very fluent translations for short and simple sentences, but when the text becomes longer, both engines require adjustments. Although OpenAI had a higher proportion of "ready-to-use translations," human proofreading and rewriting remain essential to ensure final quality.

Free Download

What are the 5 key points to success in Japanese-English translation projects?

Five Challenges and Solutions in Medical Translation

3. Summary and Considerations | Does Generative AI Translation Surpass Existing Machine Translation Accuracy?

From the comparison so far, the result is that OpenAI GPT-4.1 translation is overall of higher quality than the Google Translate API. GPT-4.1 was especially superior in "omission completion," "format unification," and "smoothness of expression." However, it should be noted that post-editing is indispensable for practical use with both, and fully automatic high-quality translation cannot be obtained.

Below, we will consider the current positioning of translation engines based on other cases, the advantages and points to note when utilizing generative AI translation, and future prospects.

3-1. Comparison with Other Cases and Evaluations

Even beyond this case, there have been increasing reports in recent years that ChatGPT (GPT-4) surpasses Google Translate in translation ability. For example, in another internal test, GPT-4.1 demonstrated accuracy comparable to or exceeding multiple translation engines, including DeepL. In fact, there was a case where only GPT-4.1 accurately captured the meaning of a particularly difficult English sentence (DeepL and other models produced some mistranslations or omissions). In this way, the latest GPT models are beginning to deliver quality on par with the top existing translation engines.

On the other hand, there are still cases where third-party evaluations highly rate DeepL. In external translation comparisons, rankings such as "DeepL > ChatGPT > Google Translate" are sometimes given, indicating that strengths and weaknesses vary depending on the field and language. However, a distinctive feature of ChatGPT-based models is that quality can be improved through prompt instructions. For example, there have been reports of significantly increased accuracy by providing instructions on terminology and writing style. As the saying goes, "Translation evolves through 'instructions'," the unique aspect of generative AI translation is that its performance can be drawn out depending on how it is used.

Overall, there is no doubt that generative AI translation is catching up with existing machine translation engines. Especially for languages like Japanese, which heavily depend on context, context-aware GPT models tend to demonstrate their strengths more effectively.

[Reference Blogs]

>How Accurate Is the Translation of OpenAI's New Model GPT-4.1? A Comparative Examination with DeepL!
>ChatGPT and DeepL②: Comparison of Japanese-English Translation Accuracy in Manufacturing and IT Fields — Does Translation Evolve Through "Instructions"? A Thorough Examination of Accuracy Comparison and Improvement Methods

3-2. Advantages and Considerations of Using Generative AI Translation

Advantages: The greatest strength of generative AI models (such as GPT-4.1) in translation lies in their flexible rendering that captures context and nuance. Sections that tended to be translated literally by conventional engines can be paraphrased appropriately or supplemented with necessary information by ChatGPT. Another attractive feature is the ease of style adjustment. You can specify tone, such as casual or formal, or give detailed instructions like using or avoiding technical terms, and these are reflected accordingly, making it easier to obtain translations suited to the purpose. Furthermore, the same model can handle related tasks beyond translation, such as summarization, proofreading, and glossary creation, offering potential to contribute to the digital transformation (DX) of the entire translation process. In fact, there have been reports of significant efficiency improvements in translation workflows following the introduction of generative AI.

Points to note: On the other hand, generative AI has an issue known as hallucination. In other words, there is a risk that it will generate content that is not present in the original text, making it appear plausible. Although no major hallucinations were observed in this test, there have been past cases where ChatGPT added supplementary explanations on its own during translation. Special caution is needed when handling numbers and proper nouns, and it is safer to cross-check important data after translation. Additionally, generative AI may not produce consistent responses even with the same input. Since translations can subtly change due to version updates or the internal state at the time, careful measures are required to strictly unify terminology and style throughout the entire document (for example, by providing rules to fix terminology in the prompt if necessary).

Additionally, usage costs and limitations should also be considered. When using GPT-4 via API, charges are incurred based on the number of characters. Processing time is also longer than traditional engines, making it unsuitable for instant translation of large volumes of documents. From an information security perspective, it is important to be sensitive to the fact that internal company document data entered may be transmitted to external AI services. When handling confidential information, it is essential to use a paid API for business purposes and to enter into a contract that ensures the data will not be used for training (avoid translating internal documents using the free ChatGPT web version).

Based on the above, generative AI translation is a powerful tool, but it requires the knowledge to use it correctly.

3-3. Future Prospects

In this comparison, we were able to experience the high performance of generative AI translation (GPT-4.1). Going forward, this trend is expected to strengthen even further. Not only OpenAI and Google, but many companies are continuously releasing new high-performance language models. Google's new model Gemini, announced at the end of 2023, has also attracted attention for its multilingual support, intensifying the competition.

However, perfect translations cannot yet be obtained automatically at this stage, and human involvement remains crucial. In both OpenAI's and Google's translations, the texts only became practical documents after post-editing and rewriting. The rise of generative AI does not mean that "translators' jobs will become unnecessary"; rather, translators will be increasingly required to ensure final quality through advanced review and editing. From this comparison as well, we strongly felt the potential for efficiency through collaboration between AI and humans.

Summary: Although generative AI translation is still developing, its capabilities are becoming comparable to existing translation engines. If used skillfully, a dramatic increase in translation work efficiency is not out of reach. On the other hand, it is also true that high-quality translation by AI alone is difficult at this point. That is why it is important to leverage the strengths of both humans and AI through hybrid translation. We encourage everyone to try the latest translation AI and experience its potential and limitations firsthand.

4. Consultations on AI Utilization with Human Science

If you have any concerns about utilizing the latest technologies such as translation AI—questions like "How should we implement it?" or "Can it be used for our documents?"—please feel free to consult Human Science. Based on our long-standing experience supporting business translation, we offer comprehensive support ranging from the introduction to the operation of AI translation tools. Below, we introduce the features of our services.

4-1. Proactive Use of the Latest AI Translation Tools

At Human Science, we achieve highly accurate and fast translations even for business documents with many technical terms by combining AI translation with translation memory technology. We select the optimal engine according to the customer's needs and utilize it effectively.

We also offer our in-house developed "MTrans" series of tools, which incorporate the latest AI engines such as ChatGPT (GPT-4.1) and DeepL.

>> Human Science's AI Translation Tool Provision Services

4-2. Extensive Translation Experience and Expertise

Since our founding in 1985, we have over 35 years of experience in manual translation as well as specialized translation in the IT and medical fields. Our experienced staff leverage their expertise in translation processes and quality control to provide optimal advice on utilizing AI translation. Through a hybrid translation system combining "AI × Human," we have contributed to improving operational efficiency for many clients.

>>Translation Services from Human Science

4-3. Complete Security Room in Our Company

To ensure that even highly confidential documents can be entrusted safely, we have established a security room within our company and implement thorough information management. Translation data is handled in dedicated databases for each customer, creating an environment that prevents leakage to third parties. Regarding the use of AI translation engines, by conducting it via API, we eliminate the risk of data being stored externally.

>> Human Science's Information Management

As described above, our company can propose optimal translation solutions tailored to your needs. It's perfectly fine if you are just at the stage of "wanting to hear more first." From AI translation implementation support to specialized field translation services, please feel free to contact Human Science!

Free Download

What are the 5 key points to success in Japanese-English translation projects?