Some parts of this page may be machine-translated.

 

  • Translation Service: HOME
  • Blog
  • Has the accuracy of medical translation improved? - A comparison between DeepL and Google Translate in 2020 and 2023

Has the accuracy of medical translation improved? - A comparison between DeepL and Google Translate in 2020 and 2023

Has the accuracy of medical translation improved? - A comparison between DeepL and Google Translate in 2020 and 2023

In September 2020, we conducted machine translation using DeepL and Google for various medical documents on our blog "How accurate is DeepL in medical translation? Verification results for CIOMS, ICF, IB, etc. (https://www.science.co.jp/nmt/blog/21613/)" and evaluated and verified them using automatic evaluation BLEU scores and manual evaluation.

 

The evaluation method for the previous verification will be described again.

Language Pair: English → Japanese

Target Documents: White Papers, Manuals (Medical Devices), CIOMS, ICF (Informed Consent Document), IB (Investigational Drug Summary), and Papers (6 types)

Evaluation Volume: Approximately 1,000 words per type (approximately 50 sentences per type)

Evaluation Criteria: Automatic Evaluation BLEU Score and Manual Evaluation

 

After about two and a half years have passed (as of March 2023), we have conducted translations of the exact same text using DeepL and Google Translate to examine any changes that have occurred. In this article, we will introduce the results of this examination.

Table of Contents

Some improvements have been made at the technical terminology level.

In medical documents, there are unique specialized terms. It is a typical (and fatal) mistranslation for them to be translated into common terms. Also, in my opinion, not limited to medical fields, there may be a tendency for general terms to be translated into IT terms in machine translation.

 

In this translation, improvements were observed in those areas.

 

Improvements at the technical terminology level were relatively common in CIOMS's Google translation.

Case correction (Correction of case)
"Correction of capitalization and lowercase" becomes "Case correction"
Outcome is Unknown (Outcome: Unknown)
"Outcome is unknown" becomes "Outcome is unknown"
Attend (Attend)
"Participate" becomes "Attended"

By the way, these were correctly translated by DeepL as of 2020.

 

As an example of terminology level improvements in Google translation for documents other than CIOMS, the following are available.

White Paper
highlighted within the analysis
In 2020, Google used "highlighting" but in 2023 it will be changed to "emphasis".

ICF
Monotherapy
In 2020, Google used "monotherapy" but in 2023 it will be "single-agent therapy"

 

The above white papers and ICF terminology were accurately translated by DeepL as of 2020. In the 2020 manual evaluation, DeepL received higher ratings overall, possibly due to the relatively accurate translation of specialized terminology.

However, there are also areas that have not been improved.

Let's take a look at CIOMS translation.

Seriousness: serious (Severity: severe) In 2020, both Google and DeepL mistranslated "seriousness" as "severity".

In 2023, both Google and DeepL did not accurately translate as "Severity: Serious".

 

"Narrative", which is translated as "物語" in Google in 2020 and 2023, and as "ナラティブ" in DeepL in 2020 and 2023, also refers to case descriptions and case progress. Additionally, "Listedness: unlisted" was translated as "上場:非上場" in Google in 2020 and 2023, and as "掲載性:非掲載" in DeepL in 2020 and "Listedness: 未記載" in 2023.

 

As such, while improvements in technical terminology translation can be confirmed to some extent, it is still considered limited.

 

By the way, when comparing the sentences from 2020 and 2023, the number of changes is quite significant. As an example, when translating a white paper using DeepL, there were 182 changes in a document of about 1600 characters (based on the number of changes calculated using Word's document comparison function).

 

However, the main changes were the reordering of phrases and many insignificant changes that did not affect the overall meaning.

"Investigated" → "Considered"
"And" → "And"
"Induced" → "Guided"
"Regarding the use of" → "Regarding the usage of",

These changes can be understood as long as the reader is limited to internal personnel, so it doesn't matter which one is used, for example, "investigated" or "considered", or if there is a more appropriate translation depending on the context and background. In the future, it is unclear if we can achieve this level of improvement through the advancement of machine translation, and it can be said that these are exactly the areas that should be properly handled in the post-editing process.

 

As a result, the differences between 2020 and 2023 are mainly as shown above, and while improvements can be seen at the technical terminology level, it is not enough to completely overwrite the previous translation evaluation score, so it seems that a strict improvement process through post-editing is still necessary.

 

From here on, I would like to explain in detail about each specific point.

Regarding style and terminology consistency

"Desu-masu style" and "dearu style":
In CIOMS, IB, and academic papers, "dearu style" is appropriate, while in ICF, white papers, and manuals, "desu-masu style" is appropriate. So, how was the result of machine translation?

 

In Google Translate, both in 2020 and 2023, the predominant tone in all documents was "desu-masu". There were a few instances of "de-aru" mixed in, but it was likely due to errors or the appropriate use of terminology within the text.

 

At DeepL, in 2020 and through 2023, CIOMS used "である調" (appropriate), IB used a mix of "である調" and "ですます調", ICF used "ですます調" (appropriate), white papers used "ですます調" (appropriate, but with some instances of "である調" mixed in), manuals used "ですます調" (appropriate), and papers used "である調" (appropriate, but with some instances of "ですます調" mixed in).

 

Of course, unification to either "dearu-style" or "desumasu-style" will be done in the post-editing process. By the way, in the 2020 personnel evaluation, "dearu-style" and "desumasu-style" were removed from the evaluation criteria and verified.

 

Half-width spaces and half-width parentheses:

It is unclear whether this was a coincidence or a decision made by the developers, but in the 2023 version of Google Translate, half-width spaces were added before and after alphanumeric characters, and parentheses and colons were also changed to half-width. However, there were a few instances where there were no half-width spaces, which may be an error.

Unified Terminology: Comparison between 2020 and 2023, can it be unified within the document?

As typical examples of terms that we want to unify in medical translation, there are "cancer" and "carcinoma".

After checking the original text in IB, which mentioned "Cancer" in six places, the following results were obtained.

Google Translate 2020 "cancer" 2 cases, "cancer" 4 cases → 2023 "cancer" 4 cases, "cancer" 2 cases

DeepL 2020 "Cancer" 5 cases, "Carcinoma" 0 cases → 2023 "Cancer" 5 cases, "Carcinoma" 0 cases (Note: The remaining one case was "cancer" within the organization name, so it was written as is in the original text.)

 

In addition, there were three instances of the term "signature" (translated as "molecular signature" or "molecular signature" or "signature") in the white paper, so I also investigated whether it was translated as "signature", "signature", or "signature".

 

Google Translate 2020 "Signature" 0 cases, "Signature" 1 case, "Signature" 2 cases → 2023 "Signature" 0 cases, "Signature" 2 cases, "Signature" 1 case
DeepL 2020 "Signature" 1 case, "Signature" 2 cases → 2023 "Signature" 2 cases, "Signature" 0 cases, "Signature" 1 case

 

Standardization of terminology and notation is one of the most important items that should be done by post-editing. Even though this verification was limited in scope, the only thing that was not unified in terms of terminology and notation was "cancer" in DeepL.

 

What is an invalid sentence?

Although there are few, there were also some translations that did not make sense.

Some common points include a large number of words (with an average sentence length of over 40 words) and multiple parentheses and slashes within the sentence. Examples of such sentences are as follows.

 

The optimized molecular HRD signature from Study AAA (BBB) was prospectively applied to the primary analysis of Study CCC (DDD), an ongoing, randomized, double-blind, Phase 3 study of eee versus placebo as switch maintenance treatment in patients with platinum-sensitive, relapsed, high-grade ovarian cancer (n = fff enrolled patients).

Note: AAA (BBB) and CCC (DDD) are the names of the trials. eee is the name of the drug. fff is the number of patients.

 

This type of sentence is always found in clinical trial-related documents and papers. It can also be considered as a difficult sentence for pre-editing, which is done in advance to make it easier for machine translation.

In 2020 and 2023, Google Translate was unable to create a translated version of the text.

At DeepL, the sentences were valid, but in any case, whether it is post-editing or human translation, it is necessary to analyze the original text and consider the context and background.

In addition, if there are similar existing translations that can be reused, it may be more effective to use a translation support tool that utilizes translation memory rather than translating from scratch each time.

Summary

In 2020 and 2023, when the same text was put through machine translation (using Google and DeepL), there were significant changes. Specifically, there were some improvements in terms of fatal mistranslations at the level of specialized terminology, but overall, it was not as impressive as expected.

 

It is not surprising that the technology related to AI, including machine translation, could have a further significant development, without even mentioning Chat GPT as an example. However, from the re-evaluation of this machine translation, it gave me the impression that post-editing is still necessary for improvement at the current stage. In addition, for documents with high complexity and expertise, such as in the medical field, there is a possibility to use translation support tools that utilize translation memory as a viable option. Furthermore, there are still situations where human translation is effective.

 

At our company, we also offer machine translation solutions and post-editing services that allow for consistent use of polite language. If you have any concerns or interests, please feel free to contact us.

 

Machine Translation Solutions (MTrans for Phrase TMS/MTrans for Trados)
https://www.science.co.jp/nmt/service/memsource.html
https://www.science.co.jp/nmt/service/nmt.html

 

Post-editing Services
https://www.science.co.jp/nmt/service/postedit.html

 

Medical Translation Services
https://www.science.co.jp/localization/industry/medical/index.html

Popular Article Ranking
Archive
Category

For those who want to know more about translation

Tokyo: +81-3-5321-3111
Nagoya: +81-52-269-8016

Reception hours: 9:30 AM to 5:00 PM JST

Contact Us / Request for Materials