Since its release in March 2020, DeepL's Japanese version has been gaining attention for its high translation accuracy.
At Human Science Co., Ltd., we are conducting verifications in various fields such as IT, general business, manufacturing, and medical/pharmaceutical.
In a previous article, we shared the comparison results between DeepL and Google engines for business emails and new drug application documents.
This time, we have verified not only through automated evaluation, but also through manual evaluation, targeting six types of documents in the medical and pharmaceutical fields: white papers, manuals (medical devices), CIOMS, ICF, IB, and papers.
Table of Contents
2. Results of Automatic Evaluation BLEU Score
3. Results of Personnel Evaluation
1. Evaluation Method
Language Pair: English → Japanese
Target Documents: White Papers, Manuals (Medical Devices), CIOMS, ICF, IB, and Papers (6 types)
Evaluation Volume: Approximately 1,000 words per type (approximately 50 sentences per type)
Evaluation Criteria: Automatic Evaluation BLEU Score and Manual Evaluation
2. Results of Automatic Evaluation BLEU Score
The automatic evaluation BLEU score resulted in different results for each document.
・White Papers, IB, Papers: DeepL has a higher score
・Manual (Medical Equipment), CIOMS: Google has a higher score
・ICF: Comparable with DeepL and Google
A BLEU score of 30 or higher is considered to be a translation of moderate quality that is understandable.
For documents with numbers exceeding 30, there is a possibility of improving work efficiency by using machine translation + post-editing instead of manual translation.
3. Results of Personnel Evaluation
What if humans evaluated it?
Human Science's medical and pharmaceutical translation reviewer evaluated the same document and assigned a score of 1 to 4 for each sentence in the document.
The criteria for scoring is as follows.
4: Translation time can be significantly reduced. Almost no need for revisions. Can be done with punctuation and minor word changes.
3: Can shorten translation time. Word corrections and reordering may be necessary.
2: Translation time cannot be shortened. It may serve as a reference, but it is faster to translate from scratch.
1: Unable to shorten translation time. Completely useless.
・White Papers, IB, Theses, CIOMS: DeepL has higher scores
・Manual (medical equipment), ICF: Google has a higher quality score
The overall results were similar to those of automated evaluation, but the evaluation by CIOMS was reversed from the results of automated evaluation, and for ICF, the quality of Google was slightly higher.
At Human Science Co., Ltd., we believe that by utilizing machine translation, we can streamline our translation process if the score is 2.5 or higher.
We believe that white papers and manuals (medical devices) are documents suitable for machine translation + post-editing work.
4. Summary
DeepL has been proven to have higher translation quality than Google in many cases.
However, it is not possible to determine which is better between DeepL and Google. It is important to choose a machine translation engine based on effectiveness verification, as the quality may vary depending on the document being translated.
Related Seminars
Shorten lead time for medical translation! Machine translation/Memsource utilization seminar