Some parts of this page may be machine-translated.

 

How to translate PDF easily and safely?

How to translate PDF easily and safely?

Table of Contents

1. PDF Translation

There are two types of PDF files: text-based PDFs, where the text is stored as text data, and image-based PDFs, where the text is stored as image data. Image-based PDFs are often documents scanned with a copier, and are also known as scanned PDFs.

 

When opening a text-based PDF in the Adobe Acrobat Reader app, you can use the mouse cursor to select text. If you want to automatically translate a text-based PDF, you can select and copy the text, and then paste it into an automatic translation service.

 

24791_img001

 

If the scanned PDF is image-based, you cannot select the text and the entire page will be selected. If you want to automatically translate the scanned PDF, you will have to manually retype the text while looking at the PDF, which is not practical as it takes a lot of time.

 

24791_img001

 

 

Therefore, the OCR function, which recognizes characters and converts them into text data, is necessary. First, the OCR function recognizes the characters, and then automatic translation is performed. However, if the characters are not recognized correctly, even if the translation accuracy is high, it cannot be translated correctly. Therefore, when translating scanned PDFs, the performance of the OCR function is important.

2. What kind of tools are available for PDF translation?

There are representative automatic translation services such as DeepL and Google Translate. Both can translate PDF files, but the level of support may vary.

 

DeepL
DeepL is a high-precision machine translation engine that utilizes neural networks and has received high praise in the translation industry. It uses its own deep learning technology to perform translations and supports multiple languages. The translation accuracy of DeepL is significantly higher compared to traditional machine translation, especially in producing natural translations. This is because it uses neural networks to better understand context and word meanings, allowing for more natural translations. Additionally, DeepL is able to achieve high translation quality in specific fields by learning from a vast amount of data.

DeepL has a strong track record in the field of machine translation and is highly regarded in the translation industry. In fact, in a comparative evaluation of translation quality conducted in 2021, DeepL was reported to achieve the highest translation quality when translating from English to Japanese. Furthermore, it is highly proficient in multilingual translation as it supports a wide range of languages.

However, DeepL is not capable of perfect translations. In particular, there is still room for improvement in translating complex fields and specialized terminology. However, DeepL has achieved high translation quality in the field of machine translation and is supported by many users.

DeepL has a feature called "File Translation" that allows you to upload PDF files for translation. It supports both text-based PDFs and scanned PDFs. The layout will be preserved during translation and you can download the translated PDF.

 

24791_img001

 

 

The number and size of files that can be translated depends on whether you are using the free or paid version of DeepL. With the free version, there is a limit of 3 files per month and a file size limit of 5MB. Translated PDF files cannot be edited. With the paid version, the number of files allowed per month ranges from 5 to 100 depending on the plan, and there is a file size limit of 10MB. Translated PDF files can be edited.

 

If you use the free version, the data you input may be used by DeepL. For more information, please see "Is Confidentiality Maintained with DeepL Translation? Security?" (https://www.science.co.jp/nmt/blog/21127/). Also, for the differences between the free and paid versions, please see "What are the Differences Between DeepL's Free and Paid Versions (DeepL Pro)? ~Fees, Security, Character Count~" (https://www.science.co.jp/nmt/blog/29139/).

 

Google Translate
Google Translate has a feature called document translation. You can upload PDF files to be translated. It only supports text-based PDFs and does not support scanned PDFs. The layout will be preserved and you can download the translated PDF. The translated PDF cannot be edited. The maximum file size is 10MB.


Google Translate is a free online machine translation service. Google Translate is also based on neural networks and uses a vast amount of data owned by Google to perform translations. Google Translate supports multiple languages and is used by people all over the world.

Translation accuracy has been improving and has received high evaluations for translation quality. In particular, it demonstrates high translation quality for general sentences. However, depending on specialized terminology and context, it may be difficult to provide accurate translations. Additionally, as a limitation of machine translation, there are cases where words have multiple meanings in different languages, making it difficult to provide accurate translations.

Translation accuracy has improved due to the evolution of artificial intelligence and the increase in data volume. In addition, Google Translate for smartphones has a wide range of features, such as voice translation for various languages and the ability to automatically translate text from images taken with a smartphone camera.

One of the major attractions of Google Translate is that it is available for free, but it is also a fact that there is a variation in translation quality. Therefore, when high translation accuracy is required, such as for professional documents or business documents, it is necessary to rely on professional translators.

Please refer to the latest trends in machine translation and the comparison between "DeepL" and "Google Translate" (https://www.science.co.jp/nmt/blog/32334/) for more information on the translation accuracy of Google and DeepL.

 

24791_img001

 

 

MTrans for Office (エムトランスフォーオフィス)
Our other automatic translation product, MTrans for Office, adds automatic translation functionality to Windows/Mac/Web versions of Outlook, Word, Excel, and PowerPoint. Among these, the Windows version of MTrans for Office includes PDF translation functionality using DeepL and Google, supporting both text-based and scanned PDFs. The layout is preserved and editable PDFs are saved. The file size limit is 10MB when using DeepL as the translation engine, and 20MB when using Google.

 

24791_img001

 

 

If Google is selected as the translation engine, Google Cloud Translation API will be used for OCR and automatic translation. This API was also introduced in "How to automatically translate scanned PDF documents" (https://www.science.co.jp/nmt/blog/29162/). Google's OCR function has been widely used in the Google Translate app for smartphones for many years, and has accumulated technical expertise, allowing for higher accuracy in character recognition.

3. Reasons and Disadvantages of Inaccurate PDF Translation

The PDF file is protected and cannot be translated
Password-protected PDF files cannot be directly translated. You will need to remove the password and prepare an unprotected PDF file. If the password cannot be removed, you can still print the file using "Microsoft PDF Printer" on Windows and save it as a new PDF file, which can then be translated.

 

Layout and table distortion may occur
When automatically translating a PDF, the layout and tables may become distorted. With the paid version of DeepL, MTrans for Office, the translated PDF can be loaded into the Word application, allowing for manual corrections to be made.

 

Translations can become incomprehensible due to unintended line breaks.
In some cases, a sentence may be split in the middle and recognized as multiple sentences. When using MTrans for Office, the translated PDF can be loaded into the Word application, allowing you to combine multiple sentences from the original text into one before re-translating automatically.

 

The processing capacity of automatic translation is small, making it difficult to translate large files at once.
Since the file size of scanned PDFs tends to be large, the upper limit of file size for automatic translation services may be a problem. If the limit is exceeded, you can either split the file or use Adobe's PDF compression service (https://acrobat.adobe.com/link/acrobat/compress-pdf). When compressing a PDF, it is recommended to choose a lower compression level (highest quality) to prevent text from becoming distorted and maintain high accuracy in text recognition.

 

There is a risk of information leakage
When using the free version of DeepL or Google Translate, there is a possibility that the input data will be used for other purposes. When translating confidential information, it is necessary to use a paid service with secure security, rather than a free automatic translation service. For more information on DeepL security, please see "Is Confidentiality Maintained with DeepL Translation? Security?" (https://www.science.co.jp/nmt/blog/21127/).

 

Protected PDF files
If a PDF file is protected, it may not be possible to translate it. Protected PDF files are restricted from copying, printing, editing, or translating in order to prevent unauthorized use. In such cases, it is necessary to first remove the restrictions on the PDF file.

After removing the restrictions on the PDF file, you can use online translation services such as Google Translate or DeepL if translation is needed. These services allow for direct translation of PDF files.

However, even with these methods, the translation accuracy is not always perfect. Due to factors such as the layout and formatting of PDF files, the translation may not be performed correctly. Therefore, it is desirable to obtain the original file before converting to PDF as much as possible. Additionally, if a higher level of translation quality is required, you can also request the services of a specialized translation service or translator.

 

Large files are difficult to translate all at once
PDF translation is usually relatively easy to handle with small files, but processing becomes difficult when the file size increases. This is because translation engines have limitations on the amount of data they can process at once. When processing a large file all at once, there is a possibility that the translation engine will stop working and errors may occur during processing.

Therefore, when translating large PDF files, it is necessary to split the file into smaller files. Each file can be translated and then recombined as needed.
When translating PDF files, it is important to pay attention to the file size. It is also important to consider ways to reduce the size of the original document if possible. This can include compressing images and graphics. This will help reduce the size of the PDF file and make the translation process smoother.

 

4. What are the requirements for a PDF translation tool?

In terms of functionality, it is required to support both text-based PDFs and scanned PDFs. When translating scanned PDFs, a high-performance OCR function that can accurately recognize text from images is required. It is desirable for the translated file to be editable so that any layout issues can be fixed. Having a feature for editing the original text is convenient when the original sentence is split into multiple parts. In terms of confidentiality, it is essential that the content of the PDF is not reused for other purposes.

5. Summary

There are text-based PDFs and image-based scanned PDFs in PDF files, and the compatibility varies depending on the automatic translation service used. Our company offers a translation product "MTrans for Office" that supports both text-based PDFs and scanned PDFs, incorporating automatic translation services from DeepL, Google, and Microsoft. Please try our 14-day free trial to confirm the quality and usability.

 

Related Services

Easy translation software MTrans for Office for Office
Popular Article Ranking
Archive
Category

For those who want to know more about translation

Tokyo: +81-3-5321-3111
Nagoya: +81-52-269-8016

Reception hours: 9:30 AM to 5:00 PM JST

Contact Us / Request for Materials