Some parts of this page may be machine-translated.

 

Applying Automatic Translation to Scanned PDF Documents

alt

2021.11.18

alt

2022.6.22

Applying Automatic Translation to Scanned PDF Documents

There are times when we scan paper documents and save them as PDF files in our operations. Additionally, we may receive such PDF files from our business partners. If you want to automatically translate these types of PDF files, what methods are available?
In this article, we will introduce some reasons why PDF translation may not work well and some solutions. If you are looking to streamline your translation tasks during business operations, please consider this as a reference.


The PDF file contains scanned documents stored as images, and the text data cannot be extracted (this PDF file is referred to as an "image PDF" or "scanned PDF"). Additionally, many automatic translation services only support text translation. Therefore, traditionally, a process called OCR, which converts images to text, was necessary before automatic translation. Furthermore, the translated document files often do not maintain the original layout.
To restore the layout to match the original data, editing work is required. Because of these necessary steps, it is not possible to proceed smoothly with PDF translation.




The Google Translate app features a function called "Real-time Camera Translation." When you launch the app and point your smartphone's camera at the material you want to translate, it can automatically translate while maintaining the layout. It is very convenient, but the Google Translate app is a free service, and the confidentiality of translated documents is not guaranteed. According to the terms of use, there is a possibility that data may be reused. Therefore, there are issues with using the Google Translate app to translate business documents. Data leaks from highly confidential documents used in business settings can directly impact a company's credibility, so caution must be exercised in its usage.


How to Use the Google Cloud Translation API

This month, the document translation feature of the Google Cloud Translation API has been launched. By using this API, you can achieve functionality equivalent to the real-time camera translation of the Google Translate app, allowing for automatic translation of image PDFs while maintaining layout. Additionally, since Google does not reuse the data, the confidentiality of the data is also preserved.


I quickly tried the document translation feature. First, I printed our English homepage and scanned the paper to create an image PDF. The image below is an excerpt from that image PDF.


This was automatically translated into Japanese using the Translation API's document translation feature.


As you can see, automatic translation was achieved while maintaining the layout. The sentences are correctly recognized, and the translation accuracy is high.

How to Translate PDF Files Using DeepL

"DeepL" is a translation service provided by DeepL. It is highly regarded for its translation accuracy and supports 26 languages, including Japanese, English, and German. Compared to Google Translate, which supports over 100 languages, it may seem to have fewer supported languages, but it covers most of the languages commonly used in business settings, making DeepL sufficient for many needs. The usage is very simple. After accessing the DeepL site, select "Text Translation," paste the text you want to translate, and choose the desired target language to automatically get the translation.

DeepL offers both a free version and a paid version. Each has different security features and character limits, so be sure to choose a plan that fits your usage.
If you wish to use the paid version, "DeepL Pro," you can either apply directly to DeepL or choose the services of MTrans Team, which incorporates the DeepL translation engine.
For a detailed explanation of the differences between DeepL's free and paid versions (DeepL Pro), please refer to this article.
>What are the differences between DeepL's free and paid versions (DeepL Pro)? - Pricing, Security, Character Limits -

How to Translate PDF with MTrans Team

MTrans Team is an automatic translation system that allows you to easily translate foreign language texts in one step. You can start using it immediately when the need for translation arises. By logging into the MTrans Team webpage in advance and entering the text you want to translate in the text box or dragging and dropping the file you want to translate, you can begin the translation.

The feature of MTrans Team is that the productivity of its functions improves with each use. By implementing MTrans Team, you will utilize a dedicated database that learns industry-specific terminology and phrases frequently used within the company, allowing for reuse in future translations. This can lead to standardizing terminology across the company and reducing correction costs, thereby enhancing operational efficiency.

In our company, in response to customer requests, we are developing an automatic translation service that allows easy use of automatic translation for image PDFs.
The MTrans Team has a robust security system in place, so there is no need to worry about data leaks as you would with typical online services.
If you are interested in our MTrans solutions, please feel free to contact us.



Related Services

MTrans Team AI automatic translation software

Translate Office products with the easy translation software MTrans for Office

MTrans for Phrase TMS

MTrans for Trados

 

 

 

 

 

Author Information

Yuki NakayamaLanguage Solutions Department
Localization Group
Machine Translation Team Leader

  • ・Over 15 years of experience in translation and review tasks, while also developing and supporting the implementation of machine translation technology.
  • ・Technically improve the quality and efficiency of post-editing projects for clients and in-house.
  • Numerous presentations at events such as JTF Translation Festival, TC Symposium, AAMT, TAUS, LocWorld, both in Japan and overseas.
  • ・Contributed articles on machine translation to 'Interpreting and Translation Journal' and 'Perfect Guide to Industrial Translation' (both published by Ikaros Publishing).

 

Most Popular
Category

For those who want to know more about translation

Tokyo Headquarters: +81 35-321-3111

Reception hours: 9:30 AM to 5:00 PM JST

Contact Us / Request for Materials