Some parts of this page may be machine-translated.

 

Multilingual Localization with AI Automatic Voice Tools - Practical Guide and Case Studies -

Multilingual Localization with AI Automatic Voice Tools - Practical Guide and Case Studies -

Previous BlogIn our previous blog, we introduced the usage scenes, benefits, and preparations for AI automatic voice tools in multilingual localization. This time, we will introduce practical content such as checking and correcting after creating the voice, as well as examples of using AI automatic voice tools in our company.

Table of Contents

1. Check the voice created with AI automatic voice tool

Last time, we explained the importance of checking the voice created by AI automatic voice tools and the need for a checklist to ensure consistent quality across languages.

"In reality, there are still points that cannot be fully addressed by automatic voice due to differences in intonation depending on the context or unnatural gaps or lack of necessary gaps that make it difficult to understand.
Especially in multilingual localization, there are many cases where the meaning changes due to differences in intonation depending on the language, so post-generation checks are essential."

 

"If the checker checks randomly, there is a risk of having to make unnecessary corrections based on the checker's sense, or on the other hand, being overlooked with the thought of "this is good enough". In order to maintain consistent quality across multiple languages, it is crucial to align the checking criteria between languages."

 

Quoted from previous blog

 

So, specifically, what kind of checks should be performed?
As mentioned above, simply asking native staff to "check the audio" is not enough. In order to meet the required quality standards, it is necessary to clearly communicate what aspects should be checked. In this case, one effective method is to use a "checklist".
Below is an example of a checklist used by our company.

Checklist Configuration

 

This checklist contains both the original and translated versions of a project that has undergone automatic voice generation after translation.
The structure of this checklist involves checking each sentence in the original-translated pair on the left from the following perspectives.

 

・Is the translation column matching the audio?
・Are there any unclear or unnatural parts at an incomprehensible level?
・Is the reading of numbers correct?
・Is the reading of units correct?
・Is the reading of English abbreviations correct?

 

It is natural to check if the manuscript and the audio match, but the key point is the pronunciation of "numbers", "units", and "English abbreviations".
This is because there is a tendency for mispronunciation when using synthetic voice tools for "numbers" and "units". Also, uncommon "abbreviations" may not be pronounced as intended.
In this way, picking up points where synthetic voice tools are not good at and conducting audio confirmation is effective in ensuring audio quality.

2. Editing of voice created with AI automatic voice tool

Here, we will introduce the method for fixing the issues found during the voice check.

 

Example 1: Words with different pronunciations based on their parts of speech are not being read correctly.

There are limitations to context judgment by AI automatic voice tools, and sometimes misreading occurs. For example, the English word "produce" has two meanings, "noun (= crops such as vegetables and fruits)" and "verb (= to produce)". When pronounced, the accent positions are different, but when "produce" is read by an automatic voice tool, it is read uniformly as "verb (= to produce)" regardless of context.

 

■Example 1
English: We grow most of our own <font id="2">produce.
(We grow most of our own produce.)

 

Incorrect Voice:

 

 

The accent position is set to "pro-DUCE". Here is how to fix it. The following image shows an example of the fix.

 

 

Modify accent position to "PRO-duce"

 ・Increase the pitch of "pro": Yellow highlight tag group

 ・Lower the pitch of "duce": group of blue highlight

 ・Minor Pronunciation Corrections: Green Highlight

In this example, due to the insertion of a pitch control tag, "duce" was read as "duche" instead of "deuce" in the latter half, so the spelling was intentionally changed to "duse" to adjust it to be read correctly (highlighted in green).

 

Here is the adjusted audio:

 

The accent position has been changed to "PRO-duce" and the pronunciation has also been correctly adjusted to "PRO-duce" instead of "PRO-deuce".

 

Example 2: Inconsistency in Pronunciation of Abbreviations and Proper Nouns

In the case of abbreviations or rarely used proper nouns, incorrect pronunciation may be output or inconsistent reading may occur depending on the location of appearance. Here, we will use the abbreviation "ISO" as an example. Needless to say, "ISO" is an abbreviation for "International Organization for Standardization" and is a commonly used term, so you may think that errors are unlikely to occur. However, in the following languages, there are multiple patterns for reading "ISO". In such cases, inconsistent reading is more likely to occur.

 ・English → ISO AISOO

 ・Indonesian → ISO AIS

 

In the following example, different readings were output depending on whether "ISO 12100" was written with or without spaces.

 

■Example 2

English: I will explain the contents of <font id="2">ISO 12100.

Indonesian: I will explain the contents of <font id="2">ISO 12100.

(Explaining the contents of ISO 12100.)

 

In this case, it is necessary to standardize the pronunciation and make voice corrections. This time, we have adjusted the voice to be pronounced as "iso" in English and "iso" in Indonesian.

 

English:

 

 

 

Indonesian:

 

 

 

3. Multilingual Localization Case Study Using AI Automatic Voice

We will introduce a case study of multilingual localization using AI automatic voice at our company.

 

■Overview

IndustryManufacturing industry
TargetTraining Materials (PowerPoint Materials + Lecture Audio by Instructor)
LanguageJapanese → English/Chinese, English → Indonesian
AmountApproximately 100,000 characters/language
Production PeriodAbout 4 months
DeliverablesVoice-Inserted PowerPoint Materials in 3 Languages

 

■Points

From the above, the following points emerge.

[Required Tasks]

・PowerPoint Materials: Translation

・Lecture Audio: Text Transcription + Translation + Audio Generation in Each Language

[Notes]

・The production period is short compared to the amount.

・Due to the nature of the teaching materials, the priority is for the meaning to be conveyed, rather than the richness of expression.

[Notes] As you can see from the above, this project prioritizes accuracy over expression and has a limited schedule. In this case, utilizing tools can be very effective.

 

In this project, we have chosen the following approach.

★Voice: <font id="1">AI Automatic Voice + Native Voice Confirmation and Correction

★Translation: <font id="1">Machine Translation + Manual Correction (Post-edit)

The key is to use tools to streamline work while ensuring accuracy by always making manual corrections.
For voice confirmation, we will use the checklist introduced in the first half of this article and make corrections on the tool as described above.

 

■Workflow

The specific workflow has been defined as follows.

 

 

・Lecture Audio:
①Textualization of lecture audio (transcription) and rewrite
②Creation of audio manuscripts in each language using machine translation
③Audiofication of audio manuscripts using AI automatic voice tools.
・PowerPoint Materials:
①Translation into each language using machine translation
Once the audio and PowerPoint for each language are completed, insert the audio into the slides to complete.

 

■Schedule

One of the clear benefits of using tools is the schedule. By using machine translation and AI automatic voice tools, the schedule can be greatly shortened. The table below compares the actual project schedule (orange line) with the assumed schedule if translation and voice creation were done by humans (gray line).

 

 

You can easily see that the translation and voice creation process has been significantly shortened. In the case of human work, the schedule has exceeded the available 4 months and has become a schedule of about 6 months.

 

In addition, there is another advantage when it comes to creating audio. When using a human narrator for recording, there is a challenge of not being able to determine the recording schedule due to the length of the project and the availability of the narrator and studio. On the other hand, when using AI automatic voice, the work can be done regardless of the time and place as long as the tool is available, making it easy to fix the schedule.

 

By utilizing tools, we have been able to "shorten and fix" the work period, allowing us to allocate enough time for quality improvement efforts such as native checks within the limited time of 4 months.

4. Summary

In the first half of this article, we introduced effective methods for checking and correcting AI automatic voice tools in multilingual localization.
You have seen the points where errors are likely to occur with automatic voice and how to fix them. In the second half, we introduced the points and benefits of using AI automatic voice tools (+ machine translation) through actual cases handled by our company.

 

In addition, the content of this article was also covered at our seminar held on September 7th and 11th, 2023. You can download the materials used in the seminar here. Also, on this page, there are other materials that may be useful for manual creation, translation, etc., so please take a look.

Popular Article Ranking
Archive
Category

For those who want to know more about translation

Tokyo: +81-3-5321-3111
Nagoya: +81-52-269-8016

Reception hours: 9:30 AM to 5:00 PM JST

Contact Us / Request for Materials