Some parts of this page may be machine-translated.

 

  • Localization: HOME
  • Blog
  • Utilizing ChatGPT for Translation Quality Management - Regular Expressions Edition -

Utilizing ChatGPT for Translation Quality Management - Regular Expressions Edition -

alt

2024.7.8

Utilizing ChatGPT for Translation Quality Management - Regular Expressions Edition -

Our company blog has published articles focusing on utilizing ChatGPT for translation.

 

Related Blog Posts:

>Is ChatGPT a Good Translator? Thorough Verification at Each Stage of Translation

>ChatGPT and DeepL①: Comparison of Japanese-English Translation Accuracy in Manufacturing and IT Fields

>ChatGPT and Medical Translation

 

In this article, as part of the theme of ChatGPT × Translation, we introduce an attempt to utilize ChatGPT for translation quality management.

Table of Contents

1. Utilization of Regular Expressions in Translation Quality Management

In translation, there are often cases during the proofreading stage where specific symbols/terms need to be entirely replaced with other symbols/terms. If a uniform replacement is sufficient, simply entering the symbols or terms into the replace window is easy. However, when conditions overlap, such as "replacing all terms that appear after this symbol" or "replacing only when a specific term appears at the end of a line," simple global replacement cannot handle it. It is necessary to specify the target sections using precise regular expressions.

 

However, whether an efficient regular expression can be devised at this time depends on the proofreader's level of knowledge. It takes time to look up regular expressions each time and create them on your own, and if you are not familiar with them, it can be difficult to research as well.

 

"In that case, let's have ChatGPT create the optimal regular expressions!"In this article, we will introduce examples of regular expressions created by ChatGPT for use in translation quality management.

2. Creating Regular Expressions Using ChatGPT

■Usage Example ① - Space Inside Brackets

Background of Use Case 1

Whether to place spaces before and after specific symbols is one of the key points to check during proofreading. Here, we present an example where ChatGPT was asked to create a regular expression using a case that actually occurred in a translation project.

 

The text to be corrected is below. There is an issue with the red-colored part.

 

Click [ Price range ] button.
[ Batch Log List ] is displayed.
Press [ Retry ].

 

Focusing on the red text, there is an unnecessary space between the brackets and the term, like in [ term ].
The correct form should be as shown below.

 

Click [Price range] button.
[Batch Log List] is displayed.
Press [Retry].

 

Although this type of error is unlikely when a human inputs text, in this case, based on the customer's request, we are using a translation method called "machine translation + human post-editing," and during the initial machine translation stage, the output included unnecessary spaces.
We will resolve this by performing a global replacement using regular expressions.

 

Try It Out

<Try 1>

I tried sending the following text to ChatGPT.

I want to delete only the space immediately after [ and the space immediately before ] in the following example sentences. I want to use the replace function of a text editor. Please tell me the regular expressions for before and after replacement.
Click [Price range] button.
[Batch Log List] is displayed.
Press [Retry].

 

Then, the following response was returned.

Certainly. Below is an example of replacement using regular expressions.
Regular expression before replacement: \[\s*(.*?)\s*\]
String after replacement: $1

 

When you actually enter the above regular expression into the replacement string and perform the replacement...

 

That's strange. The brackets have disappeared!

 

<Try 2>

I tried telling ChatGPT about the issues.

The symbol [ and ] disappeared in the replaced text.
I want to keep these symbols.
Please correct the regular expression.

 

They immediately taught me the new regular expression.

Understood. I will modify the regular expression to retain the [ and ] symbols.
Regular expression before replacement: (\[\s*)(.*?)(\s*\])
String after replacement: $1$2$3

 

I will try the replacement again.

 

Unfortunately, there were no changes this time.

 

<Try 3>

Let's pull ourselves together and try explaining the issues once again.

Then, the spaces immediately after [ and immediately before ] do not disappear.

 

They taught me regular expressions again. Will this work well?

Regular expression before replacement: (?<=\[)\s*|\s*(?=\])
String after replacement: whitespace

 

I entered the above regular expression to display the target section.
It seems to be a regular expression that specifies only the problematic spaces. Since the instruction is to replace these with blanks, it looks like it will work well.

 

Let's try executing the replacement.

 

 

As intended, we were able to remove only the unnecessary spaces!
I also asked ChatGPT to provide an explanation.

Please explain (?<=\[)\s*|\s*(?=\]) を解説してください

 

Then, we received the following response.

Of course. This regular expression uses positive lookbehind and positive lookahead.

 

(?<=\[)\s*: これは \[ の後に0個以上のスペースがある場合にマッチします。(?<=\[) は肯定の後読みで、マッチする条件が [ の直後であることを確認します。\s* は0個以上のスペースを表します。

|\s*(?=\]): This matches when there are zero or more spaces before ]. (?=\]) is a positive lookahead that ensures the match condition is immediately before ]. |\s* is the or operator, representing zero or more spaces.

This limits the spaces to be replaced to those immediately before and after the characters inside the [ ], allowing them to be replaced with no spaces.

 

They explained the positive lookahead and lookbehind used in the earlier regular expression.

■Use Case ② - Improving the Quality of Glossaries Used for Quality Checks

Background of Use Case ②

In translation, there are many cases where clients specify terms for proper nouns and technical terminology. In such cases, we thoroughly check before delivery to ensure that the specified terms are used without omission.

A tool frequently used in this context is QA Distiller (by Yamagata Europe).Additionally, we introduce terminology checks using QA Distiller in the blog below.

Related Blog Posts:

Practical Guide to Quality Management in Multilingual Translation Using Tools

 

For terminology checks in QA Distiller, a Dictionary file (.dict) specially formatted for QA Distiller, containing a bilingual glossary of "source term - specified translation," is used.
The Dictionary file is structured with terms listed line by line in the format "source term|translation" as shown below.

 

 

There are several issues with the above Dictionary.
3; Some terms have no translations ("Cancel", "Network Selection")
3; There are sequences of spaces (above and below "Cancel")

 

Due to the original state of the translation table (such as not all terms being finalized during creation), there are rare cases where such issues remain in the Dictionary file.
When running a check with QA Distiller in this state, unintended errors may occur, so we would like to resolve them.

 

Try It Out

 

To solve the above problem, a regular expression is needed to delete lines without translations and empty lines. Each was created with the help of ChatGPT.

 

Detection of Terms Without Translations

What is the regular expression to search for an entire line ending with the character |?

 

 

The regular expression to search the entire string is as follows:
^.*\|$
・^: Indicates the beginning of the string.
・.*: Represents any string. The * means the preceding character or pattern can repeat zero or more times. In other words, any string can appear from the start of the line.
・\|: Escapes the pipe | character, indicating that the string contains a pipe.
・$: Indicates the end of the string.
Therefore, this regular expression matches lines that have any string from the beginning and end with a pipe | character.

 

Detection of Terms Without Translations

What is the regular expression to match a line consisting only of spaces?

 

 

The regular expression to search for lines containing only spaces is as follows:
^\s*$
・^: Indicates the beginning of the string.
This regular expression starts with ^ at the beginning of the line, \s* represents zero or more whitespace characters, and $ indicates the end of the line. In other words, it matches lines that contain only spaces.

 

We received answers for both. Both are simple regular expressions, but it's convenient to be able to quickly ask ChatGPT when you suddenly can't recall them. We will use these two regular expressions to revise the Dictionary.

 

<Removal of Terms Without Translations>

In particular, regular expressions allow you to quickly verify whether they are actually correct. ChatGPT is often said to have the pitfall of "plausibly lying," but if you use regular expressions that can be immediately checked for correctness, it should be possible to avoid that pitfall.

 

<Removal of consecutive spaces>

 

<After Correction>

 

The corrections have been made. You can now perform terminology checks.

3. Summary

This time, we introduced an attempt to have ChatGPT create regular expressions and utilize them for translation checks. In the complex example of Use Case 1, it took several tries, but we ultimately obtained the intended result. In the simpler example of Use Case 2, we were able to get a regular expression that worked as intended on the first try.

 

Regular expressions may be easy for those familiar with them, but without knowledge, they are often difficult to use or recall. Leaving this to ChatGPT is a very simple way to use ChatGPT, but it is considered to be greatly helpful for translation quality checks.

 

In particular, regular expressions allow you to quickly verify whether they are actually correct. ChatGPT is often said to have the pitfall of "plausibly lying," but if you use regular expressions that can be immediately checked for correctness, it should be possible to avoid that pitfall.

 

 

Introducing Easy Translation Software for Office, "MTrans Office"

 

 

Most Popular
Category

For those who want to know more about translation

Tokyo Headquarters: +81 35-321-3111

Reception hours: 9:30 AM to 5:00 PM JST

Contact Us / Request for Materials