Some parts of this page may be machine-translated.

 

[Engine] Training Time

[Engine] Training Time

To improve the accuracy of machine translation, it is important to have high quality original documents. This has been discussed in previous blog posts. Regarding the quality of the original document, not only the target document, but also the "corpus" that is loaded into the statistical-based engine is very important.

When there are long sentences or sentences with complex grammatical structures,
the translation accuracy may not improve even if the statistical-based engine is trained,
and there may also be a problem with the training process becoming longer.

This time, we will introduce the verification results using project data for Japanese-English translation regarding the processing time.

●Verification Results for Training Time

 
There are many processes involved in training, and among them,
the most time-consuming is syntax analysis to determine the part of speech and dependency of each word.
Therefore, if a sentence is long and contains complex grammatical structures,
it will take a long time to process and analyze the syntax.

The table below summarizes the time it took to parse the syntax of the Japanese-English translation corpus used in actual projects.
Let's compare the results of analyzing the corpus for "Project A", which contains mostly short sentences, and "Project B", which contains mostly long sentences.

 


 

Project A (Processing Time) Project B (Processing Time)
1 sentence 6.72 seconds 6.38 seconds
100 sentences 15 minutes 41 minutes
1000 characters 1 minute and 10 seconds 7 minutes and 53 seconds
3000 characters 6 minutes and 27 seconds 1 hour and 5 minutes
10000 characters 4 hours and 9 minutes 5 hours and 46 minutes

-Verification Environment
Analyzer: Ckylark (Skylark)         Used PC: iMac
Processor: Core i5          Processor Speed: 2.8GHz
Memory: 12GB 1,333Hz DDR3


 

As you can see from the table, B takes significantly more time than A
even with the same number of sentences.

By the way, the time it takes to analyze syntax is not simply proportional to the number of sentences.

Due to the fact that there are more longer sentences in B compared to A, the difference in processing time becomes more noticeable as the number of sentences increases.
As shown, the length of Japanese sentences greatly affects the processing time of training.
This is why the processing time for training varies greatly depending on the length of Japanese sentences.

In this verification, there are 10,000 sentences,
but in research and development that deals with vast corpora,
it often takes 1-2 weeks for training.

●Reducing processing time by shortening sentences

 
To reduce training processing time, it is recommended to make the sentence used as a corpus shorter.

In addition, shortening Japanese sentences allows machines to perform more accurate training, leading to high-quality machine translation.

●Summary

 
When using a statistical-based engine, it takes time to train.
However, by simplifying Japanese sentences in the corpus,
it can lead to a reduction in processing time.
In addition, by simplifying Japanese sentences, it is possible to achieve better machine translation.

 
At Human Science Co., Ltd., we offer analysis services for corpora and target documents.

We also provide advice on introducing machine translation, so please feel free to contact us!

>>Contact Form

If the form is not available, please send your inquiry via email to hsweb_inquiry@science.co.jp.

Alternatively, please feel free to contact us at TEL: 03-5321-3111 or by phone.
 
 

Blog Writing Team

Hiroki Makino
IMG_3712_for_upload
・Studied information engineering and researched natural language processing in university.
・Conducted engine analysis and investigation to improve machine translation accuracy through quality evaluation and verification of multiple engines.
・As a Technical Writer, researched and verified Japanese text that is easy to machine translate.

Popular Article Ranking
Archive
Category

For those who want to know more about translation

Tokyo: +81-3-5321-3111
Nagoya: +81-52-269-8016

Reception hours: 9:30 AM to 5:00 PM JST

Contact Us / Request for Materials