- Table of Contents
Introduction
With the evolution of LLMs such as ChatGPT and Claude, many companies are embarking on the development of "proprietary LLMs" and "LLMs strong in Japanese." However, when actually advancing development, it is not uncommon to encounter obstacles such as "the Japanese is not as natural as expected" and "accurate answers are not returned."
In fact, the performance of an LLM is determined more by the quality of the training data (annotation quality) than by the model architecture or the number of parameters. This article explains why developing Japanese LLMs is challenging and why "human-crafted data" is essential.
1. Reasons Why Developing Japanese LLMs is Difficult
LLMs are being developed worldwide, but Japanese has unique difficulties not found in other languages. The main reasons are the following three points.
1-1. Structural Differences from English Models
Japanese often omits the subject, has relatively flexible word order, and relies heavily on particles and context for meaning interpretation. Therefore, applying an LLM designed and trained based on English as is can easily lead to cases where the hierarchical relationships and intentions in sentences are misunderstood. Furthermore, the system of expressions such as honorifics, humble language, and polite language includes the relationships between speakers, requiring the model to have a highly advanced understanding.
1-2. Difficulties in Tokenization and Issues with Orthographic Variations
Japanese does not have spaces between words, so tokenization based on morphological analysis is necessary. However, there are many technical terms, compound words, and katakana words, and the way tokens are segmented greatly affects learning efficiency and semantic representation. Additionally, there are many variations in notation such as “AI/Artificial Intelligence/Ai,” as well as mixtures of full-width and half-width characters, and kanji and hiragana, which cause the same meaning to be treated as different tokens.
1-3. Challenges of Japanese Data with Many "Different Expressions for the Same Meaning"
In Japanese, natural expressions vary depending on the context and usage, so even with the same meaning, the range of expressions is very wide. For example, appropriate expressions differ greatly between business documents and chats. While this diversity is a richness of the Japanese language, if the training data lacks consistency, it can become a factor that lowers the output quality and stability of LLMs.
Considering these challenges, it becomes clear that in the development of Japanese LLMs, not only model design but also the preparation of high-quality Japanese data and specialized review are indispensable.
[Reference Information] Notable Japanese LLMs
The following are particularly notable models as original Japanese LLMs.
●Llama-3-ELYZA-JP-8B (ELYZA Corporation)
A domestically developed LLM based on Llama 3, enhanced with Japanese data and instruction-following training to strengthen conversational performance in Japanese.
●Rakuten AI LLM (Rakuten Group)
An LLM optimized for corporate use, based on the Mistral model family, continuously trained with proprietary Japanese and English data.
●Fujitsu Takane (Fujitsu)
A Japanese LLM developed specifically for enterprise use, highly rated on Japanese benchmarks such as JGLUE.
●Llama 3 Japanese / Qwen 2.5 Japanese
A group of derivative models based on high-performance overseas LLMs, enhanced with additional training and instruction tuning specifically for Japanese.
2. The Importance of Annotation Quality Supporting LLM Accuracy
The accuracy of LLMs is greatly influenced not only by the model architecture and the amount of training but also by the quality of the labels in the training data. If ambiguous or incorrect labels are included, the model learns incorrect decision criteria, resulting in decreased output accuracy and stability.
Especially in Japanese, where understanding context and differences in emotions and nuances are often crucial, automatic labeling by AI alone cannot fully address these challenges. Therefore, consistency based on rules, meticulous attention to detail, and final human judgment are decisive factors that determine the quality of LLMs.
3. Three Points to Improve Annotation Quality
3-1. Clear Guidelines and Rule Design
To stabilize annotation quality, clear and specific guideline design is essential. If definitions and judgment criteria are ambiguous, differences in interpretation arise, resulting in a loss of overall data consistency. Especially in Japanese, verbalizing how to handle ambiguous cases in advance significantly affects the final model accuracy.
3-2. Quality Assurance through a Multi-Stage Checking System
A multi-stage checking system is indispensable for high-quality annotation. Through mutual reviews and rechecks, biases caused by individual differences and assumptions can be suppressed. Organizing cases with differing judgments through a consensus-building process also leads to improvements in the accuracy of the guidelines themselves.
3-3. Ensuring Reliability through Security and Training
In annotation, not only data quality but also the safety of the working environment and the skills of the workers are important. When handling confidential data, security measures such as access control and environment separation are indispensable. Additionally, continuous education and feedback enable the maintenance of stable quality over the long term.
4. Reasons Why LLM Development Companies Outsource Annotation
While annotation for LLMs is an important process, maintaining high accuracy continuously requires significant labor, expertise, and a strict data management system. For this reason, many LLM development companies choose to outsource.
4-1. Main Challenges of In-House Implementation
Annotation tasks require personalized skills such as contextual judgment and understanding expressions unique to the Japanese language. A lack of know-how leads to quality inconsistencies and rework, which strains development resources. Additionally, when handling confidential data, establishing an adequate security environment (such as physical monitoring and access control) within the company for temporary staff involves enormous costs and management efforts.
4-2. Benefits of Outsourcing
By outsourcing to specialized partners, it becomes possible to achieve annotation that balances both speed and quality. It can flexibly handle large-scale data and tight deadline projects, and a major strength is that work can be carried out safely under a security system based on ISO standards.
5. Summary: Human Science Annotation Support
5-1. Human Science Solutions
Human Science has a track record of creating over 48 million pieces of labeled data and has supported AI development projects in various fields including natural language processing, healthcare, IT, manufacturing, and automotive. Without relying on crowdsourcing, it maintains a system staffed by directly contracted specialists, achieving both quality and security.
In addition to annotation and curation, we also support structuring document data and data preparation for building generative AI, LLM, and RAG. We have a dedicated security room that meets ISMS standards, ensuring the safe handling of highly confidential data.
5-2. For Companies Facing These Challenges
●The accuracy of Japanese LLMs is struggling to improve
●There is variability in annotation quality
●Want to focus internal resources on model development
●Looking for a contractor who can safely handle confidential data
Regarding annotation and data preparation for Japanese LLMs, consultations are also possible from the consideration stage.
Please feel free to contact us first.

Text Annotation
Audio Annotation
Image & Video Annotation
Generative AI, LLM, RAG Data Structuring
AI Model Development
In-House Support
For the medical industry
For the automotive industry
For the IT industry
For the manufacturing industry






















































































