Some parts of this page may be machine-translated.

 

The Meaning and Future of Developing a Fully Domestic LLM

alt

1/13/2026

The Meaning and Future of Developing a Fully Domestic LLM

*Pure Domestic LLM: Here, a pure domestic LLM refers not to a model that has been additionally trained based on an overseas general-purpose LLM, but to an LLM built from scratch based on Japanese from the data design stage onward.

1. Introduction

When considering the development of a Japanese LLM, what many people first think of is the method of developing based on general-purpose LLMs such as Llama. These already have high performance, support Japanese to a certain extent, and by performing additional training and fine-tuning, can be brought to a level usable in practical work. This approach is realistic and also excels in terms of cost and speed.

On the other hand, there is also the option of developing a domestically produced LLM from scratch. However, compared to basing it on a general-purpose LLM, the hurdle is very high, and it is natural to have doubts such as "Is it really necessary to go that far?" or "Isn't the existing model sufficient?".

However, these two methods are not simply comparable in terms of development efficiency or performance. When developing an LLM that understands expressions unique to the Japanese language, the fundamental difference lies in the approach itself—what criteria to entrust to the AI for judgment and what we want to preserve as our own assets for the future.

Reference Links
Released the latest model in the "ELYZA LLM for JP" series: "Llama-3-ELYZA-JP"
NTT announces next-generation purely domestic LLM "tsuzumi 2" — Achieves GPT-5 level Japanese performance in a lightweight model with full scratch design

Reference Blog
Top 3 Recommended Japanese LLMs | Thorough Comparison of Large-Scale Language Models Specialized for Japan [2025 Latest]

2. Assumptions Underlying Development Based on General-Purpose LLMs

When using a general-purpose LLM like Llama as a base, it already possesses sufficient capabilities for general text understanding, generation, and common-sense responses, so the main focus becomes how to utilize these abilities. This approach is well-suited for general tasks such as information retrieval and writing assistance. In cases where the model's behavior being somewhat generic is not a major issue, it can be considered a very rational choice.

However, this approach has an implicit premise. That is, when developing, you basically accept the model's "way of thinking" and "judgment tendencies."

The model's "way of thinking" and "judgment tendencies" refer to behaviors such as how it responds to ambiguous instructions and where it finds a compromise in situations with divided opinions. These are largely shaped by what has been considered "desirable" through the pre-training, instruction tuning, and human feedback during the development phase of the general-purpose model. This can be called the model's "way of thinking" and "judgment tendencies."

Even if additional tuning is performed for Japanese based on that model, it is actually not easy to fundamentally change the underlying way of thinking or judgment criteria.

Reference Blogs
Comprehensive Comparison of Open Source Generative AI: Optimal Solutions and Implementation Points for Different Business Scenes
What is Reinforcement Learning? The Mechanism of AI Learning Through Trial and Error and Its Use Cases
What is Fine-Tuning in LLMs?

3. The Decisive Differences Brought by Purely Domestic LLMs Compared to General-Purpose LLMs

In general-purpose LLMs, Japanese is one of the many languages learned. While multilingual performance improves year by year, the core of the design is often based on the linguistic structures, thought patterns, and even cultural backgrounds of English-speaking regions.

On the other hand, in a purely domestically developed LLM, the Japanese language itself is placed at the center of the design from the very beginning.

In Japanese, there are many judgments and responses based on unstated assumptions or the premise that context is shared. For example, situations where the distinction in honorific language carries meaning, where requests are conveyed indirectly, or where deliberately not stating something outright becomes the appropriate answer.

The ability to design these elements to handle natural Japanese from the very beginning is a distinctive feature unique to full-scratch development.

4. The Practical Advantage of Suppressing Black-Boxing

One practical advantage of purely domestically developed LLMs is that their behavior is less likely to become a black box. This is not because the model is huge or uses the latest technology. Rather, it is the difference in whether the model has a structure that can explain how decisions were formed.

When additional training is performed based on a general-purpose LLM, the model's judgments always appear as the result of two overlapping layers. One is the judgment criteria embedded in the original model through pre-training, instruction learning, and human feedback. The other is the adjustment made by additional training provided later for Japanese language support and business specialization.

In this structure, it becomes difficult to clearly separate and explain where and how a particular response was formed. It is not possible to distinctly determine whether the handling of ambiguous instructions or the compromise reached in situations with divided judgments is due to the influence of additional training or the inherent biases of the original model. As a result, a situation arises where the developers cannot adequately explain the question, "Why was this judgment made?"

On the other hand, in purely domestically developed LLMs created from scratch, the criteria for judgment are incorporated into the design from the very beginning. Policies such as the data used for pre-training, how to interpret expressions and contexts unique to Japanese, and how far to delve into ambiguous questions are established based on a consistent approach. As a result, it becomes easier to explain the model's behavior by attributing it to specific stages of data design or learning policies.

Of course, even when developed from scratch, it is not possible to fully explain everything inside the model. However, what is important is not to completely eliminate the black-box nature, but to be able to take responsibility ourselves for explaining to what extent the design is accountable.

In fields such as government, healthcare, finance, and manufacturing, where the basis for decisions and accountability are required, this difference is not small. In situations where it is necessary to explain "why a certain decision was reached" in the words of the stakeholders, even more than high performance, purely domestically developed LLMs become an option worth considering.

5. The Future of Purely Domestic LLMs

That said, when considering the future, it is unlikely that purely domestic LLMs will completely replace general-purpose LLMs. For many applications, development based on general-purpose LLMs is more rational.

Even so, the value of purely domestic LLMs will not be lost. On the contrary, it is believed that their value will increase as generative AI becomes more deeply integrated into society.

Situations where AI that makes judgments in Japanese and operates based on Japanese business practices, systems, and culture will undoubtedly increase in the future. Especially in workplaces where ambiguous work instructions and the sharing of background knowledge are required, this difference will become more apparent. In such cases, rather than a "smart AI," an "AI that understands intentions" and an "AI that can share ways of thinking" may be what is needed.

Choosing to develop a purely domestic LLM can be seen as an investment to retain judgment criteria rooted in the Japanese language, systems, and culture in our own hands for the future, rather than entrusting them to overseas general-purpose models.

6. Human Science Teacher Data Creation, LLM RAG Data Structuring Agency Service

Over 48 million pieces of training data created

At Human Science, we are involved in AI model development projects across various industries, starting with natural language processing, including medical support, automotive, IT, manufacturing, and construction. Through direct transactions with many companies, including GAFAM, we have provided over 48 million high-quality training data. We handle a wide range of training data creation, data labeling, and data structuring, from small-scale projects to long-term large projects with a team of 150 annotators, regardless of the industry.

Resource management without crowdsourcing

At Human Science, we do not use crowdsourcing. Instead, projects are handled by personnel who are contracted with us directly. Based on a solid understanding of each member's practical experience and their evaluations from previous projects, we form teams that can deliver maximum performance.

Generative AI LLM Dataset Creation and Structuring, Also Supporting "Manual Creation and Maintenance Optimized for AI"

Since our founding, our main business and service has been manual creation, and currently, we also support the creation of documents optimized for AI recognition to facilitate the introduction of generative AI for corporate knowledge utilization. In sharing and utilizing corporate knowledge and documents using generative AI, current technology still cannot achieve 100% accuracy with tools alone. For customers who want to make the most of their past document assets, we also provide document data structuring. We offer optimal solutions leveraging our unique expertise, deeply familiar with various types of documents.

Secure room available on-site

Within our Shinjuku office at Human Science, we have secure rooms that meet ISMS standards. Therefore, we can guarantee security, even for projects that include highly confidential data. We consider the preservation of confidentiality to be extremely important for all projects. When working remotely as well, our information security management system has received high praise from clients, because not only do we implement hardware measures, we continuously provide security training to our personnel.

In-house Support

We provide staffing services for annotation-experienced personnel and project managers tailored to your tasks and situation. It is also possible to organize a team stationed at your site. Additionally, we support the training of your operators and project managers, assist in selecting tools suited to your circumstances, and help build optimal processes such as automation and work methods to improve quality and productivity. We are here to support your challenges related to annotation and data labeling.

 

 

 

Related Blog Posts

 

 

Contact Us / Request for Materials

TOP