Some parts of this page may be machine-translated.

 

Where Do Failures in AI Development for Manufacturing Originate? — Success Tips from the Perspective of Data Quality —

alt

12/17/2025

Where Do Failures in AI Development for Manufacturing Originate? — Success Tips from the Perspective of Data Quality —

1. Introduction: Why AI Development in Manufacturing is Gaining Attention

In recent years, the use of AI in manufacturing has been increasing year by year for multiple purposes such as "quality improvement," "labor reduction," and "efficiency in equipment maintenance." In particular, areas like image analysis, visual inspection, and early detection of equipment abnormalities align well with on-site challenges and AI technology, with efforts accelerating mainly among major companies, and the movement spreading throughout the entire industry is becoming remarkable.

Behind this background are structural issues such as labor shortages and the reliance on individual expertise at manufacturing sites, and the limitations of conventional systems are beginning to become apparent. Expectations for AI as a technology that can supplement these challenges and improve productivity are rising significantly. Additionally, advancements in hardware performance and the sophistication of AI models are favorable factors, leading to a rapid increase in pilot experiments and the introduction of simple tools even among small and medium-sized enterprises.

2. Failure Patterns in AI Development for Manufacturing

When advancing AI development in the manufacturing industry, several common failure patterns have been pointed out in multiple reports. Here, we have comprehensively organized the items frequently mentioned in such field cases.

Reference external articles:

Tulip "The Context Gap: Why Manufacturing AI Fails Without Human Insight"
SHIFT AI "Why Does AI Implementation in Manufacturing Fail? Thorough Explanation of Cases, Causes, and Avoidance Measures"
Business + IT (SB Creative) "Wrong from the Start... Why AI Implementation in Quality Control Often Fails, The 'Success Procedure' Known Only by Hitachi"

1. Ambiguous Purpose (Unclear Goals)
This is a case where the purpose of AI implementation is unclear from the start. When the situation is "we want to improve something with AI for now," the selection of use cases and the setting of evaluation criteria tend to become vague, making it difficult to measure results. Ideally, the business issues to be solved or the goals to be improved should come first, but if the project starts from a technology-driven standpoint, the direction may waver midway, making it difficult to reach the goals that need improvement.

2. Data Shortage and Data Quality Issues
There are cases where development stalls because the data collected on the manufacturing floor varies greatly in both quantity and quality and is not organized in a form suitable for AI development. Especially in image analysis and visual inspection, differences in lots, equipment, and lighting conditions have an impact, and if learning proceeds without the data meeting certain quality standards, it leads to problems with unstable accuracy. The fact that data collection takes time is also frequently mentioned.

3. Breakdown in Transition from PoC to Production
Even if high accuracy is achieved in the PoC, there are cases where those results cannot be replicated during actual operation. This is because the PoC is conducted under limited conditions, and the various fluctuations unique to the field are not sufficiently reflected during evaluation. If decisions are made based solely on PoC results without adequate data collection and evaluation design assuming the production environment, significant gaps tend to arise during post-deployment operation.

4. Lack of Personnel and Skills
This is a case where operations begin without securing enough personnel with knowledge of data science and machine learning. To operate AI stably in a production environment, it is essential to have personnel who understand the behavior of AI models and can continuously add data and adjust rules. Without personnel who can update and verify AI models in alignment with the on-site situation, the accuracy and reliability of AI will gradually decline, leading to a loss of the benefits of implementation.

Among these failure patterns, the two issues of "2. Data shortage and data quality problems" and "3. Breakdown in the transition from PoC to production" are frequently pointed out as data-related challenges in AI development for manufacturing.

3. Background of Failures in Terms of Data

In AI development for manufacturing, while various factors such as goal setting and personnel are involved, challenges related to data are relatively concrete and easier to address, and they tend to significantly impact the overall success or failure of the project.
From Chapter 3 onward, we will focus on these data-related aspects, organizing why problems tend to occur and what underlying structural issues exist.

●Data Variability Due to Multifaceted Conditions
Differences in equipment, processes, and imaging environments can cause data trends to vary even for the same product, making it difficult to reproduce the results obtained in the PoC in the production environment.

●Data acquisition is more difficult than expected
In processes with few defects, it is often difficult to collect data, and preparing training data itself can take a lot of time. Additionally, in many cases, the judgment of good/bad products depends on the experience of skilled workers, which also becomes a factor that slows down the annotation work.

●Unclear Methods for AI Utilization on Site
If it is not decided who on site will use the AI detection results, at what timing, and how, the evaluation criteria and decision flow set during the PoC stage tend to diverge from the actual conditions on site. If the evaluation design is based solely on assumptions from the development side, even if high accuracy is achieved in the PoC, it will not be effectively utilized on site.

In manufacturing sites, challenges such as fluctuations in conditions like equipment differences, process variations, and imaging environments, as well as the subjectivity where judgments slightly differ among skilled workers, and the difficulty in collecting defective data, tend to manifest as characteristics of the data. Additionally, if the method of AI utilization on-site is unclear, this can lead to inconsistencies in data collection policies and labeling standards, which may surface as data-related issues.

These factors ultimately influence data quality and tend to be major causes of failure. In particular, the training data, which serves as the ground truth for AI, is a part where such data-related issues are prone to surface. Since the quality of this data affects subsequent learning processes and the stability of production operations, it is crucial to determine how to create the training data and according to what standards it should be organized.

4. The Annotation Process as a Major Cause of Failure

To organize the various unique "fluctuations" found in manufacturing sites into a format suitable for AI machine learning, it is important to determine how to structure the data and by what criteria to organize it. The annotation process is where these decisions, criteria, and designs most clearly take shape, and the accuracy here tends to significantly impact the subsequent model performance.
Annotation is the process of absorbing such "fluctuations" and preparing the data as training data suitable for machine learning.

●Adjustment of Judgment Criteria
For example, when attempting to automate existing visual inspection processes with AI, judgment criteria based on know-how and experience are already defined on-site. In the annotation process, it is important to accurately inherit these on-site judgment criteria and align the interpretation consistently among workers.
If the judgment criteria are not sufficiently organized during the annotation work or if the detailed judgments on-site are not shared, judgments tend to vary among workers. When such variations mix into the training data, the AI model cannot learn "what should be considered correct," resulting in unstable accuracy.
Therefore, in the annotation process, in addition to inheriting the criteria from the site, it is essential to decide how to handle ambiguous cases and exceptions to minimize judgment variations.

● Class Design (Label Design)
Taking visual inspection, where classifications and names for defect types such as "scratches, dents, foreign objects, discoloration" are organized, as an example, in annotation it is important to adjust the defect classifications used on the manufacturing floor to a granularity that AI can easily learn features from.
In the early stages of AI development, defects are sometimes grouped into a single broad class like "appearance defects," resulting in insufficient reflection of defect type differences in the training data. If learning proceeds in this state, the model may find it difficult to capture feature differences, causing accuracy to become unstable.
By designing classes hierarchically based on on-site standards and defect analysis data—such as scratches (depth, length) / dents (area) / foreign object adhesion (type) / discoloration (hue)—the structure of the training data becomes clearer, and the AI model’s accuracy tends to stabilize.
Since revising class design later can cause significant rework such as re-annotation and retraining, it is important to align the on-site judgment criteria and AI development objectives before starting annotation and solidify the classification granularity.

●Design of Validation Data
To verify the quality of an AI model, separate "validation data" is required in addition to the training data. In manufacturing sites, data tends to fluctuate due to differences in equipment and environmental conditions, so how much of these "variations" at the manufacturing site can be included in the validation data directly affects the reproducibility of PoC accuracy. If the PoC is conducted with insufficient design of validation data, there are cases where high accuracy is achieved in the PoC but the expected performance is not realized in actual operation.
For example,
・Validation data that does not cover the various "variations" in conditions at the site
・Evaluation conducted only under coincidentally matched conditions
・Exception cases omitted from validation
are such situations.

While PoCs tend to be evaluated under limited conditions, actual on-site operations experience daily fluctuations in lighting, equipment, and working conditions, which manifest as gaps.
If the validation data does not fully capture the on-site "variations," it becomes difficult to judge whether the PoC results represent accuracy that can be reproduced in real operations, leading to gaps after full-scale deployment.

5. Key Points to Prevent Failures in Data

In dealing with the unique "fluctuations" of manufacturing sites, the difficulties of data acquisition, and the various challenges that arise in the annotation process described so far, it is important not to solve these issues solely within the annotation process but to design a framework at an upstream stage for "how to define and handle data in AI development."
Here, we organize key points for upstream design to prevent data-related failures in advance. The goal is not to fix individual failures in the annotation process each time they occur, but to create a foundation that makes such failures less likely to happen.

●Solidify Data Definitions Upstream
The quality of data in AI development is greatly influenced by the "definition of premises" such as on-site judgment criteria, NG (No Good) decision conditions, and label structures. Variations in judgment criteria and ambiguities in label systems that surface during the annotation process are often caused by a lack of solidified data definitions upstream.
The main points to solidify upstream are the following three:
・Summarize the judgment criteria used on-site so that they can also be handled by the AI development team
・Design the class granularity and hierarchical structure taking into account both the on-site standards and the AI model
・Decide in advance the extent to which on-site specific "variations" such as equipment differences and environmental differences will be included in the data

If annotation proceeds while these remain ambiguous, it will lead to rework in later stages and gaps between the PoC and production.
Conversely, if data definitions are solidified upstream, adjustments to judgment criteria, class design, and validation data design will naturally progress, greatly reducing failures related to data.

●Design the collaborative process between the field and AI development from the initial stage
Data-related failures common in AI development often stem from insufficient sharing of the NG (No Good) judgment criteria used in the field and the possible changes in conditions that occur on-site. If a PoC is conducted without aligning these, evaluations tend to be performed under limited conditions, leading to situations where accuracy cannot be reproduced in actual operation.
To avoid this gap, it is important for the field and development teams to collaborate from the early stages, aligning the basis for NG judgments and changes in field conditions as shared assumptions. Designing the PoC not only as an accuracy check but also as a process to verify consistency with the field environment helps reduce rework in later stages and facilitates smoother transition to actual operation.

● Establish an operational system to maintain data quality
Even if the data definitions are solidified upstream, if they are not consistently maintained in operation, variations in judgment criteria among workers will arise over time, leading to a decline in data quality. Therefore, when proceeding with annotation work, it is important to design an operational system to preserve the data definitions.
Additionally, when creating data by combining outsourcing and in-house production, it is necessary to clearly define the division of roles, such as handling aspects related to data design like class design and NG criteria in-house, while utilizing outsourcing for parts where standards and definitions have already been established. Furthermore, to scale annotation with consistent quality and speed, it is essential to continuously update a quality standards document that compiles representative examples and subtle OK/NG judgment cases. This makes it easier for outsourcing partners to maintain the "judgment criteria actually used on-site," helping the entire project maintain stable data quality.

6. Summary

There are various factors that cause AI development in manufacturing to fail, such as goal setting, decision-making, and lack of human resources. However, failures related to data tend to have widespread impact and often determine the success or failure of actual operation. This article has focused on organizing the data-related aspects, which have a particularly significant influence among failure patterns.

Of course, securing personnel capable of handling AI and establishing a system that can sustain operations, as well as achieving organizational consensus on KPIs and ROI in AI development, are also indispensable elements beyond data. Even so, carefully establishing the foundation of data quality is the base that supports accuracy and reproducibility in any AI project. Through such steady standard-setting and the design of collaborative processes, AI development in manufacturing can break free from failure patterns and lead to reproducible operations.

7. Human Science Teacher Data Creation, LLM RAG Data Structuring Outsourcing Service

Over 48 million pieces of training data created

At Human Science, we are involved in AI model development projects across various industries, starting with natural language processing, including medical support, automotive, IT, manufacturing, and construction. Through direct transactions with many companies, including GAFAM, we have provided over 48 million high-quality training data. We handle a wide range of training data creation, data labeling, and data structuring, from small-scale projects to long-term large projects with a team of 150 annotators, regardless of the industry.

Resource management without crowdsourcing

At Human Science, we do not use crowdsourcing. Instead, projects are handled by personnel who are contracted with us directly. Based on a solid understanding of each member's practical experience and their evaluations from previous projects, we form teams that can deliver maximum performance.

Generative AI LLM Dataset Creation and Structuring, Also Supporting "Manual Creation and Maintenance Optimized for AI"

Since our founding, our main business and service has been manual creation, and currently, we also support the creation of documents optimized for AI recognition to facilitate the introduction of generative AI for corporate knowledge utilization. In sharing and utilizing corporate knowledge and documents using generative AI, current technology still cannot achieve 100% accuracy with tools alone. For customers who want to make the most of their past document assets, we also provide document data structuring. We offer optimal solutions leveraging our unique expertise, deeply familiar with various types of documents.

Secure room available on-site

Within our Shinjuku office at Human Science, we have secure rooms that meet ISMS standards. Therefore, we can guarantee security, even for projects that include highly confidential data. We consider the preservation of confidentiality to be extremely important for all projects. When working remotely as well, our information security management system has received high praise from clients, because not only do we implement hardware measures, we continuously provide security training to our personnel.

In-house Support

We provide staffing services for annotation-experienced personnel and project managers tailored to your tasks and situation. It is also possible to organize a team stationed at your site. Additionally, we support the training of your operators and project managers, assist in selecting tools suited to your circumstances, and help build optimal processes such as automation and work methods to improve quality and productivity. We are here to support your challenges related to annotation and data labeling.

 

 

 

Related Blog Posts

 

 

Contact Us / Request for Materials

TOP