
In the field of AI development and data creation for machine learning, it is common practice to decide on the annotation label definitions (annotation specifications) before proceeding with the work. However, in actual projects, it is not uncommon for these definitions to be reviewed and revised midway.
The issue is how to handle the "existing data" for which annotation work has already been completed at that time. If handled incorrectly, inconsistencies in the data can lead to poor accuracy and it can take time to identify the cause.
This article organizes common problems that occur when label definitions are changed and how to properly handle existing data.
- 1. Why Label Definitions Change Midway
- 2. Problems Caused by Label Changes
- 3. Common Failure Patterns
- 4. How to Handle Existing Data
- 5. How to Make Decisions in Practice
- 6. Recommended Operational Flow
- 7. Strategies to Minimize Label Changes
- 8. Summary
- 9. Human Science Training Data Creation, LLM RAG Data Structuring Outsourcing Service
1. Why Label Definitions Change Midway
Changes to label definitions are not a sign of project failure but rather a common adjustment in practical work. Especially in the early stages of model development, such as PoCs, definitions are often created based on hypotheses and are frequently improved while working with actual data.
For example, when you actually start annotation, you may find more ambiguous cases than expected, leading to inconsistencies in judgment among annotators. Additionally, if a large amount of data does not fit the assumed definitions, it may become necessary to review the rules themselves.
Thus, changes to label definitions often occur due to a deepening understanding of the data as work progresses and cases that could not be fully anticipated initially, making them somewhat unavoidable.
Reference blog:
2. Problems Caused by Label Changes
When label definitions change, there is a possibility that the standards between previously created data and newly created data will become misaligned. This is the biggest issue.
For example, if content that was previously grouped under a single label is later split into two, past data cannot directly conform to the new standards. Conversely, even when multiple labels are merged into one, if old and new rules coexist, it negatively affects the accuracy of training and analysis.
As a result, data that should have the same meaning may be labeled differently, and the overall consistency of the dataset can be lost. Furthermore, if it becomes unclear at which point in time the data was created according to which definition, the very basis for evaluation and improvement collapses.
A particular point of caution is the case where old data is mixed and used as is even after label changes. At first glance, it may seem like the amount of data has increased, but in reality, the variation in quality increases, causing the model's behavior to become unstable.
3. Common Failure Patterns
In annotation sites during label definition changes, prioritizing work speed can lead to responses that seem fine at the moment but cause problems later on.
A typical example is proceeding only with the additions under the new rules while continuing to use the existing data as is. In this state, old and new rules coexist within the same dataset, making it extremely difficult to organize later.
There are also quite a few cases where the changes are not documented and are shared only verbally or via chat. In such cases, when the person in charge changes, the background becomes unclear, significantly undermining the reliability of the data.
4. How to Handle Existing Data
When label definition changes occur, how existing data is handled greatly affects subsequent accuracy and operational efficiency. Here, we introduce three typical approaches.
4-1. Method to Unify Everything Under the New Definition
This method unifies everything, including existing data, under the new label definitions. Past data is reviewed according to the new standards, and re-annotation is performed as needed.
Although it incurs costs, it ensures data consistency, making it the most stable approach for long-term operation.
4-2. Method to Manage New and Old Data Separately
This method involves managing data with the old definitions and data with the new definitions separately. The old data is limited to reference purposes such as accuracy comparison and trend verification with the past, while only the new data is used for training and evaluation.
In this case, it is important to maintain a state where it is clearly identifiable which data is based on which definition.
4-3. Method to Handle with Mapping Rules
This method defines the correspondence between old labels and new labels and converts (maps) them.
For example, in cases where "deep scratch" and "shallow scratch" are integrated into "scratch," simple label replacement can be done using batch replace functions in text editors and the like.
However, this method cannot be used for changes that split one label into multiple labels (such as splitting "wound" into "deep wound" and "shallow wound"). In such cases, one of the two previously mentioned methods must be used.
5. How to Make Decisions in Practice
Which method to choose depends on the content of the label changes and the purpose of the project.
For example, if the change is to merge labels A and B into one, it is possible to convert and utilize the existing data. On the other hand, if the change involves splitting the contents of label A into multiple parts, splitting work will naturally be required, resulting in additional labor and costs.
Also, the appropriate response varies depending on the operational policy, such as whether existing data will continue to be used for training or will be replaced with new data. Such decisions need to be made not only considering immediate labor but also from the perspective of how extensively the data will be utilized in the future.
6. Recommended Operational Flow
When a change in label definitions occurs, first clearly document the details of the change and organize the differences between the old and new definitions as well as the reasons for the change. Next, identify which data will be affected and understand, for example, to what extent past training data and evaluation data will be impacted.
Based on that, decide on a policy such as whether to re-annotate and unify the existing data, manage it separately, or handle it through mapping. Here, "unify" means to recreate everything, including the existing data, according to the new definition.
Once the policy is decided, instead of applying it to the entire dataset all at once, re-annotation or conversion is performed on a portion of the data, and model training and evaluation are actually tested using that data. By checking quality and workload at this stage, major rework can be prevented.
It is also important to manage the change history and version information of label definitions so that the background can be traced later.
7. Strategies to Minimize Label Changes
Completely eliminating label changes is difficult, but by carefully designing at the initial stage, it is possible to minimize their impact.
For example, by trying annotation in advance on a small amount of sample data and identifying cases where judgments differ, major specification changes can be prevented. Additionally, including concrete examples and edge cases in the specification document to clarify points that are likely to cause confusion in judgment beforehand is effective.
8. Summary
Changes to label definitions are unavoidable in annotation work. What is important is not the change itself, but how to design the handling of data afterward.
A mixed state of data before and after changes not only degrades data quality but also significantly impairs the overall efficiency of AI development. When label definition changes occur, it is crucial to address them comprehensively, including organizing the changes and designing operational rules.
If you have concerns about reviewing label definitions or handling existing data, organizing these matters early on can prevent major setbacks later.
Also, caution is necessary when outsourcing annotation work to external vendors. As we have seen so far, changing label definitions is not simply a matter of "just changing the definitions." Proper management of the changed data, sharing information about the changes with the client, and notifying annotators—communication with all parties involved—is essential. If a vendor cannot properly handle these tasks, the definition changes may not go smoothly, and valuable data could be wasted. It is important to select a vendor who can appropriately respond to definition changes.
Reference blog:
9. Human Science Training Data Creation, LLM RAG Data Structuring Outsourcing Service
Over 48 million pieces of training data created
At Human Science, we are involved in AI model development projects across various industries, starting with natural language processing, including medical support, automotive, IT, manufacturing, and construction. Through direct transactions with many companies, including GAFAM, we have provided over 48 million high-quality training data. We handle a wide range of training data creation, data labeling, and data structuring, from small-scale projects to long-term large projects with a team of 150 annotators, regardless of the industry.
Resource management without crowdsourcing
At Human Science, we do not use crowdsourcing. Instead, projects are handled by personnel who are contracted with us directly. Based on a solid understanding of each member's practical experience and their evaluations from previous projects, we form teams that can deliver maximum performance.
Generative AI LLM Dataset Creation and Structuring, Also Supporting "Manual Creation and Maintenance Optimized for AI"
Since our founding, our main business and service has been manual creation, and currently, we also support the creation of documents optimized for AI recognition to facilitate the introduction of generative AI for corporate knowledge utilization. In sharing and utilizing corporate knowledge and documents using generative AI, current technology still cannot achieve 100% accuracy with tools alone. For customers who absolutely want to leverage their past document assets, we also provide document data structuring. We offer optimal solutions that leverage our unique expertise, deeply familiar with various types of documents.
Secure room available on-site
Within our Shinjuku office at Human Science, we have secure rooms that meet ISMS standards. Therefore, we can guarantee security, even for projects that include highly confidential data. We consider the preservation of confidentiality to be extremely important for all projects. When working remotely as well, our information security management system has received high praise from clients, because not only do we implement hardware measures, we continuously provide security training to our personnel.
In-house Support
We provide staffing services for annotation-experienced personnel and project managers tailored to your tasks and situation. It is also possible to organize a team stationed at your site. Additionally, we support the training of your operators and project managers, assist in selecting tools suited to your circumstances, and help build optimal processes such as automation and work methods to improve quality and productivity. We are here to support your challenges related to annotation and data labeling.

Text Annotation
Audio Annotation
Image & Video Annotation
Generative AI, LLM, RAG Data Structuring
AI Model Development
In-House Support
For the medical industry
For the automotive industry
For the IT industry
For the manufacturing industry
























































































