How to Prevent Personalization of Annotations? Characteristic Causes and Countermeasures of Personalization in In-house Production Sites

Table of Contents

1. Characteristics of Annotation Work and the Impact of Personalization
2. Distinctive Factors of Personalization in Annotation Work
3. Practical Approaches to Personalization
4. Utilizing Outsourcing and External Partners
5. Features of Human Science Annotation Services

1. Characteristics of Annotation Work and the Impact of Personalization

Annotation work involves labeling a large amount of data to create training data for AI learning, and the quality of this work greatly affects the accuracy of the AI.
We have covered this topic extensively in our previous blog articles, so we will omit a detailed explanation of what annotation work entails here. However, one major factor that significantly impacts quality is the personalization of work methods and know-how.

This is not limited to annotation work, but the personalization of work methods and know-how obviously affects not only the variation in quality and productivity but also has a significant impact on various other aspects. In annotation work, although automation technology has advanced significantly compared to before, it still largely relies on manual labor. Therefore, it is a task that tends to be highly personalized. Additionally, it is characteristic of this work that many edge cases and exceptions, which are not included or cannot be fully described in specifications, frequently occur. Without taking necessary measures for information sharing and consensus unification on how to handle these cases, personalization will further progress.

Annotation work is troublesome because even when completed by a single person, there is a very high possibility of quality variation. This becomes even more true when scaling annotation work to multiple workers or when the task needs to be handed over to other members. Therefore, it is necessary to take essential measures against individual dependency, even if it is somewhat troublesome. In this blog, we would like to discuss the causes of this individual dependency and the countermeasures.

2. Distinctive Factors of Personalization in Annotation Work

There are various causes and factors of personalization, but the major factors include the following.

(1) Training relies entirely on OJT, and documents such as work procedures and manuals are not well organized.
(2) Management methods depend on supervisors and annotation project managers.
(3) Quality standards and judgment criteria are not documented or shared among workers.
(Methods for handling edge cases or exceptions are not recorded or maintained.)

Regarding (1) and (2), similar points can be made for tasks and work other than annotation, so we will not specifically mention them. However, in annotation work, the most typical aspect that tends to become tacit knowledge unique to individuals and cause dependency on specific people, differing from other tasks, is the "criteria for judgment" described in (3). These criteria for judgment serve as the basis for deciding how and when to assign the relevant labels.

As we have mentioned in our previous blogs and earlier sections, annotation work involves many exceptions and edge cases. This is why it is a task that heavily relies on human effort. Even if quality standards and decision criteria are documented in advance in procedures and manuals, as annotation progresses, exceptional data or cases that do not fit the criteria inevitably appear, which were not initially anticipated. However, trying to grasp all of this beforehand by reviewing all data in advance to identify exceptions, edge cases, and how to handle them is neither practical nor possible.

When multiple people work together, questions from the workers often arise as the work progresses, providing opportunities to notice exceptions and edge cases. However, special caution is needed when a single engineer is performing annotation work alone. Many engineers responsible for development usually possess various prerequisite knowledge, such as the purpose of AI development and domain knowledge related to the development field. Because they work based on this knowledge, even when encountering edge cases or exceptions, they often do not raise significant doubts and unknowingly use such tacit knowledge to handle the cases in a way that seems "appropriate." As a result, the work may be completed without realizing that the labeling differs from the work specifications or contains contradictions. This is a common story we often hear.

Additionally, annotation involves repeating similar tasks, resulting in a considerable workload. Even if you believe that the criteria for judgment remain consistent and stable, human perception tends to become numb, leading to many cases where the results of annotations done at the beginning differ significantly from those done at the end.
Later, as the volume of work increases and tasks are delegated to others, questions arise from the workers. While addressing these questions, it often becomes apparent that the judgment criteria within the previously completed work data have fluctuated or changed, forcing a review of the past data. This is a common occurrence.

3. Practical Approaches to Personalization

Since methods other than the criteria are similar to other tasks or operations, they are likely easy to infer and will be omitted here. However, how can we prevent personalization regarding the exceptions and edge cases discussed so far?

Since the factors have been explained so far, you can probably imagine to some extent, but when an engineer works alone, it is important to at least establish quality standards and decision criteria at the start of the work, and as the work progresses, even though it may be troublesome, to verbalize and document responses to exceptions and edge cases, thereby formalizing tacit knowledge.

For judgments made unconsciously, it is important to take a break after a certain amount of work and then review the quality standards and decision criteria from the perspective of another worker. However, documenting all approaches to edge cases and exceptions is not easy and requires considerable time. Moreover, since it is not guaranteed that multiple people will work on the tasks in the future, there is a possibility that this significant effort may go to waste.

In annotation tasks, there can be a large number of exceptions and edge cases. Therefore, even if all cases are documented, the volume of text becomes enormous, and it is questionable whether other workers can read and fully understand everything. Additionally, it becomes difficult to reach the necessary topics. For this reason, after performing a certain amount of work, it becomes possible to categorize and classify edge cases and exceptions into patterns. Simply verbalizing and documenting the guidelines and directions for decisions based on these classifications as explicit knowledge is effective.
Also, while handling such cases, you may notice inconsistencies in the standards manual, so documenting how to handle edge cases and exceptions is important in that regard as well.

When multiple people work together, it is important to share documents among workers that record responses to exceptions and edge cases. Preparing a Q&A spreadsheet where workers write down their questions and administrators record the corresponding solutions and answers is also an effective method. As the entries increase, organizing them by category or pattern for better searchability becomes essential.Additionally, workers often do not read answers to questions posed by others. Therefore, implementing a "check column" to confirm whether all questions and answers have been reviewed by every worker is a crucial measure. This practice supports the primary purpose of creating documentation—information sharing—which leads to quality stabilization.

4. Utilizing Outsourcing and External Partners

Many annotation vendors have the know-how to proceed with work while preventing the work from becoming dependent on specific individuals, but for customers who are considering performing annotation in-house, there is a concern that outsourcing or delegating the work will prevent the accumulation of annotation know-how internally.

Many customers consider requesting other internal workers or outsourcing vendors when annotations are scaled up, but even when working alone on a small amount during phases like PoC, it is still necessary to address the personalization issues mentioned so far and have expertise related to annotations.

Especially in the early stages such as the PoC phase, neglecting the establishment and maintenance of quality standards and decision criteria, failing to document and formalize tacit knowledge, results in inconsistent decision criteria and varying quality. This can lead to not achieving the expected AI accuracy, or problems that were unnoticed during the PoC phase becoming apparent once outsourcing or delegation begins. At that point, many questions arise from vendors, revealing contradictions in the standards documents and inconsistencies within the data previously processed by the client. We have seen many customers encounter such situations.

When conducting annotation in-house, one approach is to first request support from an external vendor to prevent dependency on specific individuals. Without spending a large budget or commissioning elaborate consulting, if the issues or consultation topics are diverse but not overwhelming, there are vendors who offer free and casual consultations, so why not reach out to them once? Also, even if you don't need to make a formal request, building a relationship with vendors you can casually consult on such minor matters is another good approach. Our company also supports in-house annotation efforts like these, so please feel free to contact us anytime.

5. Features of Human Science Annotation Services

Over 48 million pieces of training data created

At Human Science, we are involved in AI model development projects across various industries, starting with natural language processing, including medical support, automotive, IT, manufacturing, and construction. Through direct transactions with many companies, including GAFAM, we have provided over 48 million high-quality training data. We handle a wide range of training data creation, data labeling, and data structuring, from small-scale projects to long-term large projects with a team of 150 annotators, regardless of the industry.

Resource management without crowdsourcing

At Human Science, we do not use crowdsourcing. Instead, projects are handled by personnel who are contracted with us directly. Based on a solid understanding of each member's practical experience and their evaluations from previous projects, we form teams that can deliver maximum performance.

Not only for creating training data but also supports the creation and structuring of generative AI LLM datasets

In addition to creating labeled and identified training data for data organization, we also support the structuring of document data for generative AI and LLM RAG construction. Since our founding, we have been engaged in manual production as a primary business and service, leveraging our unique know-how gained from extensive knowledge of various document structures to provide optimal solutions.

Secure room available on-site

Within our Shinjuku office at Human Science, we have secure rooms that meet ISMS standards. Therefore, we can guarantee security, even for projects that include highly confidential data. We consider the preservation of confidentiality to be extremely important for all projects. When working remotely as well, our information security management system has received high praise from clients, because not only do we implement hardware measures, we continuously provide security training to our personnel.

In-house Support

We provide staffing services for annotation-experienced personnel and project managers tailored to your tasks and situation. It is also possible to organize a team stationed at your site. Additionally, we support the training of your operators and project managers, assist in selecting tools suited to your circumstances, and help build optimal processes such as automation and work methods to improve quality and productivity. We are here to support your challenges related to annotation and data labeling.