Some parts of this page may be machine-translated.

 

  • Annotation Agency Service: HOME
  • Blog
  • [Spin-off] How to deal with edge cases that cannot be covered in the specification document ~ Overcoming edge cases that cause hesitation in data annotation ~

[Spin-off] How to deal with edge cases that cannot be covered in the specification document ~ Overcoming edge cases that cause hesitation in data annotation ~

[Spin-off] How to deal with edge cases that cannot be covered in the specification document ~ Overcoming edge cases that cause hesitation in data annotation ~



Spin-off blog project
――Data annotation supporting AI in the DX era. The real world of analog
How to deal with edge cases that cannot be covered by specifications
~Overcoming the dilemma of data annotation caused by edge cases~

Our company has been publishing various blogs about data annotation and AI. In those blogs, we have mainly shared general knowledge and know-how. Data annotation may seem simple at first glance, as it involves putting the content into words, but it is actually a task that cannot be avoided by humans and contains a lot of "ambiguity". Therefore, there is a lot of interaction between people involved in the process. As a result, it requires a lot of experience and know-how to ensure quality and productivity, which cannot be achieved by just following clean theories.

 

Therefore, we believe that understanding the specific problems and solutions that occur in the actual data annotation process can serve as a helpful guide to success in data annotation.

 

In our company, what actually happens and what specific responses and measures are taken? Unlike regular blogs, in our spin-off blog project titled "Data Annotation: Supporting AI in the DX Era. The Realities of the Analog Field", we would like to share the realities of the field, including our unique features and commitments.

 

Table of Contents

1. Edge cases always occur

If you have experience with data annotation, you have probably encountered "edge cases" that everyone experiences. An edge case refers to something that is difficult to determine whether it should be enclosed as a target in image annotation, and if it should be enclosed, how it should be enclosed. These things are not clearly stated as criteria or exceptions in specifications or manuals, but they cannot be ignored.

 

I also encountered this while working as a data annotator. Even when I asked other annotators I was working with, they would often respond with "Hmm... I might do it this way, but I'm not sure." or "I feel like I don't need to annotate this." The pace of our work should be one image per minute, but often times, 10 minutes would have already passed without us realizing it.

 

As data annotation involves a lot of ambiguity, these edge cases are bound to occur. Even with relatively easy annotations, there may be moments of hesitation. If we rely on intuition to annotate these edge cases, what was white yesterday may become black today, or annotator A may choose black while annotator B chooses white, and annotator C may be undecided. This can lead to AI learning in an inconsistent manner, affecting its accuracy.

 

"While it may not be said that mastering edge cases is equivalent to mastering data annotation, it is without a doubt one of the most time-consuming tasks in daily work."

 

2. Create a list of questions

Data annotators and PMs will consult and discuss with the client for edge cases where conclusions cannot be reached even after referring to the manual. Once a decision is made, it will be communicated to the data annotators. However, it is important to ensure that the information is not buried in the process.

 

When data annotators are working remotely, most questions are asked through chat tools. As you may know, in chat tools, threads such as questions gradually flow away as past information and get buried. Then, "Huh? I think A-san asked a similar question before, but where was it?" and you waste time searching or scrolling through the screen. Also, on-site, there are cases where there is no record left because of verbal exchanges like "Mr. Kitada! Is this XXX?" "Yes! It's XXX!" (In this case, it would be good if PM Kitada kept a record...)

 

To avoid such waste and loss of information, questions and answers should not become transient information at that time. In order to make it easy to access and quickly refer to similar questions, we prepare supplementary materials such as an edge case collection and store and share them in a spreadsheet called "Question List".

3. Differences from the tool's comment function

Some annotation tools have a comment function. This is a very useful feature as it allows the annotator to directly communicate specific issues to the data annotator when sending back for QA checks. However, these comments often have a drawback in terms of information aggregation in most tools. This is because information and knowledge are dispersed among individual annotators.

 

We also use tools with commenting functions at our company. In doing so, we have experienced differences in understanding among individual data annotators due to the dispersion of information. There have been instances where feedback given to A-san regarding a certain annotation actually applied to other annotators as well. With the commenting function, feedback can only be given to each annotator, so even if the same edge case occurs for multiple annotators, it is necessary to comment each time, which can be time-consuming.

 

As a result, the tool's comment function is limited to communication with individual data annotators, so important information cannot be shared with everyone. In the end, it is necessary for the PM or checkers to pick up the information and include it in documents or question lists.

 

In terms of delivering necessary information to everyone without any shortage, a question list that can aggregate information may seem like a labor-intensive analog task, but it can be considered a simple and effective tool for information aggregation and management.

4. Can share information with customers

Through communication with our clients via a list of questions, we often hear them say, "We have gained insights that we did not anticipate at the beginning of development." These insights can also help improve the direction of AI development. If desired, we can provide this list of questions upon delivery.

 

>>Efficiency, quality improvement, and cost reduction achieved through data annotation outsourcing - Improved efficiency of AI development through division of labor, contributing to shortened development period

5. Summary

In data annotation, edge cases that are not covered in the manual always appear, which can be a source of frustration for data annotators. This can lead to a decrease in productivity as time is spent on considering policies and conducting inquiries. In order to minimize these factors, our PM not only utilizes a list of questions, but also proactively identifies and resolves potential edge case patterns at an early stage (sometimes even before starting the annotation process), in order to smoothly proceed with the work. This also requires thoroughly checking the data provided by the client. Although it may be a time-consuming task, spending a few hours here can save us from the risk of significantly increasing the workload and compromising the quality once the actual annotation process begins. We hope that the completed annotation data will contribute to the success of our clients' AI development.

 

Author:

Kitada Manabu

Annotation Group Project Manager

 

Since the establishment of our Data Annotation Group, we have been responsible for a wide range of tasks, from team building and project management for large-scale projects, to creating annotation specifications for PoC projects, and consulting for scalability, with a focus on natural language processing.
Currently, in addition to being a project manager for image and video annotation projects, we also work as a seminar instructor for data annotation and engage in promotional activities such as blogging.



 

 

 

Related Blogs

 

 

Popular Article Ranking

Contact Us / Request for Materials

TOP