Some parts of this page may be machine-translated.

 

[Spin-off] Interview with our experienced PM on "Data Annotation Work and its Essence" ~ "Tackle Edge Cases!" Our Common PM Thoughts ~

[Spin-off] Interview with our experienced PM on "Data Annotation Work and its Essence" ~ "Tackle Edge Cases!" Our Common PM Thoughts ~



Spin-off blog project
- Annotation that supports AI in the DX era. The reality of the analog field.
Interview with our experienced PM on "Data annotation work and its essence"
~"Tackle the edge cases!" Our common goal as PMs~

Our company has been publishing various blogs about data annotation and AI. In those blogs, we have mainly shared general knowledge and know-how. Data annotation may seem simple at first glance, as it involves putting the content into words, but it is actually a task that cannot be avoided by humans and contains a lot of "ambiguity". Therefore, there is a lot of interaction between people involved in the process. As a result, it requires a lot of experience and know-how to ensure quality and productivity, which cannot be achieved by just following clean theories.

 

Therefore, we believe that understanding the specific problems and solutions that occur in the actual data annotation process can serve as a helpful guide to success in data annotation.

 

In our company, what actually happens and what specific responses and measures are taken? Unlike regular blogs, in our spin-off blog project titled "Data Annotation: Supporting AI in the DX Era. The Realities of the Analog Field", we would like to share the realities of the field, including our unique features and commitments.

 

Table of Contents

1. What is Data Annotation Work?

Many of the articles about "data annotation" are explanations such as "teaching data to AI". Our company also publishes such blog articles. However, with just that, we may not be able to convey the difficulties and challenges that arise in actual data annotation work, and only provide a general explanation.

 

Related Blogs

What is an annotation? Explanation from its meaning to its relationship with AI and machine learning.

 

So, in this blog post, we will change our focus and discuss what our PM has been feeling while carrying out data annotation tasks on a daily basis. I asked my fellow PM about the essence and true nature of data annotation, and we were able to share our thoughts and ideas.

 

Let's introduce it right away.

 

This PM expressed the unique difficulty of data annotation work in words.

 

What is Data Annotation?

"To create a large amount of data that meets the quality desired by customers, it is necessary to organize and manage the information used for decision-making throughout the team."

 

The heart is

"Difficult cases" that do not fit easily into the criteria of specifications and definitions are often encountered in data annotation work for large amounts of raw data.
By confirming individual judgments through close communication with clients, and by organizing information in a more digestible manner, these "difficult cases" can be turned into "decidable cases" and shared within the team.
In cases where specifications are complex and there are multiple points to consider when making judgments during labeling, or when dealing with data that contains a lot of ambiguity, it is important to understand the "difficult points" from the perspective of the workers, and to organize information in a way that makes it easier for them to understand and convey to them.
Therefore, in management, it is important to anticipate "difficult cases" as much as possible in order to accurately understand the quality desired by the client, and to continuously improve the criteria for judging specifications and definitions through Q&A with the client, and to communicate these criteria clearly to each worker.

 

This PM expressed from a perspective that focused more on data annotation work.

 

What is Data Annotation?

"The power of (logic + intuition) multiplied by the power of patience."

 

The heart is

Data annotation is a process of determining whether something is white or black based on the rules set in the specifications and definitions. It requires logical judgment with solid evidence.
However, there are many cases where it is difficult to determine whether something is white or black, also known as edge cases. In such cases, it is necessary to confirm and make judgments based on intuition, while also understanding the criteria for judgment to some extent.
Each judgment in data annotation may seem easy at first glance, but it requires patience and consistency to work with accurate criteria and stable speed for a long time. It is truly a task that requires endurance.
Furthermore, it is necessary to continue working for long periods of time, for days, weeks (or even months in some cases), and it is a characteristic of data annotation that it is a difficult task that continues like multiplication.

 

I think like this.

 

Data annotation

"Catch the discomfort while working."

 

The heart is

Although the general framework is determined in documents such as specifications and definitions, there are many data that do not fit within that framework, which are called "edge cases". Edge cases are only recognized when they occur, but there are also cases that are so trivial that they can be overlooked by people. Many edge cases start with a feeling of "Huh? Something seems off." This feeling is the source of discomfort, and there is a process of building logic to solve it. Through this logic, understanding is reconstructed and the quality of data annotation is expected to improve.

2. What to do with edge cases

When looking at it this way, there are common things among each PM. That is how to handle "edge cases" and "difficult cases". In order to carry out annotation work, it is necessary to manage quality and productivity, and there are various factors that hinder these, but one of the most important and key factors is the handling of edge cases.

 

By neglecting edge cases and using opportunistic criteria such as whether it was annotated yesterday or not, the consistency of judgment cannot be maintained and there will be variations in quality. Even if such variations in quality occur, using them as training data for AI will result in AI misrecognizing or not improving recognition accuracy.

 

In addition, if you get stuck thinking about edge cases, it can be a major obstacle to the productivity of data annotation. How to handle edge cases in the annotation process is also a showcase of the PM's skills, and our PMs each come up with their own methods and share information to apply them to daily projects.

 

For these edge cases, please also refer to our spin-off blog here.

 

[Spin-off] How to deal with edge cases that cannot be covered in the specification document ~ Overcoming edge cases that cause hesitation in data annotation ~

3. Summary

Our company has accumulated and shared know-how to successfully lead data annotation and AI machine learning, and has established methods. We utilize these methods regardless of the scale, difficulty, or domain of the project. However, data annotation work often leads to unexpected challenges. In order to overcome these challenges and aim for even higher goals, our team works together to come up with solutions and approaches. If the responsible parties who are reading this can empathize with the difficulty of data annotation work, please consider consulting with our company when considering outsourcing.

 

Author:

Kitada Manabu

Annotation Group Project Manager

 

Since the establishment of our Data Annotation Group, we have been responsible for a wide range of tasks, from team building and project management for large-scale projects, to creating annotation specifications for PoC projects, and consulting for scalability, with a focus on natural language processing.
Currently, in addition to being a project manager for image and video annotation projects, we also work as a seminar instructor for data annotation and engage in promotional activities such as blogging.



 

 

 

Related Blogs

 

 

Popular Article Ranking

Contact Us / Request for Materials

TOP