Some parts of this page may be machine-translated.

 

[Spin-off] Training data starts with good teacher development - What communication is needed in the field?

alt

2023.9.1

[Spin-off] Training data starts with good teacher development - What communication is needed in the field?

Spin-off Blog Project
――Annotation Supporting AI in the DX Era. The Real Analog Scene
Training Data Starts with Good Teacher Development
~What Communication is Needed in the Field~

Until now, our company has been publishing various blogs related to annotation and AI. In those, we have mainly conveyed general knowledge and know-how. Annotation work may seem simple at first glance when put into words, but because it is a task that inevitably involves humans and contains a lot of "ambiguity," it naturally requires a lot of human interaction. For this reason, in a sense, it is a gritty process where many things cannot be resolved by the neat logic commonly found elsewhere, and ensuring quality and productivity actually requires various experiences and know-how.
Therefore, we believe that knowing concretely about the problems that occur in actual annotation sites and how to deal with them can serve as useful hints for successfully leading annotation.
At our company’s sites, what actually happens and what specific responses and measures are taken? Unlike regular blogs, under the spin-off blog project titled "Annotation Supporting AI in the DX Era: The Real Analog Site," we would like to convey the real situation on site, including our company’s unique features and particular commitments.

 

>>Past Published Blogs (Partial)

7 Tips for Successfully Leading Annotations

What is training data? An explanation from its relationship with AI, machine learning, and annotation to how to create it.

Table of Contents

1. Can everything be conveyed in specifications and work descriptions?

It can be said that the quality of the annotator's work determines the quality of the training data. Of course, annotation is based on defined requirements, so it is important that the requirements are clearly established first. However, even if the requirements are confirmed, a specification document is prepared based on them, and proper work instructions are provided, there are still significant traps that cannot be avoided.

 

No knowledge or qualifications are required to perform annotation, and as mentioned at the beginning, since it is a task that seems simple when put into words, there are some misunderstandings. For example, people unconsciously distinguish dog breeds based on their past experience and intuition. Few people usually think and judge logically like, "If this part looks like this, then it’s a Chihuahua." It’s an intuitive judgment... In annotation work, there are inevitably parts where one must rely on human judgment based on experience and intuition. Also, because a large amount of data is handled, many exceptions arise that cannot be judged by the specification document alone. There are significant traps hidden in these... (It is not realistic to check all the data in advance and incorporate all exceptions into the specification document, and if the characteristics of dog breeds, for example, were detailed exhaustively in the specifications, it would become an enormous volume of text and would be impractical to use.)

 

No matter how carefully a person works, discrepancies in judgment will inevitably occur. I have also experienced discrepancies in judgment and recognition while participating in various projects as an annotator. To produce high-quality training data, it is crucial that the annotators themselves are good teachers, and for that, it is necessary to properly manage the people involved.

 

It goes without saying that there are many different types of people. Some tend to prioritize speed, while others become overly cautious. Additionally, some people may struggle with communication, such as asking questions, and each person's personality can affect the quality of annotation. Annotation work requires careful attention to detail and continues endlessly. As time passes, there can be sensory dulling, which inevitably leads to fluctuations in judgment and careless mistakes.

 

In addition to explaining the preparation of specifications and rules, the PM can prevent many judgment errors by conveying key points and important aspects. However, it is essential to keep an eye on the workers' conditions and quality until the work is completed, ensuring that the annotators can maintain quality and work smoothly. In other words, education and support to become a good teacher are crucial.

2. Education and Support through Communication

As mentioned earlier, it is possible to enhance understanding by creating clear specifications and supplementary materials, and maintaining them as needed. However, simply sharing documents leads to one-way communication, and we cannot ensure mutual understanding. When we open the lid, we often find that the way we communicated was poor, and they did not understand... leading to having to redo the work from scratch... (ding). This often results in inflated costs and time. Depending on the scale and difficulty of the annotations, we have implemented education and support focused on communication, tailored to the situation.

 

However, there are various types of communication. Group meetings? Contact via chat tools? Email? Among these options, what is the best approach? Based on our experience, although it requires effort, the most effective form of communication is one-on-one meetings.

 

In annotation work, communication is often required to clarify the inherent ambiguities of annotation and to check the worker's understanding of the specifications. In such cases, 1-on-1 meetings are indeed effective. Complex nuances that cannot be conveyed through text can be communicated directly by speaking, and if combined with screen sharing, it becomes even easier to understand. Most importantly, being able to communicate face-to-face (or across the display if remote) is the best. It is also easier to convey individual matters that are difficult in group meetings, and annotators can speak freely without worrying about others, making it easier for them to consult and express their opinions.

3. Implementation of 1on1

This is about a certain natural language processing annotation project. In this project, there were reviews for each individual annotator from the client, and if poor performance continued, that person would not be able to continue with the project, which was quite strict.

 

Annotator A had been barely passing for several months, but eventually fell below the passing score. If we don't intervene and help recover now, they might find themselves with no way forward. Continuously barely passing suggests a possible lack of deep understanding of the specifications. Upon reviewing feedback from the client reviewers, it was clear that A was annotating in a way that differed from what was explained in the specifications. Therefore, I felt it was necessary for A to thoroughly and reliably understand the specifications, but the Q&A via chat we had been doing so far was insufficient no matter how much time was spent. So, we decided to hold a 1-on-1 session.

 

"You fell below the score. Shall we have a private lesson?" Right after I emailed this feedback to Mr. A, I received a personal message in the chat. "I messed up~. Please help...".

 

Act quickly for good, so we will promptly conduct a 1-on-1 session. We will go through each piece of feedback, explaining why it is incorrect by comparing it with the specifications. Then, you might say, 'Oh! So this part of the specifications is interpreted that way! I have been misunderstanding it all this time...'. '...What?' (I will gather my thoughts and continue explaining...). We also reviewed some common mistakes and spent about an hour together confirming the feedback to deepen our understanding.

 

If you have any concerns while working, please refer to the previous feedback or double-check the specifications. Of course, if you are still unsure, feel free to ask questions in the chat. If it's difficult to explain in writing, let's have a direct meeting.

 

The next day, perhaps due to the effect of the advice, there were more questions than usual, but there seemed to be no fundamental misunderstandings in the way of thinking. A few days later, the results from Mr. A, who returned to me as the PM, were passing scores. I sent feedback to Mr. A: "That's great! The number of mistakes has drastically decreased, and your score is good. You understand it better than I do," and felt relieved. No, I told myself to remain vigilant going forward, and then sent a message to another annotator who had newly fallen below the passing score: "You fell below the score. Shall we have a private lesson...?"

4. Summary

This time, I talked about the importance of training annotators to create training data for quality assurance, focusing on one method: communication, with real examples. Among them, I feel that one-on-one meetings are very effective because they allow for education, advice, and course correction for specific annotators. By meeting face-to-face, we can understand whether the message was conveyed through speech and gestures, and we can also grasp the person's character, making it easier to approach them later with questions like, "How can I provide feedback that will be easily understood by this person?"

 

To create good training data, we nurture annotators as teachers. Annotation is a handcrafted form of manufacturing. Communication is an essential element that forms the foundation of this process. Some may think that annotation is simple as long as you follow the specifications without putting in much effort. However, in the actual work environment, situations like this often do not go smoothly. To overcome such problems and challenges, to create higher-quality training data, and to ultimately reduce unnecessary costs arising from corrections, we are committed to focusing on people in our management approach. Along with ensuring quality, we also hold a strong desire to create a comfortable working environment.

 

Depending on the scale and continuity of the annotations, this method may not always be correct. However, in annotation work, which often involves unfamiliar data and rules, it is important to provide appropriate training and support throughout the work period to ensure quality. This requires not only throwing information in a text-based format but also achieving consensus and accumulating know-how through direct interaction with the person (or remotely via a screen), which is something I have learned from my past experiences, even though it may seem obvious.

 

Such work may be messy and not considered a smart way of doing things. However, from the perspective of the field, I believe this is what annotation truly is. Our company is committed to diving into this messy work without hesitation, and we would like to continue assisting everyone with this spirit.

 

Author:

Manabu Kitada

Annotation Group Project Manager
  Since the establishment of our Annotation Group, mainly focusing on natural language processing,
I have been extensively involved in team building and project management for large-scale projects,
formulating annotation specifications for PoC projects, and consulting for scaling up.
Currently, alongside managing projects related to image/video and natural language annotations,
I am engaged in annotation seminar instruction and promotional activities such as blogging.

 

 

 

Related Blog Posts

 

 

Contact Us / Request for Materials

TOP