Some parts of this page may be machine-translated.

 

7 Tips to Successfully Lead Annotations

7 Tips to Successfully Lead Annotations

7 Tips to Successfully Lead Annotations

Data annotation creates training data for AI models for various purposes. The data handled includes images, videos, text, audio, etc., and the tools and annotation work requirements vary for each. Training data is the foundation of AI model learning, so it is necessary to create data that meets specifications and required quality. In addition, in the development of AI models where speed is required, quick execution of annotation is essential. Furthermore, cost cannot be overlooked. Achieving these required elements is the key to successful annotation. We at our company are constantly trial and error, and in that experience, we will introduce and explain the 7 key points to keep in mind, mainly in terms of quality.

>>Related Blogs
What is Data Annotation? Explanation from its meaning to its relationship with AI models and machine learning



Table of Contents

1. 7 Tips for Successful Data Annotation

Data annotation is mainly the process of tagging data. It may seem like a simple task, but in reality, there are many challenges that come with it. This is because data annotation, and ultimately AI models, are trying to mimic the way our brains intuitively recognize things. For example, when annotating a "Tuna Quality Recognition AI Model" that mimics the way fishmongers judge the quality of tuna based on the cut of its tail, it is necessary to incorporate the implicit knowledge (experience as a feature) that the fishmongers have based on their experience. How to manage and capture these intuitive tasks quantitatively and qualitatively is the key to the success of data annotation and ensuring the quality of the training data.

1-1. Collecting Various Types of Data

Just as people accumulate knowledge through experience and become able to respond to various challenges, AI models also improve recognition accuracy by learning from diverse data. To achieve this, it is important to gather various types of data as much as possible. For example, when detecting cars, it is important to not only use images with high traffic volume in urban areas, but also images with few vehicles and backgrounds such as mountains and winding roads. In addition, images with cars at various angles, such as front, side, back, and diagonal, should also be prepared. By teaching these various features related to cars to the AI model, recognition accuracy will be improved.

 

Even with a small amount of data, data augmentation such as reducing image resolution, flipping images, and cropping parts of images can supplement the types of data. In addition, if there is a wide variety of data but a small amount, overfitting may occur in AI models, which can lead to incorrect recognition of newly loaded images. Therefore, it is ideal to have a large amount of data available. Our company often receives requests for thousands to tens of thousands of files.

1-2. Create Work Standards and Specifications

In order to create training data for AI models, manual data annotation is necessary. To ensure that the annotators understand the requirements and create accurate data, it is important to have organized information and provide clear work guidelines and specifications.

 

In addition to explaining the rules of annotation with text, let's make it visually easy to understand by using screenshots of the work tools. It would also be helpful to have a flowchart from the start to the end of the process. Also, if there are any edge cases that may cause confusion, be sure to include them.

 

If possible, let's conduct test data annotation. This allows us to identify edge cases, improve work standards and specifications, review tool settings, and predict progress. This will help us smoothly proceed with the actual work. However, there may not be enough time to select and set up the annotation tool. Also, it is not realistic to identify all edge cases in advance, so it is inevitable that we will have to deal with them after the annotation work has started. For handling edge cases, please also refer to the section on communication mentioned below.

1-3. Establishing an efficient process for data annotation

When performing data annotation, it is important to consider in advance how to proceed efficiently. For example, by setting up the annotation tool to quickly select frequently used tags, it is possible to reduce the workload by a few seconds. If the project involves creating 10,000 bounding boxes per person, this alone can save 10,000 seconds to 30,000 seconds = 2.5 to 8 hours per person.

In addition, in projects with many types of tags, simply switching tags can take time. Some tools also allow you to set keyboard shortcuts for tags. When selecting a tool, be sure to keep these points in mind.

>>Related Blogs
5 Recommended Data Annotation Tools: Comparing 3 Key Points for Choosing the Right Tool

 

Not only productivity, but also in terms of quality, the starting position for image annotation is important. The ease of work and the potential for careless mistakes can be affected by the starting position and the way the work is carried out. It can also have an impact on the checking process mentioned later. Of course, the preferred method of work may vary depending on the data annotator, so it is not necessary to always follow a specific method. However, it is a well-known fact that complex work methods and tool operations can lead to human error, so it is recommended to keep the number of steps and actions to a minimum and to use simple operations and work methods as much as possible.

In our actual project experience, we often hear voices saying, "I continued doing it the way I started because I was used to it, but there is actually an easier and faster way." Therefore, it is important to pick up the voices of data annotators and share easier methods with the entire team.

1-4. Establish a check process

It goes without saying, but let's incorporate the check process into the data annotation. By doing so, we can ensure the quality of the data annotation and lead the project to better results.

 

Installation of Check Phase:

By checking the annotated data, we can discover careless mistakes, discrepancies in individual annotators' understanding, and other issues. By checking and correcting these, we can ensure consistent data and also obtain feedback for the work. Therefore, we recommend conducting checks from an early stage, as delaying them can lead to continued errors.

There are two methods for checking: one is to have a dedicated checker, and the other is to have mutual checks by data annotators. Which one to choose depends on the difficulty and scale of the project.

 

Check Rate:

If full checking is possible, it is best to do so, but it will cost more. Depending on the accuracy required by the AI model, spot checks may be sufficient. Also, if the difficulty of annotation is low, it is expected that there will be fewer mistakes in the work, so a low rate spot check is also a reasonable choice. In addition, as the project progresses, the annotators will become more familiar, so it is effective to start with a full check and switch to spot checks later.

 

Creating Check Procedure Documents and Check Sheets:

In addition to the work standards and specifications, it is also a good idea to create a procedure manual for the checking process. In the checking process, the main tasks include checking whether the annotated data meets the required quality and confirming careless mistakes. By clearly stating the key points that should be focused on in such checks, it is possible to avoid wasting time on unnecessary details and efficiently carry out the checking process. Furthermore, if there are multiple checkers, sharing the procedure manual and check sheet among them can help to reduce variations in check items and perspectives, leading to stable quality.

1-5. Establish a smooth environment for mutual communication

I feel like I'm actually doing data annotation work, but people don't immediately ask questions even if they have doubts. The reasons are "I can't ask because it's such a basic thing and it's embarrassing.", "I want to ask, but I might not be able to explain it well.", "I don't want to be seen as not understanding the work standards and specifications.", "I feel bad taking up someone's time by asking questions." People's psychology is diverse.

Such hesitation itself can already be a factor in reducing the efficiency of the project. Therefore, it is extremely important for the entire team to have the feeling that "communication can be done easily." In order to achieve this, it is very important to hold regular team meetings and encourage effective use of chat tools. Also, an environment that is too formal can be a factor in creating hesitation and reluctance, so it would be ideal to create a good team atmosphere.

 

From these points, it is important to choose the right data annotator. Simply gathering people to do the work often does not go well. The ideal data annotator is someone who can not only read and understand the work standards and specifications, but also respond well to necessary communication, have PC skills to operate tools, and be able to continue with detailed work for long periods of time. It is surprisingly difficult to gather such personnel, so it may be a good choice to leave it to an external vendor who has the appropriate personnel.

 

Kickoff Meeting:

Avoid starting by simply handing over the work standards and specifications to the data annotators and saying "yes, please." Conduct a kickoff meeting to explain the project's objectives to the extent possible and show the actual workflow on the screen using tools. This will allow you to convey subtle nuances that cannot be conveyed in documents. Also, by having the team meet face-to-face at the start, it can serve as a starting point for communication going forward.

 

Handling Edge Cases:

In data annotation work, there are often cases that were not anticipated in the work standards or specifications, or cases that are not clearly stated in the specifications. In these cases, there are frequent occurrences of discrepancies in judgment among data annotators, or confusion about how to annotate (or not annotate) edge cases. It is not acceptable to leave these edge cases to the individual judgment of each data annotator. It is important to align with the development team's understanding, and it is also common to reach a solution through discussions within the team, such as meetings or chat tools. Through this communication, understanding of data annotation work deepens and team cohesion is strengthened. Establishing a communication environment has a positive effect on the project.

1-6. Implement Feedback

Regularly conduct feedback to data annotators. If there is no feedback while continuing the work, people tend to become anxious. This can lead to a decrease in motivation and potential risks such as a decrease in quality. Instead of only providing feedback on negative factors such as annotation mistakes, also communicate positive things such as reducing mistakes and improving productivity. Giving praise is very important.

 

When giving feedback to individuals or sharing it with the entire team, careful judgment is sometimes required. Especially when it comes to negative factors, there are data annotators who do not want others to know about it, so it is desirable to give feedback to individuals. However, there may be information that the team wants to share as a whole, including this feedback. When disseminating information, be considerate and hide names, etc.

 

However, in cases where the overall team performance, such as productivity and quality, cannot be improved, it may be necessary to objectively assess individual performance and share it with the team. While this may seem like a drastic measure, there are times when such methods are necessary. In such cases, it is also important to provide support and care through 1on1 meetings.

1-7. Conduct a review

After the project is completed, let's reflect on the annotation process. We will review various aspects such as whether the quality met the requirements, whether the progress was on track to meet the deadline, and whether it stayed within the expected budget. We will identify both the good points and the areas that did not go well, and gather the knowledge and experience gained from them. By utilizing this accumulated information and knowledge in the next project, we can achieve better project management.

2. Summary

Data annotation is made possible by the continuation of diligent work. It can often become monotonous, but it is not simply a continuous repetition of the same task. We have introduced tips for ensuring quality in such situations. However, in order to actually implement data annotation in-house, it is necessary to not only focus on quality, but also manage the work period and appropriate costs that do not hinder the AI model development cycle. It may be difficult to know where to start or how to effectively implement these tips. In such cases, there is also the option to outsource to an external vendor with extensive experience in data annotation. By entrusting the work to a vendor with expertise in data annotation services, you can focus on developing your AI model in-house.

3. Data Annotation Outsourcing Service by Human Science Co., Ltd.

Rich track record of creating 48 million pieces of teacher data

At Human Science, we are involved in AI model development projects in various industries such as natural language processing, medical support, automotive, IT, manufacturing, and construction. Through direct transactions with numerous companies including GAFAM, we have provided over 48 million high-quality training data. We handle various annotation projects regardless of industry, from small-scale projects to large-scale projects with 150 annotators. If your company is interested in introducing AI models but unsure of where to start, please consult with us.

Resource Management without Using Crowdsourcing

At Human Science, we do not use crowdsourcing and instead directly contract with workers to manage projects. We carefully assess each member's practical experience and evaluations from previous projects to form a team that can perform to the best of their abilities.

Utilize the latest data annotation tools

One of the annotation tools introduced by Human Science, AnnoFab, allows customers to check progress and provide feedback on the cloud even during project execution. By not allowing work data to be saved on local machines, we also consider security.

Equipped with a security room within the company

At Human Science, we have a security room that meets the ISMS standards in our Shinjuku office. This allows us to provide on-site support for highly confidential projects and ensure security. We consider confidentiality to be extremely important for all projects at our company. We continuously provide security education to our staff and pay close attention to the handling of information and data, even for remote projects.



 

 

 

Related Blogs

 

 

Popular Article Ranking

Contact Us / Request for Materials

TOP