
- Table of Contents
1. What is Annotation Work
As mentioned several times in previous blogs, annotation is the process of creating data for training AI. Specifically, it involves identifying the subjects that the AI should recognize within the data and labeling them. For example, if you want the AI to recognize a person in an image, you would draw a rectangle (bounding box) around the person so that it is clear who they are, and attach a tag "person" to that bounding box. The amount of training data required to improve the AI's recognition accuracy can range from thousands to tens of thousands, depending on the purpose. Therefore, annotation work can take anywhere from a few weeks to several months.
Annotation is the repetitive task of identifying subjects within data and labeling them. The work is seemingly simple, as it involves labeling based on specifications. However, it is not widely understood that progressing with annotation while ensuring quality and productivity can be surprisingly challenging.
2. What is the Difficulty of Annotation Work?
In annotation work, it is often required to handle thousands to tens of thousands of files, thoroughly reviewing each piece of data to ensure that no target is overlooked. Even with automation, human intervention is ultimately unavoidable, necessitating focus and perseverance.
The data includes edge cases that can be difficult to judge based solely on specifications, leading to tasks that should take only a few seconds taking several minutes instead. As these cases accumulate, the work pace naturally slows down. However, if we rely too much on intuition to respond quickly, the basis for our judgments can become unstable, compromising the consistency of annotations and lowering quality, which ultimately affects the accuracy of AI recognition.
Looking at it this way, annotation work requires efficient execution and quick logical judgment to maintain accuracy, and since this will continue for several weeks to months, it is necessary to have knowledge and tips to continue working consistently.
This time, rather than explaining the specific content of annotation work or from the perspective of management, I will discuss it from the worker's viewpoint, focusing on the mindset and tips necessary to continuously advance work while ensuring productivity and quality, as well as the role of the PM in supporting this.
3. Essential Mindset and Tips for Annotation Work
Make logical judgments to the extent that you can explain them
In annotation, it is important not to leave things you don't understand as just "somehow." Not everything is explained or documented in work specifications or manuals. If everything were to be included in the manual, the amount of documentation would become enormous, making it impractical when considering searchability and the labor involved in documentation. Therefore, it is customary for manuals to primarily include the basics of the thought process and representative examples of the objects subject to annotation. Consequently, by understanding and applying the basics and representative examples provided, it is necessary to make accurate judgments on various cases that are not clearly written.
To achieve this, it is necessary to have a basis for judgment that can adequately explain "why the annotation was done (or not done) in that way." Even in cases where it is simply a matter of "just because I felt that way," logically articulating that "just because" allows for consistent annotations for the first time. If this cannot be explained, it could lead to annotations where something that was "white" yesterday is annotated as "black" today. Needless to say, such inconsistencies in judgment affect the quality of the training data.
Take breaks to reset, and pause to view the results objectively
While I think this is something anyone can imagine, when it comes to image annotation, it is necessary to continuously look over every corner of the image to discover the target and label it. In some cases, precision to the level of a few pixels may be required. For text, it is essential to read through every sentence without missing any, ensuring that labeling is done accurately where needed. To continue this for several hours a day, it is important to take breaks moderately without overexerting oneself and to reset and pause periodically.
When you continue working for a long time, no matter how logical your judgment ability is, your senses can become numb and your judgment may become biased. Therefore, it is important to take appropriate breaks to reset, pause for a moment, and objectively assess whether your annotation results are skewed in one direction or another.
Take breaks to reset, and pause to view the results objectively
While I think this is something anyone can imagine, when it comes to image annotation, it is necessary to continuously look over every corner of the image to discover the target and label it. In some cases, precision to the level of a few pixels may be required. For text, it is essential to read through every sentence without missing any, ensuring that labeling is done accurately where needed. To continue this for several hours a day, it is important to take breaks moderately without overexerting oneself and to reset and pause periodically.
When you continue working for a long time, no matter how logical your judgment ability is, your senses can become numb and your judgment may become biased. Therefore, it is important to take appropriate breaks to reset, pause for a moment, and objectively assess whether your annotation results are skewed in one direction or another.
Quickly decide on the next action
Even if you have logical thinking skills, being able to make quick decisions is essential for annotation. As mentioned earlier, the data encountered in annotation often includes edge cases that are not described in specifications and have no other examples, leading to frequent indecision. If you get caught up in such situations, time can quickly slip away.
Annotation requires a significant amount of work. For example, with bounding boxes, except for special cases, each annotation usually needs to be completed in about several tens of seconds. If you stop for several minutes every time you encounter an edge case, productivity will quickly decline. When struggling with edge cases, it is important to limit the time spent 'worrying' to the bare minimum and quickly move on to the next actions, such as 'asking questions' or 'thinking it through and reaching a conclusion.'
Respond flexibly according to the situation
Flexibility, without being overly attached to something, is also important. It's a simple example, but humans make intuitive and instantaneous judgments in various situations. For instance, when determining whether it's a dog or a cat, we don't judge by saying, "If the ears are shaped like this, it's a dog or a cat." Instead, we are likely making a comprehensive judgment based on the information we see, using various parameters in our brain based on past experiences. If we demand too much theoretical background or justification for such judgments, or if we hold on to our own biases, we may overthink and arrive at incorrect answers, or embark on a far-off journey seeking answers that do not exist.
It may seem contradictory to the logical judgment made earlier, but distinguishing between "this is where logical judgment should be applied" and "this is something that cannot be reasoned through and should be judged based on experiences like the example above" requires a certain kind of "logical" judgment.
Not overly polite (not excessive quality)
Pursuing quality and taking responsibility for one's own work while performing tasks carefully is extremely important in any job, not just in annotation. However, whether consciously or unconsciously, overdoing it can significantly impact productivity. For example, in tasks like semantic segmentation of trees, it is common to find oneself "unintentionally over-segmenting the leaf tips more than necessary" despite having samples or instructions for segmentation accuracy. When this happens, it becomes a problem not only for productivity but also for the consistency of quality among other workers. As mentioned earlier, it is essential to stop and check from an objective perspective, considering "how much accuracy is required" and "whether it meets the requirements."
Read the work specifications and manuals carefully
This may not be limited to annotations, but properly reading and understanding the specifications, manuals, and instructions is fundamental to all work processes. Our company has worked with hundreds of annotators, and many of them proceed with their tasks without thoroughly reading the specifications, manuals, and instructions. In particular, instructions for handling edge cases and exceptions frequently arise in annotation tasks. If these materials are not read and confirmed properly, it can lead to incorrect annotations, which naturally affects quality.
4. The Role of the PM
When performing annotations, it is necessary to have the insights and tips that have been discussed so far, but not everyone can understand and practice these from the beginning. In many cases, there are weaknesses such as "excellent in detailed work and logical thinking, but unable to make quick decisions" or "tending to prioritize productivity, leading to inconsistencies in quality."
The PM plays a crucial role here. While other companies may have positions that fulfill this role aside from the PM, our PM's responsibilities include not only managing the quality of annotations, productivity, and work progress, but also guiding workers towards the ideal direction as previously mentioned.
It is ideal if the worker can resolve issues on their own, but there are often times when they are struggling and cannot see a way to solve the problem. By quickly identifying such weaknesses in daily management and conducting one-on-one meetings with workers who have these weaknesses, we aim to address their areas of difficulty. Additionally, sharing the know-how possessed by excellent workers with the entire annotation team is also one of the roles of the PM.
5. Summary
No one can do everything perfectly from the start as mentioned so far. It is important for the annotators, who are the workers, and the PM to work together to elevate the quality to the desired level, and this needs to be pursued by the entire team. To establish teamwork, it goes without saying that a humble and flexible attitude is essential, where we respect each other and openly accept advice and suggestions.
6. Human Science Annotation, LLM RAG Data Structuring Agency Service
A rich track record of creating 48 million pieces of training data
At Human Science, we are involved in AI model development projects across various industries, starting with natural language processing, including medical support, automotive, IT, manufacturing, and construction. Through direct transactions with many companies, including GAFAM, we have provided over 48 million high-quality training data. We accommodate various types of annotation, data labeling, and data structuring, from small-scale projects to long-term large projects with a team of 150 annotators, regardless of the industry.
Resource management without using crowdsourcing
At Human Science, we do not use crowdsourcing; instead, we advance projects with personnel directly contracted by our company. We form teams that can deliver maximum performance based on a solid understanding of each member's practical experience and their evaluations from previous projects.
Supports not only annotation but also the creation and structuring of generative AI LLM datasets
In addition to labeling and annotation for identification systems for data organization, we also support the structuring of document data for the construction of generative AI and LLM RAG. Since our founding, we have been engaged in manual production as a primary business and service, leveraging our unique know-how gained from a deep understanding of various document structures to provide optimal solutions.
Equipped with a security room in-house
At Human Science, we have a security room that meets ISMS standards within our Shinjuku office. Therefore, we can ensure security even for projects that handle highly confidential data. We consider the protection of confidentiality to be extremely important for all projects. Even for remote projects, our information security management system has received high praise from our clients, as we not only implement hardware measures but also continuously provide security training to our personnel.