
- Table of Contents
1. What is Annotation Work
As mentioned several times in previous blogs, annotation is the process of creating data for training AI. Specifically, it involves identifying the subjects that the AI should recognize within the data and labeling them. For example, if you want the AI to recognize a person in an image, you would draw a rectangle (bounding box) around the person so that it is clear who they are, and attach a tag "person" to that bounding box. The amount of training data required to improve the AI's recognition accuracy can range from thousands to tens of thousands, depending on the purpose. Therefore, annotation work can take anywhere from a few weeks to several months.
Annotation is the repetitive task of identifying subjects within data and labeling them. The work is seemingly simple, as it involves labeling based on specifications. However, it is not widely understood that progressing with annotation while ensuring quality and productivity can be surprisingly challenging.
2. What is the Difficulty of Annotation Work?
In annotation work, it is often required to handle thousands to tens of thousands of files, thoroughly reviewing each piece of data to ensure that no target is overlooked. Even with automation, human intervention is ultimately unavoidable, necessitating focus and perseverance.
The data includes edge cases that can be difficult to judge based solely on specifications, leading to tasks that should take only a few seconds taking several minutes instead. As these cases accumulate, the work pace naturally slows down. However, if we rely too much on intuition to respond quickly, the basis for our judgments can become unstable, compromising the consistency of annotations and lowering quality, which ultimately affects the accuracy of AI recognition.
Looking at it this way, annotation work requires efficient execution and quick logical judgment to maintain accuracy, and since this will continue for several weeks to months, it is necessary to have knowledge and tips to continue working consistently.
This time, rather than explaining the specific content of annotation work or from the perspective of management, I will discuss it from the worker's viewpoint, focusing on the mindset and tips necessary to continuously advance work while ensuring productivity and quality, as well as the role of the PM in supporting this.
3. Essential Mindset and Tips for Annotation Work
Make logical judgments to the extent that you can explain them
In annotation, it is important not to leave things you don't understand as just "somehow." Not everything is explained or documented in work specifications or manuals. If everything were to be included in the manual, the amount of documentation would become enormous, making it impractical when considering searchability and the labor involved in documentation. Therefore, it is customary for manuals to primarily include the basics of the thought process and representative examples of the objects subject to annotation. Consequently, by understanding and applying the basics and representative examples provided, it is necessary to make accurate judgments on various cases that are not clearly written.
To achieve this, it is necessary to have a basis for judgment that can adequately explain "why the annotation was done (or not done) in that way." Even in cases where it is simply a matter of "just because I felt that way," logically articulating that "just because" allows for consistent annotations for the first time. If this cannot be explained, it could lead to annotations where something that was "white" yesterday is annotated as "black" today. Needless to say, such inconsistencies in judgment affect the quality of the training data.
Take breaks to reset, and pause to view the results objectively
While I think this is something anyone can imagine, when it comes to image annotation, it is necessary to continuously look over every corner of the image to discover the target and label it. In some cases, precision to the level of a few pixels may be required. For text, it is essential to read through every sentence without missing any, ensuring that labeling is done accurately where needed. To continue this for several hours a day, it is important to take breaks moderately without overexerting oneself and to reset and pause periodically.
When you continue working for a long time, no matter how logical your judgment ability is, your senses can become numb and your judgment may become biased. Therefore, it is important to take appropriate breaks to reset, pause for a moment, and objectively assess whether your annotation results are skewed in one direction or another.
Take breaks to reset, and pause to view the results objectively
While I think this is something anyone can imagine, when it comes to image annotation, it is necessary to continuously look over every corner of the image to discover the target and label it. In some cases, precision to the level of a few pixels may be required. For text, it is essential to read through every sentence without missing any, ensuring that labeling is done accurately where needed. To continue this for several hours a day, it is important to take breaks moderately without overexerting oneself and to reset and pause periodically.
When you continue working for a long time, no matter how logical your judgment ability is, your senses can become numb and your judgment may become biased. Therefore, it is important to take appropriate breaks to reset, pause for a moment, and objectively assess whether your annotation results are skewed in one direction or another.
Quickly decide on the next action
Even if you have logical thinking skills, being able to make quick decisions is essential for annotation. As mentioned earlier, the data encountered in annotation often includes edge cases that are not described in specifications and have no other examples, leading to frequent indecision. If you get caught up in such situations, time can quickly slip away.
Annotation requires a significant amount of work. For example, with bounding boxes, except for special cases, each annotation usually needs to be completed in about several tens of seconds. If you stop for several minutes every time you encounter an edge case, productivity will quickly decline. When struggling with edge cases, it is important to limit the time spent 'worrying' to the bare minimum and quickly move on to the next actions, such as 'asking questions' or 'thinking it through and reaching a conclusion.'
Respond flexibly according to the situation
Flexibility, without being overly attached to something, is also important. It's a simple example, but humans make intuitive and instantaneous judgments in various situations. For instance, when determining whether it's a dog or a cat, we don't judge by saying, "If the ears are shaped like this, it's a dog or a cat." Instead, we are likely making a comprehensive judgment based on the information we see, using various parameters in our brain based on past experiences. If we demand too much theoretical background or justification for such judgments, or if we hold on to our own biases, we may overthink and arrive at incorrect answers, or embark on a far-off journey seeking answers that do not exist.
It may seem contradictory to the logical judgment made earlier, but distinguishing between "this is where logical judgment should be applied" and "this is something that cannot be reasoned through and should be judged based on experiences like the example above" requires a certain kind of "logical" judgment.
Not overly polite (not excessive quality)
Pursuing quality and taking responsibility for one's own work while performing tasks carefully is extremely important in any job, not just in annotation. However, whether consciously or unconsciously, overdoing it can significantly impact productivity. For example, in tasks like semantic segmentation of trees, it is common to find oneself "unintentionally over-segmenting the leaf tips more than necessary" despite having samples or instructions for segmentation accuracy. When this happens, it becomes a problem not only for productivity but also for the consistency of quality among other workers. As mentioned earlier, it is essential to stop and check from an objective perspective, considering "how much accuracy is required" and "whether it meets the requirements."
Read the work specifications and manuals carefully
This may not be limited to annotations, but properly reading and understanding the specifications, manuals, and instructions is fundamental to all work processes. Our company has worked with hundreds of annotators, and many of them proceed with their tasks without thoroughly reading the specifications, manuals, and instructions. In particular, instructions for handling edge cases and exceptions frequently arise in annotation tasks. If these materials are not read and confirmed properly, it can lead to incorrect annotations, which naturally affects quality.
4. The Role of the PM
When performing annotations, it is necessary to have the insights and tips that have been discussed so far, but not everyone can understand and practice these from the beginning. In many cases, there are weaknesses such as "excellent in detailed work and logical thinking, but unable to make quick decisions" or "tending to prioritize productivity, leading to inconsistencies in quality."
The PM plays a crucial role here. While other companies may have positions that fulfill this role aside from the PM, our PM's responsibilities include not only managing the quality of annotations, productivity, and work progress, but also guiding workers towards the ideal direction as previously mentioned.
It is ideal if the worker can resolve issues on their own, but there are often times when they are struggling and cannot see a way to solve the problem. By quickly identifying such weaknesses in daily management and conducting one-on-one meetings with workers who have these weaknesses, we aim to address their areas of difficulty. Additionally, sharing the know-how possessed by excellent workers with the entire annotation team is also one of the roles of the PM.
5. Summary
No one can do everything perfectly from the start as mentioned so far. It is important for the annotators, who are the workers, and the PM to work together to elevate the quality to the desired level, and this needs to be pursued by the entire team. To establish teamwork, it goes without saying that a humble and flexible attitude is essential, where we respect each other and openly accept advice and suggestions.
6. Human Science Annotation, LLM RAG Data Structuring Agency Service
Over 48 million pieces of training data created
At Human Science, we are involved in AI model development projects across various industries, starting with natural language processing and extending to medical support, automotive, IT, manufacturing, and construction, just to name a few. Through direct business with many companies, including GAFAM, we have provided over 48 million pieces of high-quality training data. No matter the industry, our team of 150 annotators is prepared to accommodate various types of annotation, data labeling, and data structuring, from small-scale projects to big long-term projects.
Resource management without crowdsourcing
At Human Science, we do not use crowdsourcing. Instead, projects are handled by personnel who are contracted with us directly. Based on a solid understanding of each member's practical experience and their evaluations from previous projects, we form teams that can deliver maximum performance.
Support for not just annotation, but the creation and structuring of generative AI LLM datasets
In addition to labeling for data organization and annotation for identification-based AI systems, Human Science also supports the structuring of document data for generative AI and LLM RAG construction. Since our founding, our primary business has been in manual production, and we can leverage our deep knowledge of various document structures to provide you with optimal solutions.
Secure room available on-site
Within our Shinjuku office at Human Science, we have secure rooms that meet ISMS standards. Therefore, we can guarantee security, even for projects that include highly confidential data. We consider the preservation of confidentiality to be extremely important for all projects. When working remotely as well, our information security management system has received high praise from clients, because not only do we implement hardware measures, we continuously provide security training to our personnel.