
Spinoff Blog Project
――Annotations Supporting AI in the DX Era. The Reality of the Analog Field
What is the cost of annotation?
That cost might actually be high? Insights from vendors who know the truth about outsourcing annotation.
Until now, our company has been publishing various blogs related to annotation and AI. In those, we have focused on conveying general knowledge and know-how. While the task of annotation may seem simple when expressed in words, it is a task that inevitably involves a lot of human interaction due to its inherent "ambiguity" and the fact that it is a "task that cannot be avoided by humans." Therefore, it often becomes quite messy and cannot be resolved with the neat theories that are commonly found. In order to ensure quality and productivity, a variety of experiences and know-how are actually required.
Therefore, understanding the specific problems that occur in the actual annotation field and how to address them can serve as helpful hints for leading to successful annotation.
In our company, we want to convey what actually happens on-site and the specific responses and measures we take. Unlike regular blogs or columns, we aim to share the real conditions of our work environment, including our unique characteristics and commitments, under the title of a spin-off blog project.
- Table of Contents
When requesting annotation from an outsourcing vendor, it is natural to consider whether the "quality", "delivery time", and "cost" fit within expectations. However, the reality is that the amount of work required for annotation can vary greatly depending on the content of the project, and the delivery time and costs can also differ, making it difficult to determine whether they are appropriate. Therefore, this time we will focus on "cost" and explain how outsourcing vendors calculate the costs of annotation, with a focus on the amount of work involved, as well as the points that customers should consider to ensure they outsource at a fair price when making a request.
Reference Blog
1. A major factor in determining annotation costs: "Man-hours"
The annotation cost consists of the "annotation work" and "QA check work," as well as the "management overhead" for managing personnel and progress, and "profit." Since annotation is a labor-intensive business, costs are generally calculated based on man-hours. It goes without saying that if the amount of work is the same, the cost is not always constant. The man-hours required for annotation can vary depending on the specifications, domain, quality level, and difficulty of the work. For example, as the difficulty increases, the management man-hours for quality control and training of workers also increase.
2. How to Calculate Annotation Work Hours
When vendors calculate man-hours, they do so based on the information provided by the client, such as specifications and sample data. In our company, we specifically measure actual man-hours to estimate costs. However, there may be times when there is limited detailed information about these tasks at the time of the request. For example, this can occur in cases where the request for a quote is completed via email or when the specifications have not yet been finalized. In such cases, the information may be inevitably limited, and vendors will estimate based on past similar projects and their experience. During this process, vendors generally tend to hedge against risks.
For example, "The man-hours will probably be around this much, but when we open the lid, it seems that the specifications and tasks will be a bit more complex," "The segmentation coloring may need to be done with quite high precision and detail depending on the target," and "It is also possible that the quality requirements are quite high," which means that the estimated amount will also increase to account for factors and risks that may lead to an increase in man-hours.
However, when we actually started working on it, it didn't take as much effort as expected, and there are cases where the customer might think, "Actually, the costs should have been lower," due to internal issues with the outsourced vendor that the customer is unaware of. This means that the annotation work was not priced appropriately, resulting in the customer experiencing a "loss."
To outsource at a reasonable price, it is important to provide the vendor with as much information as possible at the time of estimating "man-hours" to ensure that the actual work hours do not deviate from the estimated hours. So, what can be done to prevent this deviation in man-hours?
3. Factors and Measures for Discrepancies in Man-Hours
The number of objects differs from expectations
A common pattern seen in image annotation is that the "expected number of objects to be annotated differs from the number anticipated by the client." For instance, if some data is reviewed and the impression is given to the vendor that there are "about 10 objects per image," the vendor will base their labor estimation on this information. Therefore, the premise for the estimated amount should be based on the cost for "10 objects."
However, when it comes to the actual work after opening the lid, if the average number of objects turns out to be 3, then the annotation labor cost would be approximately 1/3 based on simple calculations. Even if it is not exactly 1/3, the actual work should have been cheaper than the initial estimate, which means the estimate was not a fair price. It is important to provide as much actual data as possible to the vendor as samples to prevent discrepancies between the expected number of objects and the actual number in advance.
Annotations work on vast amounts of data, so we understand that it is impossible for our clients to review all the data. However, as mentioned above, the actual conditions of the data being worked on may not always align with expectations. It would be reassuring if you could at least take a quick look at the entire dataset.
One way to place an order at a reasonable price is to request based on the unit price of the objects subject to annotation. The advantage of using the object unit price is that even when there is variability in the number of object occurrences per file or per data, making it difficult to grasp the total number of objects, it is easier to maintain a reasonable price through the methods outlined below.
- Determine the number of work objects in advance
We will calculate the annotation labor cost per object and provide an estimate that determines the number of work objects that can fit within your budget. By doing this, even with data that has variability in the number of objects per file, you can outsource at a reasonable price.
・Settlement will be based on the final number of objects worked on
This is a method where you determine the unit price of the objects and get a rough estimate of the costs, and then settle the final amount based on the actual number of objects worked on. It is limited to cases where you want to work on all the target files and have a relatively flexible budget. When adopting this method, it is important to share the progress of work with the vendor, as there may be unexpected amounts of work.
We are flexible in accommodating either of these methods.
Specification and quality requirements are not fully defined
There may be cases where the specifications and requirements for annotations are not fully defined at the stage of the estimate request. In such cases, it will still impact the labor estimation. For example, if it is anticipated that "the time spent on Q&A may increase in order to finalize the detailed specifications in the future" or "it seems necessary to create reference materials to supplement the specification document," the vendor will hedge against risks and estimate the labor costs higher to cover the burden of quality management.
To avoid this, it would be ideal to finalize the specifications and requirements as much as possible, but there may be challenging situations. We also receive inquiries during business negotiations such as, "There are still some aspects of the requirements that we haven't fully finalized, and we would like to discuss those with your company."
Some customers proceed from estimates to orders via email, but we strongly encourage you to clarify any concerns or questions through meetings. If you are working with an annotation vendor, you can often receive professional suggestions on ideas to reduce costs, such as optimizing the work process, and on challenges related to solidifying requirements. Through meetings, you can arrive at a more appropriate price that aligns better with your budget than the initial estimate. It is wise to request a formal estimate again after this.
It is not well communicated that specialized knowledge is not required.
If the domain related to the project is an area where the vendor has little or no experience or knowledge, it is expected that acquiring the necessary knowledge and training for the project will take time. Therefore, in addition to the annotation workload, there tends to be an increase in the workload for training and education of the workers, including the project manager. On the other hand, when customers outsource work, they will naturally consider that the domain is specialized and may request annotations that can be done with minimal specialized knowledge. However, there are cases where it is not effectively communicated to the vendor that such domain knowledge is "unnecessary" or "easy to acquire."
In such cases, vendors may perceive that, while they have heard that "the customer is an expert and possesses the necessary domain knowledge, so no special knowledge is required," from the perspective of us amateurs, it is still necessary to receive training to acquire a certain level of expertise. Therefore, there tends to be a tendency to calculate estimated man-hours and costs with a risk-hedging approach.
If the content of the request is in a specialized domain, it is important to communicate to the vendor, as much as possible, whether the content requires knowledge of that domain or if it can be easily acquired, as well as providing materials for knowledge acquisition and an estimate of the time required for learning. This will lead to placing orders at a reasonable price. Additionally, checking the vendor's website or inquiring about their experience in that domain is also a crucial factor in determining whether they are a suitable outsourcing partner, in order to minimize such processes.
4. Summary
So far, I have discussed how to communicate information to vendors in order to focus on labor hours and minimize discrepancies when requesting a quote, in order to approach a fair price. Of course, price and costs are not determined solely by labor hours; they are also influenced by factors such as labor costs for offshore and domestic workers, work locations based on security requirements, and the vendor's pricing strategy. However, labor hours remain a significant factor in determining price and costs. When requesting from a vendor, it is important to review your entire work process as much as time allows, clarify specifications and quality requirements in clear language, and convey the necessary information to the vendor, even if it involves thoughts like, "The work time is roughly XX minutes," "They should obviously understand this much," or "Just looking at the reference materials should be sufficient for quality." By doing so, you can avoid the situation where you think, "It should have been cheaper than the quoted amount," thus preventing the fair price from becoming a 'loss' for the customer.
That said, it can be quite difficult for customers who are not specialized in annotation work to objectively review their own annotation tasks, and it can be said that this falls within the realm of the annotation vendor's responsibilities. Therefore, our company conducts hearings with experienced project managers during business discussions to articulate the customer's requests regarding work specifications and data characteristics. We accompany our customers from estimation to execution and delivery to ensure they are satisfied with a price that is close to fair.
5. [Special Edition] Avoid the end of the fiscal year when projects are concentrated
In this main section, we focused on the aspect of man-hours and explained how to outsource annotation at a reasonable price. On the other hand, while not directly related to man-hours, we would like to inform you of some "beneficial" information regarding the timing of requests.
It may not be widely known, but generally, AI development and the related annotation tend to concentrate around the end of the fiscal year, and from April to around October, it is relatively a slow period for work. Annotation vendors need to secure jobs and projects, even if it means keeping profits low, in order to recover fixed costs such as labor expenses during this time. Many of them implement attractive campaigns, and by taking advantage of these, you may be able to obtain annotation data at a surprisingly good price, so this period is recommended.
We have successfully conducted campaigns in the past that received positive feedback. If you contact us, we will inform you of the next opportunity, so please make sure to take advantage of it at that time.
Author:
Manabu Kitada
Annotation Group Project Manager
Since the establishment of our Annotation Group, we have been broadly responsible for team building and project management for large-scale projects centered on natural language processing, as well as the formulation of annotation specifications for PoC projects and consulting aimed at scaling.
Currently, in addition to being a project manager for image and video annotation and natural language annotation, I am also engaged in promotional activities such as being a seminar instructor for annotation and writing blogs.