Some parts of this page may be machine-translated.

 

Tips for Utilizing Crowdsourcing in Data Annotation Work

Tips for Utilizing Crowdsourcing in Data Annotation Work

When performing data annotation, it is common to utilize crowdsourcing to gather a large number of people and perform a large amount of annotation. There are also many data annotation vendors that utilize crowdsourcing. In this case, we would like to explain the tips for utilizing crowdsourcing in data annotation, while also highlighting the points to be aware of, as well as the advantages and disadvantages when using crowdsourcing or data annotation vendors that utilize it.



Table of Contents

1. Benefits of Crowdsourcing

1-1. Work Scale and Diversity

Data annotation often requires a large amount of teacher data, and when there is a need to prepare data in a short period of time, it goes without saying that a large amount of human resources will inevitably be needed. From hiring personnel to actually starting work, it takes time and effort, but when a large number of human resources are needed, it is undoubtedly effective to use crowdsourcing. When relatively simple annotation or teacher data with relatively low skill requirements for personnel is needed and worker education does not need to be over a long period of time, utilizing crowdsourcing is an effective means.

In addition, when you want to collect diverse data, utilizing crowdsourcing can be an effective means. In crowdsourcing, which has a large pool of human resources, not only is it relatively easy to collect large amounts of data, but also in AI development such as OCR, when a large amount of handwritten text data is needed, it is more effective to gather small amounts of handwritten data from many people, which becomes diverse training data and contributes to the improvement of AI accuracy. Collecting this type of data is where crowdsourcing excels, and not only for data collection, but also for seeking diversity in training data, utilizing crowdsourcing is an extremely effective means.

1-2. Delivery Time and Cost

It goes without saying that recruiting a large number of personnel on your own incurs significant costs and labor. Even for annotation vendors that have a certain number of personnel on a registration basis, securing a large number of personnel at once for a specific annotation project takes time and incurs costs, which inevitably adds to the price. In that regard, crowdsourcing and annotation vendors that utilize it already have a large number of personnel, making it possible to secure personnel at a relatively low cost and in a short period of time, allowing work to begin at an early stage. This not only reduces recruitment costs, but also shortens delivery times in general.

In addition to hiring, by injecting a large number of personnel, it is possible to inevitably shorten the delivery time of the work itself. Shortening the delivery time of the work itself means that the management period, such as managing personnel and work data, will also be shortened. Although it may be necessary to manage a large amount of data and personnel in a short period of time, this generally leads to improved cost efficiency and a decrease in total costs.

2. Disadvantages of Crowdsourcing

2-1. Quality Variability

One of the major benefits of utilizing a large number of personnel in a short period of time is the ability to produce a large amount of teacher data. However, it is still difficult to ensure consistent data annotation and teacher data quality among workers in such situations. As expected, as the number of workers increases, it becomes more difficult for instructions, points of attention, specification changes, and methods for dealing with edge cases to be communicated and thoroughly understood by all workers. In addition, confirming whether or not workers understand the content of these instructions is also important in managing annotation projects, but as the number of workers increases, this becomes more difficult and the risk of inconsistent quality also increases.

In addition, in crowdsourcing, in most cases, contracts are made for each project, making it difficult to maintain loyalty to the ordering company. No matter how many measures are taken, there is a tendency for the commitment to the quality of the workers to be low.

Related Columns

How to deal with edge cases that cannot be covered in the specification document

Management of Human Science's Data Annotation Work

2-2. Security

Crowdsourcing workers often work from home, and it is common for them to handle sensitive data annotation tasks that require high confidentiality. In cases where working in a security room within the vendor company is necessary, it can be difficult to accommodate. In terms of security education, it is common for clients to have one-time contracts for each project, making it difficult to ensure security through continuous education and software measures. Relying solely on hardware for security measures has its limitations, and even if the worker themselves are not aware of security breaches, lack of education can lead to a lack of knowledge and awareness of security, resulting in unintentional security violations. This is a common occurrence.

Related Columns

Creating a data annotation work environment at our on-site (in-house security room)

2-3. Specific Domain Knowledge and Difficulty

Data annotation requires a high level of difficulty, specific domain knowledge, and expertise. In the pool of crowdsourcing workers, there are not many with specialized knowledge in the field. To ensure the quality of difficult annotations and acquire expertise and domain knowledge, it is crucial to keep the workers fixed and focus on their training. However, in crowdsourcing, workers are often contracted for specific projects, making it difficult to keep them fixed for a long period of time and continue their education to improve their skills.

Related Columns

What is the unexpected difficulty of data annotation?

2-4. Communication and Education

Effective communication with workers is essential for providing instructions and ensuring thorough education on security and specific domain knowledge, as mentioned above. This is especially important for annotation work, where determining ambiguous elements requires even more emphasis on communication.

In addition, there are many implicit elements in specific domain knowledge, and when it comes to education, relying solely on text-based manuals and materials is often insufficient. It is more effective to actually show and communicate while looking at the screen. In order to do so, real-time Q&A during meetings is necessary, but as the number of people increases, it becomes difficult to adjust meeting times, resulting in delays in education and instructions, and ultimately causing situations where the results are not reflected in the final product.

At first glance, data annotation may seem like a simple task, and it may seem that education and communication for this purpose are not so important if work instructions and procedures are carefully prepared. However, this may only be the case because they are familiar with data annotation specifications and specific domain knowledge. As work progresses, exceptions and edge cases may inevitably occur in data annotation work, making such communication essential.

Related Columns

Good teacher data leads to good teacher creation.

3. Summary

As mentioned earlier, I hope you have understood the benefits and drawbacks of utilizing crowdsourcing. By understanding these, it is without a doubt that crowdsourcing is an effective means for annotation and data collection.
However, in order to do so, it is important to use it appropriately depending on various situations such as the purpose and phase of AI development, annotation specifications and desired quality level, characteristics of the work, and security. What has been mentioned so far is general and may not apply to everything. Many annotation vendors are able to overcome the drawbacks and maximize the benefits by continuously improving and devising while utilizing crowdsourcing.

Therefore, when considering outsourcing data annotation and collection, we recommend that you not only rely on emails and inquiry forms, but also have a meeting to discuss and hear about the things we have mentioned so far, and then consider it after receiving an estimate.

4. Human Science's Data Annotation Outsourcing Service

Rich track record of creating 48 million pieces of teacher data

At Human Science, we are involved in AI model development projects in various industries such as natural language processing, medical support, automotive, IT, manufacturing, and construction. Through direct transactions with numerous companies including GAFAM, we have provided over 48 million high-quality training data. We handle various data labeling projects regardless of industry, from small-scale projects to large-scale projects with 150 data annotators. If your company is interested in implementing AI models but unsure of where to start, please consult with us.

Resource Management without Using Crowdsourcing

At Human Science, we do not use crowdsourcing and instead directly contract with workers to manage projects. We carefully assess each member's practical experience and evaluations from previous projects to form a team that can perform to the best of their abilities.

Utilize the latest data labeling tool

At Human Science, one of the data labeling tools we use is AnnoFab, which allows customers to check progress and provide feedback on the cloud even during project execution. By not allowing work data to be saved on local machines, we also consider security.

Equipped with a security room within the company 

At Human Science, we have a security room that meets the ISMS standards in our Shinjuku office. This allows us to provide on-site support for highly confidential projects and ensure security. We consider confidentiality to be extremely important for all projects at our company. We continuously provide security education to our staff and pay close attention to the handling of information and data, even for remote projects.



 

 

 

Related Blogs

 

 

Popular Article Ranking

Contact Us / Request for Materials

TOP