Introducing Methods to Improve the Accuracy of RAG (Retrieval-Augmented Generation)! The Importance of Document Maintenance

Hello! I am K, a consultant. I usually handle manual creation and improvement projects for companies in the manufacturing and pharmaceutical industries.
Today, I would like to talk about "RAG (Retrieval-Augmented Generation)," which we have been hearing about frequently from our clients recently. From the perspective of a manual production expert, I would like to introduce methods to improve the accuracy of RAG.

Table of Contents

1. What is RAG?
1-1. Uses of RAG
2. Efforts to Improve Accuracy Essential for Utilizing RAG
3. Methods to Improve the Accuracy of RAG
4. A Case Study on Document Organization with an Eye on Building RAG
5. For consultations on manual preparation, contact Human Science

1. What is RAG?

The term "RAG (Retrieval-Augmented Generation)," which you often hear about recently. Simply put, this is a mechanism where generative AI, when creating answers, searches for necessary information from pre-prepared external sources (such as internal manuals and knowledge bases) and generates responses based on that information. By incorporating external information, it becomes possible to generate more reliable answers than using generative AI alone.
Regular generative AI answers user questions using only the data it has learned in the past. Therefore, the information may be outdated, or it may provide answers that differ from the facts for content it has not learned (hallucination).
On the other hand, RAG searches for relevant information in real time after receiving a question and refers to that content when answering. As a result, users can obtain more accurate and up-to-date information.

1-1. Uses of RAG

So, in what situations is RAG used?

● Customer Support
Using RAG in chatbots employed for corporate customer support enables them to search for necessary information from internal manuals and FAQs to provide answers, making it possible to offer more accurate responses.

● Internal Operations
You can extract information from manuals for employees and provide only the necessary information in real time. For example, by using a chatbot that leverages RAG (commonly called the "RAG Chatbot"), when a new employee asks, "Can you tell me the procedure for this task?", it searches the relevant information from internal documents and provides an answer.

● Marketing and Market Research
We can search for and summarize market trends and movements from web articles and news, and convey them clearly to marketing personnel.

In this way, RAG is utilized in various tasks, gathering information and enabling generative AI to provide intelligent answers, making it a noteworthy mechanism.

2. Essential Efforts to Improve Accuracy for Effective Use of RAG

RAG is a convenient system, but it is very important not only to introduce the technology but also to carry out efforts in parallel to improve its accuracy. This is because, depending on the quality, structure, and processing method of the information targeted by RAG's search, it may not be possible to provide appropriate answers to users.

For example, if the information found from external sources is actually unrelated to the user's question, the generated answer will be off the mark. Additionally, if only a portion of the information is extracted to generate the response, the original meaning may be conveyed incorrectly. Such unclear answers can cause user confusion, leading to risks such as operational judgment errors and delays in work. Furthermore, repeated incorrect answers can reduce the reliability of the RAG chatbot itself, potentially resulting in it eventually no longer being used.

Therefore, to operate an RAG chatbot at a practical level, it is essential to work on improving accuracy so that accurate and reliable information can be provided consistently.

3. Methods to Improve RAG Accuracy

As mentioned earlier, to maximize the strengths of RAG, it is necessary to organize the input information and improve the search and generation processes. Here, we introduce seven effective methods to enhance the accuracy of RAG.

● Organize the Documents to be Input
RAG processes the internally searched documents so that the generative AI can understand and utilize them, but if the source documents are miscellaneous or the expressions are ambiguous, the information cannot be used correctly. Therefore, it is important to first organize internal documents and manuals so that "anyone can understand their meaning."
For example, the following types of organization are effective.

・Make the subject and predicate clear
・Use bullet points to organize information
・Avoid ambiguous expressions (demonstratives like "this" and "that")
・Clearly distinguish types of information such as procedures, cautions, and supplements

By organizing in this way, the documents become easier to understand for both people and generative AI, enabling more effective use of RAG.

● Choose the Optimal Search Method
In RAG, "search" is a crucial process for selecting the information that forms the basis of the answers. Search methods include "keyword search," "vector search," and "hybrid search."

・Keyword Search:
Search for documents containing specific words. It is fast but does not consider semantic similarity.
・Vector Search:
Search for documents based on semantic closeness. It hits documents with similar meanings even if the vocabulary differs.
・Hybrid Search:
A method that combines both keyword and vector searches to improve search accuracy.
By selecting the optimal search method according to the purpose and characteristics of the target documents, you can obtain more relevant information, ultimately improving the accuracy of RAG.

● Chunk Division
Instead of searching the entire document as is, dividing it into chunks of a certain size allows for searching at a finer granularity of information units. It is important to perform this chunk division appropriately according to the granularity of information and logical structure.

● Preprocess Data Such as Figures and Tables
Since RAG processes mainly text, it cannot recognize figures and tables saved as images. For documents with many figures and tables, it is important to convert them to text format such as Markdown as much as possible and load them while preserving the structure.

・Tables are written using Markdown table syntax
・Contents of figures are converted into text as captions or descriptions
・Arrows and flowcharts are converted into simple bullet points

By writing it down as text like this, generative AI can more easily understand the information, improving the quality of its responses.

● Add Information to Improve Searchability
Attaching meta information and tags to documents is also effective for retrieving more accurate information during searches. By utilizing information like the following, it becomes easier to narrow down search results and reduces the risk of incorrect documents appearing in the search results.

・Document category (e.g., HR manual, sales procedures, FAQ, etc.)
・Update date
・Related keywords and terms
・Related departments and target users (e.g., for new employees, for system administrators)

● Continuously Improve
The accuracy of RAG is not sufficient with just the initial setup. It is important to accumulate feedback on what kinds of questions users ask and which answers were inappropriate through actual use, and to continuously make improvements.

・Analysis of causes for incorrect answers (search errors? document quality? prompt deficiencies?)
・Addition and updating of documents to be searched
・Review of chunk structure and search methods
・Improvement of prompts

By continuously making such improvements, the RAG chatbot will grow into a trusted chatbot firmly established in business operations.

There are various ways to improve accuracy, but Human Science, a professional in manual production, believes that "organizing the documents to be read by AI" is very important as a measure to enhance the accuracy of RAG.

By properly organizing documents, not only does it become easier for generative AI to correctly understand the content, but it also becomes easier for people to comprehend. In fact, many RAG chatbots display links to the original documents so that users can verify the basis of the answers or investigate further. If the linked documents are difficult to understand, it may result in users feeling "In the end, I still didn't really understand."

That is precisely why it is very important to create documents that are not only easy for generative AI to understand but also kind and easy to read for humans—"documents that are friendly to both."

In the next chapter, we will introduce a case study of a company that actually worked on organizing internal manuals based on this approach, assuming they would be loaded into RAG.

4. A Case Study on Document Preparation with an Eye on Building RAG

Here, we introduce a case study of a financial company, Company A.

● Issues and Background
Company A had the following issues and background.

・In-house operation manuals are difficult to understand, necessary information is hard to find, and the formats are inconsistent
・We want to utilize generative AI chatbots in the future and review the manuals accordingly

● Initiatives
At Human Science, we provided the following support.

・Create a model manual
We created it to balance ease of processing by generative AI and clarity for humans. We also emphasized comprehensive information coverage to make it fully usable as input for chatbots.
・Create a rulebook summarizing how to write manuals
Including points that affect the accuracy of RAG chatbots, we compiled writing rules so that anyone can easily create manuals.

● Future Outlook
At Company A, the manuals and rulebooks developed this time will be rolled out to other departments, aiming to promote information organization and sharing throughout the entire company. Additionally, there are plans to load the developed manuals into the in-house developed RAG chatbot and proceed with verification for operational use.

Company A is diligently working on organizing their manuals while anticipating the use of generative AI. As we collaborate closely with Company A on this project, we have come to realize that alongside considering generative AI tools, it is crucial to properly prepare the documents that serve as input for these tools.

5. For consultations on manual preparation, contact Human Science

Human Science provides one-stop support from the creation of Japanese manuals to English and multilingual translations. We have a long track record of handling numerous manuals since 1985. If you have any needs like the following, please feel free to contact us.

- Want to improve existing Japanese or English manuals to make them easier to understand
- Considering creating English manuals and want to proceed step-by-step from Japanese manuals
- Want to translate (into various languages) Japanese manuals created in-house and utilize them

Feature 1: Extensive Manual Production Experience with Major and Global Companies
Human Science has accumulated a wealth of manual production experience across various fields, primarily in the manufacturing and IT industries. We have worked with renowned companies such as "Docomo Technology Inc.", "Yahoo Inc.", and "Yamaha Corporation" as our clients.

Case Studies of Manual Production | Human Science

Feature 2: From Research and Analysis by Experienced Consultants to Output
The creation of business manuals is handled by our experienced consultants at Human Science. Our skilled consultants will propose clearer and more effective manuals based on their extensive experience and the provided materials. Additionally, we can create manuals even from the stage where information is not yet organized. The assigned consultant will conduct interviews to create the optimal manual.

Manual Evaluation, Analysis, and Improvement Proposal Services | Human Science

Feature ③: Emphasis on not only manual creation but also support for implementation
Human Science focuses not only on manual creation but also on the important stage of "implementation." Even after the manual is created, we will support its implementation through regular updates and manual creation seminars. Through a variety of measures, we will support the effective use of manuals in the field.

Manual Creation Seminar | Human Science

Thank you for reading until the end.
I hope this blog serves as helpful tips for creating easy-to-understand manuals.