← Back to Project List
RAG-Anything is an open source All-in-One multimodal RAG framework of HKUDS, which is extended to mixed content such as PDF, Office, pictures, tables, formulas, etc. based on LightRAG. It is suitable for solving the problem of "documents are not plain text" in the enterprise knowledge base, such as financial reports, research reports, product manuals, papers, contract attachments, and graphic materials. It can be described as a "multimodal knowledge base engine for complex documents" before sales, but the landing should focus on verifying the analysis quality, chart understanding, retrieval accuracy, permission isolation and cost.

1. Project Overview

DimensionInformation
ProjectsHKUDS/RAG-Anything
PositioningAll-in-One Multimodal Document Processing RAG system
Technical basisBased on LightRAG, integrated MinerU / Docling / PaddleOCR and other analytical capabilities
Main LanguagePython
Open Source LicenseMIT
Created2025-06-06
Recently pushed2026-06-15
GitHub Hot2026-06-30 Query: About 21.7k stars, 2.5k forks, 106 open issues
Package installation'pip install raganything'

RAG-Anything try to solve a key shortcoming of traditional RAG: real-life enterprise documents are usually not pure text, but a mixture of paragraphs, pictures, charts, tables, formulas and complex layouts. The traditional "extract text-> cut chunk -> vector retrieval" scheme will lose a lot of structural relations. RAG-Anything, on the other hand, splits documents into different modalities and organizes them using multimodal knowledge graphs and hybrid searches.

Official architecture diagram:

! RAG-Anything Framework

2. What does it mostly do?

CapabilitiesDescriptionsValue to Customers
multi-format document parsingsupports PDF, Office, pictures, text, etc.enterprise history data need not be manually converted into clean text first
Multi-modal content processingProcessing text, pictures, tables, formulas, common content separatelyCan answer information in charts, formulas, tables
Multimodal Knowledge GraphExtract entities and cross-modal relationships, retain document hierarchyExpressing "which chapter the chart belongs to" better than a simple vector library
Hybrid retrievalCombining vector retrieval and graph structure relationshipsEasier recall of relevant contexts for complex problems
Insert content_list directlyCan connect external parser products, skip built-in parsingSuitable for integration with customer's existing OCR/layout parsing system
Configurable parserMinerU, Docling, PaddleOCR, etc.Different document types can choose a more appropriate parsing route
VLM enhanced queryVisual model analysis can be introduced when the document contains picturesSuitable for drawing, screenshot, flowchart, report interpretation

3. Applicable Scenario

SceneFitExample
Questions and answers on complex enterprise documentsHighProduct manuals, rules and regulations, bidding documents, operation guides
Analysis of Financial/Advisory ReportsHighAnnual Reports, Research Reports, Tables, Charts, Appendices Mixed Information
Research Paper AssistantGaoFormulas, Experimental Tables, Illustrations, References in Papers
Industrial Knowledge BaseMedium and HighEquipment Manual, Maintenance Diagram, Flow Chart, Parameter Table
Legal/Contract Knowledge BaseMediumScan of Contract Text Schedule, Additional Attention to Authority and Accuracy
Simple FAQ Questions and AnswersMedium and LowPlain Text FAQ Lighter with Ordinary RAG

The best pre-sales entry is: "What is really difficult for customers is not the question and answer, but the reliable transformation of complex documents into searchable, traceable and referable knowledge structures."

4. Not quite the scene

Unsuitable pointCause
Only deal with plain text Markdown/FAQOrdinary RAG is simpler and cheaper
High real-time and low latency requirementsMultimodal analysis, VLM analysis, and knowledge graph construction will increase time consumption
Document permissions are very complex, but there is no permission system.The project itself is a framework, and enterprise-level permission isolation needs to be completed by the application layer.
Auditable for 100% compliance with the answerStill do citation, manual review, profiling and anti-hallucination policies
Document scan quality is very poorOCR/layout analysis quality becomes a bottleneck

5. Architecture and Workflow

A typical link of RAG-Anything can be summarized:

  1. Document analysis: Use MinerU, Docling or PaddleOCR to disassemble PDF/Office/pictures into structured content.
  2. Content classification: divide the content into text, image, table, equation, generic content and other types.
  3. Modal processing: pictures go visual analysis, tables go structured interpretation, formulas retain LaTeX and semantic description.
  4. Map construction: Extract entities and establish relationships between text and charts, chapters and elements, tables and indicators.
  5. Hybrid retrieval: When querying, combine vector similarity and graph relationship to return a more complete context.
  6. LLM generation: Generate answers with recall context, which can be combined with multimodal information.

The pre-sales highlight of this architecture is that it does not simply OCR images into text, but attempts to preserve "the relationship between elements". This point is critical for complex customer documentation.

How to use #6.

Installation:

pip install raganything
pip install 'raganything[all]'

Office documents require additional installation LibreOffice:

brew install --cask libreoffice

Source installation:

git clone https://github.com/HKUDS/RAG-Anything.git
cd RAG-Anything
uv sync
uv run python examples/raganything_example.py --help

Minimum use of ideas:

from raganything import RAGAnything, RAGAnythingConfig

config = RAGAnythingConfig(
    working_dir="./rag_storage",
    parser="mineru",
    parse_method="auto",
    enable_image_processing=True,
    enable_table_processing=True,
    enable_equation_processing=True,
)

The official example requires configuring the LLM, visual model, and embedding functions, and then calling process_document_complete or directly inserting content_list '. content_list is valuable for system integration, because enterprises can first use their own OCR/analysis services to get structured content, and then hand it over to RAG-Anything for multimodal RAG.

7. What can I say before sales

One-sentence positioning:

"RAG-Anything is a multimodal RAG framework for complex enterprise documents, which can integrate PDF, Office, pictures, tables and formulas into knowledge base retrieval and question-and-answer."

Value Mapping:

Customer Pain PointsSpeech
There are a large number of tables and pictures in the document, which cannot be answered by ordinary knowledge baseRAG-Anything treats tables, pictures and formulas as first-class content
Structure lost after traditional OCRIt retains chapter hierarchy, element relationships, and cross-modal associations
Various document formats are mixedSupport PDF, Office, Picture, TXT/MD and other format routes
Existing document parsing systemCan be accessed directly through content_list without completely overturning the existing architecture
Hope to improve the interpretability of questions and answers. Metadata such as graphs and page_idx can help cite the original location.

8. Demo/PoC Suggestions

It is recommended to use real customer documents instead of public demo documents. PoC is divided into three types of materials:

MaterialsTest Questions
Financial Report/Research Report with Chart"What is the trend of an indicator? Which year in the chart has changed the most?"
Product Manual/Equipment Manual"What should I do when an error code appears? On which page is the relevant illustration?"
Paper with Formula/Technical White Paper"What is the meaning of the variables in the formula? What does the experimental table show?"

PoC indicators:

IndicatorDescription
parsing success ratewhether the document can completely disassemble text, tables, pictures and formulas
Chart Q & A AccuracyCorrected answers to chart/table questions
Reference TraceabilityWhether page number, chapter, element can be located
Build TimeDocument Parsing and Receipt Time Per Hundred Pages
Query latencyAverage response time of hybrid queries
Amount of manual correctionProportion of table/formula/OCR requiring manual correction

9. Frequently Asked Customer Questions

What is the difference between it and ordinary vector database?Vector database is mainly responsible for similarity retrieval; RAG-Anything is more concerned with complex document analysis, multi-modal content understanding and graph relationship organization.
Can you process scanned PDF?Can be processed through the OCR route, but the effect depends on the scan quality, language, layout, and parser capabilities.
Do I have to use OpenAI?Official examples use OpenAI-style functions, but frameworks can pass in custom LLM, visual models, and embedding functions.
Can it be deployed privately?Yes, but you need to prepare the local model, OCR/resolution environment, storage, queues, permissions, and service encapsulation.
Is it possible to guarantee that the answer is not an illusion?cannot be guaranteed by the framework alone. References, confidence levels, evaluation sets, rejection strategies, and manual review processes are required.

10. Risks and Considerations

  1. Analytical quality determines the upper limit: complex tables, spread tables, scanned copies, and handwritten content will significantly affect the effect.
  2. Multi-modal cost is high: VLM analysis of images, charts may bring additional costs and delays.
  3. Engineering integration still requires investment: permissions, auditing, tenant isolation, incremental updates, and failure retries are not out-of-the-box enterprise systems.
  4. Evaluation is critical: a standard problem set must be built for the customer's business, otherwise it is difficult to judge whether it is really better than ordinary RAG.
  5. License friendly: MIT is friendly to business integration, but still needs to confirm the authorization of the model, parser and data to which it is connected.

11. My Pre-Sales Judgment

RAG-Anything is a direction worthy of long-term pre-sales attention in this batch of projects. It goes to the real pain point of the enterprise knowledge base: documents are not clean text, but a mixture of complex typography and multimodal information. Ordinary RAG is easy to "see tables and pictures" in such scenarios. The value of RAG-Anything lies in providing a more complete processing chain.

It is recommended to use it for the program explanation and PoC of "complex document intelligent question and answer", "multimodal knowledge base" and "R & D/financial/manufacturing document assistant. Don't just demonstrate plain text Q & A, that doesn't show the difference. What really impresses customers is to take a real document with charts, formulas and appendices and let the system answer questions that ordinary RAG cannot answer well.

12. REFERENCE

-GitHub: https://github.com/HKUDS/RAG-Anything

-Thesis: https://arxiv.org/abs/2510.12323

-PyPI: https://pypi.org/project/raganything/

-LightRAG: https://github.com/HKUDS/LightRAG

-MinerU: https://github.com/opendatalab/MinerU