1. Project/Product Overview
| Dimension | Information |
|---|---|
| Project name | QAnything(Question and Answer based on Anything) |
| Developer | Netease Youdao (NetEase Youdao) |
| Open Source Protocol | AGPL-3.0 (⚠Strong Copyleft, commercial need special attention) |
| Main Language | Python |
| GitHub Stars | 14,025 |
| Forks | 1,347 |
| Commits | 754 |
| Created | 2024-01-03 (approximately 2.5 years) |
| Recent Code Push | 2025-03-24 (⚠There has been no substantial code update for 15 months) |
| Latest Release | v2.0.0(2024-08-23, about 22 months ago) |
| Open Issues | 403 (a large number of unclosed issues, low maintenance activity) |
| Default Branch | qanything-v2 |
| official website | https://qanything.ai |
| Online Experience | https://qanything.ai |
| Enterprise Edition | QAnything Enterprise Edition (closed source, commercial license, Turbo/Plus/Long/Max multi-size model available) |
| Core Components | BCEmbedding(Embedding Rerank, Independent Open Source Project Apache-2.0) |
| Community | WeChat Community 3,000 People, HuggingFace Downloads 1,025,440 |
| Business Contact | 010-82558901 / AIcloud_Business@corp.youdao.com |
2. What does it mostly do?
The core of the QAnything is a complete document parsing → vector indexing → two-stage retrieval → LLM generation pipeline. Users only need to upload files to obtain accurate answers based on document content.
2.1 Document Analysis Ability (Youdao Core Accumulation)
| Parse Dimension | Capability Description |
|---|---|
| Supported formats | PDF, Word(docx), PPT(pptx), XLS(xlsx), Markdown(md), EML (email), TXT, Image (jpg/jpeg/png), CSV, HTML |
| PDF Table Parsing | v2.0 Rewrite Logic to Recognize Cross-Page Table Structure, Row-Column Layout, and Automatic Header Extraction; Tables with embedded text columns can also be correctly recognized instead of being processed as plain text |
| Column/multi-column layouts | Identify double-column or multi-column layouts, sort text blocks by reading habits; cross-page text to ensure that they belong to the same chunk |
| Image Extraction | Images in PDF are completely retained, embedded in the corresponding chunk, and "Answer with Image" is supported |
| Subtitle attribution | The text under a specific subtitle is assigned to the same chunk first. If it is too long, repeat the title before the new chunk to maintain semantic coherence |
| Metadata Embedding | metadata information is embedded in both the retrieval and Q & A phases to improve retrieval accuracy |
| Chunk Visualization | The front end can preview and manually edit the content of each chunk online and take effect in real time |
2.2 OCR Capability
The technical accumulation of Youdao Translation/OCR is the core barrier that distinguishes QAnything from other domestic RAG schemes. For Chinese scanned PDF (image PDF), the OCR recognition accuracy of the QAnything is significantly better than that of the general OCR engine. In v2.0, OCR can be called separately as a separate service (HTTP).
115.00g Phase Retrieval (Core Architecture Differentiation)
QAnything uses the self-developed BCEmbedding model (Apache-2.0 open source) and consists of two components:
- One-stage Embedding(bce-embedding-base_v1) :MTEB has a comprehensive score of 59.43, significantly better than BGE-large-zh-v1.5(54.21) and M3E-base(53.54), especially in Chinese cross-language scenes.
-Phase II Rerank(bce-reranker-base_v1):Reranking score 60.06, better than BGE-reranker-large(59.69)
-Combination effect SOTA: In the LlamaIndex RAG evaluation, the combination of BCEmbedding BCReranker reaches the current best.
The core value of two-stage retrieval: the larger the amount of data in the knowledge base, the problem of "retrieval degradation" will appear in one-stage Embedding retrieval, and Rerank rearrangement can reverse the trend and realize "the more data, the better the effect".
2.4 Hybrid Search & Other Features
| Function | Description |
|---|---|
| hybrid search | BM25 (keyword) Embedding (semantic) two-way fusion |
| Online Search | Support External Network Search to Supplement Information Outside the Knowledge Base (VPN Required) |
| Quick Start Mode | Similar to Kimi, you can upload files without creating a knowledge base. |
| Fileless Conversation | Pure LLM Chat Mode Not Dependant on Knowledge Base |
| Retrieve-only mode | Only the retrieved document fragment is returned, without calling LLM |
| Custom Bot | Can bind knowledge base, customize role prompt, configure model parameters, and share with others |
| FAQ | Built-in FAQ matching engine |
| File traceability | The answer can be traced back to the specific location of the original document. Click to open it directly. |
| Web UI | Supports multiple dialog windows, saving Q & A records as pictures, and configuring API Key/Base URL/fragment size on the front end |
3. Applicable Scenario
| Scenario | Description | Typical Customer |
|---|---|---|
| Chinese Scanned Documents/Pictures PDF Q & A | Youdao OCR has accumulated a deep accumulation, and the recognition rate of Chinese Scanned Documents is the leading in the industry | Government Files, Legal Contracts, Financial Bills |
| Enterprise internal knowledge base | Employee handbooks, product documents, technical specifications, etc. upload and ask | Knowledge management for medium and large enterprises |
| offline/classified environment deployment | supports the installation and use of the whole network disconnection, and the data is not available in the LAN | military industry, government affairs, finance |
| Form-intensive document processing | PDF form parsing capability is a differentiating highlight | Financial report analysis, research reports, experimental data |
| Cross-language Document Q & A | High Accuracy of Cross-language Semantic Retrieval in Mixed Chinese and English Documents | Foreign Enterprise China Branch, Import and Export Trade |
| Enterprise Digital Employee | Sales Assistant, Customer Service Robot, Technical Consultant 7 × 24 Service | Enterprise Customer Service Center, IT Service Desk |
Report study/investment research, content summary, key point extraction, document question and answer, investment institutions, consulting companies.
4. Not quite the scene
| Scenario | Reason | Alternative Suggestions |
|---|---|---|
| Requires commercial License security | AGPL-3.0 requires open source derivative codes when providing services to the outside world, lawyers usually do not recommend direct use in production | Purchase enterprise license/exchange Apache-2.0 protocol (such as MaxKB) |
| Long-term maintenance and ecological importance | ⚠The project basically stagnated after March 2025, 403 Open Issues, no new Release for nearly 2 years | RAGFlow (update very active)/ MaxKB(20,000 Stars) |
| Complex Agent/Workflow Orchestration | QAnything is a straight pipeline of "Document → Question and Answer" without visual workflow orchestration | Dify / RAGFlow(v0.21 introduces Ingestion Pipeline) |
| Non-document knowledge management | Non-document scenarios such as knowledge graph and structured database query | GraphRAG / LlamaIndex |
| Requires GPU acceleration | Completely migrated from v2.0 to pure CPU, no longer provides native GPU inference | RAGFlow self-hosted LLM (with GPU support) |
| Large-scale concurrent production environment (open source version) | The open source version cannot perform parallel operations when uploading files, and the number and size of files are limited. | Enterprise Edition/RAGFlow/MaxKB |
5. Core Competence List
| Capability Category | Capability Item | Detailed Description |
|---|---|---|
| Document parsing | PDF parsing (including tables) | Self-developed parser to identify table structure/cross-page table/column layout/embedded picture |
| OCR Recognition | Youdao OCR Technology, PDF Recognition of Scanned Documents, Obvious Advantages of Chinese Scenes | |
| Multi-format support | PDF/Word/PPT/XLS/Markdown/EML/TXT/Picture/CSV/HTML | |
| Visual Chunk Editing | The front-end directly previews the contents of a chunk, supports manual editing, and takes effect in real time | |
| Metadata embedding | Both the retrieval phase and the Q & A phase carry metadata | |
| Retrieval | Embedding Retrieval | Self-developed BCEmbedding,MTEB Comprehensive 59.43, Chinese Scene SOTA |
| Rerank Reordering | Self-developed BCReranker with a score of 60.06 to solve large-scale retrieval degradation | |
| Mixed retrieval | BM25 keyword Embedding semantic two-way fusion | |
| Fragment Fusion Sort | Aggregates chunk fragments of single or double documents | |
| LLM | Multi-model access | Supports all models compatible with OpenAI APIs (Ollama, Tongyi Qiwen DashScope, etc.) |
| Front-end and Back-end Configuration | API Key/Base URL/Fragment Size/Number of Output Tokens/Number of Context Messages can be configured on the front-end | |
| Custom Bot | Configure model parameters, role prompt, and binding knowledge base independently for each Bot | |
| Q & A | Multiple Conversations | Support multiple conversation windows and save multiple sets of history records at the same time |
| File traceability | The answer can be traced back to the original document location and opened directly | |
| Retrieval-only mode | Only return results without calling LLM | |
| Fileless Conversations | Pure LLM Chat Mode | |
| Internet Search | Extranet Search Supplementary Knowledge | |
| Deployment | One-click Docker deployment | Start with the 'docker compose up -d' single-line command |
| CPU-only operation | v2.0 completely migrated to CPU,Mac/Linux/Win three-terminal unified | |
| Offline use | Support full network disconnection installation and operation | |
| Mirror slimming | Compressed from 18.94GB to 4.88GB(1/4) | |
| Independent service calls | Embed/Rerank/OCR/PDF parsing can be independently HTTP calls | |
| Enterprise Edition Extra | Large Model Customization | Turbo/Plus/Long/Max Available in Various Sizes |
| Large-scale support | The number and size of files is 10-100 times that of the open source version | |
| Parallel operation | Upload files in parallel with other operations | |
| Field landing | Fine tuning prompt to reduce illusion, multi-industry landing cases |
6. Architecture/deployment/integration approach
Overall Architecture
用户上传文档(PDF/Word/PPT/...)
│
▼
文档解析层(PDF Parser / OCR / 格式转换器)
│
▼
文本分块 + 元数据提取
│
▼
向量化索引(Embedding 服务 + Elasticsearch/Milvus 等)
│
▼
用户提问 → 一阶段 Embedding 检索(粗筛)
│
▼
二阶段 Rerank 重排序(精排)
│
▼
LLM 生成回答(OpenAI 兼容接口)
│
▼
返回答案 + 溯源引用
Deployment Mode
| Mode | Description |
|---|---|
| Docker Compose (recommended) | 'docker compose up -d' one-click start, support Linux, Mac, and Windows (no WSL required) |
| Pure Python | v1.4.2 supports the 'pip install' method, but is not recommended for production use. |
| Offline deployment | Docker images can be downloaded in advance and imported offline. The whole process can be run offline. |
Hardware Requirements
| Environment | Requirements |
|---|---|
| CPU | v2.0 pure CPU operation, 32GB memory recommended |
| Storage | Mirroring 4.88GB of knowledge base data |
| network | offline can run, network retrieval requires external network |
Model Integration
-LLM: all OpenAI API-compatible models (Ollama, Tongyi Qiwen DashScope, DeepSeek, GLM, etc.)
-Embedding Rerank: default BCEmbedding (self-developed, replaceable)
-Vector storage:Elasticsearch (built-in Chinese word segmentation IK)
API Support
Provides RESTful API, which can perform all operations such as file upload, knowledge base management, and Q & A. In v2.0, Embed, Rerank, OCR, and PDF parsing all support independent HTTP calls.
How to use #7.
Docker one-click deployment (recommended)
# 克隆项目
git clone https://github.com/netease-youdao/QAnything.git
cd QAnything
# 启动服务(根据操作系统选择 compose 文件)
# Linux
docker compose -f docker-compose-linux.yaml up -d
# Mac
docker compose -f docker-compose-mac.yaml up -d
# Windows
docker compose -f docker-compose-win.yaml up -d
# 访问 Web UI
# 浏览器打开 http://localhost:5052
Configure LLM
To set the page configuration in the Web UI frontend:
-'API_BASE': the API address of the LLM service (e. g. 'https://api.openai.com/v1' or Ollama's' http:// localhost:11434/v1')
-'API_KEY: API Key
-'MODEL': model name (e. g. gpt-4o, qwen-plus, deepseek-chat)
API call example
import requests
# 上传文件到知识库
url = "http://localhost:5052/api/local_doc_qa/upload_files"
files = {"files": open("contract.pdf", "rb")}
data = {"kb_id": "KB123456", "user_id": "user001"}
resp = requests.post(url, files=files, data=data)
# 问答
url = "http://localhost:5052/api/local_doc_qa/local_doc_chat"
payload = {
"question": "这份合同的关键条款是什么?",
"kb_ids": ["KB123456"],
"user_id": "user001"
}
resp = requests.post(url, json=payload)
print(resp.json()["response"])
Use BCEmbedding independent components
# BCEmbedding 是 Apache-2.0 协议,可单独使用
from BCEmbedding import EmbeddingModel, RerankerModel
# Embedding
embed_model = EmbeddingModel(model_name_or_path="maidalun1020/bce-embedding-base_v1")
embeddings = embed_model.encode(["什么是RAG?", "RAG是检索增强生成"])
# Rerank
reranker = RerankerModel(model_name_or_path="maidalun1020/bce-reranker-base_v1")
scores = reranker.compute_score(["什么是RAG?"], ["RAG是检索增强生成技术"])8. What can I say before sales
8.1 a sentence positioning
- * "QAnything is a document intelligent question and answer system produced by Netease Youdao-throw in the scanned Chinese document and give accurate answers in seconds. "**
8.2 customer pain points → solutions
| Customer pain points | QAnything solutions |
|---|---|
| "We have a large number of scanned PDF,OCR is not allowed, and the question-and-answer effect is poor" | Youdao OCR technology has accumulated deeply, and the recognition rate of Chinese scanned documents is the industry leader, which is the core differentiation advantage |
| "Company data cannot go to the cloud and must be deployed on the intranet" | The whole network is disconnected for installation and use, Docker is deployed with one click, and the data cannot go out of the LAN |
| "The larger the knowledge base, the more inaccurate the retrieval" | Two-stage retrieval (Embedding Rerank), the more data, the better the effect, BCEmbedding evaluation SOTA |
| "The table in PDF cannot be recognized" | v2.0 rewrites the table parsing logic, and the cross-page table, embedded table and header recognition have been optimized |
| "Can the open source solution agreement be commercially available?" | Enterprise Edition provides commercial license, Turbo/Plus/Long/Max models are available |
| "Can I run without GPU?" | v2.0 is completely migrated to pure CPU operation, Mac/Linux/Win all |
| "The deployment is too complicated, the team does not have ML engineers" | 'docker compose up -d' one-line command to start, out-of-the-box |
8.3 Differentiated Selling Points
vs RAGFlow(InfiniFlow open source):
-QAnything advantages: Youdao OCR accumulation → better PDF analysis of Chinese scanned documents/forms; BCEmbedding bilingual and cross-language SOTA; Pure CPU deployment is lighter
-RAGFlow advantages: more comprehensive DeepDoc parsing engine (support for layout recognition YOLOv8); V0.21 introduces Ingestion Pipeline that can be arranged; The update is very active (2025-2026 continuous high frequency iteration);Apache-2.0 protocol is more friendly
-Conclusion: If the customer's core requirement is "a large number of complex documents in various formats can be Pipeline by deep analysis", select RAGFlow; If it is "simple deployment of Chinese scanned documents/forms PDF Q & A", the QAnything is more accurate.
vs MaxKB(1Panel open source):
-QAnything advantages: stronger retrieval model (BCEmbedding SOTA vs MaxKB basic retrieval), deeper document analysis (especially OCR/table), commercial support in enterprise version
-MaxKB Advantages: Stars More (20,600), Protocol Friendly (GPL-3.0 with MaxKB EULA), Workflow Orchestration and MCP Tool Call, Much Higher Community Activity, Deep Integration with 1Panel Operation and Maintenance Ecology
-Conclusion: MaxKB is more suitable as a general platform Agent base; QAnything focus more on the "document → question and answer" path to the extreme.
vs Dify:
-QAnything is "Documentation Q & A Special Tool";Dify is "AI Application Development Platform"
-Dify has visual workflow orchestration, rich tools and plug-in ecology, but the document analysis depth is not as deep as QAnything.
-QAnything is suitable for customers who do not need complicated choreography and only need document questions and answers; Dify is suitable for enterprises that need custom AI applications.
- Core Differentiation Summary: Youdao OCR BCEmbedding Pure CPU Deployment Scan Friendly *
8.4 Customer Value Story Line
- Cut in *:"Your company has a lot of scanned contracts/files/reports, want to use AI Q & A but OCR effect is not good?"
- Resonance :"The universal OCR engine has a low recognition rate for Chinese scanned documents, the contents of the form are broken after page spanning, and the column layout is misread-these are common pits."
- Demo: Upload a Chinese scanned PDF contract (including forms), and ask "What is the liability clause for breach of contract?" -- QAnything accurately identify the scanned text, parse the forms, locate the answers, and trace the original text.
- Advanced : Build Enterprise Knowledge Base from Single Document → Batch Upload → Custom Bot (Sales Assistant/Legal Assistant/Technical Consultant) → API Integration into OA System
- Ends :"Netease has 20 years of OCR technology accumulation, 14,000 GitHub Stars,3000 WeChat community users-there is no stronger open source scheme on this subdivision track."
9. Frequently Asked Customer Questions
| Question | Answer |
|---|---|
| Can AGPL-3.0 agreements be commercially available? | This is probably the most critical question. AGPL-3.0 requirements: If your system provides external network services (such as SaaS), you must provide users with the complete source code of derivative works. For internal use only and no external service, open source is not required. If the customer needs to "provide external document question and answer service" and modify the QAnything source code, it must be open source. Suggestion: The use of pure intranet is not a big problem. If you do not want to undertake open source obligations or external services, purchase the enterprise version of the commercial license. |
| Is the project not maintained? | Objectively speaking, the code warehouse of the project has been basically stagnant since March 2025 (no new submissions have been made in 15 months). The latest Release v2.0.0 is nearly 2 years ago, and 403 Open Issues have not been processed. However, the v2.0 version itself has complete and stable functions, which is sufficient for the relatively mature requirement of "document question and answer. If customers need new features for continuous iteration, they need to be evaluated carefully. |
| Which is better than RAGFlow? | Look at the scene. QAnything OCR/scan processing is a strong point, and pure CPU deployment is lighter. The RAGFlow DeepDoc analysis engine is more comprehensive and updated more actively (new functions such as Ingestion Pipeline and Long-Context RAG continue to be added). If you value the depth of document analysis and retrieval accuracy, the RAGFlow ecology is currently stronger. If the scene happens to be a "Chinese scan question and answer", the QAnything is more right. |
| Which major models are supported? | All OpenAI API-compatible models can be accessed: OpenAI GPT series, Tongyi Qiwen (DashScope), DeepSeek, GLM, Ollama deployed local models, etc. The front-end directly configures API Key and Base URL without changing the code. |
| How many files can you handle? | The open source version has a limit (the official limit is not clear, but the enterprise version claims to be 10-100 times that of the open source version). In actual use, the knowledge base experience of hundreds of documents is good; more than thousands of recommended enterprise edition or evaluation performance. |
| Does GPU acceleration be supported? | Version 2.0 has been completely migrated to pure CPU and no longer supports GPU acceleration. This is a deliberate architectural choice-lowering the threshold for deployment, but at the expense of processing large-scale documents faster than GPU solutions. |
| Can I export or back up the knowledge base? | The knowledge base data is stored in the Elasticsearch and can be backed up through the snapshot API of ES. Q & A records and bot configurations are exportable through the API. |
| Is there a mobile terminal? | No official mobile App. The web UI is responsive and can be used in mobile browsers. The mobile terminal needs to develop its own docking API. |
10. PoC Recommendations
Recommended PoC Direction: Chinese Scanned PDF Q & A
This is the core differentiation scenario of QAnything, and it is recommended that PoC focus on this to maximize its irreplaceability.
| Phase | Content | Time | Output |
|---|---|---|---|
| 1. Environment preparation | One-click deployment of Docker and configuration of LLM API (such as Tongyi Thousand Questions or DeepSeek) | 0.5 days | QAnything instances that can be run |
| 2. Data Preparation | Collect 30-50 real documents from customers (it is recommended to include scanned PDF, PDF with forms, and double-column typesetting documents) | 0.5 days | Test Document Set |
| 3. Document storage | Batch upload documents, observe OCR analysis effect, and check chunk quality | 1 day | Indexed knowledge base |
| 4. Q & A verification | Design 20-30 test questions (covering: scanned text recognition, table data Q & A, cross-page content understanding, original source tracing), score one by one | 1 day | Accuracy report |
| 5. Competition Comparison | Run the same document set with RAGFlow/MaxKB to compare OCR recognition rate, table analysis and answer accuracy | 1 day | Comparative analysis report |
| 6. Integration Demonstration | Interface the customer's existing system (OA/customer service) through API to demonstrate the actual business process | 1 day | Demonstrable integration scheme |
Validation Metrics:
-OCR text recognition accuracy> 95% (Chinese scan)
-Complete extraction rate of table data> 90%
-End-to-end answer accuracy> 85% (based on document factual validation)
-Average response time <5 seconds (CPU-only environment)
-Answer traceability is accurate (the location of the cited document matches the answer)
PoC Note:
-Be sure to use the customer's own real documents, not clean typeset test documents.
-If the customer has concerns about the AGPL protocol, the PoC phase should be clear: the open source version is only used for technical verification, and the commercial use requires the enterprise version authorization.
-Inform customers in advance of the expected indexing speed of large-scale documents in CPU-only mode
11. Risks and Considerations
| Risk | Level | Description | Mitigation |
|---|---|---|---|
| AGPL-3.0 Agreement | 🔴The high | strong Copyleft protocol requires open source derivative code when providing network services to the outside world. Most corporate legal departments would object to direct use of AGPL open source in production systems. | Pure intranet use security; Purchase enterprise commercial license for external services; Or use Apache-2.0 BCEmbedding to build RAG system |
| Project Maintenance Stagnation | 🔴High | Last code push 2025-03-24(15 months ago), latest Release 2024-08-23 (nearly 2 years),403 Open Issues not processed. The project has essentially gone into maintenance hibernation. | Available if the current v2.0 features meet the requirements; if you want to continue to add new features, recommended RAGFlow/MaxKB |
| Enterprise Edition Closed Source Dependency | 🟡The medium | open source version has limited functions and performance-the document parsing effect is general, the number of files is limited, parallel operation is not supported, and the production environment is not supported. True production-level capability in Enterprise Edition. | PoC phase clarifies the difference between open source version and enterprise version; the enterprise version license fee is reserved in the budget |
| Pure CPU Performance | 🟡Medium | v2.0 gives up GPU acceleration, and performance may become a bottleneck when processing a large number of documents or high concurrency. | Evaluate the actual document level; If there is a high concurrency requirement, consider enterprise version or GPU scheme |
| Competitive Ecological Suppression | 🟡Medium | RAGFlow update is extremely fast (v0.21 introduces Ingestion Pipeline and Long-Context RAG),MaxKB is ecologically active (20,600 Stars, workflow MCP tool call), and the two protocols are more friendly | Focus on the QAnything OCR/scan differentiation advantages to avoid "comprehensive functions" compared with competitors |
| NetEase Strategy Shifting to Risk | 🟡China | QAnything may be Netease Youdao's exploration project in the AI boom, and its core energy may have shifted to the commercialization of the enterprise version or other product lines | Pay attention to the relationship between the open source version and the enterprise version; Evaluate Netease Youdao's long-term investment willingness |
| Security Vulnerability Response | 🟡The stagnation of project maintenance means that security vulnerabilities may not be fixed in time. Security audit before use; Run in an intranet isolation environment; Monitor the security announcements of dependent components (ES, Nginx, etc.). |
12. My Pre-Sales Judgment
Recommendation: Cautiously recommended (very suitable for specific scenarios, but the overall risk is high)
Reason:
- irreplaceable differentiation advantage : There is a combination of OCR BCEmbedding. At present, there is no better open source scheme for processing the subdivision scenario of Chinese scanned PDF. If the customer happens to be in this scenario, QAnything is the most preferred.
- Deployment Friendly :Docker one-click startup, pure CPU operation, mirror image 4.88GB, extremely friendly to small and medium-sized enterprises without GPU.
- Full-featured and stable: Although it is no longer updated, v2.0 is already a mature and fully functional document answering system.
- But-AGPL and maintenance stagnation are two thunder : Legal Risk Technology Stagnation = Customers must be fully informed before sales and cannot be avoided.
Recommended Customer Persona:
-Core Appeal Accurately Hits "Chinese Scanned Document/Form PDF → Local Q & A"
-Pure intranet deployment, do not provide external document Q & A service (to avoid AGPL risks)
-The requirement for function update frequency is not high, and the existing functions of v2.0 can already meet the demand
-Limited budget but need OCR advantage (open source version can meet)
-Or have a budget to buy the enterprise version (more powerful business license)
Not recommended situations:
-Customer Legal Affairs explicitly prohibits the use of AGPL protocol → Directly exclude open source version, push enterprise version or change RAGFlow/MaxKB
-Need long-term continuous function iteration and community support → recommend RAGFlow (update the most active)
-Workflow orchestration, multi-Agent, plug-in ecology required → Dify/MaxKB recommended
-Documents are mainly in English or non-scanned documents → OCR advantage is not obvious, RAGFlow or Haystack is recommended
-High concurrency external services (and do not want to pay) → AGPL risk unacceptable
Pre-sales strategy recommendations:
-First judge the type of customer document: there are a large number of scanned documents → QAnything is the first recommendation
-reconfirm AGPL's position: legal OK and intranet use → direct push of open source PoC
-Legal service is not OK or needs external service → Push enterprise version or BCEmbedding self-built scheme (BCEmbedding Apache-2.0!)
13. REFERENCE
-GitHub repository: https://github.com/netease-youdao/QAnything
-official website/online experience: https://qanything.ai
-BCEmbedding (Retrieval Model, Apache-2.0):https://github.com/netease-youdao/BCEmbedding
-Youdao Speed Reading (online trial):https://read.youdao.com
-FAQ (Chinese):https://github.com/netease-youdao/QAnything/blob/qanything-v2/FAQ_zh.md
-Demand feedback: https://qanything.canny.io/feature-requests
-HuggingFace model: https://huggingface.co/maidalun1020
-Enterprise Business Contact: AIcloud_Business@corp.youdao.com -8255-8901
- Analysis Date: 2026-07-02 | Data Aging: GitHub Information Pull in Real Time, Official Website Content from qanything. AI, Competition Comparison Based on 2026 Latest Data *