Unlimited-OCR - AI Navigation

← Back to Project List

Unlimited-OCR is Baidu's open source long document OCR/document analysis model with the slogan "One-shot Long-horizon Parsing". It is not only for single picture text recognition, but also for multi-page pictures, PDF pages, long context document analysis, and supports Hugging Face Transformers, vLLM, SGLang/OpenAI-compatible API and other reasoning methods. Pre-sales is suitable for scenarios such as bills, contracts, reports, scanned files, file digitization, RAG knowledge base pre-storage processing, etc., but GPU cost, layout complexity, Chinese/English/table performance and structured output stability should be verified in advance.

1. Project Positioning

Unlimited-OCR is an OCR model for long-view document parsing. README clearly writes that it wants to take the Deepseek-OCR a step further and welcome the one-shot long-horizon parsing era.

Traditional OCR is often processed by page, block, and line, and then spliced through rules or post-processing structures. The selling point of Unlimited-OCR is to use visual language model to process longer document context at one time, especially suitable for semantic analysis of multi-page PDF, scanned documents and complex layout materials.

2. What does it mostly do?

Competence	Official Material Basis	Pre-Sales Understanding
Single image document resolution	'model.infer(... image_file =...)'	Suitable for single-page scans, screenshots, bills, and forms
Multi-page parsing	'model.infer_multi(... image_files =[...])'	Suitable for multi-page documents such as contracts, reports, manuals, tenders, etc.
PDF Analysis	README uses PyMuPDF to transfer pictures and then goes through multi-page parsing	Accessible to document management, knowledge base and file digitization process
Long context output	'max_length = 32768'	Suitable for preserving long document content and structure
Multi-inference backend	Transformers, vLLM, SGLang	The deployment path from PoC to production is more flexible
OpenAI-compatible API	The SGLang example provides '/v1/chat/completions'	It is convenient to connect to the existing enterprise model gateway/application layer

3. Applicable Scenario

Document digitization and file storage

Suitable for government, finance, manufacturing, education, medical and other industries of historical archives, scanned PDF, contract, manual storage. Pre-sales can be said: first convert unstructured scans into retrievable text, and then enter the RAG, full-text search or business review process.

Contract/Report/Tender Analysis

Multi-page PDFs often have headers and footers, chapters, tables, and pictures. The Unlimited-OCR multi-page parsing capability is suitable for the first stage extraction, followed by the LLM for clause identification, risk summary, and chapter indexing.

Knowledge Base/RAG Pre-Processing

Many RAG projects fail not because of the retrieval model, but because of poor parsing quality of the source document. Unlimited-OCR can be used as a front-end capability of "scanned documents-> Markdown/text/structured content" to improve the quality of knowledge base storage.

Bill/Form/Screenshot Analysis

The 'gundam' or 'base' configuration of a single figure can be used for bills, page screenshots, and table pictures. Formal PoC should compare accuracy and cost between traditional OCR, PaddleOCR, DeepSeek-OCR, and document large models.

4. Not quite the scene

Scenario	Reason
Only low-cost batch recognition of simple printed text	Traditional OCR is cheaper and faster
Strong real-time mobile OCR	Model inference relies on GPU, high deployment pressure on the end side
100% of structured field extraction is required.	OCR is only the first step. Field schema, checksum and business rules are also required.
Very high-security documents but cannot deploy GPU locally	Cloud model calls may not meet compliance

5. Deploy and use

Transformers

from transformers import AutoModel, AutoTokenizer

model_name = "baidu/Unlimited-OCR"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
    use_safetensors=True,
    torch_dtype="bfloat16",
).eval().cuda()

A single graph supports two configurations:

-'gundam': 'base_size = 1024, image_size = 640, crop_mode = True'

-'base': 'base_size = 1024, image_size = 1024, crop_mode = False'

Multi-page/PDF only use' base',README example uses PyMuPDF to convert PDF into 300 DPI PNG and then call' infer_multi '.

vLLM

The project has announced support for vLLM inference in 2026-06-28 and provides mirroring:

docker pull vllm/vllm-openai:unlimited-ocr
docker pull vllm/vllm-openai:unlimited-ocr-cu129

SGLang

The SGLang route can start the OpenAI-compatible services:

python -m sglang.launch_server \
  --model baidu/Unlimited-OCR \
  --served-model-name Unlimited-OCR \
  --context-length 32768 \
  --host 0.0.0.0 \
  --port 10000

Suitable for connecting OCR services to existing application platforms.

6. What can be said before sales

Unlimited-OCR is not an ordinary OCR for single-line text recognition, but a visual language model for long document parsing. It is suitable for solving the problems of "many pages, complex layout, long context and subsequent knowledge base" when enterprises scan PDF, report, contract and file storage. PoC should not only look at a few screenshots, but use customer real PDF samples to evaluate accuracy, structural fidelity, throughput and GPU cost.

7. PoC Recommendations

Phases	Work	Indicators
Sample Preparation	Select 50-100 real scans to cover clear/fuzzy/form/long documents	Sample representativeness
baseline comparison	contrast PaddleOCR, traditional OCR, DeepSeek-OCR, manual annotation	character accuracy, structure fidelity
Multi-page Test	Multi-page parsing after PDF to Picture	Page Order, Chapter, Table Loss Rate
Cost Evaluation	Transformers/vLLM/SGLang Three-Route Test	Single Page Time-consuming, Display and Storage, Concurrency
Business Closed Loop	RAG or Field Extraction	Search Hit Rate, Field Accuracy

8. Risks and Considerations

-The README dependency version is relatively new, such as Python 3.12, CUDA 12.9, torch 2.10, etc. The customer environment should be checked in advance.

-'trust_remote_code = True' is a sensitive point for enterprise security review and requires source code audit or isolation environment.

-Document OCR results still need to be manually sampled and accepted, and the model output cannot be directly used as the legal/financial final result.

-PDF to image DPI, cropping, page orientation will significantly affect the effect.

-Multi-page output is very long, and subsequent LLM/retrieval links need to handle chunk, page number, and reference positioning.

9. My Pre-Sales Judgment

Unlimited-OCR is very suitable as a "enterprise document intelligence" front-end capability, rather than selling OCR alone. The best program narrative is:

扫描 PDF / 图片 -> Unlimited-OCR 长文档解析 -> 结构化清洗 -> RAG / 审核 / 字段抽取 / 检索

If the customer has a large number of scanned documents, files, contracts and reports, this project is worth PoC. If the customer only wants ordinary invoice/ID card/simple form OCR, traditional OCR or mature commercial OCR may be more economical.

10. REFERENCE

-GitHub:baidu/Unlimited-OCR

-Hugging Face:baidu/Unlimited-OCR

-arXiv:Unlimited OCR Works

-vLLM Recipe:recipes.vllm.ai/baidu/Unlimited-OCR

-Illustration:Unlimited-OCR.png

-Demo animation:long-horizon-ocr.gif

Information verification date: 2026-06-30. GitHub API anonymous access triggers stream limiting, this note does not write real-time stars/forks.