DSPy - AI Navigation

← Back to Project List

DSPy is an open source LLM programming framework (MIT protocol, 35,700 + Stars) of Stanford NLP laboratory. the core idea is "Programming, not Prompting"-replace handwritten Prompt with declarative Python code and optimize the entire LLM pipeline through automatic compiler (Optimizer). Monthly downloads were 6.4 million plus, 433 plus contributors, 9 academic papers covering ICLR 2024 to ICLR 2026, and the latest GEPA optimizer was awarded ICLR 2026 Oral. Used in production by Shopify (75x cost reduction), Databricks (90x cost reduction), Dropbox(45% NMSE reduction), AWS, JetBlue, Replit, Sephora, VMware, Moody's, etc. DSPy's core value is to change Prompt engineering from "craft work" to "compilation science"-to move AI systems from fragile string stitching to maintainable, optimizable and reproducible engineering practices.

1. Project/Product Overview

Dimension	Information
Project name	DSPy(Declarative Self-improving Python)
Developer	Stanford NLP Labs (Omar Khattab, Christopher Potts, Matei Zaharia, etc.)
Open Source License	MIT
Main Language	Python
GitHub Stars	35,718(2026-07-02 query)
Forks	3,039
Commits	4,550
Created	2023-01-09
Last Updated	2026-07-01 (Activity: High, 523 + PR/Year)
Latest Version	v3.3.0b1(ReActV2 Module + Improved LM/BaseLM), 109 Release in total
Monthly Downloads	6.4 million +(PyPI)
Contributors	433 +
Discord	8,400 + members
Academic Papers	9 (ICLR 2024 Papers + ICLR 2026 GEPA Oral), and 60 + Tutorials and Cookbook
official website	https://dspy.ai
Production users	Shopify (cost reduction 75x), Databricks (cost reduction 90x,Genie platform), Dropbox(Dash correlation score, 45% NMSE reduction), AWS(Nova model migration), JetBlue (multiple chatbots on the Databricks), Replit (code repair), Sephora, VMware, Moody's, Nous Research(Hermes Agent self-evolution)
Dependent	1,800 + Downstream Items
Observability	Native MLflow Tracing (OpenTelemetry-based) integration
Enterprise Tools	MLflow Model Serving deployment, PyPI publishing, MCP Server export
Related projects	dspy-go(Go language compatible implementation), GEPA independent library

2. What does it mostly do?

DSPy's core innovation is to upgrade LLM calls from "handwritten strings" to "declarative compilers". This isn't another Prompt template library-it's a complete compiler framework.

2.1 core three elements

DSPy builds and optimizes LLM systems through three layers of abstraction:

Elements	Descriptions	Technical Features
Signature (Signature)	Use typed Python Field to declare the input and output of a task instead of a handwritten Prompt string.	Support type constraints, description fields, default values, Literal types, and Image multimodal fields.
Module (Module)	Combinable components that control the execution strategy of LLM. The same Signature can replace different Modules	Predict (direct), ChainOfThought (step-by-step inference), ReAct (tool inference loop), ReActV2, MultiChainComparison, ProgramOfThought, RLM (recursive language model)
Optimizer (optimizer)	automatic compilation: automatically search for optimal Prompt and examples according to labeled data and evaluation indicators	GEPA(ICLR 2026 Oral, reflective gene evolution, 35x fewer rounds than GRPO), MIPROv2 (joint optimization instructions and examples), BootstrapFewShot, etc.

2.2 Compiler Workflow

传统 Prompt 工程：
  写 Prompt 字符串 → 测试 → 效果不好 → 改字符串 → 再测试 → 换模型又得重来

DSPy 工作流：
  定义 Signature（声明任务） → 选 Module（执行策略） → Optimizer 编译（自动优化） → 保存为 JSON

Key difference: DSPy's optimization results in a savable, reusable, versioned JSON file, rather than a "by chance" piece of text. After changing the model, you only need to recompile it once, and the code does not need to be changed.

2.3 Core Concept: From Prompt to Program

DSPy takes LLM application development from "assembly language"(Prompt strings) to "high-level language" (declarative Python):

# 传统方式：脆弱的 Prompt 字符串
prompt = """You are a helpful assistant. Given the following email, extract
the event name and date. Return in JSON format. Email: {email}"""

# DSPy 方式：声明式类型化签名
class ExtractEvent(dspy.Signature):
    """Extract event details from an email."""
    email: str = dspy.InputField()
    event_name: str = dspy.OutputField()
    date: str = dspy.OutputField()

extract = dspy.Predict(ExtractEvent)
result = extract(email="Team offsite this Thursday at 2pm")
# Prediction(event_name="Team Offsite", date="Thursday")

When you need step-by-step reasoning, just replace the Module and the code does not change:

extract = dspy.ChainOfThought(ExtractEvent)  # 自动加推理步骤

When a tool call is required:

agent = dspy.ReAct("question -> answer", tools=[search, calc])

2.4 Advanced Competency

-Multimodal:'dspy. Image' as a field type, directly processing pictures, charts

-Assertions (assertion): runtime constraint check, automatic retry error correction

-PythonInterpreter: Security sandbox code execution, support for mathematical reasoning

-RLM (Recursive Language Model): Recursive Reasoning Module (new paper December 2025)

-Fine-tuning Integration: The optimizer not only adjusts Prompt, but also generates fine-tuning data (BetterTogether papers)

-Asynchronous support: Native 'async' execution, thread-safe, suitable for high-throughput scenarios

-Observability:MLflow Tracing (based on OpenTelemetry), each call can be traced

3. Applicable Scenario

Scenario	Description	Typical Customer/Case
Prompt Optimization Project	With labeled data, you need to systematically find the optimal Prompt and sample combination	Any LLM application team that has been online but does not meet the performance standards
Classification/Extraction Task	Text Classification, Entity Extraction, Intent Recognition, Structured Information Extraction	Shopify (full-platform merchant metadata extraction, cost reduction 75x)
RAG Pipeline Optimization	Optimized Retrieval → Sorting → Generated Complete Pipeline, Automatic Reference Adjustment	Dropbox Dash (Correlation Score, 45% NMSE Reduction)
Model Migration/Cost Reduction	Migrate from a large model to a small model, and compile the small model output to approach the large model	AWS Nova migration and Databricks(90x cost reduction)
Agent Behavior Optimization	Automatically optimizes the Agent inference chain and tool invocation strategy	Databricks Genie (table search) and Replit (code repair)
Multimodal Task	Image Understanding, Chart Analysis (Image Field Type)	Medical Imaging (Stanford Prompt Triage, up to +3400% improvement)
Security Detection	Jailbreak detection, Prompt injection detection	DSPy security Pipeline (including cryptography untamperable state)
Quality Assessment Automation	Build an automated scoring/assessment pipeline to reduce manual labeling	LLM-as-Judge system for each enterprise
Research and experimentation	Systematic experiments to quickly compare different models, strategies, and optimization methods	Academic research, AB testing

4. Not quite the scene

Scenario	Reason	Alternative Suggestions
Simple One-time Prompt **	DSPy's optimized compilation requires labeling data and evaluation indicators, and a single call is over-designed	Write a few lines of Prompt directly or call directly with API
No labeled data at all	Optimizer must rely on calculable indicators (accuracy, F1, etc.)	First use manual evaluation to go online, accumulate labeled data, and then introduce DSPy
Low-Code/Drag-and-Drop Development	DSPy is a pure Python framework with no GUI interface	Dify / Coze / MaxKB
High requirements for indicator design	The optimization effect depends heavily on the quality of the evaluation indicators, and incorrect indicators will lead to degradation.	Someone needs to design a metric function.
Compilation cost sensitive *	GEPA/MIPROv2 optimization process requires a large number of LLM calls (although 35x more efficient than RL)	Small tasks can be compiled without signing modules
Requires extremely low latency	The compiled module still calls LLM, which cannot reduce the inference latency itself	Model distillation, quantization
Pure non-LLM tasks	DSPy is an LLM programming framework, not suitable for traditional ML/rule systems	Scikit-learn, XGBoost, etc.

5. Core Competence List

5.1 Signature (signature) ability


inline shorthand signature	'"question -> answer"' one-line definition
Class definition signature	Inherited 'dspy. Signature', supports InputField/OutputField type declarations
Type Constraints	'str', 'int', 'float', 'bool', 'Literal["a","B"]', 'list[dict]', etc.
Multimodal Field	'dspy. Image', 'dspy. Video' as field type
Description field	'desc = "often between 1 and 5 words" 'Constraint output format
Default	'dspy. InputField(default="N/A")'
	docstring	'"" "Task description." ""' as a task semantic description

5.2 Module Matrix

Module	Purpose	Applicable scenarios
'dspy. Predict'	Call LLM directly without additional inference	Simple classification, extraction
'dspy. ChainOfThought'	Require LLM to reason before outputting	Tasks that require logical deduction
'dspy. ReAct'	Inference action loop, support tool invocation	Agent, multi-step inference, search
'dspy. ReActV2'	Improved ReAct, better inference structure (v3.3.0b1 new)	Complex Agent Scenarios
'dspy. MultiChainComparison'	Best output after comparing multiple inference chains	Tasks to be consensus
'dspy. ProgramOfThought'	Code Generation Execution Inference	Math Inference, Programming Tasks
'dspy. RLM'	Recursive Language Model, Multi-Layer Nested Reasoning	Deep Reasoning, Complex Logic
'dspy. Module' (Custom)	Combine multiple Signature and Modules	Complex multi-step Pipeline

5.3 Optimizer (optimizer) matrix

Optimizer	Principle	Applicable Scenarios
GEPA	Gene Evolution Pareto Frontier Natural Language Reflection (ICLR 2026 Oral)	The current most recommended production-level optimizer, 35x fewer rounds than GRPO
MIPROv2	Joint optimization command text and few-shot example	Basic optimization requirements, zero sample to less sample
BootstrapFewShot	Automatically select the best examples from the training set	Quickly build a few-shot program
BootstrapFewShotWithRandomSearch	Random Search Example Guide	Explore a Larger Search Space

5.4 Enterprise and Engineering Capabilities


MLflow Tracing	Tracks every LLM call based on OpenTelemetry's native observability
MLflow Model Serving	Deploy a compiled program as a production API with one click
Asynchronous execution	'dspy.asyncify()'supports high concurrency scenarios
Thread Safe	Multi-threaded environment safe for production services
Program Save/Load	'program.save("model.json")' / 'dspy. Module.load("model.json")'
Multiple Backend LM Support	Via 'dspy. LM()'unified interface, supporting OpenAI, Anthropic, local models, etc.
LiteLLM Integration	Lazy Load, Activate on Demand
Go language compatibility	dspy-go community projects to implement compatible operation of DSPy modules in Go

6. Architecture/deployment/integration approach

6.1 installation

# 稳定版
pip install -U dspy

# 最新开发版（从 main 分支）
pip install git+https://github.com/stanfordnlp/dspy.git

6.2 LM Configuration (Unified Model Interface)

DSPy via 'dspy. LM()'unified management of various LLM back-ends, code does not need to change due to model changes:

# OpenAI 模型
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Anthropic 模型
lm = dspy.LM("anthropic/claude-sonnet-4-20250514")

# 本地开源模型（通过 LiteLLM 或直接调用）
lm = dspy.LM("ollama/llama3.1")

# Azure OpenAI
lm = dspy.LM("azure/gpt-4o", api_key="...", api_version="...")

6.3 deployment mode

Mode	Description
Local OSS	'pip install dspy', pure Python, suitable for development experiments
MLflow Model Serving	Package the compiled program as an MLflow model and deploy it as a REST API
MLflow Tracing	OpenTelemetry-based call tracing for production monitoring
Asynchronous deployment	'dspy.asyncify(program)'is converted to an asynchronous version and supports asyncio.
MCP Server	You can export DSPy programs as MCP-compatible service endpoints

6.4 Integrated Ecosystem

-Observability:MLflow Tracing(OpenTelemetry standard)

-Deployment:MLflow Model Serving, PyPI release

-Model providers:OpenAI, Anthropic, Cohere, Mistral, Google Gemini, AWS Bedrock, Azure OpenAI, Ollama, on-premises HuggingFace models, and more

-Vector database: natively integrated with ColBERTv2 and other retriever, and can also access any vector library through tool functions.

-With other frameworks:LangChain DSPy official integration, DSPy module can be embedded in LangChain Pipeline;LlamaIndex can also be used complementary to DSPy

-Go Language: Community dspy-go project provides compatible implementation

How to use #7.

7.1 Minimum Complete Example

import dspy

# 1. 配置语言模型
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# 2. 定义任务签名（替代手写 Prompt）
class SentimentClassifier(dspy.Signature):
    """Classify the sentiment of a sentence as positive, negative, or neutral."""
    sentence: str = dspy.InputField()
    sentiment: str = dspy.OutputField(desc="one of: positive, negative, neutral")

# 3. 选择执行策略
classify = dspy.ChainOfThought(SentimentClassifier)

# 4. 直接调用（零样本）
result = classify(sentence="The new feature is amazing and works flawlessly!")
print(result.sentiment)  # "positive"

7.2 complete training process with optimization

import dspy

# 准备训练数据
trainset = [
    dspy.Example(sentence="I love this product!", sentiment="positive").with_inputs("sentence"),
    dspy.Example(sentence="This is terrible.", sentiment="negative").with_inputs("sentence"),
    dspy.Example(sentence="It's okay, nothing special.", sentiment="neutral").with_inputs("sentence"),
    # ... 更多训练样本
]

# 定义评价指标
def sentiment_accuracy(example, pred, trace=None):
    return pred.sentiment.lower() == example.sentiment.lower()

# 创建分类器和优化器
classify = dspy.ChainOfThought(SentimentClassifier)
optimizer = dspy.GEPA(metric=sentiment_accuracy, auto="medium")

# 编译优化
optimized_classify = optimizer.compile(classify, trainset=trainset)

# 保存编译结果
optimized_classify.save("sentiment_classifier_v1.json")

# 加载使用
loaded = dspy.ChainOfThought(SentimentClassifier)
loaded.load("sentiment_classifier_v1.json")
result = loaded(sentence="Absolutely fantastic experience!")
print(result.sentiment)  # "positive"

7.3 Agent Example (ReAct Tool Call)

import dspy

lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=lm)

# 定义工具（Python 函数即可）
def search(query: str) -> list[str]:
    """Search a knowledge base for relevant documents."""
    return knowledge_base.query(query, k=3)

def calculator(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return dspy.PythonInterpreter({}).execute(expression)

# 创建 Agent
agent = dspy.ReAct("question -> answer", tools=[search, calculator])

# 使用 Agent
result = agent(question="What is the GDP per capita of France based on 2024 data?")
# Agent 自动推理：需要 GDP 数据和人口数据 → search → 计算除法
print(result.answer)  # "$46,029"

7.4 Multimodal Example

class AnalyzeChart(dspy.Signature):
    """Describe the trend and key data points in a chart."""
    chart: dspy.Image = dspy.InputField()
    title: str = dspy.OutputField()
    trend: str = dspy.OutputField()
    data_points: list[dict] = dspy.OutputField()

analyze = dspy.Predict(AnalyzeChart)
result = analyze(chart=dspy.Image("quarterly_revenue.png"))

8. What can I say before sales

8.1 a sentence positioning

* "DSPy is Stanford's LLM compiler-turning your AI system from a handwritten Prompt 'handcrafted workshop 'into an automatically optimized 'industrial production line', Shopify saving 75 times the cost and 90 times the Databricks. "**

8.2 customer pain points → solutions

Customer Pain Points	DSPy Solution	Quantified Effects
"Prompt has been adjusted for two months, but the effect is still unstable"	Optimizer automatic compilation, far better than manual Prompt found in a few hours	GEPA typical improvement: + 14% accuracy, Prompt shortened by 9.2x
"It's too painful to rewrite all the Prompt when changing the model"	The Signature is decoupled from the model. When changing the model, the code only needs to be recompiled once, and the code does not move.	AWS Nova Migration Case: Large Model → Small Model Seamless Switching
"A lot of data is marked, and I don't know how to use it to improve the effect"	Optimizer directly use the marked data as the optimization signal	200 samples can raise the baseline from 62% to 89%
"The big model is too expensive, I want to use the small model but the effect is poor"	Compile optimization to make the Prompt quality of the small model approach or exceed the zero sample of the big model	Databricks: open source model + GEPA optimization> Claude Opus 4.1 (zero sample)
"AI system is a black box, I don't know which link is in trouble"	MLflow Tracing tracks each LLM call, each step can be observed	native OpenTelemetry standard, access to existing monitoring
"Three people in the team maintain 50 Prompt, going crazy"	Signature + Module code management, Prompt becomes version controllable JSON	Git management, Diff comparison, CI/CD
"Agent behavior is unstable, sometimes easy to use and sometimes nonsense"	Optimizer can optimize the agent's tool invocation policy and inference chain	Replit code repair, Databricks Genie Agent is verified in production

8.3 Differentiated Selling Points

vs LangChain (the most ecologically wide LLM framework):

-DSPy does not do orchestration and integration, but focuses on optimizing the quality of LLM calls themselves-complementary to LangChain

-LangChain the official integration of DSPy, DSPy module can be used as an optimization node in the LangChain Pipeline

-DSPy's optimization ability is not available in LangChain: LangChain helps you build Pipeline, and DSPy helps you tune Pipeline.

-Positioning: LangChain can be used as the framework DSPy as the optimization layer

vs LlamaIndex(RAG dedicated framework):

-LlamaIndex strong in data connection and index management, DSPy strong in LLM call compilation and optimization

-LlamaIndex is suitable for "help LLM find the right data",DSPy is suitable for "let LLM process the data correctly"

-DSPy's RAG optimization (retrieval strategy, build quality) works with LlamaIndex index layers

vs Handwritten Prompt (Traditional Way):

-Handwriting Prompt is art, DSPy is engineering-reproducible, versionable, automatable

-Handwritten Prompt relies on personal experience, DSPy relies on data and indicators-more scientific

-Handwritten Prompt model is equivalent to rewriting, DSPy recompile

-Research data: The optimized Prompt is not necessarily effective when used outside the framework (some scenes are degraded), indicating that DSPy is an overall programming model, not just a "Prompt generator"

vs domestic framework (Dify / Coze/MaxKB, etc.):

-DSPy is the underlying LLM compiler, the domestic framework multi-bias application layer.

-DSPy is more suitable for medium and large teams with ML engineering team and need fine control

-DSPy's academic endorsement (Stanford ICLR) has strong appeal to technology decision makers

8.4 Customer Value Story Line (5-Step Approach)

Cut in:"Is the effect stable after you launch the LLM application? How long did it take to debug the Prompt? Do you have to start all over again to change the model?"
Resonance :"Handwriting Prompt is like manually adjusting assembly code-you have to do it all over again for another CPU (model). The big model itself is evolving, but the approach to Prompt engineering is still stuck in 2023."
Demo : Define a classification task with 5 lines of Python on site → Change the ChainOfThought without changing the code → Automatically compile and optimize with GEPA → Improve the display effect → Save as JSON to prove "reproducible"
Advanced : Show Production Cases-Shopify from Single GPT-5 Call to Self-Hosting Qwen-3-9B Multi-Agent Architecture, 2x Quality Improvement and Cost Reduce from $5 million/Year to Zero
Heavy:"DSPy is not another Prompt template library-it is the compiler methodology for the ICLR 2026 Oral paper. Stanford, MIT Open Source, Shopify/Databricks/Dropbox/AWS are all in production."

9. Frequently Asked Customer Questions

Question	Answer
What is the relationship between DSPy and LangChain/LlamaIndex? Competition or complementarity?	Complementary relationship. LangChain do orchestration (Chain and Agent organization), LlamaIndex do indexing (data connection and retrieval), and DSPy do optimization (LLM call quality improvement). LangChain have DSPy official integration. Best practice: use LangChain/LlamaIndex architecture and DSPy as the optimization layer.
Must I have labeled data to use it?	Optimization (Optimizer) requires calculable evaluation metrics and labeled data. However, if optimization is not required, writing declarative programs directly using Signature Module can also enjoy type safety, modularity and maintainability improvement, and the effect is equivalent to a good zero-sample Prompt.
How much does it cost to optimize once?	Depends on the training set size and optimizer. GEPA is much more efficient than traditional methods (35x fewer rounds than GRPO). Typical scenario: 200 training samples, gpt-4o-mini, GEPA auto = "medium", with a compilation cost of about $2-5. Databricks report 90x total cost reduction (optimization cost vs long-term inference savings).
Support for privatized deployment and homegrown models?	Support. DSPy achieves this through 'dspy. LM()'unified interface to any model backend, including local models deployed in Ollama (such as Qwen, Llama), vLLM, HuggingFace TGI, etc. Shopify is to replace the GPT-5 with a self-hosted Qwen-3-9B.
Can the compiled results be used outside the framework?	Some scenarios are possible, but not recommended. One study (University of Minho, 2025) found that the optimized Prompt may degenerate after leaving the DSPy framework, because DSPy is not just "generating Prompt", but a complete execution model. It is recommended that compiled modules remain loaded within DSPy for use.
How big team is DSPy suitable for? Is the technical threshold high?	Suitable for medium-sized teams with 1-2 ML engineers. The core concept (Signature → Module → Optimizer) takes only half a day to understand. Python basic LLM experience to get started. 60 tutorials and Cookbook covering common scenarios.
Which is better than Fine-tuning?	No contradiction. DSPy's BetterTogether paper proves that the combination of Prompt optimization Fine-tuning is the best. The DSPy optimizer can generate fine-tuning data or just do Prompt optimization. For teams with limited resources, optimizing just the Prompt(zero/few-shot) can yield significant benefits.
How to ensure data security? Does the troubleshooting process upload data?	DSPy itself is a local Python library, and the data does not pass through any remote server (except the LLM API you configured). Optimizer the LLM calls during compilation to go through the model backend you configured (which can be a local model). MLflow Tracing data is stored in your own environment.
How about Chinese task support?	The framework itself is language independent. The Chinese effect depends on the LLM used. With strong Chinese models such as Qwen and DeepSeek, DSPy optimization is equally effective. Signature docstring and field descriptions can be written in Chinese.

10. PoC Recommendations

Recommended PoC Direction: From Handwritten Prompt to DSPy Compilation Optimization

Select an LLM task (classification/extraction/RAG) that the customer already has labeled data, and compare the compiled effect of handwritten Prompt vs DSPy.

Phase	Content	Time	Output
1. Environment construction	pip install dspy, configure LM (the model currently used by the customer)	0.5 days	Runable environment
2. Task Migration	Convert the Prompt logic of an existing task to DSPy Signature (without changing the business logic)	0.5 days	Available Signature versions
3. Baseline test	Run the training set with Predict zero samples and record the baseline accuracy	0.5 days	Baseline metrics
4. GEPA Optimization	Configure metric and run GEPA compilation (auto = "medium")	0.5 days	Optimized version after compilation
5. Effect Evaluation	Compare the handwritten Prompt vs compiled version on the test set, and output an improvement report	0.5 days	Compare the evaluation report
6. Model Migration Verification	Change to a cheaper model (e. g. gpt-4o-mini → gpt-5.4-nano), recompile and compare	0.5 days	Cost Optimization Scheme
7. Production Demo	Shows JSON save/load, MLflow Tracing observability	0.5 days	Full system demo

Overall time: 3.5 days

Validation Metrics:

-Compiled Accuracy> Handwritten Prompt

-Post-Compile Stability (Multiple Run Variance)

-The effect of the compiled version after model migration is close to or higher than that of the original model handwritten version

-Compile time <30 minutes (200 samples)

-Compiled and saved as JSON, ready to load

PoC Success Criteria (Go/No-Go):

-✅Go: The compiled version is at least as effective as the handwritten version, and significantly reduces the workload of prompt maintenance.

-✅Go: The model migration experiment was successful, proving that the model only needs to be recompiled.

-❌No-Go: The customer has no annotation data at all and is unwilling to invest in annotation (at this time, the value of DSPy optimizer cannot be reflected)

11. Risks and Considerations

Risk	Level	Description	Mitigation
Indicator design depends on	High	The effect of DSPy optimization depends entirely on the quality of the metric function. Wrong indicators will lead to "high scores and low energy"-the program optimizes the indicators but does not really improve the quality of the task.	PoC phase gives priority to verifying metric design; Using manual evaluation to cross-validate automatic indicators.
Labeling Data Threshold	High	Labeling Data (at least dozens to hundreds of pieces) is required for Optimizer compilation. Scenarios without labeled data cannot give full play to their core value.	First, guide customers to establish a labeling process (or use existing user feedback data); You can enjoy coding benefits with Signature Module without optimization
Compilation Cost	Medium	The GEPA/MIPROv2 compilation process requires multiple LLM calls, which may result in a one-time cost ranging from $10 to $100 under a large training set.	Use auto = "medium" to control the search space; Use a small sample to pilot and expand again; Compilation is a one-time cost, saving compared with long-term reasoning
Degeneration outside the framework	In	Research shows that the Prompt optimized by DSPy may have a lower effect when used out of the framework.	Keep the compiled results loaded and used in DSPy. Don't try to export to plain text Prompt
Learning Curve			The thinking change from "Writing Prompt String" to "Declarative Programming" takes time, and the team needs to understand Signature/Module/Optimizer three-tier abstraction	60 tutorials cover common tasks; Learning path similar to PyTorch-first run through with Predict, then gradually deepen optimization
Research Attribute	Low-Medium	DSPy originated from academia. Some modules (such as RLM) are marked as experimental. API may have breaking change between versions.	Use stable version (v3.x); Pay attention to Release Notes; Mature modules such as Predict/ChainOfThought/ReAct for critical path production
Weak Chinese Ecology	Chinese	Communities and documents are mainly in English, with few Chinese tutorials and cases	Use Chinese LLM with DSPy framework; Please refer to the English document translation tool
Depth of integration with LangChain and other frameworks	Low	Although there is official integration, the design philosophies of the two frameworks are different. When mixing, attention should be paid to concept mapping	Clear division of labor: layout to LangChain, optimization to DSPy; Start with independent use of DSPy and integrate when necessary
Lock-in risk	Low	MIT protocol, pure Python, independent of any specific model	Compiled results are saved in JSON and auditable; Signature are standard Python classes; Fork friendly

12. My Pre-Sales Judgment

Recommendation: Highly recommended (especially suitable for customers who have LLM applications but have unstable results or need large-scale cost reduction, especially medium and large enterprises with ML engineering capabilities)

Reason:

Methodology Leading :"Programming, not Prompting" is not a marketing slogan-it is a compiler paradigm verified by ICLR 2024 and ICLR 2026(Oral). In an era when AI infrastructure is becoming more and more mature, the manual workshop mode optimized by Prompt will eventually be eliminated, and DSPy is at the forefront of this paradigm shift.

Solid production cases :Shopify (cost reduction 75x, migration from GPT-5 to self-hosted Qwen-3-9B), Databricks (cost reduction 90x, open source model + optimization> Claude Opus 4.1), Dropbox(45% NMSE reduction)-these are not PoC experiments, but production systems with 10 million requests. Each case has an open technical blog/video detailing.

ROI narrative is clear : one-time compilation cost ($2-500) for long-term reasoning cost savings ($500000/year → fraction). This is a strong business argument in front of any CFO.

Engineering Friendly :MIT protocol open source, JSON save/load, MLflow Tracing(OpenTelemetry standard), asynchronous support, thread safety, MCP Server export-the infrastructure required for production is ready.

Academic endorsement business verification double insurance : Stanford team continuous output: 2023 DSPy → 2024 MIPROv2/BetterTogether → 2025 GEPA(ICLR 2026 Oral)/RLM. 523 PR and 109 Release per month prove that the community is extremely active.

Recommended Customer Persona:

-The LLM application has been launched, but the Prompt maintenance cost is high and the effect is unstable.

-ML/engineering team (1-2 people), Python ability

-Want to migrate from large to small models to reduce costs

-RAG/classification/extraction/Agent scenarios with labeled data (or willing to invest in labeling)

-Medium and large enterprises that value observability and engineering

-Use multiple model providers, requiring a unified LLM call layer

Not recommended situations:

-There is no labeled data at all and no plan to label it in the short term (Optimizer value cannot be reflected)

-Requires low-code/no-code platform (Dify/Coze/MaxKB recommended)

-The team has no Python capability at all (API direct or SaaS solutions are recommended)

-Only 1-2 simple Prompt, do not involve optimization (direct handwriting)

-Extremely cost-sensitive and unable to afford LLM calls during compilation

13. REFERENCE

-GitHub repository: https://github.com/stanfordnlp/dspy

-Official website: https://dspy.ai

-Installation Guide: https://dspy.ai/getting-started/installation/

-Getting Started Tutorial: https://dspy.ai/getting-started/program-dont-prompt/

-Production deployment: https://dspy.ai/production/

-Use Case Summary: https://dspy.ai/community/use-cases/

-DSPy Framework Paper (ICLR 2024):https://arxiv.org/abs/2310.03714

-GEPA Papers (ICLR 2026 Oral):https://arxiv.org/abs/2507.19457

-MIPROv2 papers: https://arxiv.org/abs/2406.11695

-BetterTogether(Fine-tuning Prompt Opt):https://arxiv.org/abs/2407.10930

-RLM (Recursive Language Model):https://arxiv.org/abs/2512.24601

-DSP Papers (Frame Origins):https://arxiv.org/abs/2212.14024

-Shopify production case: https://www.youtube.com/watch?v=bxToahwOVpY

-Dropbox Dash Case: https://dropbox.tech/machine-learning/optimizing-dropbox-dash-relevance-judge-with-dspy

-AWS Nova Migration Case: https://aws.amazon.com/blogs/machine-learning/improve-amazon-nova-migration-performance-with-data-aware-prompt-optimization/

-Databricks 90x cost reduction: https://www.databricks.com/blog/building-state-art-enterprise-agents-90x-cheaper-automated-prompt-optimization

-Databricks Genie:https://www.databricks.com/blog/pushing-frontier-data-agents-genie

-JetBlue case: https://www.databricks.com/blog/optimizing-databricks-llm-pipelines-dspy

-Replit Code Fix: https://blog.replit.com/code-repair

-MLflow integration: https://mlflow.org/docs/latest/llms/dspy/index.html

-Observability tutorial: https://dspy.ai/tutorials/observability/

-DeepWiki DSPy:https://deepwiki.com/stanfordnlp/dspy

-Discord Community: https://discord.gg/XCGy2WDCQB

-PyPI:https://pypi.org/project/dspy/

-dspy-go(Go language implementation):https://github.com/darwishdev/dspy-go

-GEPA Independent Library: https://github.com/gepa-ai/gepa

-GEPA Use Case Set: https://gepa-ai.github.io/gepa/guides/use-cases/

-Medical Imaging DSPy Application: https://arxiv.org/abs/2511.11898

-Multi-scene DSPy optimization study: https://www.alphaxiv.co/overview/2507.03620

analysis date: 2026-07-02 | data aging: GitHub real-time access, official website content as of access date, v3.3.0b1 *