← Back to Project List
DSPy is an open source LLM programming framework (MIT protocol, 35,700 + Stars) of Stanford NLP laboratory. the core idea is "Programming, not Prompting"-replace handwritten Prompt with declarative Python code and optimize the entire LLM pipeline through automatic compiler (Optimizer). Monthly downloads were 6.4 million plus, 433 plus contributors, 9 academic papers covering ICLR 2024 to ICLR 2026, and the latest GEPA optimizer was awarded ICLR 2026 Oral. Used in production by Shopify (75x cost reduction), Databricks (90x cost reduction), Dropbox(45% NMSE reduction), AWS, JetBlue, Replit, Sephora, VMware, Moody's, etc. DSPy's core value is to change Prompt engineering from "craft work" to "compilation science"-to move AI systems from fragile string stitching to maintainable, optimizable and reproducible engineering practices.

1. Project/Product Overview

DimensionInformation
Project nameDSPy(Declarative Self-improving Python)
DeveloperStanford NLP Labs (Omar Khattab, Christopher Potts, Matei Zaharia, etc.)
Open Source LicenseMIT
Main LanguagePython
GitHub Stars35,718(2026-07-02 query)
Forks3,039
Commits4,550
Created2023-01-09
Last Updated2026-07-01 (Activity: High, 523 + PR/Year)
Latest Versionv3.3.0b1(ReActV2 Module + Improved LM/BaseLM), 109 Release in total
Monthly Downloads6.4 million +(PyPI)
Contributors433 +
Discord8,400 + members
Academic Papers9 (ICLR 2024 Papers + ICLR 2026 GEPA Oral), and 60 + Tutorials and Cookbook
official websitehttps://dspy.ai
Production usersShopify (cost reduction 75x), Databricks (cost reduction 90x,Genie platform), Dropbox(Dash correlation score, 45% NMSE reduction), AWS(Nova model migration), JetBlue (multiple chatbots on the Databricks), Replit (code repair), Sephora, VMware, Moody's, Nous Research(Hermes Agent self-evolution)
Dependent1,800 + Downstream Items
ObservabilityNative MLflow Tracing (OpenTelemetry-based) integration
Enterprise ToolsMLflow Model Serving deployment, PyPI publishing, MCP Server export
Related projectsdspy-go(Go language compatible implementation), GEPA independent library

2. What does it mostly do?

DSPy's core innovation is to upgrade LLM calls from "handwritten strings" to "declarative compilers". This isn't another Prompt template library-it's a complete compiler framework.

2.1 core three elements

DSPy builds and optimizes LLM systems through three layers of abstraction:

ElementsDescriptionsTechnical Features
Signature (Signature)Use typed Python Field to declare the input and output of a task instead of a handwritten Prompt string.Support type constraints, description fields, default values, Literal types, and Image multimodal fields.
Module (Module)Combinable components that control the execution strategy of LLM. The same Signature can replace different ModulesPredict (direct), ChainOfThought (step-by-step inference), ReAct (tool inference loop), ReActV2, MultiChainComparison, ProgramOfThought, RLM (recursive language model)
Optimizer (optimizer)automatic compilation: automatically search for optimal Prompt and examples according to labeled data and evaluation indicatorsGEPA(ICLR 2026 Oral, reflective gene evolution, 35x fewer rounds than GRPO), MIPROv2 (joint optimization instructions and examples), BootstrapFewShot, etc.

2.2 Compiler Workflow

传统 Prompt 工程:
  写 Prompt 字符串 → 测试 → 效果不好 → 改字符串 → 再测试 → 换模型又得重来

DSPy 工作流:
  定义 Signature(声明任务) → 选 Module(执行策略) → Optimizer 编译(自动优化) → 保存为 JSON

Key difference: DSPy's optimization results in a savable, reusable, versioned JSON file, rather than a "by chance" piece of text. After changing the model, you only need to recompile it once, and the code does not need to be changed.

2.3 Core Concept: From Prompt to Program

DSPy takes LLM application development from "assembly language"(Prompt strings) to "high-level language" (declarative Python):

# 传统方式:脆弱的 Prompt 字符串
prompt = """You are a helpful assistant. Given the following email, extract
the event name and date. Return in JSON format. Email: {email}"""

# DSPy 方式:声明式类型化签名
class ExtractEvent(dspy.Signature):
    """Extract event details from an email."""
    email: str = dspy.InputField()
    event_name: str = dspy.OutputField()
    date: str = dspy.OutputField()

extract = dspy.Predict(ExtractEvent)
result = extract(email="Team offsite this Thursday at 2pm")
# Prediction(event_name="Team Offsite", date="Thursday")

When you need step-by-step reasoning, just replace the Module and the code does not change:

extract = dspy.ChainOfThought(ExtractEvent)  # 自动加推理步骤

When a tool call is required:

agent = dspy.ReAct("question -> answer", tools=[search, calc])

2.4 Advanced Competency

-Multimodal:'dspy. Image' as a field type, directly processing pictures, charts

-Assertions (assertion): runtime constraint check, automatic retry error correction

-PythonInterpreter: Security sandbox code execution, support for mathematical reasoning

-RLM (Recursive Language Model): Recursive Reasoning Module (new paper December 2025)

-Fine-tuning Integration: The optimizer not only adjusts Prompt, but also generates fine-tuning data (BetterTogether papers)

-Asynchronous support: Native 'async' execution, thread-safe, suitable for high-throughput scenarios

-Observability:MLflow Tracing (based on OpenTelemetry), each call can be traced

3. Applicable Scenario

ScenarioDescriptionTypical Customer/Case
Prompt Optimization ProjectWith labeled data, you need to systematically find the optimal Prompt and sample combinationAny LLM application team that has been online but does not meet the performance standards
Classification/Extraction TaskText Classification, Entity Extraction, Intent Recognition, Structured Information ExtractionShopify (full-platform merchant metadata extraction, cost reduction 75x)
RAG Pipeline Optimization Optimized Retrieval → Sorting → Generated Complete Pipeline, Automatic Reference AdjustmentDropbox Dash (Correlation Score, 45% NMSE Reduction)
Model Migration/Cost ReductionMigrate from a large model to a small model, and compile the small model output to approach the large modelAWS Nova migration and Databricks(90x cost reduction)
Agent Behavior OptimizationAutomatically optimizes the Agent inference chain and tool invocation strategyDatabricks Genie (table search) and Replit (code repair)
Multimodal TaskImage Understanding, Chart Analysis (Image Field Type)Medical Imaging (Stanford Prompt Triage, up to +3400% improvement)
Security DetectionJailbreak detection, Prompt injection detectionDSPy security Pipeline (including cryptography untamperable state)
Quality Assessment AutomationBuild an automated scoring/assessment pipeline to reduce manual labelingLLM-as-Judge system for each enterprise
Research and experimentationSystematic experiments to quickly compare different models, strategies, and optimization methodsAcademic research, AB testing

4. Not quite the scene

ScenarioReasonAlternative Suggestions
Simple One-time Prompt **DSPy's optimized compilation requires labeling data and evaluation indicators, and a single call is over-designedWrite a few lines of Prompt directly or call directly with API
No labeled data at allOptimizer must rely on calculable indicators (accuracy, F1, etc.)First use manual evaluation to go online, accumulate labeled data, and then introduce DSPy
Low-Code/Drag-and-Drop DevelopmentDSPy is a pure Python framework with no GUI interfaceDify / Coze / MaxKB
High requirements for indicator designThe optimization effect depends heavily on the quality of the evaluation indicators, and incorrect indicators will lead to degradation.Someone needs to design a metric function.
Compilation cost sensitive *GEPA/MIPROv2 optimization process requires a large number of LLM calls (although 35x more efficient than RL)Small tasks can be compiled without signing modules
Requires extremely low latencyThe compiled module still calls LLM, which cannot reduce the inference latency itselfModel distillation, quantization
Pure non-LLM tasksDSPy is an LLM programming framework, not suitable for traditional ML/rule systemsScikit-learn, XGBoost, etc.

5. Core Competence List

5.1 Signature (signature) ability

inline shorthand signature'"question -> answer"' one-line definition
Class definition signatureInherited 'dspy. Signature', supports InputField/OutputField type declarations
Type Constraints'str', 'int', 'float', 'bool', 'Literal["a","B"]', 'list[dict]', etc.
Multimodal Field'dspy. Image', 'dspy. Video' as field type
Description field'desc = "often between 1 and 5 words" 'Constraint output format
Default'dspy. InputField(default="N/A")'
docstring'"" "Task description." ""' as a task semantic description

5.2 Module Matrix

ModulePurposeApplicable scenarios
'dspy. Predict'Call LLM directly without additional inferenceSimple classification, extraction
'dspy. ChainOfThought'Require LLM to reason before outputtingTasks that require logical deduction
'dspy. ReAct'Inference action loop, support tool invocationAgent, multi-step inference, search
'dspy. ReActV2'Improved ReAct, better inference structure (v3.3.0b1 new)Complex Agent Scenarios
'dspy. MultiChainComparison'Best output after comparing multiple inference chainsTasks to be consensus
'dspy. ProgramOfThought'Code Generation Execution InferenceMath Inference, Programming Tasks
'dspy. RLM'Recursive Language Model, Multi-Layer Nested ReasoningDeep Reasoning, Complex Logic
'dspy. Module' (Custom)Combine multiple Signature and ModulesComplex multi-step Pipeline

5.3 Optimizer (optimizer) matrix

OptimizerPrincipleApplicable Scenarios
GEPAGene Evolution Pareto Frontier Natural Language Reflection (ICLR 2026 Oral)The current most recommended production-level optimizer, 35x fewer rounds than GRPO
MIPROv2Joint optimization command text and few-shot exampleBasic optimization requirements, zero sample to less sample
BootstrapFewShotAutomatically select the best examples from the training setQuickly build a few-shot program
BootstrapFewShotWithRandomSearchRandom Search Example GuideExplore a Larger Search Space

5.4 Enterprise and Engineering Capabilities

MLflow TracingTracks every LLM call based on OpenTelemetry's native observability
MLflow Model ServingDeploy a compiled program as a production API with one click
Asynchronous execution'dspy.asyncify()'supports high concurrency scenarios
Thread SafeMulti-threaded environment safe for production services
Program Save/Load'program.save("model.json")' / 'dspy. Module.load("model.json")'
Multiple Backend LM SupportVia 'dspy. LM()'unified interface, supporting OpenAI, Anthropic, local models, etc.
LiteLLM IntegrationLazy Load, Activate on Demand
Go language compatibilitydspy-go community projects to implement compatible operation of DSPy modules in Go

6. Architecture/deployment/integration approach

6.1 installation

# 稳定版
pip install -U dspy

# 最新开发版(从 main 分支)
pip install git+https://github.com/stanfordnlp/dspy.git

6.2 LM Configuration (Unified Model Interface)

DSPy via 'dspy. LM()'unified management of various LLM back-ends, code does not need to change due to model changes:

# OpenAI 模型
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Anthropic 模型
lm = dspy.LM("anthropic/claude-sonnet-4-20250514")

# 本地开源模型(通过 LiteLLM 或直接调用)
lm = dspy.LM("ollama/llama3.1")

# Azure OpenAI
lm = dspy.LM("azure/gpt-4o", api_key="...", api_version="...")

6.3 deployment mode

ModeDescription
Local OSS'pip install dspy', pure Python, suitable for development experiments
MLflow Model ServingPackage the compiled program as an MLflow model and deploy it as a REST API
MLflow TracingOpenTelemetry-based call tracing for production monitoring
Asynchronous deployment'dspy.asyncify(program)'is converted to an asynchronous version and supports asyncio.
MCP ServerYou can export DSPy programs as MCP-compatible service endpoints

6.4 Integrated Ecosystem

-Observability:MLflow Tracing(OpenTelemetry standard)

-Deployment:MLflow Model Serving, PyPI release

-Model providers:OpenAI, Anthropic, Cohere, Mistral, Google Gemini, AWS Bedrock, Azure OpenAI, Ollama, on-premises HuggingFace models, and more

-Vector database: natively integrated with ColBERTv2 and other retriever, and can also access any vector library through tool functions.

-With other frameworks:LangChain DSPy official integration, DSPy module can be embedded in LangChain Pipeline;LlamaIndex can also be used complementary to DSPy

-Go Language: Community dspy-go project provides compatible implementation

How to use #7.

7.1 Minimum Complete Example

import dspy

# 1. 配置语言模型
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# 2. 定义任务签名(替代手写 Prompt)
class SentimentClassifier(dspy.Signature):
    """Classify the sentiment of a sentence as positive, negative, or neutral."""
    sentence: str = dspy.InputField()
    sentiment: str = dspy.OutputField(desc="one of: positive, negative, neutral")

# 3. 选择执行策略
classify = dspy.ChainOfThought(SentimentClassifier)

# 4. 直接调用(零样本)
result = classify(sentence="The new feature is amazing and works flawlessly!")
print(result.sentiment)  # "positive"

7.2 complete training process with optimization

import dspy

# 准备训练数据
trainset = [
    dspy.Example(sentence="I love this product!", sentiment="positive").with_inputs("sentence"),
    dspy.Example(sentence="This is terrible.", sentiment="negative").with_inputs("sentence"),
    dspy.Example(sentence="It's okay, nothing special.", sentiment="neutral").with_inputs("sentence"),
    # ... 更多训练样本
]

# 定义评价指标
def sentiment_accuracy(example, pred, trace=None):
    return pred.sentiment.lower() == example.sentiment.lower()

# 创建分类器和优化器
classify = dspy.ChainOfThought(SentimentClassifier)
optimizer = dspy.GEPA(metric=sentiment_accuracy, auto="medium")

# 编译优化
optimized_classify = optimizer.compile(classify, trainset=trainset)

# 保存编译结果
optimized_classify.save("sentiment_classifier_v1.json")

# 加载使用
loaded = dspy.ChainOfThought(SentimentClassifier)
loaded.load("sentiment_classifier_v1.json")
result = loaded(sentence="Absolutely fantastic experience!")
print(result.sentiment)  # "positive"

7.3 Agent Example (ReAct Tool Call)

import dspy

lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=lm)

# 定义工具(Python 函数即可)
def search(query: str) -> list[str]:
    """Search a knowledge base for relevant documents."""
    return knowledge_base.query(query, k=3)

def calculator(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return dspy.PythonInterpreter({}).execute(expression)

# 创建 Agent
agent = dspy.ReAct("question -> answer", tools=[search, calculator])

# 使用 Agent
result = agent(question="What is the GDP per capita of France based on 2024 data?")
# Agent 自动推理:需要 GDP 数据和人口数据 → search → 计算除法
print(result.answer)  # "$46,029"

7.4 Multimodal Example

class AnalyzeChart(dspy.Signature):
    """Describe the trend and key data points in a chart."""
    chart: dspy.Image = dspy.InputField()
    title: str = dspy.OutputField()
    trend: str = dspy.OutputField()
    data_points: list[dict] = dspy.OutputField()

analyze = dspy.Predict(AnalyzeChart)
result = analyze(chart=dspy.Image("quarterly_revenue.png"))

8. What can I say before sales

8.1 a sentence positioning

  • * "DSPy is Stanford's LLM compiler-turning your AI system from a handwritten Prompt 'handcrafted workshop 'into an automatically optimized 'industrial production line', Shopify saving 75 times the cost and 90 times the Databricks. "**

8.2 customer pain points → solutions

Customer Pain PointsDSPy SolutionQuantified Effects
"Prompt has been adjusted for two months, but the effect is still unstable"Optimizer automatic compilation, far better than manual Prompt found in a few hoursGEPA typical improvement: + 14% accuracy, Prompt shortened by 9.2x
"It's too painful to rewrite all the Prompt when changing the model"The Signature is decoupled from the model. When changing the model, the code only needs to be recompiled once, and the code does not move.AWS Nova Migration Case: Large Model → Small Model Seamless Switching
"A lot of data is marked, and I don't know how to use it to improve the effect"Optimizer directly use the marked data as the optimization signal200 samples can raise the baseline from 62% to 89%
"The big model is too expensive, I want to use the small model but the effect is poor"Compile optimization to make the Prompt quality of the small model approach or exceed the zero sample of the big modelDatabricks: open source model + GEPA optimization> Claude Opus 4.1 (zero sample)
"AI system is a black box, I don't know which link is in trouble"MLflow Tracing tracks each LLM call, each step can be observednative OpenTelemetry standard, access to existing monitoring
"Three people in the team maintain 50 Prompt, going crazy"Signature + Module code management, Prompt becomes version controllable JSONGit management, Diff comparison, CI/CD
"Agent behavior is unstable, sometimes easy to use and sometimes nonsense"Optimizer can optimize the agent's tool invocation policy and inference chainReplit code repair, Databricks Genie Agent is verified in production

8.3 Differentiated Selling Points

vs LangChain (the most ecologically wide LLM framework):

-DSPy does not do orchestration and integration, but focuses on optimizing the quality of LLM calls themselves-complementary to LangChain

-LangChain the official integration of DSPy, DSPy module can be used as an optimization node in the LangChain Pipeline

-DSPy's optimization ability is not available in LangChain: LangChain helps you build Pipeline, and DSPy helps you tune Pipeline.

-Positioning: LangChain can be used as the framework DSPy as the optimization layer

vs LlamaIndex(RAG dedicated framework):

-LlamaIndex strong in data connection and index management, DSPy strong in LLM call compilation and optimization

-LlamaIndex is suitable for "help LLM find the right data",DSPy is suitable for "let LLM process the data correctly"

-DSPy's RAG optimization (retrieval strategy, build quality) works with LlamaIndex index layers

vs Handwritten Prompt (Traditional Way):

-Handwriting Prompt is art, DSPy is engineering-reproducible, versionable, automatable

-Handwritten Prompt relies on personal experience, DSPy relies on data and indicators-more scientific

-Handwritten Prompt model is equivalent to rewriting, DSPy recompile

-Research data: The optimized Prompt is not necessarily effective when used outside the framework (some scenes are degraded), indicating that DSPy is an overall programming model, not just a "Prompt generator"

vs domestic framework (Dify / Coze/MaxKB, etc.):

-DSPy is the underlying LLM compiler, the domestic framework multi-bias application layer.

-DSPy is more suitable for medium and large teams with ML engineering team and need fine control

-DSPy's academic endorsement (Stanford ICLR) has strong appeal to technology decision makers

8.4 Customer Value Story Line (5-Step Approach)

  1. Cut in:"Is the effect stable after you launch the LLM application? How long did it take to debug the Prompt? Do you have to start all over again to change the model?"
  2. Resonance :"Handwriting Prompt is like manually adjusting assembly code-you have to do it all over again for another CPU (model). The big model itself is evolving, but the approach to Prompt engineering is still stuck in 2023."
  3. Demo : Define a classification task with 5 lines of Python on site → Change the ChainOfThought without changing the code → Automatically compile and optimize with GEPA → Improve the display effect → Save as JSON to prove "reproducible"
  4. Advanced : Show Production Cases-Shopify from Single GPT-5 Call to Self-Hosting Qwen-3-9B Multi-Agent Architecture, 2x Quality Improvement and Cost Reduce from $5 million/Year to Zero
  5. Heavy:"DSPy is not another Prompt template library-it is the compiler methodology for the ICLR 2026 Oral paper. Stanford, MIT Open Source, Shopify/Databricks/Dropbox/AWS are all in production."

9. Frequently Asked Customer Questions

QuestionAnswer
What is the relationship between DSPy and LangChain/LlamaIndex? Competition or complementarity?Complementary relationship. LangChain do orchestration (Chain and Agent organization), LlamaIndex do indexing (data connection and retrieval), and DSPy do optimization (LLM call quality improvement). LangChain have DSPy official integration. Best practice: use LangChain/LlamaIndex architecture and DSPy as the optimization layer.
Must I have labeled data to use it?Optimization (Optimizer) requires calculable evaluation metrics and labeled data. However, if optimization is not required, writing declarative programs directly using Signature Module can also enjoy type safety, modularity and maintainability improvement, and the effect is equivalent to a good zero-sample Prompt.
How much does it cost to optimize once?Depends on the training set size and optimizer. GEPA is much more efficient than traditional methods (35x fewer rounds than GRPO). Typical scenario: 200 training samples, gpt-4o-mini, GEPA auto = "medium", with a compilation cost of about $2-5. Databricks report 90x total cost reduction (optimization cost vs long-term inference savings).
Support for privatized deployment and homegrown models?Support. DSPy achieves this through 'dspy. LM()'unified interface to any model backend, including local models deployed in Ollama (such as Qwen, Llama), vLLM, HuggingFace TGI, etc. Shopify is to replace the GPT-5 with a self-hosted Qwen-3-9B.
Can the compiled results be used outside the framework?Some scenarios are possible, but not recommended. One study (University of Minho, 2025) found that the optimized Prompt may degenerate after leaving the DSPy framework, because DSPy is not just "generating Prompt", but a complete execution model. It is recommended that compiled modules remain loaded within DSPy for use.
How big team is DSPy suitable for? Is the technical threshold high?Suitable for medium-sized teams with 1-2 ML engineers. The core concept (Signature → Module → Optimizer) takes only half a day to understand. Python basic LLM experience to get started. 60 tutorials and Cookbook covering common scenarios.
Which is better than Fine-tuning?No contradiction. DSPy's BetterTogether paper proves that the combination of Prompt optimization Fine-tuning is the best. The DSPy optimizer can generate fine-tuning data or just do Prompt optimization. For teams with limited resources, optimizing just the Prompt(zero/few-shot) can yield significant benefits.
How to ensure data security? Does the troubleshooting process upload data?DSPy itself is a local Python library, and the data does not pass through any remote server (except the LLM API you configured). Optimizer the LLM calls during compilation to go through the model backend you configured (which can be a local model). MLflow Tracing data is stored in your own environment.
How about Chinese task support?The framework itself is language independent. The Chinese effect depends on the LLM used. With strong Chinese models such as Qwen and DeepSeek, DSPy optimization is equally effective. Signature docstring and field descriptions can be written in Chinese.

10. PoC Recommendations

Recommended PoC Direction: From Handwritten Prompt to DSPy Compilation Optimization

Select an LLM task (classification/extraction/RAG) that the customer already has labeled data, and compare the compiled effect of handwritten Prompt vs DSPy.

PhaseContentTimeOutput
1. Environment constructionpip install dspy, configure LM (the model currently used by the customer)0.5 daysRunable environment
2. Task MigrationConvert the Prompt logic of an existing task to DSPy Signature (without changing the business logic)0.5 daysAvailable Signature versions
3. Baseline testRun the training set with Predict zero samples and record the baseline accuracy0.5 daysBaseline metrics
4. GEPA OptimizationConfigure metric and run GEPA compilation (auto = "medium")0.5 daysOptimized version after compilation
5. Effect EvaluationCompare the handwritten Prompt vs compiled version on the test set, and output an improvement report0.5 daysCompare the evaluation report
6. Model Migration VerificationChange to a cheaper model (e. g. gpt-4o-mini → gpt-5.4-nano), recompile and compare0.5 daysCost Optimization Scheme
7. Production DemoShows JSON save/load, MLflow Tracing observability0.5 daysFull system demo

Overall time: 3.5 days

Validation Metrics:

-Compiled Accuracy> Handwritten Prompt

-Post-Compile Stability (Multiple Run Variance)

-The effect of the compiled version after model migration is close to or higher than that of the original model handwritten version

-Compile time <30 minutes (200 samples)

-Compiled and saved as JSON, ready to load

PoC Success Criteria (Go/No-Go):

-✅Go: The compiled version is at least as effective as the handwritten version, and significantly reduces the workload of prompt maintenance.

-✅Go: The model migration experiment was successful, proving that the model only needs to be recompiled.

-❌No-Go: The customer has no annotation data at all and is unwilling to invest in annotation (at this time, the value of DSPy optimizer cannot be reflected)

11. Risks and Considerations

RiskLevelDescriptionMitigation
Indicator design depends onHighThe effect of DSPy optimization depends entirely on the quality of the metric function. Wrong indicators will lead to "high scores and low energy"-the program optimizes the indicators but does not really improve the quality of the task.PoC phase gives priority to verifying metric design; Using manual evaluation to cross-validate automatic indicators.
Labeling Data Threshold HighLabeling Data (at least dozens to hundreds of pieces) is required for Optimizer compilation. Scenarios without labeled data cannot give full play to their core value.First, guide customers to establish a labeling process (or use existing user feedback data); You can enjoy coding benefits with Signature Module without optimization
Compilation Cost MediumThe GEPA/MIPROv2 compilation process requires multiple LLM calls, which may result in a one-time cost ranging from $10 to $100 under a large training set.Use auto = "medium" to control the search space; Use a small sample to pilot and expand again; Compilation is a one-time cost, saving compared with long-term reasoning
Degeneration outside the framework InResearch shows that the Prompt optimized by DSPy may have a lower effect when used out of the framework.Keep the compiled results loaded and used in DSPy. Don't try to export to plain text Prompt
Learning Curve The thinking change from "Writing Prompt String" to "Declarative Programming" takes time, and the team needs to understand Signature/Module/Optimizer three-tier abstraction60 tutorials cover common tasks; Learning path similar to PyTorch-first run through with Predict, then gradually deepen optimization
Research Attribute Low-MediumDSPy originated from academia. Some modules (such as RLM) are marked as experimental. API may have breaking change between versions.Use stable version (v3.x); Pay attention to Release Notes; Mature modules such as Predict/ChainOfThought/ReAct for critical path production
Weak Chinese EcologyChineseCommunities and documents are mainly in English, with few Chinese tutorials and casesUse Chinese LLM with DSPy framework; Please refer to the English document translation tool
Depth of integration with LangChain and other frameworks LowAlthough there is official integration, the design philosophies of the two frameworks are different. When mixing, attention should be paid to concept mappingClear division of labor: layout to LangChain, optimization to DSPy; Start with independent use of DSPy and integrate when necessary
Lock-in riskLowMIT protocol, pure Python, independent of any specific modelCompiled results are saved in JSON and auditable; Signature are standard Python classes; Fork friendly

12. My Pre-Sales Judgment

Recommendation: Highly recommended (especially suitable for customers who have LLM applications but have unstable results or need large-scale cost reduction, especially medium and large enterprises with ML engineering capabilities)

Reason:

  1. Methodology Leading :"Programming, not Prompting" is not a marketing slogan-it is a compiler paradigm verified by ICLR 2024 and ICLR 2026(Oral). In an era when AI infrastructure is becoming more and more mature, the manual workshop mode optimized by Prompt will eventually be eliminated, and DSPy is at the forefront of this paradigm shift.
  1. Solid production cases :Shopify (cost reduction 75x, migration from GPT-5 to self-hosted Qwen-3-9B), Databricks (cost reduction 90x, open source model + optimization> Claude Opus 4.1), Dropbox(45% NMSE reduction)-these are not PoC experiments, but production systems with 10 million requests. Each case has an open technical blog/video detailing.
  1. ROI narrative is clear : one-time compilation cost ($2-500) for long-term reasoning cost savings ($500000/year → fraction). This is a strong business argument in front of any CFO.
  1. Engineering Friendly :MIT protocol open source, JSON save/load, MLflow Tracing(OpenTelemetry standard), asynchronous support, thread safety, MCP Server export-the infrastructure required for production is ready.
  1. Academic endorsement business verification double insurance : Stanford team continuous output: 2023 DSPy → 2024 MIPROv2/BetterTogether → 2025 GEPA(ICLR 2026 Oral)/RLM. 523 PR and 109 Release per month prove that the community is extremely active.

Recommended Customer Persona:

-The LLM application has been launched, but the Prompt maintenance cost is high and the effect is unstable.

-ML/engineering team (1-2 people), Python ability

-Want to migrate from large to small models to reduce costs

-RAG/classification/extraction/Agent scenarios with labeled data (or willing to invest in labeling)

-Medium and large enterprises that value observability and engineering

-Use multiple model providers, requiring a unified LLM call layer

Not recommended situations:

-There is no labeled data at all and no plan to label it in the short term (Optimizer value cannot be reflected)

-Requires low-code/no-code platform (Dify/Coze/MaxKB recommended)

-The team has no Python capability at all (API direct or SaaS solutions are recommended)

-Only 1-2 simple Prompt, do not involve optimization (direct handwriting)

-Extremely cost-sensitive and unable to afford LLM calls during compilation

13. REFERENCE

-GitHub repository: https://github.com/stanfordnlp/dspy

-Official website: https://dspy.ai

-Installation Guide: https://dspy.ai/getting-started/installation/

-Getting Started Tutorial: https://dspy.ai/getting-started/program-dont-prompt/

-Production deployment: https://dspy.ai/production/

-Use Case Summary: https://dspy.ai/community/use-cases/

-DSPy Framework Paper (ICLR 2024):https://arxiv.org/abs/2310.03714

-GEPA Papers (ICLR 2026 Oral):https://arxiv.org/abs/2507.19457

-MIPROv2 papers: https://arxiv.org/abs/2406.11695

-BetterTogether(Fine-tuning Prompt Opt):https://arxiv.org/abs/2407.10930

-RLM (Recursive Language Model):https://arxiv.org/abs/2512.24601

-DSP Papers (Frame Origins):https://arxiv.org/abs/2212.14024

-Shopify production case: https://www.youtube.com/watch?v=bxToahwOVpY

-Dropbox Dash Case: https://dropbox.tech/machine-learning/optimizing-dropbox-dash-relevance-judge-with-dspy

-AWS Nova Migration Case: https://aws.amazon.com/blogs/machine-learning/improve-amazon-nova-migration-performance-with-data-aware-prompt-optimization/

-Databricks 90x cost reduction: https://www.databricks.com/blog/building-state-art-enterprise-agents-90x-cheaper-automated-prompt-optimization

-Databricks Genie:https://www.databricks.com/blog/pushing-frontier-data-agents-genie

-JetBlue case: https://www.databricks.com/blog/optimizing-databricks-llm-pipelines-dspy

-Replit Code Fix: https://blog.replit.com/code-repair

-MLflow integration: https://mlflow.org/docs/latest/llms/dspy/index.html

-Observability tutorial: https://dspy.ai/tutorials/observability/

-DeepWiki DSPy:https://deepwiki.com/stanfordnlp/dspy

-Discord Community: https://discord.gg/XCGy2WDCQB

-PyPI:https://pypi.org/project/dspy/

-dspy-go(Go language implementation):https://github.com/darwishdev/dspy-go

-GEPA Independent Library: https://github.com/gepa-ai/gepa

-GEPA Use Case Set: https://gepa-ai.github.io/gepa/guides/use-cases/

-Medical Imaging DSPy Application: https://arxiv.org/abs/2511.11898

-Multi-scene DSPy optimization study: https://www.alphaxiv.co/overview/2507.03620

  • analysis date: 2026-07-02 | data aging: GitHub real-time access, official website content as of access date, v3.3.0b1 *