Introducing two Nouns that make working with DSPy clearer.
AI-generated entry. See What & Why for context.
DSPy has modules, predictors, optimizers, signatures… the vocabulary can blur together. I kept finding myself wanting clearer terms for two specific things:
-
What do I call ChainOfThought, ReAct, BestOfN? They’re not optimizers. They’re not plain modules. They do something specific: augment how predictions happen at runtime.
-
What’s the thing I pass to
compile()? It’s a module, sure. But it’s the top-level module, the unit of optimization. That feels distinct.
I’ve started using two terms that help me think about this: Strategy and Program.
These aren’t official DSPy vocabulary. They’re mental models I find useful. Maybe you will too.
Background: The Existing Concepts
Quick recap of the DSPy concepts you already know:
Signature — Declares what a module does (inputs → outputs)
"question -> answer"
# or class-based with descriptions
Predict — The atomic unit that makes ONE LM call
predict = dspy.Predict("question -> answer")
Module — Base class for composing DSPy components
class MyModule(dspy.Module):
def forward(self, question):
...
Optimizer — Tunes prompts/demos at compile-time (used to be called Teleprompters)
optimizer.compile(module, trainset=data)
Now for the two concepts I’m introducing.
Introducing: Strategy
What I mean by “Strategy”: A module that augments how Predict runs at runtime.
DSPy ships with ChainOfThought, ReAct, BestOfN, and several others. But what are these things? They’re not optimizers — they don’t tune anything. They’re not plain modules — they specifically change how LM calls happen.
I call them strategies: runtime techniques for improving LM responses.
Most strategies are built-in, but you can write your own.
Strategy vs. Optimizer
Both strategies and optimizers exist to improve LM output. The difference is when they operate:
| Strategy | Optimizer | |
|---|---|---|
| When | Runtime (during forward()) | Compile-time (during compile()) |
| How | Changes how LM is called | Tunes prompts/demos/weights |
| Example | ChainOfThought adds reasoning | MIPROv2 optimizes instructions |
A strategy changes the mechanics of prediction. An optimizer changes the content of prompts.
Complete List of Strategies
| Strategy | What It Does |
|---|---|
| ChainOfThought | Adds reasoning step before prediction |
| ReAct | Iterative reasoning + tool usage |
| ProgramOfThought | Generates and executes Python code |
| CodeAct | Code generation with tool calling |
| BestOfN | Run N times, return best by reward |
| Refine | Run N times with iterative feedback |
| MultiChainComparison | Compare M reasoning attempts |
| Avatar | Dynamic agent with tool selection |
Code Example
# Strategy = runtime augmentation of Predict
cot = dspy.ChainOfThought("question -> answer")
# At runtime: adds reasoning field, asks LM to think step-by-step
react = dspy.ReAct("question -> answer", tools=[search])
# At runtime: loops through thought → action → observation
The strategy doesn’t change your prompts ahead of time. It changes what happens when you call forward().
Introducing: Program
What I mean by “Program”: Any module that gets passed to an optimizer’s compile(). It’s the top-level module being optimized.
In DSPy, you might have nested modules — modules containing modules containing Predicts. When you optimize, which one is “the program”? The one you pass to compile(). That’s the program.
Why This Term Helps
Coordinated optimization: When you optimize a program, ALL predictors within it get optimized together. The optimizer walks through named_predictors() and tunes them as a coordinated whole. It doesn’t matter how deeply nested your Predicts are. If they’re inside the module you pass to compile(), they get tuned.
Boundary between AI and non-AI code: The program marks where your deterministic code ends and your AI code begins. This boundary matters because:
- The program is stateless. It takes inputs, returns outputs, no side effects. (The tools you pass to a program may be stateful—databases, APIs, external services—but the program itself remains stateless.)
- The program’s code shouldn’t change much over time. What changes is the training data and hyperparameters you pass to
compile(). - Your non-AI code (API routes, data pipelines, UI) calls the program as a black box.
Compile once, deploy anywhere: A compiled program can be exported (saved) and imported at runtime for inference. You don’t re-run the optimizer in production. You load the already-optimized program and call it.
A/B testing: Different compiled versions of the same program—trained on different data, with different hyperparameters, or different optimizers—can be deployed side by side. Same code, different compilations. This makes experimentation clean.
This mental model keeps things organized. Define a program, optimize it, export it, deploy it as a unit.
Program vs. Module vs. Strategy
| Module | Strategy | Program | |
|---|---|---|---|
| What | Base class | Runtime augmentation | Top-level module being optimized |
| Role | Building block | Improves LM responses at runtime | Unit of optimization |
A module is the building block. A strategy is a specific kind of module that augments prediction. A program is whatever module you hand to the optimizer.
Three Examples of Programs
Example 1: Direct call to Predict or a built-in strategy
# A single Predict is a program when optimized
program = dspy.Predict("question -> answer")
optimized = optimizer.compile(program, trainset=data)
# A single strategy is also a program when optimized
program = dspy.ChainOfThought("question -> answer")
optimized = optimizer.compile(program, trainset=data)
Example 2: A module that calls Predict or a built-in strategy
class QA(dspy.Module):
def __init__(self):
self.answer = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.answer(question=question)
program = QA()
optimized = optimizer.compile(program, trainset=data)
Example 3: A module that has other modules
class MultiHop(dspy.Module):
def __init__(self):
self.hop1 = HopModule() # Another module
self.hop2 = HopModule() # Another module
self.final = dspy.Predict("context -> answer")
def forward(self, question):
context = self.hop1(question=question)
context = self.hop2(question=question, context=context)
return self.final(context=context)
program = MultiHop()
optimized = optimizer.compile(program, trainset=data)
# All predictors across all nested modules get tuned together
Putting It Together
Here’s the mental model:
┌───────────────────────────────────────────────┐
│ PROGRAM │
│ (top-level, unit of optimization) │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Strategy │ │ Strategy │ │
│ │ (CoT) │ --> │ (Predict) │ │
│ │ │ │ │ │
│ │ [Predict] │ │ │ │
│ └─────────────┘ └─────────────┘ │
└───────────────────────────────────────────────┘
│
│ compile()
▼
┌──────────────┐
│ Optimizer │
│ (MIPROv2) │
└──────────────┘
The workflow:
- Build your program using strategies (ChainOfThought, ReAct, etc.)
- Pass the program to an optimizer
- All predictors within get tuned together
- Deploy the optimized program
Strategies and optimizers aren’t competing approaches. They’re complementary. Pick your runtime behavior with strategies. Tune that behavior for your domain with optimizers.
Summary
| Concept | What It Is | My Term? |
|---|---|---|
| Signature | Input/output declaration | Existing |
| Predict | Atomic LM call | Existing |
| Module | Base class for composition | Existing |
| Optimizer | Compile-time tuning | Existing |
| Strategy | Module that augments Predict at runtime | Introduced |
| Program | Top-level module being optimized | Introduced |
I find these terms useful for keeping straight what operates when. Maybe DSPy will adopt them officially someday. For now, they’re just how I think about it.