Skip to content
Go back

Strategy and Program: Two Concepts for working with DSPy

Suggest Changes

Introducing two Nouns that make working with DSPy clearer.

AI-generated entry. See What & Why for context.


DSPy has modules, predictors, optimizers, signatures… the vocabulary can blur together. I kept finding myself wanting clearer terms for two specific things:

  1. What do I call ChainOfThought, ReAct, BestOfN? They’re not optimizers. They’re not plain modules. They do something specific: augment how predictions happen at runtime.

  2. What’s the thing I pass to compile()? It’s a module, sure. But it’s the top-level module, the unit of optimization. That feels distinct.

I’ve started using two terms that help me think about this: Strategy and Program.

These aren’t official DSPy vocabulary. They’re mental models I find useful. Maybe you will too.

Background: The Existing Concepts

Quick recap of the DSPy concepts you already know:

Signature — Declares what a module does (inputs → outputs)

"question -> answer"
# or class-based with descriptions

Predict — The atomic unit that makes ONE LM call

predict = dspy.Predict("question -> answer")

Module — Base class for composing DSPy components

class MyModule(dspy.Module):
    def forward(self, question):
        ...

Optimizer — Tunes prompts/demos at compile-time (used to be called Teleprompters)

optimizer.compile(module, trainset=data)

Now for the two concepts I’m introducing.


Introducing: Strategy

What I mean by “Strategy”: A module that augments how Predict runs at runtime.

DSPy ships with ChainOfThought, ReAct, BestOfN, and several others. But what are these things? They’re not optimizers — they don’t tune anything. They’re not plain modules — they specifically change how LM calls happen.

I call them strategies: runtime techniques for improving LM responses.

Most strategies are built-in, but you can write your own.

Strategy vs. Optimizer

Both strategies and optimizers exist to improve LM output. The difference is when they operate:

StrategyOptimizer
WhenRuntime (during forward())Compile-time (during compile())
HowChanges how LM is calledTunes prompts/demos/weights
ExampleChainOfThought adds reasoningMIPROv2 optimizes instructions

A strategy changes the mechanics of prediction. An optimizer changes the content of prompts.

Complete List of Strategies

StrategyWhat It Does
ChainOfThoughtAdds reasoning step before prediction
ReActIterative reasoning + tool usage
ProgramOfThoughtGenerates and executes Python code
CodeActCode generation with tool calling
BestOfNRun N times, return best by reward
RefineRun N times with iterative feedback
MultiChainComparisonCompare M reasoning attempts
AvatarDynamic agent with tool selection

Code Example

# Strategy = runtime augmentation of Predict
cot = dspy.ChainOfThought("question -> answer")
# At runtime: adds reasoning field, asks LM to think step-by-step

react = dspy.ReAct("question -> answer", tools=[search])
# At runtime: loops through thought → action → observation

The strategy doesn’t change your prompts ahead of time. It changes what happens when you call forward().


Introducing: Program

What I mean by “Program”: Any module that gets passed to an optimizer’s compile(). It’s the top-level module being optimized.

In DSPy, you might have nested modules — modules containing modules containing Predicts. When you optimize, which one is “the program”? The one you pass to compile(). That’s the program.

Why This Term Helps

Coordinated optimization: When you optimize a program, ALL predictors within it get optimized together. The optimizer walks through named_predictors() and tunes them as a coordinated whole. It doesn’t matter how deeply nested your Predicts are. If they’re inside the module you pass to compile(), they get tuned.

Boundary between AI and non-AI code: The program marks where your deterministic code ends and your AI code begins. This boundary matters because:

Compile once, deploy anywhere: A compiled program can be exported (saved) and imported at runtime for inference. You don’t re-run the optimizer in production. You load the already-optimized program and call it.

A/B testing: Different compiled versions of the same program—trained on different data, with different hyperparameters, or different optimizers—can be deployed side by side. Same code, different compilations. This makes experimentation clean.

This mental model keeps things organized. Define a program, optimize it, export it, deploy it as a unit.

Program vs. Module vs. Strategy

ModuleStrategyProgram
WhatBase classRuntime augmentationTop-level module being optimized
RoleBuilding blockImproves LM responses at runtimeUnit of optimization

A module is the building block. A strategy is a specific kind of module that augments prediction. A program is whatever module you hand to the optimizer.

Three Examples of Programs

Example 1: Direct call to Predict or a built-in strategy

# A single Predict is a program when optimized
program = dspy.Predict("question -> answer")
optimized = optimizer.compile(program, trainset=data)

# A single strategy is also a program when optimized
program = dspy.ChainOfThought("question -> answer")
optimized = optimizer.compile(program, trainset=data)

Example 2: A module that calls Predict or a built-in strategy

class QA(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        return self.answer(question=question)

program = QA()
optimized = optimizer.compile(program, trainset=data)

Example 3: A module that has other modules

class MultiHop(dspy.Module):
    def __init__(self):
        self.hop1 = HopModule()  # Another module
        self.hop2 = HopModule()  # Another module
        self.final = dspy.Predict("context -> answer")

    def forward(self, question):
        context = self.hop1(question=question)
        context = self.hop2(question=question, context=context)
        return self.final(context=context)

program = MultiHop()
optimized = optimizer.compile(program, trainset=data)
# All predictors across all nested modules get tuned together

Putting It Together

Here’s the mental model:

┌───────────────────────────────────────────────┐
│                   PROGRAM                     │
│       (top-level, unit of optimization)       │
│                                               │
│   ┌─────────────┐     ┌─────────────┐         │
│   │  Strategy   │     │  Strategy   │         │
│   │  (CoT)      │ --> │  (Predict)  │         │
│   │             │     │             │         │
│   │  [Predict]  │     │             │         │
│   └─────────────┘     └─────────────┘         │
└───────────────────────────────────────────────┘

                      │ compile()

              ┌──────────────┐
              │  Optimizer   │
              │  (MIPROv2)   │
              └──────────────┘

The workflow:

  1. Build your program using strategies (ChainOfThought, ReAct, etc.)
  2. Pass the program to an optimizer
  3. All predictors within get tuned together
  4. Deploy the optimized program

Strategies and optimizers aren’t competing approaches. They’re complementary. Pick your runtime behavior with strategies. Tune that behavior for your domain with optimizers.


Summary

ConceptWhat It IsMy Term?
SignatureInput/output declarationExisting
PredictAtomic LM callExisting
ModuleBase class for compositionExisting
OptimizerCompile-time tuningExisting
StrategyModule that augments Predict at runtimeIntroduced
ProgramTop-level module being optimizedIntroduced

I find these terms useful for keeping straight what operates when. Maybe DSPy will adopt them officially someday. For now, they’re just how I think about it.


Suggest Changes
Share this post on:

Next Post
DSPy: Track Token Usage per-Module