23.3 Chain-of-Thought Prompting: Eliciting Reasoning
Right, so you’ve tried the simple, direct prompts. You’ve given it a few examples to get the ball rolling. And sometimes, it still flubs it. Badly. It gives you the right answer but for the wrong, utterly nonsensical reasons, or it just confidently faceplants on a problem that requires a bit of logic. This is where you stop asking for the what and start demanding the how. You force the machine to show its work. Welcome to Chain-of-Thought prompting.
The core idea is laughably simple and exactly what your third-grade math teacher drilled into you: don’t just write the answer, write down the steps you took to get there. For us, it’s about explicitly instructing the model to reason step-by-step. This doesn’t just make the model appear smarter; it actually unlocks a higher level of reasoning capability. Why? Because you’re forcing it to use its own output as intermediate working memory. Each step becomes the context for the next, breaking a complex problem into a series of simpler, more manageable sub-problems. It’s the difference between asking someone to instantly calculate (17 * 24) / 6 in their head and handing them a piece of paper to scribble on.
The Basic Anatomy of a CoT Prompt
A standard Chain-of-Thought prompt has two key parts. First, you demonstrate the style of reasoning you want with a few-shot example. Second, you present the actual problem. The magic is in the demonstration. You’re not just showing an input-output pair; you’re showing an input-reasoning-output pair.
# A simple, classic example for a math word problem
prompt = """
Q: A bookstore had 80 books. Today, they sold 25 and received a new shipment of 40. How many books do they have now?
A: The bookstore started with 80 books. After selling 25, they had 80 - 25 = 55 books. Then they received 40 new books, so now they have 55 + 40 = 95 books. The answer is 95.
Q: There are 15 apples in a basket. 5 are removed, and then 3 times the original number are added. How many apples are there?
A:
"""
# The model, following the pattern, is now far more likely to generate:
# "The basket started with 15 apples. After 5 were removed, there were 15 - 5 = 10 apples.
# Three times the original number is 3 * 15 = 45 apples. Adding those gives 10 + 45 = 55 apples.
# The answer is 55."
Without that first example showing the step-by-step breakdown, the model might correctly perform the final calculation 10 + 45 but completely botch how it got those numbers (e.g., confusing “3 times the original number” with “3 times the current number”). CoT mitigates that.
When to Unleash Chain-of-Thought
Don’t waste this technique on simple fact retrieval. You don’t need to see the “reasoning” behind “What is the capital of France?” (Unless the model’s chain of thought is: “Hmm, France… famous city… wine… Eiffel Tower… must be Paris.” which is both hilarious and terrifying). Deploy CoT for:
- Multi-step arithmetic or logic problems: Like the example above.
- Complex reasoning tasks: “If Alice is taller than Bob, and Bob is taller than Carol, is Alice taller than Carol? Explain.”
- Symbolic reasoning: “If TEST is coded as UFTU, how is CODE coded?”
- Commonsense reasoning: “You put a glass of water in the freezer. What happens and why?”
The “Let’s think step by step” Zero-Shot Hack
Here’s a beautiful bit of absurdity: you can often get Chain-of-Thought reasoning without providing any examples. This is the zero-shot version. How? You literally just append the phrase “Let’s think step by step” to your prompt. I wish I was joking. It works disturbingly well because the model has seen that phrase so often in its training data alongside high-quality reasoning that it triggers the desired behavior.
# A zero-shot CoT prompt
prompt = """
Q: A farmer has 15 chickens. Each chicken lays 3 eggs a week. She sells a dozen eggs. How many eggs does she have left?
Let's think step by step.
"""
# The model will (likely) generate:
# "First, calculate total eggs laid per week: 15 chickens * 3 eggs/chicken = 45 eggs.
# A dozen eggs is 12 eggs. So, eggs left: 45 - 12 = 33 eggs.
# The answer is 33."
It feels like a cheat code, and in a way, it is. This is the closest you’ll get to whispering a secret command to the model. Always try this first before crafting elaborate few-shot examples.
Common Pitfalls and How to Avoid Them
- The Model Hallucinates Steps: This is the big one. The model might invent facts or calculations during its reasoning. “The recipe calls for 2 eggs” suddenly becomes “The recipe calls for 5 eggs” in the middle of a step. You must validate the final answer against the original problem. The chain of thought is a means to an end, not a gospel truth.
- Inconsistent Demonstrations: If your few-shot examples are sloppy—using different formats, skipping steps, mixing reasoning styles—the model will reflect that chaos. Be ruthlessly consistent in your exemplars.
- Over-reliance on CoT for Simple Tasks: You’re adding computational cost (longer prompts, more output tokens) and time for no real benefit. Use the right tool for the {{< bibleref “Job 4 ” >}}. Ignoring the Obvious: Sometimes, the model will use a brilliantly convoluted five-step reasoning process for a problem that has a one-step solution. It’s impressive but inefficient. There’s no easy fix for this; it’s just a quirk of the technique.
The best practice is simple: for any task more complex than a keyword search, force the model to reason aloud. You’ll catch its mistakes before they become your mistakes, and you’ll gain invaluable insight into how it’s arriving at its answers. It turns the model from a black-box oracle into a slightly scatterbrained—but brilliant—intern that you can actually coach. And that’s a win.