30.4 Extended Thinking: Eliciting Deep Reasoning

Alright, let’s get into the real magic: making Claude think. Not just the surface-level, pattern-matching stuff, but the deep, chain-of-thought reasoning that makes this model feel so different. This isn’t about tricking the API; it’s about structuring your conversation to properly utilize the architecture it was built on.

The Power of the Scratchpad

The single most effective technique for eliciting deep reasoning is to explicitly give Claude a workspace. Think of it like this: you’re asking a brilliant colleague a hard question. They don’t just stare into the middle distance and then blurt out a perfect answer. They grab a whiteboard, jot down facts, reason step by step, cross things out, and then present their conclusion.

You are that colleague’s manager. Your job is to point at the whiteboard and say, “Use that.” The technical term for this is “chain-of-thought” (CoT) prompting, and it’s not just a Claude thing—it’s a fundamental way to boost reliability in almost all LLMs. But Claude’s 100k context window is basically a convention-center-sized whiteboard, so we might as well use it.

Here’s the dead-simple format. You literally just tell it to think step by step.

import anthropic

client = anthropic.Anthropic(api_key="your_api_key")

prompt = """Human: You are an expert logician. Solve the following problem: A farmer has chickens and goats. He counts 36 heads and 100 feet. How many chickens and how many goats does he have?

Please reason step by step in a structured way before giving your final answer.

Assistant: Okay, let's think through this step by step:

1. First, I need to define variables. Let 'c' be the number of chickens and 'g' be the number of goats.
2. Each animal has one head, so the equation for heads is: c + g = 36.
3. Chickens have 2 feet each, goats have 4 feet each. So the equation for feet is: 2c + 4g = 100.
4. I can simplify the feet equation by dividing everything by 2: c + 2g = 50.
5. Now I have two equations:
   - Equation 1: c + g = 36
   - Equation 2: c + 2g = 50
6. If I subtract Equation 1 from Equation 2, I get: (c + 2g) - (c + g) = 50 - 36 -> g = 14.
7. So, there are 14 goats.
8. Plugging that back into Equation 1: c + 14 = 36 -> c = 22.
9. Therefore, there are 22 chickens and 14 goats.

Let me verify: 22 chickens have 44 feet. 14 goats have 56 feet. Total feet: 44 + 56 = 100. Total heads: 22 + 14 = 36. Perfect.

Final Answer: The farmer has 22 chickens and 14 goats.
"""

message = client.messages.create(
    model="claude-3-sonnet-20240229",
    max_tokens=500,
    temperature=0, # Keep it deterministic for a problem like this
    system="You are a meticulous and careful problem solver.",
    messages=[{"role": "user", "content": prompt}]
)
print(message.content[0].text)

Without that instruction, Claude might just jump to the answer. By forcing the scratchpad, you get visibility into the process, which is invaluable for debugging its logic and your own prompt. If the final answer is wrong, you can often see exactly where it went off the rails in the reasoning steps.

Structuring Complex Multi-Step Problems

For more complex tasks—like analyzing a codebase, comparing business strategies, or deconstructing a philosophical argument—you need a more robust structure. This is where you move from a simple scratchpad to a full framework.

The key is to break the problem down into discrete phases within your prompt. Tell it what each phase is for.

complex_prompt = """Human: Analyze the feasibility of building a new email client in 2024. Your analysis should be structured in the following phases:

<phase_1>
**Market Landscape**: Identify the key competitors (e.g., Gmail, Outlook, Superhuman) and their dominant features. Also note any recent successful entrants.
</phase_1>

<phase_2>
**Technical Hurdles**: Consider the major technical challenges, such as email protocol limitations (SMTP, IMAP), spam filtering, and cross-platform synchronization.
</phase_2>

<phase_3>
**Monetization & Differentiation**: Propose potential business models and suggest what a new client would need to offer to compellingly differentiate itself.
</phase_3>

<phase_4>
**Synthesis & Recommendation**: Weigh the factors from the previous phases and provide a final reasoned recommendation on whether this is a viable venture.
</phase_4>

Please proceed phase by phase, providing a detailed analysis for each. Conclude with your final recommendation.

Assistant: Understood. I will analyze the feasibility of a new email client venture by breaking it down into the requested phases.
"""

This structure does two things brilliantly: it forces Claude to tackle the problem in a logical order, preventing it from jumbling all the concepts together, and it gives you, the human, clear hooks to correct or steer the response if it starts to go astray. You can say, “I like your Phase 1, but in Phase 2, you completely forgot to discuss the cost of data centers.” It creates a shared language for your collaboration.

Common Pitfalls and How to Avoid Them

First, and this is the biggest one: don’t be vague in your instructions. Telling Claude to “think carefully” is like telling a intern to “do a good job.” It’s useless. Be specific. “Reason step by step,” “List your assumptions,” “Consider the counterargument,” “Break the problem into three parts.” Specificity is the fuel of good reasoning.

Second, mind your token budget. That glorious 100k context is a finite resource. If you ask for a 10,000-token scratchpad for a simple math problem, you’re wasting tokens (and money) that could be used for actual reasoning. Scale your “thinking” space to the problem. A paragraph might suffice for logic puzzles; a full essay might be needed for ethical dilemmas.

Finally, remember the temperature. For tasks where accuracy is paramount (temperature=0), the scratchpad will yield a deterministic, reliable result. If you crank up the temperature to get more creativity, you’re also randomizing its reasoning process. This can lead to wonderfully novel ideas but also to spectacularly wrong conclusions. It’s a trade-off. Use high temperature for brainstorming and low temperature for verification.

This isn’t a party trick. It’s the core of working effectively with Claude. You’re not just prompting; you’re architecting a thought process. And when you get it right, the results are frankly spooky.