23.2 Few-Shot Prompting: In-Context Examples

Alright, let’s talk about giving your AI a cheat sheet. You’ve mastered the zero-shot prompt—the high-concept, one-line wonder. It works… sometimes. But when it doesn’t, you’re left with a response that’s so generic it could be used to sell insurance or describe a sunset. This is where few-shot prompting waltzes in, orders a double espresso, and gets down to business.

The core idea is laughably simple, almost stupidly so: you show the model examples of what you want before you ask it to do the real task. We call this providing “in-context examples.” It’s like teaching a brilliant but extremely literal intern. You wouldn’t just say “draft a contract”; you’d show them three examples of well-drafted contracts first and then say “okay, now do one for this new client.” That’s few-shot.

Why This Actually Works (It’s Not Magic)

Don’t be fooled by the simplicity; there’s some serious machinery at work here. You’re not “programming” the model in the traditional sense. Instead, you’re exploiting its core function: predicting the next token. By providing an example (or a few), you’re sharply narrowing down the probability space for the next response.

The model sees the pattern: [Input A] -> [Output A], [Input B] -> [Output B]. When it then sees [Your New Input], its statistical engine screams, “Aha! I know this game! The most probable next tokens are the ones that continue this pattern, not some random, generic nonsense from my training data.” You’re essentially giving it a temporary, context-specific rule to follow.

Crafting Effective Examples: The Art of the Cheat Sheet

Throwing random examples at the model is like giving that intern a manual written in Klingon. The quality and clarity of your examples are everything.

Be Specific and Contrastive: Your examples should highlight the exact type of reasoning or format you want. If you’re classifying sentiment, don’t just show positive and negative. Show a positive, a negative, and a sarcastic one where the text seems positive but the sentiment is negative. This teaches the model the nuances you care about.

Mind the Format: The model will mimic your formatting with religious fervor. If your examples use Input: and Output:, your result will probably try to generate an Output: label. If you use a Python dictionary, it’ll try to spit out JSON. This is powerful but also a common pitfall.

# A good, clear example for a translation task:
prompt = """
Translate from English to French. Follow the format of the examples.

Example 1:
Input: "Hello, world!"
Output: "Bonjour, le monde!"

Example 2:
Input: "How are you today?"
Output: "Comment allez-vous aujourd'hui?"

Now translate this:
Input: "The meeting is scheduled for tomorrow."
Output:
"""

Common Pitfalls and How to Avoid Them

The Overkill Paradox: More examples are usually better, right? Not always. There’s a point of diminishing returns. Each example consumes precious context window tokens. Sometimes, two perfect examples are better than five mediocre ones. Start with 1-3 and test if more actually improves the output.
Learning the Wrong Lesson: The model is a pattern-matching savant, even when the pattern is stupid. If all your examples for a “summarize this text” task are three sentences long, the model might start outputting three-sentence summaries regardless of the content. It learned to count sentences, not to summarize. Ensure your examples demonstrate the underlying principle, not just a superficial trait.
Ignoring Label Consistency: This one will bite you. In a classification task, make sure your labels are perfectly consistent. If you use “Positive” in one example and “positive” (lowercase) in another, you’re introducing ambiguity. The model might generate “PosITIVE” just to split the difference. Pick a style and stick to it religiously across all examples.

When to Break the Chain (of Thought)

Few-shot is your go-to for tasks where the format or style of the output is critical. But for highly complex reasoning, even a few examples might not be enough. The model might see what the answer is but not how to get there. This is where you might graduate to Chain-of-Thought prompting, which we’ll cover next. It’s like few-shot, but for the reasoning process itself.

So the next time your zero-shot prompt returns something unusable, don’t just yell at the model. Give it a blueprint. Show it, don’t just tell it. It’s not cheating; it’s communicating with a trillion-parameter statistical engine on its own terms. And it works.