Tool-Use | mikePietsch.com

30.8 Constitutional AI and Claude's Safety Philosophy

Right, let’s talk about the elephant in the server room: safety. You’re probably thinking, “Great, another lecture about how my AI might go rogue.” But stick with me. This isn’t about shackling creativity; it’s about building a system that’s robust, reliable, and doesn’t accidentally suggest you add bleach to your pasta sauce for extra flavor (yes, that’s a real thing people have gotten from less careful models). Anthropic’s approach, Constitutional AI, is genuinely clever. Instead of just trying to patch bad behavior after the fact with a mountain of filters (a losing battle), they baked the principles directly into Claude’s training. Think of it less like a stern parent and more like a personal constitution—a core set of rules and principles the model uses to govern its own responses. It’s a system of self-critique and revision.

30.7 The Files API and Batch Processing

Right, let’s talk about moving beyond one-off chat completions. The Files API and batch processing are where you stop asking Claude to solve a single riddle and start putting it on a factory line to solve a thousand. It’s the difference between a scalpel and a conveyor belt of scalpel-wielding robots. The core idea is brutally simple: you upload your data, you tell Claude what to do with each piece of it, and you get back a neatly packaged JSONL file with all the results. No more babysitting individual API calls.

30.6 Vision: Analyzing Images and Documents

Right, let’s talk about getting Claude to open its eyes. The Vision API isn’t just about slapping an image into a prompt and hoping for the best. It’s about giving Claude a new pair of glasses and teaching it how to read the fine print. The magic here is that you can toss almost any common image or document format (JPEG, PNG, PDF, DOCX, you name it) at the model and it will not only see the pixels but understand the content. This is where we move from a fancy chatbot to a genuine analysis engine.

30.5 Prompt Caching: Reducing Latency and Cost for Long Contexts

Right, let’s talk about one of the most powerful features you probably didn’t know you needed: prompt caching. Think of it like this. You’re sending a massive 100k context prompt to Claude, maybe a huge document for analysis followed by a set of instructions. You do this repeatedly in a loop, or across different user sessions. Every single time, you’re paying to process that entire document and you’re waiting for every single token to be processed. It’s like photocopying a book every time you want to ask a question about a specific page. It’s absurdly wasteful. Prompt caching is the three-hole punch and binder that fixes this.

30.4 Extended Thinking: Eliciting Deep Reasoning

Alright, let’s get into the real magic: making Claude think. Not just the surface-level, pattern-matching stuff, but the deep, chain-of-thought reasoning that makes this model feel so different. This isn’t about tricking the API; it’s about structuring your conversation to properly utilize the architecture it was built on. The Power of the Scratchpad The single most effective technique for eliciting deep reasoning is to explicitly give Claude a workspace. Think of it like this: you’re asking a brilliant colleague a hard question. They don’t just stare into the middle distance and then blurt out a perfect answer. They grab a whiteboard, jot down facts, reason step by step, cross things out, and then present their conclusion.

30.3 Tool Use: Defining Tools and Handling Tool Results

Right, let’s get our hands dirty with Claude’s Tool Use. This isn’t just about making an API call; it’s about teaching Claude to be your highly capable, slightly pedantic, software intern. The core idea is simple: you define functions (tools) that Claude can call, and it returns the results to you. But the devil, as always, is in the details. Defining Your Tools: The Schema is the Contract First, you need to tell Claude what tools are available. You do this by passing a list of tool definitions in the tools parameter. Each tool is a JSON schema that describes a function. This schema is your contract with Claude. Be excruciatingly specific here. Vague contracts lead to confused AIs.

30.2 Messages API: System Prompts, Human and Assistant Turns

Alright, let’s talk about the Messages API. This is the core of how you actually talk to Claude. Forget the old, simplistic “single prompt” model. This is a conversation, and getting that right is 90% of the battle. The API is built around a messages array, where each object represents a turn in the conversation. It feels natural because it is natural—it’s how we communicate. The Three Roles: System, Human, Assistant Every message in the array must have a role property. This isn’t just bureaucratic labeling; it tells Claude who is speaking and, more importantly, how to interpret that text. There are three roles, and they are not created equal.

30.1 Claude Model Family: Haiku, Sonnet, and Opus

Alright, let’s talk about the main event: the Claude model family. You’ve got three options—Haiku, Sonnet, and Opus—and your job is to know which one to pick for the task at hand. This isn’t just about picking the “smartest” one; it’s about picking the right tool. Using Opus to summarize a tweet is like using a particle accelerator to crack a walnut. Impressive? Sure. A grotesque waste of resources? Absolutely.

30. Anthropic Claude API: Prompting, Tools, and Caching

28.8 Agent Evaluation and Safety

Alright, let’s get real about agent evaluation and safety. This isn’t some academic footnote; it’s the difference between building a useful assistant and unleashing a digital Rube Goldberg machine that accidentally spends your entire AWS budget on cat food subscriptions. We’re not just teaching agents to use tools; we’re teaching them to use them responsibly. This is where the rubber meets the road, or more accurately, where the LLM meets the API that can actually change things in the real world.

28.7 AutoGen and CrewAI: Multi-Agent Frameworks

Right, so you’ve got your single agent doing its ReAct thing, calling a tool, and feeling pretty clever. But let’s be honest, most real-world problems aren’t solved by one brilliant mind working in isolation. They’re solved by a group of specialists, some arguing, some delegating, and at least one making the coffee. Welcome to the wonderfully chaotic world of multi-agent systems. Frameworks like AutoGen and CrewAI exist to manage this chaos for you. They provide the scaffolding to define different agent personas, give them specific tools, and—most importantly—orchestrate the conversation between them. Think of it as being a director for a play where the actors are LLM instances and they’re all prone to going wildly off-script.

28.6 Multi-Agent Systems: Collaboration, Competition, and Communication

Right, so you’ve got your single agent doing its ReAct thing, using tools, feeling pretty clever. But let’s be honest, most real-world problems aren’t solved by a single brilliant mind working in isolation. They’re solved by teams, committees, and groups of specialists who (ideally) collaborate, (sometimes) bicker, and (occasionally) produce something greater than the sum of their parts. Welcome to multi-agent systems, where we take that single-agent brain and copy-paste it a few times to see what beautiful—or horrifying—chaos ensues.

28.5 Planning Agents: MRKL, Toolformer, HuggingGPT

Alright, let’s get our hands dirty with planning agents. You’ve seen the basic ReAct loop, which is like a friend who thinks out loud before doing something. Planning agents are that friend on a triple espresso, with a whiteboard and a disturbingly detailed Gantt chart. They don’t just plan the next action; they plan a whole sequence of them, often breaking your big, scary problem into smaller, chewable pieces before they even reach for a single tool.

28.4 Memory in Agents: Short-Term, Long-Term, Episodic

Right, let’s talk about memory. Because without it, your AI agent is just a glorified, one-shot API call with amnesia. It’s the difference between a colleague who remembers the entire project history and a new intern you have to re-introduce yourself to every single morning. The core problem is context windows. LLMs have a shockingly short attention span. You’re basically trying to fit the entire plot of War and Peace into a tweet. We combat this with a strategy you’re already familiar with: not remembering everything, but remembering the right things. We break it down into three key types.

28.3 Tool Use: Function Calling and MCP

Right, let’s talk about getting these LLMs to actually do things. You see, an AI that can only talk is like a brilliant philosopher locked in a sensory deprivation tank. They can reason about the world, but they can’t interact with it. Their knowledge is frozen in time, limited to their training data. They can’t tell you the weather, can’t look up your latest database entry, and can’t book you a flight to Tahiti. This is where Tool Use, often called Function Calling, comes in. It’s the mechanism we use to give our boxed-in intellects a set of hands.

28.2 ReAct: Reasoning + Acting in Interleaved Steps

Right, let’s talk about ReAct. You’ve probably hit the wall with standard LLM prompting. You ask a question, it gives you an answer that sounds plausible but is, in fact, a beautiful and confident hallucination. It’s like asking for directions from a poet. ReAct is our first solid attempt to fix that by giving the model a way to do things to find the answer, not just make one up.

28.1 What Is an AI Agent? Perception, Planning, Action

Right, let’s cut through the marketing fluff. When I say “AI agent,” I’m not talking about a chrome-plated automaton that’s going to file your TPS reports. At its core, an agent is just a program that doesn’t just think—it does. It takes a high-level goal from you, like “find the best price for a new graphics card,” and breaks it down into a series of steps, using tools (like a web browser or a calculator) to execute them. It’s the difference between a student who memorizes the textbook and one who actually knows how to use the library, the lab, and a decent search engine.