25.2 LLM and ChatModel Wrappers

Right, so you want to talk to an LLM. Your first instinct might be to just import openai and start firing off HTTP requests. Don’t. That’s how you end up with a spaghetti code monster of API keys, retry logic, and output parsing that’ll haunt your dreams. LangChain’s first and most fundamental gift to you is the LLM and ChatModel wrappers. Think of them as your brilliant, slightly pedantic assistant who handles the tedious bits so you can focus on the actual logic.

These wrappers exist primarily to give you a consistent, unified interface to what is, behind the scenes, a chaotic menagerie of different APIs. Whether it’s OpenAI, Anthropic, Cohere, or some open-weight model you’re running on your own laptop in a closet, you talk to them all the same way. This is a bigger deal than it sounds. It means you can swap your model provider without rewriting your entire application—a lifesaver when GPT-4 is having a bad day and you need to quickly pivot to Claude.

The Two Flavors: LLM vs. ChatModel

This is the first thing you need to grok. LangChain draws a crucial, albeit sometimes blurry, distinction between two types of models:

LLM: This is for your standard, plain-text completion models. You give it a string prompt, it gives you a string completion. It’s the digital equivalent of a very smart, very verbose autocomplete. Think the older text-davinci-003 model or a raw open-source model like Llama.
ChatModel: This is for models that are explicitly designed for back-and-forth conversation, like gpt-4-turbo or claude-3-opus. These expect structured messages rather than a raw string.

Why the distinction? Because chat models are trained with special tokens that separate system instructions, human queries, and AI responses. Using the ChatModel wrapper forces you to structure your input in a way the model expects, which almost always leads to better, more predictable results. Unless you’re doing something extremely simple, you’ll probably use ChatModel 95% of the time.

Talking to a Completion Model (LLM)

Let’s start with the simpler, older way. You instantiate an LLM wrapper with your chosen provider and parameters. The main method here is .invoke()—a nice, general term for “go do the thing.”

from langchain.llms import OpenAI

# This is the old-school way. It still works, but you'll mostly see ChatModels now.
llm = OpenAI(model="gpt-3.5-turbo-instruct")  # Note: This is the COMPLETION endpoint, not chat.

prompt = "Explain the concept of recursion to a five-year-old in one sentence."
completion = llm.invoke(prompt)

print(completion)
# Output: "Recursion is like a story that tells itself inside itself until the story is done."

Simple, right? But notice the model name. Using the completion endpoint of gpt-3.5-turbo is a bit like using a sports car to haul gravel. It works, but it’s not what it was built for.

The Right Way: Using a ChatModel

This is where we move to structured messages. Instead of a raw string, you pass a list of message objects. The most important ones are HumanMessage (you), AIMessage (the model), and SystemMessage (the context-setting overlord).

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

chat = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0.7)

messages = [
    SystemMessage(content="You are a cynical, world-weary detective from a 1940s film noir. Answer all questions in character."),
    HumanMessage(content="Who stole the moon?")
]

response = chat.invoke(messages)
print(response.content)
# Output: "The moon? That dame's been gone for weeks. I got a tip from a stoolie down at the docks... said he saw a slick character in a pinstripe suit loading it onto a freighter headed for Timbuktu. Follow the money, see? It's always about the money."

The SystemMessage is your secret weapon. It’s how you set the personality, tone, and rules of engagement for the model without cluttering up the actual user’s question. This is a best practice you should always use. The .invoke() method returns an AIMessage object, so you almost always want .content to get the actual string out.

Beyond `.invoke()`: Batch and Stream

The .invoke() method is for one-off requests. But what if you have a list of 100 prompts to run? You don’t want to wait for each one sequentially. That’s what .batch() is for.

# A list of prompts for batch processing
questions = [
    "What's the capital of France?",
    "What's the capital of Germany?",
    "What's the capital of Nepal?"
]

# Create a HumanMessage for each question
human_messages = [[HumanMessage(content=q)] for q in questions]

# .batch() takes a list of inputs (which are themselves lists of messages)
responses = chat.batch(human_messages)

for response in responses:
    print(response.content)
# Output: Paris, Berlin, Kathmandu

And for user experience, you never want them staring at a spinning wheel for 10 seconds. You want words to appear as they’re generated. That’s streaming.

for chunk in chat.stream(messages):
    print(chunk.content, end="", flush=True)
# Output streams word-by-word: "The... moon?... That... dame's... been..."

The Rough Edges and Pitfalls

Now, the real talk. This abstraction is great, but it’s leaky.

Provider Quirks: Every model has its own weird parameters. temperature and max_tokens are mostly standard, but what if you want to set Anthropic’s max_tokens_to_sample or OpenAI’s presence_penalty? You have to drop down to the provider-specific model_kwargs parameter. It works, but it breaks the beautiful abstraction.
```
chat = ChatOpenAI(model_name="gpt-4", model_kwargs={"top_p": 0.95, "presence_penalty": 0.6})
```
The AIMessage Rabbit Hole: The response isn’t a string; it’s an object. This is powerful later when we deal with chains (because chains pass entire messages around for context), but for now, forgetting to call .content is a rite of passage. You’ll do it at least once.
Costly Mistakes: A naive batch() call with 1000 prompts will fire off 1000 API calls immediately. If you’re not careful with your billing limits, you might accidentally fund OpenAI’s next data center. Always test with small batches first.

The wrapper doesn’t make the underlying models any less stochastic or confusing. It just gives you a cleaner, more robust way to manage the conversation. It’s the foundation. Everything else in LangChain—the chains, the agents, the memory—is built on top of this simple idea of invoking a model with a list of messages. Master this, and the rest becomes a lot less scary.

The Two Flavors: LLM vs. ChatModel

Talking to a Completion Model (LLM)

The Right Way: Using a ChatModel

Beyond .invoke(): Batch and Stream

The Rough Edges and Pitfalls

Beyond `.invoke()`: Batch and Stream