81.5 Text Classification, NER, and Question Answering

Alright, let’s get our hands dirty. You’ve probably heard that NLP is “solved” thanks to these big fancy models. Spoiler alert: it’s not. But what is true is that the barrier to entry has been demolished, and you can now build shockingly powerful text applications without needing a PhD and a million-dollar GPU cluster. We’re going to walk through the three workhorses of applied NLP: classifying text, finding entities within it, and making it answer questions.

The Basics: Pipelines are Your Best Friend

Hugging Face’s transformers library is brilliant because it understands you don’t want to mess with model architectures, loss functions, or tokenizer vocabularies. You just want to throw text at a wall and see what sticks. That’s where the pipeline function comes in. It’s a one-liner that handles the entire process: tokenization, model inference, and output formatting. It’s the duct tape and WD-40 of NLP, and I mean that as the highest compliment.

Think of a pipeline as a pre-configured factory. You tell it the task (“text-classification”, “ner”, “question-answering”), and it pulls a sensible default model, knows how to tokenize the input for that model, runs the prediction, and gives you back a clean, usable result. It’s the fastest way to go from zero to useful.

from transformers import pipeline

# Zero to sentiment analysis in one line. It doesn't get easier than this.
classifier = pipeline("text-classification")
result = classifier("I'm not just saying this, but this library is fantastic!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]

See? The model is confident. And rightly so. The key takeaway here is the score. Never, ever just take the label. The score tells you the model’s confidence, which is crucial for understanding when it’s borderline and probably wrong. A “POSITIVE” with a score of 0.51 is basically the model shrugging and guessing.

Diving into Named Entity Recognition (NER)

NER is the task of finding and categorizing real-world objects in text: people, organizations, locations, etc. It’s incredibly useful for pulling structured data out of unstructured text, like scraping news articles or processing legal documents.

The pipeline for this is just as simple, but the output is more interesting.

ner_pipeline = pipeline("ner")
text = "My name is John Doe and I work for Apple in Cupertino."
entities = ner_pipeline(text)
print(entities)

Now, you might get an output that looks like a list of messy dictionaries. This is one of those rough edges. The model predicts tokens, not always whole words. “Apple” might be one token, but “Cupertino” could be split into ['Cu', '##per', '##tino']. The pipeline, by default, gives you the raw model output. For a cleaner, aggregated view, you often have to post-process it yourself to merge those subwords. Hugging Face has a group_entities option that helps, but it’s a good example of the kind of thing you need to be aware of. The model isn’t thinking in words; it’s thinking in tokens, and that abstraction leaks.

The Power of Question Answering

This one feels like magic the first time you see it. You provide a context (a text blob) and a question, and the model finds the answer within that context. It’s not doing open-world knowledge retrieval; it’s literally reading comprehension. This is the tech behind everything from automated support systems to quickly querying documents.

qa_pipeline = pipeline("question-answering")
context = """
Hugging Face is a company based in New York City. It is focused on artificial intelligence and natural language processing. The company is best known for its Transformers library, which provides thousands of pre-trained models for NLP tasks.
"""
question = "Where is Hugging Face based?"
answer = qa_pipeline(question=question, context=context)
print(f"Answer: '{answer['answer']}', score: {answer['score']:.4f}")
# Answer: 'New York City', score: 0.9987

Why is this so effective? The model is trained to find a “start span” and an “end span” in the context that best answers the question. It’s not generating new text; it’s highlighting the relevant part. This is why the score is, again, critical. A low score means it couldn’t find a good span, indicating the answer probably isn’t in the context you provided. This is your first line of defense against the model hallucinating an answer—it can only “make up” an answer by choosing a span that doesn’t actually answer the question, but it can’t fabricate text whole cloth in this setup.

Best Practices and Pitfalls

Mind the Context Length: Every model has a maximum context length (often 512 tokens). If your text is longer, the pipeline will silently truncate it. You’ll lose information. For long documents, you need a strategy: split the doc into chunks and process them separately, then aggregate the results. It’s a pain, but there’s no way around it.
Picking the Right Model: The default pipelines use a general-purpose model. For better performance, specify one fine-tuned for your domain. Working with legal text? Use "legal-ner" or a similar model from the Hub. The pipeline function lets you specify a model parameter. Use it.
```
# Better than the default for some tasks?
specific_ner = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")
```
The Confidence Score is Your Co-pilot: Build your application logic around the score. Set confidence thresholds. If an entity has a score below 0.9, maybe flag it for human review. If a QA score is below 0.5, respond with “I couldn’t find a confident answer in the document.” Ignoring this is the number one reason for janky, unreliable NLP demos that turn into production nightmares.
Hardware Matters: These models are big. Running them on a laptop CPU is slow. For real work, you need a GPU. The pipeline will automatically use a GPU if it finds one (device=0). If you’re serious about this, get access to one, even if it’s just a Colab notebook. The speed difference is night and day.