Transformers | mikePietsch.com

25.7 Building a Simple Code Generator

Right, so you want to build a code generator. Not just any code generator, but one that understands the structure of the code it’s manipulating. You’re not just concatenating strings like a barbarian; you’re a sculptor, and the TypeScript Compiler API is your chisel. It’s the difference between sending a mass email and writing a personal letter. The former might get the job done, but the latter is correct, robust, and doesn’t accidentally call the recipient by the wrong name.

25.6 ts-morph: A Higher-Level API for AST Manipulation

Alright, let’s get our hands dirty. You’ve met the raw TypeScript Compiler API. It’s powerful, but let’s be honest, it feels like you’re trying to perform open-heart surgery with a rusty spoon while wearing oven mitts. The API is low-level, verbose, and requires you to constantly check the node.flags bitmask to figure out what you’re even looking at. It’s a masterpiece of engineering, but a nightmare of ergonomics. This is where ts-morph swoops in like a superhero in a nicely tailored suit. It’s a library that wraps the raw Compiler API, giving you all its power but with a sane, object-oriented, and downright pleasant interface. Instead of dealing with ts.SyntaxKind.SomeObscureEnum and ts.isCallExpression(node), you work with clear classes like CallExpression. It’s the difference between assembling a car from a pile of parts and simply turning the key.

25.5 Writing a Custom Transformer

Right, so you want to mess with the very fabric of your code as it’s being compiled. Not content with just writing TypeScript, you want to reach into the compiler’s guts and twist the knobs. I respect that. It’s how you build the next generation of linters, code formatters, and those fancy tools that feel like magic (until you have to debug them). We call this a “Custom Transformer,” and it’s your VIP pass to the Abstract Syntax Tree (AST) party. The AST is the compiler’s internal, object-oriented representation of your code. A transformer’s job is to walk through this tree, find the nodes it cares about, and then optionally replace, update, or delete them to produce a new, transformed tree. It’s like performing surgery on your code while it’s still just a thought in the compiler’s brain.

25.4 Extracting Type Information from the Type Checker

Alright, let’s get our hands dirty. You’ve got a ts.Program and its trusty sidekick, the Type Checker (ts.createTypeChecker()). This isn’t some glorified linter; this is the engine room of the entire language service. It’s the thing that actually knows what string is, why your generic is failing, and that the property you’re trying to access on that object definitely, probably, doesn’t exist. Its job is to take all those abstract syntax trees and turn them into a coherent web of types.

25.3 Walking the AST: ts.Node, ts.SyntaxKind

Alright, let’s get our hands dirty. You’ve got a TypeScript program in memory, parsed into an Abstract Syntax Tree (AST). It’s a beautiful, terrifying, and deeply nested structure of objects. Your job is to traverse it, find the bits you care about, and do something useful. This isn’t about reading the file as text; it’s about understanding its meaning programmatically. To do that, you need to know two things intimately: ts.Node and ts.SyntaxKind.

25.2 Creating a Program and Type Checker

Right, let’s get our hands dirty. You’ve parsed a file, you’ve looked at its AST, and you feel like a wizard. But a single file is a lonely island. In the real world, TypeScript understands your code by seeing the whole archipelago—every file, every dependency, every declaration. That holistic view is encapsulated in a ts.Program. This isn’t just a fancy concept; it’s the beating heart of the compiler API. It’s the in-memory representation of your entire project, and without it, you’re just playing with syntax trees in a vacuum.

25.1 Why Use the Compiler API: Linters, Codemods, Generators

Look, you don’t reach for the TypeScript Compiler API because you had a nice, normal day and thought, “You know what sounds relaxing?” You reach for it when you have a problem that can’t be solved by just writing more TypeScript. It’s the power tool for when you need to not just use the language, but understand it, manipulate it, and generate it programmatically. Think of it as the difference between driving a car and being a mechanic with a full diagnostic computer. Most of us just need to drive. But when you need to tune the engine or, heaven forbid, build a new car from scratch, you need the mechanic’s tools.

25. The TypeScript Compiler API

30. Hugging Face Transformers Integration

21. MLlib: Pipelines, Transformers, Estimators, and Feature Engineering

81.8 Datasets Library: Loading and Processing Large Datasets

Right, let’s talk about data. It’s the unglamorous, often-messy fuel for our beautiful AI models. You can have the slickest architecture ever designed, but if you feed it garbage, it will, with unwavering commitment, produce super-intelligent garbage. This is where Hugging Face’s datasets library swoops in, not just as a convenient tool, but as a full-on paradigm shift for how we handle data in Python. Forget pandas for a second—I know, it’s a lot to ask—because when your dataset is larger than your laptop’s RAM, pandas gracefully throws a MemoryError and gives up. The datasets library, by contrast, just gets started.

81.7 Pillow: Image Manipulation in Pure Python

Right, let’s talk about Pillow. You know, the friendly fork of the now-defunct Python Imaging Library (PIL). If you need to open, manipulate, and save images in Python without summoning the eldritch horrors of OpenCV’s C++ bindings, Pillow is your first, last, and best port of call. It’s not the fastest kid on the block, but it’s pure Python, beautifully straightforward, and it gets the job done. Think of it as the trusty multi-tool in your image-processing kit.

81.6 OpenCV: Reading Images, Transformations, and Feature Detection

Right, let’s talk OpenCV. It’s the grumpy, battle-hardened old wizard of computer vision. It’s not always pretty, its API is a historical record of two decades of computer science Ph.D. theses, and it will absolutely let you shoot yourself in the foot if you’re not careful. But it’s also incredibly powerful, fast, and reliable. Think of it as the C++ of vision libraries: it might not hold your hand, but it will get the job done with brutal efficiency.

81.5 Text Classification, NER, and Question Answering

Alright, let’s get our hands dirty. You’ve probably heard that NLP is “solved” thanks to these big fancy models. Spoiler alert: it’s not. But what is true is that the barrier to entry has been demolished, and you can now build shockingly powerful text applications without needing a PhD and a million-dollar GPU cluster. We’re going to walk through the three workhorses of applied NLP: classifying text, finding entities within it, and making it answer questions.

81.4 Fine-Tuning with the Trainer API

Alright, let’s get our hands dirty. You’ve probably loaded a pre-trained model and run some inference, which feels like magic for about five minutes. Then the reality sets in: this generic model doesn’t know your specific problem, your data, your life. It’s like getting relationship advice from a stranger who’s never met you or your questionable partner. Fine-tuning is how you make that generic model your brilliant, specialized colleague. The good news is that Hugging Face’s Trainer API does the heavy lifting for you. It’s a beautifully abstracted training loop that handles all the boilerplate—GPU setup, gradient accumulation, logging, checkpointing, you name it. The bad news is that this abstraction can feel like a black box if you don’t know what levers to pull. Let’s open it up.

81.3 Hugging Face Transformers: Loading Pretrained Models

Right, let’s get our hands dirty. You’ve heard the hype, you’ve seen the demos, and now you want to actually use one of these so-called “transformers.” Welcome to the main event. Hugging Face’s transformers library is the reason a lot of us can actually do this without needing a PhD and a bank loan for compute time. It’s a brilliantly engineered abstraction layer over a frankly absurd number of pretrained models. Our first job is to stop staring at the menu and actually get a model into your Python runtime.

81.2 spaCy: Industrial-Strength NLP Pipelines

Alright, let’s talk about spaCy. If NLTK is the academic’s dusty toolkit—full of interesting but often impractical prototypes—then spaCy is the mechanic’s rollaway, stocked with precisely calibrated, industrial-grade tools. It’s built for one thing: getting real work done, fast and reliably. It doesn’t mess around with theory; it loads a model and gives you a pipeline of annotations so rich and interconnected you’ll feel like you just put on night-vision goggles for your text data.

81.1 NLTK: Tokenization, Stemming, POS Tagging, and Corpora

Before we dive into the fancy deep learning stuff, we need to talk about the fundamentals. And for that, we’re going to spend some quality time with NLTK, the Natural Language Toolkit. Think of it not as the shiny new power tool, but as the rock-solid, slightly-scuffed-but-infinitely-reliable toolbox your grandpa gave you. It’s where you learn the why before you rely on the wow of modern transformers. Hugging Face’s transformers library is incredible, but it often feels like magic. NLTK is where the magicians learn how the tricks are actually done. It provides the essential utilities—tokenization, stemming, part-of-speech tagging—that are the bedrock of any NLP task, even if they’re now happening under the hood of a billion-parameter model.