Json | mikePietsch.com

25.6 log/slog: Structured Logging Built Into the Standard Library

Finally. For years, logging in Go felt like we were all collectively duct-taping our fmt.Printf statements into something resembling a professional application. We’d bolt on logrus or zap, which are fantastic, but it created a fragmented ecosystem. The Go team, in their infinite and sometimes frustrating wisdom, decided it was time to bring order to the chaos. Enter log/slog in Go 1.21: structured logging, right there in the standard library.

25.5 Handling Dynamic JSON: json.RawMessage and map[string]any

Alright, let’s talk about the moment every Go developer faces: when your JSON isn’t a nice, tidy struct. It’s a moving target. Maybe it’s a third-party API that sends different object shapes based on some "type" field, or a config file with deeply nested, arbitrary blobs. You can’t define a static struct ahead of time, so you reach for the universal key: interface{}. Or, as it’s been mercifully renamed in Go 1.18, any.

25.4 Custom JSON Marshaling: MarshalJSON and UnmarshalJSON

Right, so you’ve hit the point where encoding/json’s default behavior just isn’t cutting it. Maybe you need to send data in a snake_case API but your structs are in Go’s CamelCase. Maybe you need to parse a date string that looks nothing like time.RFC3339. Or perhaps you need to marshal a struct into something that isn’t a JSON object for once. This is where you roll up your sleeves and implement the json.Marshaler and json.Unmarshaler interfaces. They’re your escape hatch from the library’s sometimes overly-opinionated defaults.

25.3 json.Encoder and json.Decoder: Streaming JSON

Alright, let’s get our hands dirty with the real workhorses of the encoding/json package: json.Encoder and json.Decoder. You’ve probably met json.Marshal and json.Unmarshal—they’re fine for small, self-contained jobs. But when you’re dealing with streams of data, whether it’s from an HTTP response body, a file on disk, or a network socket, the Marshal/Unmarshal duo starts to feel like using a sledgehammer to crack a nut. They need the whole nut in your hand at once.

25.2 Struct Tags: json:"name,omitempty" and json:"-"

Right, let’s talk about struct tags. You’ve probably seen these little string literals clinging to your struct fields like metadata remoras. They look like magic incantations, and honestly, they kind of are. The encoding/json package uses them to figure out how to map your beautifully named Go struct fields to the often-absurdly named keys in the JSON you’re marshaling or unmarshaling. Without them, you’re at the mercy of the encoder’s default behavior, which is about as subtle as a brick.

25.1 json.Marshal and json.Unmarshal: Basic Serialization

Alright, let’s get our hands dirty with the workhorses of Go’s JSON story: json.Marshal and json.Unmarshal. These two functions are your primary gateway between the structured, type-safe world of Go and the flexible, but loosey-goosey, world of JSON. They seem simple on the surface, but the devil—and the real power—is in the details. Think of Marshal as your meticulous packer. You give it a Go thing (a struct, a map, a slice), and it carefully wraps it up into a neat []byte parcel, ready to be shipped over the network or dumped into a file. Unmarshal is the unpacker on the other side. It takes that []byte parcel and, with a bit of guidance from you on what you expect to find inside, tries to reassemble it into a Go thing on your side.

25. encoding/json and log/slog

26.6 Modifying JSONB: jsonb_set() and the Concatenation Operator ||

Right, so you’ve got your JSONB document. It’s a beautiful, nested snowflake, perfect and pristine. And now you need to change it. Of course you do. Data isn’t a museum exhibit; it’s a living, breathing, frequently-misconfigured mess that needs constant tweaking. Let’s talk about how to perform surgery on these JSONB structures without leaving a bloody mess everywhere. The workhorse here is jsonb_set(). Don’t let the simple name fool you; it’s deceptively powerful and, like most powerful things, easy to misuse. Its job is to replace or set the value at a specified path.

26.5 jsonb_agg() and jsonb_build_object(): Building JSON in SQL

Right, so you’ve got all this lovely JSONB data sitting in your columns, and you’re probably thinking, “Great, I can query it. But how do I make it?” Because sometimes you need to assemble your own JSON structures on the fly, either to return from an API, feed into another system, or just to show off. That’s where jsonb_agg() and jsonb_build_object() come in. They’re your power tools for constructing JSON directly within a SQL statement, saving you from the nightmare of string concatenation (a path that leads only to madness and escaping errors).

26.4 JSON Path Queries with jsonpath (PostgreSQL 12+)

Alright, let’s talk about jsonpath. You’ve probably been using the -> and ->> operators to navigate your JSONB data, and they’re great for simple, straightforward paths. But what happens when your data structure gets more complex, or you need to do something more powerful, like filtering for elements within an array based on a condition? You start writing these monstrous, nested SQL expressions that are a pain to write and a nightmare to read.

26.3 GIN Indexes on JSONB: jsonb_ops vs jsonb_path_ops

Right, let’s talk about making your JSONB queries not just work, but scream. You’ve loaded up a table with a mountain of JSON documents, and you’re running WHERE data @> '{"status": "published"}'. It’s fast at first, but as your data grows, it starts to feel like wading through molasses. You’ve heard about GIN indexes, the workhorse for JSONB, but then you’re hit with a choice: jsonb_ops or jsonb_path_ops? It’s not just academic; picking the wrong one is like showing up to a Formula 1 race with a go-kart engine.

26.2 JSONB Operators: ->, ->>, #>, @>, ?, ?|, ?&

Right, let’s talk about JSONB operators. This is where you stop just storing JSON and start actually using it. Forget the clunky, string-based horror of json_extract_path_text or whatever your previous database tried to sell you. PostgreSQL gives you a proper set of tools that feel, well, like they belong in a database. They’re the difference between poking your data with a stick and wielding a lightsaber. We’ll break them down into two camps: the path navigators (who get you the data) and the existence checkers (who tell you if something’s there).

26.1 json vs jsonb: Storage and Operator Differences

Right, let’s settle this. You’ve probably already been told that jsonb is the one you should use 99.9% of the time. You nod, you move on. But I know you. You’re the kind of person who needs to know why. Because if you don’t, that 0.1% case will sneak up and bite you in production at 3 AM on a Sunday. So let’s get our hands dirty. The core difference isn’t about what they store—they both store perfectly valid JSON. It’s about how they store it. The json type stores an exact, whitespace-and-all copy of the text you put in. It’s a glorified text field with syntax validation. Need to preserve the exact textual representation for legal reasons or because some external system is ridiculously fussy? Fine, use json. For everyone else, read on.

26. JSONB: Operators, Indexing, and Querying Nested Data

29.7 Generating a search index (Lunr.js / Pagefind)

Right, so you’ve built this beautiful, content-rich site. I’m proud of you. But now your readers are going to want to find things. Scrolling through pages is for chumps and people who still use AOL dial-up. We’re building a search index. This isn’t about slapping a Google Custom Search bar on there and calling it a day. That’s lazy, and it hands over your user’s data to a third party. We’re going to build our own client-side search. It’s faster, it’s private, and it gives you total control. The two heavy hitters in this space are Lunr.js and Pagefind. I’ll show you both because they represent two very different, very valid philosophies.

29.6 AMP: Accelerated Mobile Pages Output Format

Right, AMP. Let’s have a talk. AMP, or Accelerated Mobile Pages, is Google’s well-intentioned but often controversial project to make the web faster. The idea is simple: you create a version of your page that follows a very strict, stripped-down set of HTML and CSS rules. In return, Google caches it on their servers and serves it from their own domain (usually google.com/amp/...), which makes it load near-instantly on mobile devices. It’s a classic deal with the devil: you get ludicrous speed and potential SEO benefits (it used to be a requirement for the “Top Stories” carousel), but you sacrifice a lot of control over your own content and design.

29.5 RSS Feed Customization

Right, so you want an RSS feed. Not just any feed, but your feed. One that doesn’t look like it was generated by a bored robot in 2003. Good. RSS is a cranky old standard, but it’s far from dead. It’s the un-opinionated, user-centric backbone of the independent web that refuses to die, and customizing it is a small act of rebellion. Let’s make yours brilliant. The core truth you must accept is that an RSS feed is just XML. Not scary, magical XML, but a specific, documented XML format. Your job is to generate that XML correctly. Most frameworks have some built-in RSS generator, and they’re usually… fine. But “fine” is for cowards. We’re going to bend it to our will.

29.4 Building a JSON API from Hugo Content

Right, so you want to build a JSON API with Hugo. Good choice. It’s a shockingly capable static API engine, and it saves you from having to maintain a separate database and server just to serve some structured data. We’re going to move beyond the basic json output format and build something you’d actually be happy to have a frontend app consume. First, let’s address the elephant in the room: Hugo is a static site generator. It builds files. An API typically implies dynamic requests. The key here is that we’re building a static API. All the possible data responses are pre-rendered as JSON files at build time. This is brilliant for read-only data (blog posts, product catalogs, documentation) because it’s insanely fast, secure, and cheap to host. It’s a terrible idea for anything that needs real-time, user-specific data. Don’t try to build a stock trading platform with this.

29.3 Associating Output Formats with Page Kinds

Right, so you’ve got your content. It’s beautiful. But now you need to get it out into the world in different shapes. You don’t want to just slap an .xml extension on a page and call it a day. You want /articles/my-great-post to be able to serve up its glorious HTML self, a clean JSON representation for some headless CMS nonsense, and a tidy RSS item for the three people (hi, mom!) still using feed readers. The key to this magic trick is telling your static site generator which page kinds should be able to produce which output formats. It’s about association, and Hugo handles this with a concept so simple you’ll wonder why other SSGs make it feel like rocket surgery.

29.2 Defining a Custom Output Format

Right, so you’re tired of the same old HTML and want to spit out something a bit more structured, like JSON for an API, or RSS for a feed, or maybe even something like AMP for Google’s fleeting whims. Good news: Hugo’s custom output formats are your new best friend. This is where Hugo stops being just a website generator and starts being a proper content engine. The bad news? The configuration is a bit… idiosyncratic. We’ll get to that.

29.1 Built-in Output Formats: HTML, RSS, JSON, CSV, robots.txt, sitemap

Right, let’s talk about the freebies. Hugo doesn’t make you build everything from scratch. It comes packing a set of built-in output formats that cover about 90% of what a typical website needs to do. These aren’t just afterthoughts; they’re core to its philosophy of being a full-fledged web engine, not just a fancy blog generator. We’ll get to the cool custom ones later, but first, you need to understand the tools already in your belt. The big ones are HTML, RSS, JSON, and CSV. But we’ll also touch on the two special-purpose ones: robots.txt and sitemap.xml.

29. Custom Output Formats: JSON, RSS, AMP, and More

18.7 Data-Driven Shortcodes

Right, so you’ve got your data files all set up in data/, looking clean and organized. But now you want to actually use that data in your content without copy-pasting HTML all over the place. This is where Hugo’s data-driven shortcodes come in. They’re the perfect bridge between your structured data and your unstructured content pages. Think of them as little factory functions; you feed them a key from your data, and they spit out the same complex HTML every time. It’s consistency and DRY principles for the win.

18.6 Building a Team Page from a Data File

Right, so you’ve got a list of people, and you want to build a team page without copy-pasting a mountain of HTML for every single profile. Welcome to the party. We’ve all been there, staring at a dozen nearly identical <div> blocks, knowing that adding a new team member is an exercise in tedious, error-prone repetition. This is where Hugo’s data templates and the data/ directory come in to save your sanity. Think of it as moving your content out of your templates and into a structured, easily manageable file—like a mini-database for your site.

18.5 Building a Navigation Menu from a Data File

Right, so you’ve got a site, and it has navigation. Maybe it’s a list of pages, maybe it’s categories, maybe it’s a collection of your favorite 80s action heroes. The point is, it’s data. And the moment you find yourself hard-coding a list of hrefs and labels in a layout file, a little alarm should go off in your head. You’ve just created a liability. What happens when you need to add a new section? You’re digging through baseof.html or some other template, praying you don’t mangle the markup. This is why Hugo gave us the data/ directory and data templates. We’re going to use them to build a navigation menu that you can manage with a simple text file, like a civilized person.

18.4 Fetching Remote Data: resources.GetRemote

Right, so you want to fetch some data from the internet and jam it into your Hugo site. Maybe it’s a JSON API from a third-party service, maybe it’s a CSV file you’re keeping on GitHub, maybe it’s the latest manifesto from your favorite obscure band. You’ve heard about resources.GetRemote, and you’re thinking, “Great! A simple GET request. How hard can it be?” Famous last words. Let’s pull up a chair. This is one of those Hugo features that is incredibly powerful but has more sharp edges than a bag of broken glass if you don’t know how to handle it properly. I’m here to make sure you don’t bleed all over your build script.

18.3 Nested Data Files and Directory Structure

Right, so you’ve got your data/ directory humming along nicely. You’re pulling in a single config.yaml file and feeling pretty good about yourself. I get it. But your project is growing up, and you’re starting to realize that dumping everything into one massive file is like trying to cook a five-course meal on a single burner. It’s time to get organized. This is where nesting comes in, and Hugo’s data templates are about to become your new best friend.

18.2 Accessing Data in Templates: .Site.Data

Right, let’s talk about .Site.Data. This is where Hugo stops being just a static site generator and starts feeling like a proper application framework. It’s the primary way you inject structured, non-content data into your templates. Think of it as your personal data pantry, stocked with JSON, YAML, or TOML goodies that you can pull out and use to build just about anything. The concept is brilliantly simple: you drop a data file (say, authors.json) into your data/ directory, and Hugo automatically makes it available to you at .Site.Data.authors. No import statements, no configuration, no fuss. It’s just there. This is Hugo’s data-driven design philosophy at its best—convention over configuration, working exactly as you’d hope.

18.1 Storing Data: JSON, YAML, TOML, and CSV in data/

Right, let’s talk about your data/ directory. This is Hugo’s designated “I’m not a page, I’m just data” drawer. It’s where you stash all the structured information your site needs but that doesn’t deserve (or want) its own front matter and Markdown content. Think of it as your site’s personal JSON, YAML, TOML, and CSV junk drawer—hopefully more organized than your actual junk drawer. The beauty here is that Hugo, in its infinite wisdom, automatically slurps up any file in data/ and makes it available globally in your templates under the .Site.Data variable. The name of the file becomes the top-level key. It’s dead simple and incredibly powerful.

18. Data Templates and the data/ Directory

8. Creating DataFrames from Files, RDDs, and Databases

8. Output Parsers

52.8 Choosing a Data Format for Your Use Case

The choice of a data format is a foundational architectural decision that impacts everything from application performance and interoperability to developer ergonomics and long-term maintainability. There is no universally “best” format; the optimal selection is dictated by the specific use case, the environment, and the priorities of the project. A systematic evaluation against key criteria is essential. Evaluating Key Criteria for Selection Begin by asking a series of strategic questions about your data and its lifecycle. The answers will naturally guide you toward a suitable format.

52.7 INI/CFG: configparser

While JSON, YAML, and TOML are modern favorites for configuration, the INI file format remains a stalwart in the computing world, particularly within the Python ecosystem due to its simplicity and long-standing Windows legacy. The configparser module in Python’s standard library provides a powerful and intuitive way to work with these files. It’s important to understand that configparser does not parse the Windows Registry format, but rather the classic INI style consisting of sections, properties, and values.

52.6 YAML: PyYAML and ruamel.yaml

While JSON excels as a data interchange format and TOML prioritizes configuration clarity, YAML (YAML Ain’t Markup Language) aims for a human-friendly, data-oriented serialization standard. Its minimal syntax, reliance on indentation, and support for complex data types make it exceptionally popular for configuration files (e.g., Docker Compose, Kubernetes, Ansible) and data persistence where readability is paramount. In the Python ecosystem, two libraries dominate YAML handling: the original PyYAML and its more powerful, modern fork, ruamel.yaml. Understanding the distinction between them is crucial for professional Python development.

52.5 TOML: tomllib (Python 3.11+) and tomli

The TOML (Tom’s Obvious, Minimal Language) format has gained significant traction as a configuration file format, praised for its semantic clarity and human-readability, which often positions it as a more intuitive alternative to YAML or JSON for settings. Prior to Python 3.11, developers relied on third-party libraries like toml or tomli for parsing. Recognizing this need, Python 3.11 integrated TOML parsing into the standard library with the tomllib module, which is essentially a standardized version of the excellent tomli library. This move signifies TOML’s importance in the modern Python ecosystem, particularly for tooling like pyproject.toml as defined in PEP 518.

52.4 lxml: Faster and More Powerful XML/HTML Parsing

While the standard library’s xml.etree.ElementTree module provides a capable and Pythonic way to parse XML, it can be limiting for large-scale or complex XML/HTML processing. This is where lxml enters the picture. lxml is a Python binding for the robust, industry-standard C libraries libxml2 and libxslt. It combines the ease-of-use of the ElementTree API with the speed and feature-completeness of these underlying libraries, making it the de facto choice for high-performance XML and HTML parsing in Python.

52.3 XML: ElementTree Parsing and Building

The eXtensible Markup Language (XML) provides a robust, hierarchical, and self-descriptive format for data serialization. While numerous parsing approaches exist, the xml.etree.ElementTree module in Python’s standard library offers a particularly elegant and “Pythonic” interface for both parsing existing XML documents and programmatically constructing new ones. Its name derives from its core abstraction: an XML document is treated as a tree of Element objects, where each element has a tag, attributes, a text content, and a list of child elements.

52.2 CSV: csv.reader, csv.writer, DictReader, DictWriter

The Comma-Separated Values (CSV) format is a deceptively simple text format for tabular data. Its lack of a formal standard has led to numerous dialects, making robust parsing non-trivial. Python’s csv module provides a powerful toolkit to handle these complexities, abstracting away the tedious details of string splitting and manual escaping. The module’s primary philosophy is to operate on sequences—most commonly, lists and dictionaries—treating file objects as its conduit. The csv.reader Object The csv.reader object is the foundational tool for reading CSV data. It takes an iterable (like a file object) and returns a reader object that itself iterates over the rows in the given CSV file, presenting each row as a list of strings.

52.1 JSON: json.loads, json.dumps, Custom Encoders/Decoders

The JavaScript Object Notation (JSON) format has become the lingua franca for data interchange on the web due to its simplicity, readability, and near-universal support. In Python, the json module provides a robust, if sometimes simplistic, interface for serializing and deserializing data. Its two primary workhorses are json.loads() (load string) for decoding JSON data into a Python object and json.dumps() (dump string) for encoding a Python object into a JSON-formatted string.