Yaml | mikePietsch.com

18.7 Data-Driven Shortcodes

Right, so you’ve got your data files all set up in data/, looking clean and organized. But now you want to actually use that data in your content without copy-pasting HTML all over the place. This is where Hugo’s data-driven shortcodes come in. They’re the perfect bridge between your structured data and your unstructured content pages. Think of them as little factory functions; you feed them a key from your data, and they spit out the same complex HTML every time. It’s consistency and DRY principles for the win.

18.6 Building a Team Page from a Data File

Right, so you’ve got a list of people, and you want to build a team page without copy-pasting a mountain of HTML for every single profile. Welcome to the party. We’ve all been there, staring at a dozen nearly identical <div> blocks, knowing that adding a new team member is an exercise in tedious, error-prone repetition. This is where Hugo’s data templates and the data/ directory come in to save your sanity. Think of it as moving your content out of your templates and into a structured, easily manageable file—like a mini-database for your site.

18.5 Building a Navigation Menu from a Data File

Right, so you’ve got a site, and it has navigation. Maybe it’s a list of pages, maybe it’s categories, maybe it’s a collection of your favorite 80s action heroes. The point is, it’s data. And the moment you find yourself hard-coding a list of hrefs and labels in a layout file, a little alarm should go off in your head. You’ve just created a liability. What happens when you need to add a new section? You’re digging through baseof.html or some other template, praying you don’t mangle the markup. This is why Hugo gave us the data/ directory and data templates. We’re going to use them to build a navigation menu that you can manage with a simple text file, like a civilized person.

18.4 Fetching Remote Data: resources.GetRemote

Right, so you want to fetch some data from the internet and jam it into your Hugo site. Maybe it’s a JSON API from a third-party service, maybe it’s a CSV file you’re keeping on GitHub, maybe it’s the latest manifesto from your favorite obscure band. You’ve heard about resources.GetRemote, and you’re thinking, “Great! A simple GET request. How hard can it be?” Famous last words. Let’s pull up a chair. This is one of those Hugo features that is incredibly powerful but has more sharp edges than a bag of broken glass if you don’t know how to handle it properly. I’m here to make sure you don’t bleed all over your build script.

18.3 Nested Data Files and Directory Structure

Right, so you’ve got your data/ directory humming along nicely. You’re pulling in a single config.yaml file and feeling pretty good about yourself. I get it. But your project is growing up, and you’re starting to realize that dumping everything into one massive file is like trying to cook a five-course meal on a single burner. It’s time to get organized. This is where nesting comes in, and Hugo’s data templates are about to become your new best friend.

18.2 Accessing Data in Templates: .Site.Data

Right, let’s talk about .Site.Data. This is where Hugo stops being just a static site generator and starts feeling like a proper application framework. It’s the primary way you inject structured, non-content data into your templates. Think of it as your personal data pantry, stocked with JSON, YAML, or TOML goodies that you can pull out and use to build just about anything. The concept is brilliantly simple: you drop a data file (say, authors.json) into your data/ directory, and Hugo automatically makes it available to you at .Site.Data.authors. No import statements, no configuration, no fuss. It’s just there. This is Hugo’s data-driven design philosophy at its best—convention over configuration, working exactly as you’d hope.

18.1 Storing Data: JSON, YAML, TOML, and CSV in data/

Right, let’s talk about your data/ directory. This is Hugo’s designated “I’m not a page, I’m just data” drawer. It’s where you stash all the structured information your site needs but that doesn’t deserve (or want) its own front matter and Markdown content. Think of it as your site’s personal JSON, YAML, TOML, and CSV junk drawer—hopefully more organized than your actual junk drawer. The beauty here is that Hugo, in its infinite wisdom, automatically slurps up any file in data/ and makes it available globally in your templates under the .Site.Data variable. The name of the file becomes the top-level key. It’s dead simple and incredibly powerful.

18. Data Templates and the data/ Directory

6.8 The Page Object: Accessing Every Field in Templates

Right, let’s talk about the Page object. It’s the single most important variable in your Hugo templates, the key to the kingdom, the master switchboard. When you see .Site or .Title in a template, you’re tapping into this object. It’s Hugo’s way of taking all the disparate data you’ve slung into YAML, TOML, or JSON front matter and squashing it into one beautifully convenient, if occasionally quirky, Go structure for you to play with.

6.7 Cascade: Inheriting Front Matter Down a Section Tree

Right, so you’ve got your site structured into sections. Maybe you have a posts/ directory and a projects/ directory. You’ve painstakingly set the author and layout in the front matter of every single file. It works, but it feels… repetitive. And you’re right, it is. We’re programmers. We hate repetition. This is where Hugo’s front matter cascade comes in, a feature so powerful it feels like you’re getting away with something. The core idea is simple: you can define default values for front matter in your content directory structure itself, and those values will cascade down through that section of your content tree. It’s inheritance, but for your metadata.

6.6 Custom Params: Arbitrary Key-Value Pairs for Templates

Now, let’s talk about the secret sauce, the duct tape, and the junk drawer of your Hugo site: Custom Params. This is where you get to attach your own arbitrary key-value data to pretty much anything—your site, your sections, and most importantly, your content pages. It’s the mechanism that lets you move beyond the standard front matter fields and build truly dynamic templates. The concept is brilliantly simple. In your front matter, under a top-level key called params, you can define any custom data you want. Hugo doesn’t care what you put in there; it just collects it all and makes it available for you in your templates. It’s your personal storage locker for template variables.

6.5 Layout and Type Fields: type, layout

Alright, let’s get our hands dirty. Before you can build anything, you need to tell Cargo what you’re building. Is it a library? A binary? A terrifyingly complex workspace with a dozen sub-crates? This is where type and layout come in, and they are the bedrock of your Cargo.toml. Get these wrong, and nothing works. Get them right, and you’ve laid a solid foundation. The package.type Field: What Are You, Anyway? The type field, specified under the [package] table, is the single most important piece of information you’ll declare. It tells the Rust compiler and Cargo the fundamental nature of your project’s output. Forget this, and Cargo will make an assumption—an assumption you probably won’t like.

6.4 Ordering Fields: weight, lastmod, publishDate, expiryDate

Right, let’s talk about order. Your content is a pile of brilliant ideas, and by default, Hugo will list them in a way that makes sense only to a computer (alphabetical by filename, if you’re curious). We’re not computers. We want to control the narrative. Hugo gives you a delightful little toolbox of front matter fields to impose your will upon this chaos. Let’s break them down, from the most blunt instrument to the most nuanced.

6.3 Taxonomy Fields: tags, categories, and Custom Taxonomies

Alright, let’s talk taxonomy. You’re going to be using these. A lot. The whole point of a CMS like Hugo is to structure your content so you can, you know, find it again. Taxonomies are your primary tool for that. They’re the index cards for your digital library, and if you do them right, you can find anything in seconds. Do them wrong, and you’re just piling metadata in a corner for the digital mice to nibble on.

6.2 Core Fields: title, date, draft, description, summary

Alright, let’s get our hands dirty. Before you can make your site do backflips, you have to tell it how to stand up. That’s what this front matter is for. Think of it as the instruction sheet you slap on top of your content—a quick, machine-readable note that says, “Hey, Hugo, process this one like that.” We’re going to focus on the absolute non-negotiables, the fields you’ll use on nearly every single piece of content you create. Get these wrong, and the whole operation goes sideways.

6.1 Front Matter Formats: YAML (---), TOML (+++), JSON ({})

Right, let’s talk about the three ways you can tell your static site generator (or any other tool) what your content is about before it even reads the first paragraph. We call this “front matter,” and it’s the little data packet that lives at the top of your content files. It’s where you set the title, the publish date, the tags, and whatever else your heart desires. The format you choose says a lot about you, and at least two of these choices are correct.

6. Front Matter: YAML, TOML, JSON, and Every Field

52.8 Choosing a Data Format for Your Use Case

The choice of a data format is a foundational architectural decision that impacts everything from application performance and interoperability to developer ergonomics and long-term maintainability. There is no universally “best” format; the optimal selection is dictated by the specific use case, the environment, and the priorities of the project. A systematic evaluation against key criteria is essential. Evaluating Key Criteria for Selection Begin by asking a series of strategic questions about your data and its lifecycle. The answers will naturally guide you toward a suitable format.

52.7 INI/CFG: configparser

While JSON, YAML, and TOML are modern favorites for configuration, the INI file format remains a stalwart in the computing world, particularly within the Python ecosystem due to its simplicity and long-standing Windows legacy. The configparser module in Python’s standard library provides a powerful and intuitive way to work with these files. It’s important to understand that configparser does not parse the Windows Registry format, but rather the classic INI style consisting of sections, properties, and values.

52.6 YAML: PyYAML and ruamel.yaml

While JSON excels as a data interchange format and TOML prioritizes configuration clarity, YAML (YAML Ain’t Markup Language) aims for a human-friendly, data-oriented serialization standard. Its minimal syntax, reliance on indentation, and support for complex data types make it exceptionally popular for configuration files (e.g., Docker Compose, Kubernetes, Ansible) and data persistence where readability is paramount. In the Python ecosystem, two libraries dominate YAML handling: the original PyYAML and its more powerful, modern fork, ruamel.yaml. Understanding the distinction between them is crucial for professional Python development.

52.5 TOML: tomllib (Python 3.11+) and tomli

The TOML (Tom’s Obvious, Minimal Language) format has gained significant traction as a configuration file format, praised for its semantic clarity and human-readability, which often positions it as a more intuitive alternative to YAML or JSON for settings. Prior to Python 3.11, developers relied on third-party libraries like toml or tomli for parsing. Recognizing this need, Python 3.11 integrated TOML parsing into the standard library with the tomllib module, which is essentially a standardized version of the excellent tomli library. This move signifies TOML’s importance in the modern Python ecosystem, particularly for tooling like pyproject.toml as defined in PEP 518.

52.4 lxml: Faster and More Powerful XML/HTML Parsing

While the standard library’s xml.etree.ElementTree module provides a capable and Pythonic way to parse XML, it can be limiting for large-scale or complex XML/HTML processing. This is where lxml enters the picture. lxml is a Python binding for the robust, industry-standard C libraries libxml2 and libxslt. It combines the ease-of-use of the ElementTree API with the speed and feature-completeness of these underlying libraries, making it the de facto choice for high-performance XML and HTML parsing in Python.

52.3 XML: ElementTree Parsing and Building

The eXtensible Markup Language (XML) provides a robust, hierarchical, and self-descriptive format for data serialization. While numerous parsing approaches exist, the xml.etree.ElementTree module in Python’s standard library offers a particularly elegant and “Pythonic” interface for both parsing existing XML documents and programmatically constructing new ones. Its name derives from its core abstraction: an XML document is treated as a tree of Element objects, where each element has a tag, attributes, a text content, and a list of child elements.

52.2 CSV: csv.reader, csv.writer, DictReader, DictWriter

The Comma-Separated Values (CSV) format is a deceptively simple text format for tabular data. Its lack of a formal standard has led to numerous dialects, making robust parsing non-trivial. Python’s csv module provides a powerful toolkit to handle these complexities, abstracting away the tedious details of string splitting and manual escaping. The module’s primary philosophy is to operate on sequences—most commonly, lists and dictionaries—treating file objects as its conduit. The csv.reader Object The csv.reader object is the foundational tool for reading CSV data. It takes an iterable (like a file object) and returns a reader object that itself iterates over the rows in the given CSV file, presenting each row as a list of strings.

52.1 JSON: json.loads, json.dumps, Custom Encoders/Decoders

The JavaScript Object Notation (JSON) format has become the lingua franca for data interchange on the web due to its simplicity, readability, and near-universal support. In Python, the json module provides a robust, if sometimes simplistic, interface for serializing and deserializing data. Its two primary workhorses are json.loads() (load string) for decoding JSON data into a Python object and json.dumps() (dump string) for encoding a Python object into a JSON-formatted string.