28.5 Optimizing Remote Resource Fetching

Right, let’s talk about fetching stuff from the internet. You’ve probably got a Hugo site that pulls in data from some remote API, or maybe you’re building an image gallery from a CDN. It’s fantastic until you run hugo server and go make a coffee while it decides to re-fetch every single resource, every single time. This is the digital equivalent of your friend who tells the same long-winded story whenever you see them. We’re going to fix that.

The core of the problem is that Hugo, by default, is a bit too diligent. It assumes that any remote resource you call with getJSON, getCSV, or resources.GetRemote might have changed since the last build. For a rapidly updating API, that’s prudent. For your company’s static employee directory API that updates quarterly, it’s a form of digital self-flagellation.

The Golden Rule: Cache Locally, Fetch Remotely

The single most important thing you can do is tell Hugo to save a local copy of the remote resource and use that for a period of time. We do this with the cacheDir and the --ignoreCache flag. Think of cacheDir as your pantry. You go to the store (the remote API) once, stock up, and then you can make sandwiches for a week without another trip. The --ignoreCache flag is you saying, “I don’t care what’s in the pantry, I’m going to the store right now.”

First, configure your config.* file. This is non-negotiable.

caches:
  modules:
    dir: ':projectDir/assets/_vendor'
  assets:
    dir: ':projectDir/assets/_cache'
  getresource:
    dir: ':projectDir/assets/_cache'
  getjson:
    dir: ':projectDir/assets/_cache'
  getcsv:
    dir: ':projectDir/assets/_cache'
  images:
    dir: ':projectDir/assets/_cache'

Now, when you use resources.GetRemote, it will automatically stash a copy in that _cache directory. The magic, however, is in the headers.

Getting Smart with HTTP Headers

Blindly caching forever is just as bad as not caching at all. You need a strategy. The most elegant way is to respect the remote server’s wishes using the Cache-Control or Etag headers it (hopefully) sends. This is how adults handle caching.

Hugo’s resources.GetRemote can do this, but you have to ask it nicely. Here’s how you make a request that will only re-fetch the data if the remote copy has changed (using Etag/Last-Modified semantics):

{{- $url := "https://api.example.com/very-large-endpoint" -}}
{{- with resources.GetRemote $url (dict "method" "head") }}
  {{- /* If the remote supports it, we now have headers! */ -}}
  {{- $etag := .Data.Headers.Etag -}}
  {{- $lastMod := .Data.Headers.Get "Last-Modified" -}}
{{- end }}

{{- /* Now use those headers in a conditional GET */ -}}
{{- $opts := dict 
  "headers" (dict 
    "If-None-Match" $etag
    "If-Modified-Since" $lastMod
  )
-}}
{{- $data := resources.GetRemote $url $opts -}}

{{- if $data -}}
  {{- /* Data was updated, process it! */ -}}
  {{- $content := $data.Content | transform.Unmarshal -}}
  ... your logic here ...
{{- else -}}
  {{- /* 304 Not Modified, use your cached version */ -}}
  {{- /* This is where you'd load from a previously stored variable */ -}}
{{- end -}}

Yeah, I know. It’s a bit more work. But this is the difference between a site that builds in 30 seconds and one that builds in 30 minutes. You’re checking if the milk is sour before buying a new gallon.

The Quick and Dirty: Time-Based Caching

Maybe the API you’re hitting doesn’t send good headers, or you just can’t be bothered with the ceremony above. I get it. Sometimes you just need a simple solution. For that, we can use a hack: only fetch the remote resource if a locally cached file is older than a certain time.

We can combine resources.GetRemote with Hugo’s file system functions. The trick is to write the fetched data to a file in your assets directory and then check its modification time.

{{- $cacheFile := "data/cached-api-response.json" -}}
{{- $cacheDuration := "24h" -}} <!-- Cache for 24 hours -->
{{- $cacheExists := fileExists (printf "assets/%s" $cacheFile) -}}
{{- $shouldFetch := true -}}

{{- if $cacheExists -}}
  {{- $fileMod := (os.Stat (printf "assets/%s" $cacheFile)).ModTime -}}
  {{- if lt (now.Sub $cacheMod) (duration $cacheDuration) -}}
    {{- $shouldFetch = false -}}
  {{- end -}}
{{- end -}}

{{- if $shouldFetch -}}
  {{- $data := resources.GetRemote "https://api.example.com/data" -}}
  {{- if $data -}}
    {{- /* Write the fresh data to our cache file */ -}}
    {{- $data.Permalink | readFile | printf "/assets/%s" | os.Rename -}}
    {{- $freshData = $data.Content | transform.Unmarshal -}}
  {{- end -}}
{{- else -}}
  {{- /* Read from the cached file on disk */ -}}
  {{- $freshData = readFile (printf "content/%s" $cacheFile) | transform.Unmarshal -}}
{{- end -}}

This is less efficient than the header method, but it’s brutally effective and gives you absolute control. It’s the caching equivalent of a blunt instrument.

The Pitfall: Dynamic URLs and Build-Time Fetching

Here’s the big one. Let’s say you do this in your template:

{{ $productData := getJSON "https://api.com/products/" .Page.Params.product_id }}

Seems fine, right? Wrong. Hugo executes this during the build. If you have 100 product pages, it will try to fetch from that API 100 times. If the API is slow, your build will be a disaster.

The solution is to fetch once, then distribute. Fetch the entire product list in a shared partial or shortcode, cache it once using the methods above, and then use Hugo’s scratchpad or a dict to pluck out the data you need for each page from that single, in-memory dataset. You move the fetch from the page level to the site level. This is the single biggest performance win you’ll get.

Remember, the goal isn’t to avoid the remote fetch altogether; it’s to avoid doing it redundantly. You’re not being lazy, you’re being smart. And your build times will thank you for it.