31.6 Custom Deployment: rsync, scp, and S3+CloudFront

Alright, let’s get our hands dirty. You’ve built your Hugo site, and it’s a thing of beauty. But now we need to move it from the cozy confines of your laptop to the cold, hard internet where people can actually see it. The big platforms like Netlify are fantastic, but sometimes you need to roll your own deployment. Maybe you’re deploying to a client’s existing VPS, or you need the fine-grained control of a CDN. This is where the old guard—rsync, scp, and the AWS combo platter of S3 and CloudFront—come into play. It’s a bit more manual, but you’ll know exactly what’s happening, and I’ll be right here to make sure you don’t step on any rakes.

The Humble Power of rsync

rsync is the workhorse. It’s been moving files around since the dawn of time (or 1996, which is basically the same thing in internet years). Its superpower is that it only sends the differences between files, not the whole file every time. This makes subsequent deployments blazingly fast.

The basic incantation is simple, but the devil is in the flags. Here’s a command you’d run from your local machine after running hugo to build your public/ directory:

rsync -avz --delete public/ username@your-server.com:/path/to/your/webroot/

Let’s break down this arcane spell:

-a: Archive mode. This is a bundle of useful flags that preserves permissions, timestamps, and copies directories recursively. It’s basically “do the right thing.”
-v: Verbose. It tells you what it’s doing. Annoyingly quiet without it.
-z: Compress during transfer. Great for saving bandwidth, especially if you’re on a dodgy coffee shop connection.
--delete: This is the dangerous one. It tells rsync “make the destination exactly like the source; delete any files on the server that aren’t in your local public/ folder.” This is why you must get the paths right. Test this without --delete first, or you might accidentally nuke your server’s /etc directory. I’m not joking. The path trailing the slash on public/ matters—it copies the contents of the directory, not the directory itself.

Pitfall: Permissions. If your local Hugo build runs as your user, the files might have permissions that don’t work for the web server user (like nginx or www-data) on the remote machine. You might need to add a --chmod flag or run a quick chown command on the server after.

The Simplicity (and Brutality) of scp

scp is rsync’s simpler, less clever cousin. It uses the same SSH-under-the-hood magic, but it doesn’t do incremental syncs. It just copies files. Whole files. Every time. It’s like using a sledgehammer to hang a picture. For a full site deployment, I’d almost always choose rsync, but scp is perfect for quickly uploading a single file you need to check.

To copy your entire public directory up to a server:

scp -r public/* username@your-server.com:/path/to/your/webroot/

See? Simple. And brutally inefficient for a thousand tiny HTML files. The -r flag is for “recursive.” The major pitfall here is that it doesn’t clean up old files. If you renamed blog/post-1.md to blog/my-great-post.md, scp will upload the new file but leave the old one sitting there, rotting and returning 404s. You’d have to SSH in and manually clean up. This is why rsync --delete is almost always the better tool for this job.

The Industrial-Grade Solution: S3 + CloudFront

Now we’re talking. This is the “I need a globally distributed, highly scalable CDN” option. Hugo sites are static, and there’s no better home for static assets than Amazon S3 (Simple Storage Service). It’s dirt-cheap and incredibly durable. But you don’t want users hitting your-bucket-name.s3-website-us-east-1.amazonaws.com; you want your custom domain. That’s where CloudFront, AWS’s CDN, comes in.

Step 1: The S3 Bucket. Create a bucket named exactly after your custom domain (e.g., www.myawesomeblog.com). This isn’t strictly necessary, but it keeps things sane. Crucially, you must configure it for static website hosting and set a bucket policy that allows the world to read the objects. This policy is your first hurdle:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::www.myawesomeblog.com/*"
        }
    ]
}

Step 2: The CloudFront Distribution. This is where you point your custom domain. You create a distribution and tell it to use your S3 bucket as the “Origin.” Here’s the critical “questionable choice” you must override: Do not use the auto-populated S3 REST API endpoint origin. It’s a foot-gun. Instead, use the Static Website Hosting Endpoint from your S3 bucket’s properties. Why? The REST endpoint (which looks like bucket-name.s3.amazonaws.com) expects requests in a specific AWS format. The static website endpoint (which looks like bucket-name.s3-website-us-east-1.amazonaws.com) understands regular web requests, which means things like redirects and proper error pages (like 404.html) will actually work. It’s baffling that this isn’t the default.

Step 3: Deployment. You use the AWS CLI to sync your public/ folder to the bucket. It’s like rsync for S3.

aws s3 sync public/ s3://www.myawesomeblog.com --delete --acl public-read

The --delete flag removes files in the bucket that aren’t locally present. The --acl public-read flag ensures new files are readable by the world. Massive Pitfall: CloudFront caches everything aggressively by default. After you sync, your changes might not be visible for minutes or even hours (the default TTL is 24 hours!). You must invalidate the CloudFront cache. You can do this for the entire distribution (/*) or for specific files.

aws cloudfront create-invalidation --distribution-id YOUR_DISTRIBUTION_ID --paths "/*"

It’s an extra step, and it’s annoying, but it’s the price of immense speed and global reach. Automate this in a script, and you’ve got a deployment pipeline that rivals any platform-as-a-service, with total control over every single part of it.