27.7 rsync Over SSH: Efficient Incremental File Transfers
Right, so you’ve SSH’d into your server. You can run commands, edit files, and feel like a wizard. But what about getting a whole directory of files to or from that remote machine? Your first instinct might be to reach for scp. Don’t get me wrong, scp is a decent enough tool for a quick one-off file copy. But the moment you need to do this more than once, or you need to sync a directory that’s already partially there, scp becomes a blunt instrument. It copies everything, every single time. That’s like packing your entire house into a moving truck just to bring a new book to your bedside table.
This is where rsync over SSH comes in, and it’s one of the most powerful combos in your toolkit. rsync is a “synchronization” tool. Its genius is in its efficiency; it only sends the differences between your source and destination. If you’ve got a directory with 100 GB of data and you change one 1 KB text file, rsync will only transfer that tiny changed file. It’s magic. And by tunneling it through SSH, you get all the security and authentication you’re already used to.
The basic incantation is simple. You’re essentially just telling rsync to use SSH as its transport mechanism.
rsync -av /path/to/local/source/ user@remote-host:/path/to/remote/dest/
Let’s break that down because those arguments matter. The -a is for “archive mode,” which is a handy pre-packaged set of flags that preserves permissions, timestamps, and does recursive copying (among other things). You almost always want this. The -v is for “verbose,” so it tells you what it’s doing. I like to add a --progress flag too when I’m running it manually, so I can see the transfer status and estimate how long it’ll take to ferry my 20GB of cat memes to the cloud.
The Trailing Slash Trap
This is the single most common “it didn’t work the way I expected!” gotcha with rsync, and it’s a doozy. The presence or absence of a trailing slash (/) on your source path changes its behavior dramatically.
- With a trailing slash (
source/): It means “copy the contents of thesourcedirectory.” - Without a trailing slash (
source): It means “copy thesourcedirectory itself.”
So, if you run:
rsync -av my_project/ user@host:/backups/
You’ll end up with /backups/file1.txt, /backups/file2.txt, etc.
But if you run:
rsync -av my_project user@host:/backups/
You’ll end up with /backups/my_project/file1.txt, /backups/my_project/file2.txt.
The first one is usually what you want for backups and syncs. Pay attention to those slashes; they’re quietly running the whole show.
Forgetting Your SSH Arguments
Here’s a pro move. If your remote host uses a non-standard SSH port or a specific private key, you don’t need to fuss with your ~/.ssh/config file first. You can pass those SSH options directly through rsync using the -e (or --rsh) flag.
Say your obscure IoT device runs SSH on port 2222 and you need to specify a key:
rsync -av -e 'ssh -p 2222 -i ~/.ssh/my_iot_key' ./data/ user@iot-device.local:/storage/
This is a lifesaver for scripting weird one-off devices. rsync just hands that whole string to SSH and says, “You deal with this.”
The –delete Flag: Use With Extreme Prejudice
By default, rsync is additive. It only copies files that exist on the source to the destination. It won’t delete files on the destination that have been removed from the source. Sometimes you want that. Sometimes you want the destinations to be a perfect mirror. That’s what --delete is for.
WARNING: This is a foot-cannon. Test this without the --dry-run flag first and you will eventually delete something you didn’t mean to. Always, always do a dry run first.
# First, see what it WOULD do. This is your get-out-of-jail-free card.
rsync -av --dry-run --delete /clean_source/ user@host:/cluttered_dest/
# If the list of files to delete looks correct, then run it for real.
rsync -av --delete /clean_source/ user@host:/cluttered_dest/
This is the command you use when you want your backup directory to be an exact replica, not an historical archive of old files you’ve since deleted locally.
Excluding Files and Directories
You rarely want to sync everything. Temporary files, build directories, virtual environments (node_modules/, .venv/), and giant logfiles are just baggage. You can exclude them with the --exclude flag. For a project with a Python virtual environment and a build directory, you’d do:
rsync -av --exclude='.venv' --exclude='__pycache__' --exclude='build/' ./ project-user@deploy-server:/opt/my_app/
If you find yourself using the same set of excludes every time (you will), create a .rsync-filter file in your source directory or list your excludes in a file and use --exclude-from=excludes.txt. It saves your sanity and your bandwidth.
So, ditch scp for anything more sophisticated than a single file. Embrace rsync. It feels a bit more complex at first, but the power and time it gives you back is immense. Just mind the trailing slashes and never, ever run --delete without a --dry-run first. Trust me on this one.