40.5 pgBadger: Parsing and Visualizing the Log

Alright, let’s get our hands dirty with pgBadger. You’ve enabled your log_line_prefix, set log_min_duration_statement, and now your log files are growing, filled with the glorious, unvarnished truth of what your database is actually doing. Staring at these raw text files is a special kind of masochism. It’s like trying to understand a city’s traffic patterns by lying in the middle of a highway with a notepad. Enter pgBadger: it’s the traffic control helicopter that lifts you up, gives you a map, and points out all the fender benders and gridlock.

This isn’t some bloated SaaS dashboard; it’s a fiercely efficient Perl script that chews through gigabytes of logs at an alarming speed and spits out a beautiful, self-contained HTML report. It’s the single most effective tool for going from “something’s slow” to “this query from this application user is causing all the problems because it’s missing an index and it’s called 10,000 times a minute.” Let’s make it work for you.

Installation and Basic Invocation

You can probably install it with your package manager (apt install pgbadger on Debian/Ubuntu, yum install pgbadger on RHEL/CentOS), but I prefer the cpan method because it often gets you the latest and greatest. Don’t worry, cpan isn’t as scary as it used to be.

sudo cpanm install pgBadger

Once it’s installed, the most basic invocation is almost comically simple. You point it at your log file and tell it where to put the output.

pgbadger /var/log/postgresql/postgresql-*.log -o /tmp/my_first_report.html

Open that HTML file in a browser. Congratulations, you’ve just upgraded from a cave painting to a high-def satellite image. But we can, and will, do better. The real magic happens with the options you’re about to learn.

Tuning for Performance and Precision

pgBadger is fast, but if you’re dealing with a multi-gigabyte log directory, you can make it scream. The -j or -J flags are your best friends here. They tell pgBadger to use multiple cores. It’s the difference between watching paint dry and… well, watching several paints dry concurrently, but much faster. Use -j for the number of cores or -J to let it detect the number automatically.

# For a big batch job, use all the cores you paid for
pgbadger -J 8 /var/log/postgresql/postgresql-*.log -o /tmp/daily_report.html

Now, the most common “why isn’t this working?!” moment. Your log_line_prefix in postgresql.conf and the prefix you tell pgBadger must match. If they don’t, pgBadger will politely process zero lines and you’ll get a very empty, very sad report. My standard prefix is %t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h , which breaks down as the timestamp, process ID, log line number, user, database, application name, and client IP. To tell pgBadger about this slightly complex prefix, you use the --prefix flag:

pgbadger --prefix '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' /path/to/logfile.log

Getting this right is non-negotiable. If you’re unsure, run pgBadger with the -f (format) flag set to stderr first. It will output the first few lines it parses, so you can verify it’s reading them correctly before doing a big run.

The Real-World, Incremental Workflow

You’re not going to run a full report on gigabytes of logs from scratch every hour. That’s wasteful. The production-grade workflow is incremental. You run pgBadger periodically (e.g., every hour via cron), and it appends the new data from the last hour’s logs to an existing binary data file. Then, you regenerate the HTML report from that compact binary file. This is incredibly efficient.

Here’s the cron job you’ve been waiting for:

# Example hourly cron job
pgbadger -I --prefix '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' \
  /var/log/postgresql/postgresql-$(date -d '1 hour ago' +%Y-%m-%d-%H).log \
  -O /var/www/pgbadger_reports/ \
  -o /var/www/pgbadger_reports/hourly_incremental.html

The -I flag is key. It tells pgBadger this is an incremental run. The first time you run it, it will create a pgbadger_data.bin file in the output directory (-O). Every subsequent run with -I will read that .bin file, parse the new log file you’ve specified, update the binary data, and then regenerate the HTML report. Your -o output file will always be a complete report from the beginning of time (or whenever you started the incremental series).

What to Actually Look For in the Report

The report is vast. Don’t get overwhelmed. Start here:

Overview Tab: Check the “Time Range” to ensure you’re looking at the correct period. Immediately look for red—any errors or fatal connections. A spike there is your biggest emergency.
Top Queries by Time: This is your hit list. These are the queries consuming the most total clock time on your system. Click one. pgBadger will show you its execution plan (if auto_explain is logged), its average duration, how many times it was called, and, crucially, the exact log lines where it occurred. This is how you connect a slow query to a specific application event or user.
Temporary Files: A sudden spike in temp file usage is a silent killer. It means queries are doing massive sorts or joins that can’t fit in work_mem. This is a primary source of I/O bombardment. This tab tells you exactly which queries to blame.
Connections: See if you have a connection storm. Thousands of quick connections from an app server with a leaky pool will show up here, wasting resources on authentication and setup.

pgBadger doesn’t solve your problems, but it eliminates the mystery. It tells you exactly which of your 10,000 queries is the problem child, so you can stop guessing and start adding indexes, rewriting code, or adjusting memory settings. It’s the closest thing to a database superpower you can get from a Perl script.