27.1 Automatic Sitemap Generation and Customization
Right, let’s talk about sitemaps. You’ve probably heard the phrase “just install a plugin” and been done with it. And for a simple blog, that’s fine. But you and I aren’t building simple blogs, are we? We’re building complex, dynamic beasts with custom post types, user-generated content, and all sorts of weirdness. Plugins often guess wrong. They include things they shouldn’t and, more dangerously, exclude things they absolutely should not. So we’re going to do it right: we’re going to generate our sitemaps automatically, and we’re going to tell them exactly what to do.
The beauty of modern WordPress (and by modern, I mean everything since version 5.5) is that it has a robust, built-in sitemap system. It’s not some clunky add-on; it’s core functionality. It’s generated on the fly, which means it’s always up-to-date. No cron jobs to set, no static files to regenerate. When you publish a post, it’s instantly in the sitemap. Beautiful.
The Default Behavior (And Why You’ll Want to Change It)
Out of the box, WordPress will automatically generate sitemaps for the following:
- Your posts (post)
- Your pages (page)
- Your categories and tags (the ‘category’ and ‘post_tag’ taxonomies)
- Your author archives
- Any other public custom post types or taxonomies you have
You can see it in action by visiting yoursite.com/wp-sitemap.xml. This isn’t a single massive XML file; it’s an index that points to individual sitemaps for each content type (wp-sitemap-posts-post-1.xml, wp-sitemap-taxonomies-category-1.xml, etc.). This is smart. It keeps the files a manageable size for search engines to parse.
The problem? This default is… enthusiastic. It will include everything public. Your /author/admin/ page that has one post from 2012? Included. Your /category/uncategorized/ archive? Included. Your custom post type for internal company memos that should never see the light of Google? Included. You get the picture. We need to curate.
Taking Control with wp_sitemaps_add_provider
The most surgical way to remove an entire sitemap section is to de-register the provider for it. Let’s say you’ve come to your senses and decided that exposing a list of all your site’s users to every search engine is a bad idea (it is). You’d nuke the users sitemap like this:
// Remove the user sitemap because we're not monsters.
add_filter( 'wp_sitemaps_add_provider', function( $provider, $name ) {
return $name === 'users' ? false : $provider;
}, 10, 2 );
This hook runs for each provider (users, posts, taxonomies, etc.). When the provider name is ‘users’, we return false, preventing it from being added to the sitemap index. Gone.
The Finer-Grained Approach: wp_sitemaps_post_types and wp_sitemaps_taxonomies
More commonly, you’ll want to leave a post type or taxonomy active but just prune specific items from it. The first step is to control which post types and taxonomies are even eligible for the sitemap.
Maybe you have a ’testimonial’ custom post type. It’s public, so WordPress happily creates a sitemap for it. But you’re using these on a private, client-by-client basis and don’t want them indexed. Easy. Remove it from the list.
// Stop the 'testimonial' post type from having its own sitemap.
add_filter( 'wp_sitemaps_post_types', function( $post_types ) {
unset( $post_types['testimonial'] );
return $post_types;
} );
You do the exact same thing with wp_sitemaps_taxonomies to remove, say, the ‘post_tag’ taxonomy if you’ve decided tags are a messy SEO nightmare (a position I often sympathize with).
The Nuclear Option: Customizing Individual URLs
This is where the real power is. The wp_sitemaps_posts and wp_sitemaps_taxonomies filters let you modify the list of entries within a sitemap before it’s generated. This is your chance to be a micromanager.
Let’s say you have a “Secret Projects” category with the ID of 4. You want all the posts in that category to be in the sitemap, but you don’t want the category archive itself to be indexed. You can’t do that with wp_sitemaps_taxonomies because that would remove the entire ‘category’ sitemap. Instead, you filter the list of category IDs.
// Remove a specific category (ID 4) from the taxonomy sitemap.
add_filter( 'wp_sitemaps_taxonomies', function( $taxonomies ) {
// Check if the category sitemap is being built.
if ( isset( $taxonomies['category'] ) ) {
$categories = $taxonomies['category'];
// Find and remove the term_id we don't want.
$categories = array_filter( $categories, function( $category ) {
return $category->term_id !== 4; // Goodbye, category 4.
} );
$taxonomies['category'] = $categories;
}
return $taxonomies;
} );
The same principle applies to posts with wp_sitemaps_posts. You could filter out posts with a specific meta key, or posts older than a certain date, or posts written by a specific author. The data structure is an array of WP_Post or WP_Term objects, so you have the full context to make your decisions.
The Most Important Part: The lastmod and Priority Fallacy
Look at your auto-generated sitemap. You’ll see a <lastmod> date for each URL. This is not the last time you modified the post. It’s the last time the post was published or its status changed. This is a WordPress “quirk.” If you just update a post, the lastmod date does not change. This is, frankly, dumb.
Fixing it requires a bit of a hack—listening for when a post is updated and manually changing its post_modified_gmt field. It’s a pain, and whether the tiny SEO gain is worth the effort is debatable. Just be aware that the date you see isn’t always the one you think it is.
And please, for the love of all that is holy, do not waste a single brain cell trying to manually set “priority” in your sitemap. Google has explicitly stated they ignore it. It’s a relic. Your time is better spent literally anywhere else, like making sure your lastmod dates are actually correct. The irony is not lost on me.