91.5 Celery Beat: Periodic Tasks

Alright, let’s talk about Celery Beat, the part of Celery that tries its best to be a reliable alarm clock but occasionally sleeps through its own wake-up call. You want to run tasks periodically—every ten seconds, every day at midnight, every Tuesday at 3 AM when the server load is theoretically lowest. That’s Beat’s job. It’s the scheduler. Think of it as cron that’s been dragged, kicking and screaming, into the distributed application era.

The core idea is simple: Beat is a separate process that, when started (celery -A proj beat), consults a schedule and pushes tasks into the broker at the appointed times. The workers then pick them up and execute them as normal. The magic, and the complexity, lies in how that schedule is configured and persisted.

The Beat Schedule (`beat_schedule`)

You define your periodic tasks right in your Celery app configuration. This isn’t just a list of “what” to run; it’s a dictionary that defines the rhythm of your entire operation. Let’s build one.

# proj/celery.py
from celery import Celery

app = Celery('proj')

app.conf.update(
    broker_url='pyamqp://guest@localhost//',
    result_backend='rpc://',
    beat_schedule={
        'add-every-30-seconds': {
            'task': 'proj.tasks.add',
            'schedule': 30.0,  # Every 30 seconds
            'args': (16, 16),  # These args feel... fitting.
        },
        'scrape-the-news-every-morning': {
            'task': 'proj.tasks.scrape_news',
            'schedule': crontab(hour=7, minute=30),  # 7:30 AM every day
            'kwargs': {'source': 'nytimes'},
            'options': {'queue': 'scraping'},
        },
    },
)

Notice the two types of schedule here. The first is a simple float for seconds. It’s easy but inflexible. The second uses a crontab object, which is your gateway to the beautiful granularity of Unix-style scheduling. Want something every 15 minutes between 9-to-5 on weekdays? crontab(minute='*/15', hour='9-17', day_of_week='mon-fri'). It’s powerful. Use it.

The Scheduler Backend: The Single Source of Truth Problem

Here’s the first “questionable choice” you need to understand to avoid disaster. By default, Beat uses a local file to store the last time it ran each task (celerybeat-schedule). This is fine for a single Beat instance running on your laptop. It is a catastrophically bad idea for production.

Why? Because if you run multiple Beat instances for redundancy (which you should, because if Beat dies, no tasks get scheduled), they will each have their own file. They will quickly fall out of sync, and you’ll end up with duplicate tasks for every schedule. It’s a mess.

The solution is to use a centralized scheduler backend. The only production-ready choice you should seriously consider is the Database Scheduler. It stores the schedule and the last run times in a SQL table, so all Beat instances can coordinate.

# Configure the database scheduler in your app
app.conf.beat_scheduler = 'celery.beat.PersistentScheduler'
app.conf.beat_schedule_filename = 'celerybeat-schedule'  # This is the default, the problem!

# Do this instead for production:
app.conf.beat_scheduler = 'django_celery_beat.schedulers:DatabaseScheduler'
# If you're using Django, you'd install `django-celery-beat`

Without the Django integration, it’s more complex, which is why most people just use the Django package even if it feels a bit heavy-handed. The designers made the simple case trivial and the correct production setup require a third-party library. It’s a trade-off, but knowing it exists saves you from the duplicate task apocalypse.

The Clock Service: For When You Really, Really Need Reliability

Sometimes, even the database scheduler isn’t enough. If you need absolute certainty that a task runs exactly once at its scheduled time and you need to scale horizontally, you should consider offloading the scheduling responsibility entirely.

This is where the Clock Service pattern comes in. You use a ridiculously reliable external service to trigger your tasks. You might use actual cron on a very stable machine to send a message to your broker, or you might use a cloud service like AWS CloudWatch Events.

# A task designed to be triggered by an external clock
@app.task
def nightly_report():
    # ... generate report ...

# Meanwhile, in a crontab on a dedicated server:
# 0 2 * * * curl -X POST http://your-app.com/trigger-nightly-report/

This seems like a step backwards, and in complexity it is. But in reliability? It can be a step forward. You’re using a battle-hardened tool (cron) for what it’s best at: telling time. The trade-off is that you now have to manage that external system and ensure it can authenticate with your app to trigger the task. It’s a valid choice for critical schedules.

Pitfalls and Sharp Edges

Task Overlap: This is the big one. What if your task is scheduled every 30 seconds, but the task itself takes 45 seconds to run? By default, Beat doesn’t care. It will queue the next task anyway. You end up with a backlog of the same task piling up. Your solution is to use locks or leader election inside the task itself, often with a caching layer like Redis to set a “this task is currently running” flag. Celery does not hold your hand here.
Timezone Awareness: Your crontab schedules run in UTC by default. If you want them to run in a specific timezone (e.g., “at 9 AM Berlin time”), you must set it explicitly. crontab(hour=9, minute=0).tzset('Europe/Berlin'). Forgetting this is a classic way to have a task run at the wrong hour after a deployment.
Deployments and Changing Schedules: When you change your beat_schedule and deploy, the running Beat process won’t know about it. You must restart it. This is another point in favor of the database scheduler—you can often update the schedule in the database without a full restart, but it’s still fiddly.

The bottom line is this: Celery Beat is a useful tool, but it’s not a precision instrument. It gets the job done for most common periodic tasks, but the moment you need strong guarantees about uniqueness or timing, you’re in for a world of custom coordination. Start with the beat_schedule and a crontab, but graduate to the database scheduler immediately. And for the love of all that is holy, never rely on the default file-based scheduler in production. I’m not kidding. Your brilliant friend has made that mistake so you don’t have to.

The Beat Schedule (beat_schedule)

The Scheduler Backend: The Single Source of Truth Problem

The Clock Service: For When You Really, Really Need Reliability

Pitfalls and Sharp Edges

The Beat Schedule (`beat_schedule`)