89.4 gettext: Marking Strings for Translation

Right, let’s talk about the part of i18n that feels like you’re tagging your entire codebase with digital sticky notes: marking strings for translation. This is where we use gettext, the granddaddy of i18n libraries, a piece of software so old and battle-tested it has its own particular, slightly musty, smell. Don’t worry, it’s still incredibly effective.

The core idea is simple, even if the implementation makes you want to weep. You wrap every user-facing string in your code in a special function call. This tells the gettext system, “Hey, this one needs to be translated.” Later, a tool will scan your code, find all these tagged strings, and create a template file (.pot) for translators to fill in. The magic happens at runtime: when your function is called, it looks up the original string in the correct translated catalog and returns the translation. If it doesn’t find one, it just returns the original string. No harm, no foul.

The standard function for this is gettext(), but you’ll almost always use its shorthand alias, _(). Yes, an underscore. It’s the shortest function name you’ll ever use, and it’s a stroke of genius born from pure laziness—you’ll be typing it a lot.

# This is the classic, required setup.
import gettext
gettext.bindtextdomain('myapp', '/path/to/your/locale/directory')
gettext.textdomain('myapp')
_ = gettext.gettext

# Now, in your code, you do this:
print(_("Welcome to my amazing application."))
user_message = _("You have {count} new messages.").format(count=message_count)

Why the alias? Because wrapping every single string in gettext.gettext("...") would make your code unreadable. The underscore is a concession to sanity.

The Two Hard Problems: Context and Plurals

Sometimes, the same word means different things in English. Consider “file”. Is it a thing on your disk, or a tool for smoothing wood? The poor translator hasn’t a clue. This is where context (pgettext) comes in. You provide an extra “context” string to disambiguate.

# pgettext(context, message)
from gettext import pgettext

# To a translator, these are now two distinct strings.
menu_label = pgettext("menu item", "File")
tool_name = pgettext("carpentry tool", "File")

Then there’s the plural problem. English has two forms: singular and plural. “1 file” vs. “2 files”. Easy. Other languages have way more. Some have different forms for numbers ending in 2, 3, or 11. I’m not kidding. This is why you use ngettext.

# ngettext(singular, plural, count)
from gettext import ngettext

message = ngettext(
    "You have {num} message.",  # singular
    "You have {num} messages.", # plural
    num                         # the number to decide which form to use
).format(num=num)

The ngettext function uses the number n to choose the correct plural form from the translated catalog, which is mapped using rules from the Unicode CLDR. You just provide the two English forms and the number; gettext and the translator handle the mind-bending complexity for you.

Python’s F-Strings: A Beautiful Trap

Here’s the part where I save you hours of frustration. You might think, “I’ll use f-strings! They’re so clean!”

# DON'T DO THIS. IT WILL BREAK EVERYTHING.
user = "Alice"
print(_(f"Hello, {user}!"))

This is a catastrophe. The string extracted for translation will be "Hello, Alice!" not "Hello, {user}!". The translation team will have to create a separate entry for every single user in the universe. Use formatting after the translation lookup.

# DO THIS INSTEAD. IT'S THE WAY.
user = "Alice"
print(_("Hello, {name}!").format(name=user))

The translation team gets the clean, predictable string "Hello, {name}!" and can move the placeholder around in the translated version as their language’s grammar requires (¡{name}, hola!).

Best Practices: Don’t Be a Monster

Your job isn’t just to tag strings; it’s to tag them in a way that doesn’t make translators want to hunt you down. Break up long sentences. It’s easier to translate “File” and “not found” than the combined “File not found”. But don’t break up things that need to be together. _("Confirm") + " " + _("delete") will be translated as two separate words, which might be nonsensical in another language. Always keep the full semantic phrase together.

And for the love of all that is holy, don’t create strings through code. Never do this:

# This is an i18n war crime. The extraction tools can't find these strings.
parts = ["Error", "code", str(code)]
message = " ".join(parts)

The extraction tool (xgettext, babel, etc.) is a static parser. It can’t run your code. It will only find literal strings inside your _() calls. If a string isn’t there as a literal, it won’t be found, and it will never be translated. Mark everything. Be generous. Assume that if a human can read it, it probably needs a translation wrapper.