8.1 String Literals: Single, Double, Triple, Raw, and Byte Prefixes
In Python, a string literal is a sequence of characters enclosed within quotes. The interpreter recognizes this syntax and creates a str object in memory. The choice of quote prefix and type is not merely stylistic; it fundamentally changes how the interpreter parses the enclosed characters, enabling a variety of use cases from multi-line text to precise byte-level data representation.
Single and Double Quotes
The most basic string literals are enclosed in either single ('...') or double ("...") quotes. These two forms are functionally identical in Python; the language does not assign different meanings to them as some others do. This design choice provides flexibility for conveniently including one type of quote within the string without escaping it.
single_quoted = 'This contains a "double" quote.'
double_quoted = "This contains a 'single' quote."
print(single_quoted) # Output: This contains a "double" quote.
print(double_quoted) # Output: This contains a 'single' quote.
To include the same type of quote inside the string, you must escape it using a backslash (\), which tells the Python interpreter to treat the next character literally rather than as a string terminator.
escaped_single = 'It\'s a beautiful day.'
escaped_double = "He said, \"Hello world!\""
print(escaped_single) # Output: It's a beautiful day.
print(escaped_double) # Output: He said, "Hello world!"
Triple-Quoted Strings
For strings spanning multiple lines or containing a significant number of both single and double quotes, triple quotes ('''...''' or """...""") are the ideal solution. The interpreter reads everything between the opening and closing triple quotes, preserving all newlines, tabs, and other whitespace characters.
multi_line_string = """This is the first line.
This is the second line, which includes a "quote" and an apostrophe: it's.
This line is indented with a tab."""
print(multi_line_string)
This is also the conventional method for writing docstrings, which are used to document modules, classes, functions, and methods. The string literal becomes the __doc__ attribute of the object.
def calculate_area(width, height):
"""
Calculate the area of a rectangle.
Args:
width (float): The width of the rectangle.
height (float): The height of the rectangle.
Returns:
float: The calculated area (width * height).
"""
return width * height
print(calculate_area.__doc__)
Raw Strings
A raw string literal is prefixed with r or R (e.g., r'...'). In a raw string, backslashes (\) are treated as literal characters and not as escape characters. This is exceptionally valuable when working with regular expressions and Windows file paths, where backslashes are prevalent.
# Without a raw string, each backslash must be escaped, leading to a "leaning toothpick syndrome"
regex_pattern = "\\section"
windows_path = "C:\\\\Users\\\\John\\\\file.txt"
# With a raw string, the backslashes are preserved exactly as written
raw_regex = r"\section"
raw_path = r"C:\Users\John\file.txt"
print(regex_pattern) # Output: \section
print(raw_regex) # Output: \section
print(windows_path) # Output: C:\\Users\\John\\file.txt
print(raw_path) # Output: C:\Users\John\file.txt
It is a critical best practice to always use raw strings for regex patterns. A single unescaped backslash in a regular string can cause an Invalid Escape Sequence warning or, worse, silently create an incorrect pattern.
A common pitfall is assuming a raw string can end with a single backslash. The closing quote still terminates the string, so a trailing backslash must be escaped, even in a raw string. r"\" is not a valid string literal. To include a trailing backslash, you must use r"\\" or r"\ " (with a space).
Byte String Literals
Prefixed with b or B (e.g., b'...'), byte string literals create instances of bytes instead of str. They are intended to represent sequences of bytes, corresponding to ASCII characters, and are used for handling binary data from files, networks, or devices. Only ASCII characters and escape sequences are permitted.
# Creating a bytes object
byte_data = b'Hello'
print(byte_data) # Output: b'Hello'
print(type(byte_data)) # Output: <class 'bytes'>
# A bytes object with an escape sequence for a hex value
hex_byte = b'\x48\x65\x6c\x6c\x6f' # Represents 'H' 'e' 'l' 'l' 'o' in hex
print(hex_byte) # Output: b'Hello'
# Trying to include a non-ASCII character will raise a SyntaxError
# invalid_byte = b'café' # This line would cause an error
The key distinction is that a str is a sequence of Unicode code points (text), while a bytes object is a sequence of integers between 0 and 255 (binary data). Converting between them requires explicit encoding and decoding, typically using UTF-8.
text_str = 'café' # Unicode string
byte_data = text_str.encode('utf-8') # Encode text to bytes
print(byte_data) # Output: b'caf\xc3\xa9'
reconstructed_str = byte_data.decode('utf-8') # Decode bytes back to text
print(reconstructed_str) # Output: café