8.9 Regular-Expression-Adjacent Methods: startswith, endswith, isdigit
The startswith and endswith Methods
These methods are essential for checking the prefix or suffix of a string, a common task in data validation, file path manipulation, and parsing. They offer a more readable and efficient alternative to slicing the string and comparing the result.
The str.startswith(prefix, start, end) method returns True if the string starts with the specified prefix, otherwise it returns False. The optional start and end parameters allow you to define a slice of the string to check against. Similarly, str.endswith(suffix, start, end) checks for a matching suffix.
Crucially, both methods can accept a tuple of prefixes or suffixes to check against. This is far more efficient and readable than chaining multiple or operators, as the method will return True if any of the tuple elements match.
file_path = "report.pdf"
# Common but less efficient way
if file_path.endswith('.pdf') or file_path.endswith('.docx'):
print("File is a document")
# Preferred way: using a tuple
if file_path.endswith(('.pdf', '.docx', '.txt')):
print("File is a document")
# Using with start and end parameters
url = "https://www.example.com"
if url.startswith("https", 0, 5): # Checks the slice url[0:5]
print("The protocol is secure (HTTPS)")
The isdigit Method
The str.isdigit() method is a member of the “is…” family of string methods, which are used for character classification. It returns True if all characters in the string are digits (0-9) and the string is not empty. This is the key criterion for checking if a string can be safely converted to an integer using int().
It is vital to understand what isdigit() does not recognize:
- Negative signs (
-) and decimal points (.) are not considered digits. Therefore, it will returnFalsefor strings representing negative numbers or floats. - It recognizes superscript digits (e.g.,
'²'(u00B2)) as digit characters, whichint()cannot convert, leading to aValueError. - It does not recognize other numeric characters like fractions or Roman numerals.
# Valid integer strings
print("12345".isdigit()) # Output: True
print(int("12345")) # Output: 12345
# Invalid for integer conversion
print("-123".isdigit()) # Output: False (contains '-')
print("12.34".isdigit()) # Output: False (contains '.')
print("12a34".isdigit()) # Output: False (contains 'a')
print("".isdigit()) # Output: False (empty string)
# Edge Case: Unicode digits
unicode_superscript = "²" # Unicode for 'squared'
print(unicode_superscript.isdigit()) # Output: True
print(int(unicode_superscript)) # Raises ValueError: invalid literal for int()
Common Pitfalls and Best Practices
A frequent pitfall is using isdigit() to validate numeric input without considering the full scope of what “numeric” means. For converting user input to an integer, a robust approach is to use a try/except block. This gracefully handles negatives, extraneous whitespace, and invalid characters.
user_input = input("Enter an integer: ").strip()
# Pitfall: Incorrect validation
if user_input.isdigit():
number = int(user_input)
print(f"You entered the integer: {number}")
else:
print("That was not a positive integer.")
# Best Practice: Defensive conversion with try/except
try:
number = int(user_input)
print(f"You entered the integer: {number}")
except ValueError:
print("Please enter a valid integer.")
For startswith() and endswith(), always prefer passing a tuple of possibilities rather than using logical or. This is not only more Pythonic and readable but also performs better, as the method is optimized for this operation.
When using the start and end parameters, remember they follow the same slicing rules as string indexing. The check is performed on the slice string[start:end].
text = "The year is 2024"
# Check if the substring "year" starts with 'y'
# The slice text[4:8] is "year"
if text.startswith('y', 4, 8):
print("Found it!") # This will be printed
In summary, startswith and endswith are your primary tools for precise prefix/suffix checks, especially with tuple arguments. Use isdigit for a quick check if a string contains only base-10 digits, but rely on try/except around int() for robust integer conversion from user input.