61.6 Querying with SQLAlchemy ORM: filter, join, order_by, limit
Right, so you’ve got some data in there. Congratulations. Now let’s actually do something with it. The SQLAlchemy ORM is where this goes from being a neat trick to being your superpower. It lets you query your database using Python objects and methods, which is infinitely more pleasant than string-stitching raw SQL. But with great power comes the great responsibility of not writing horribly inefficient queries. Let’s get into it.
The Workhorse: filter() is Your New Best Friend
The session.query(MyModel) is your starting pistol, but it just gets everything. You use filter() to, well, filter the results. This is where the ORM shines, because it’s not just doing WHERE clauses; it’s abstracting them into Pythonic expressions.
The most common way is to use the standard column operators. Forget ==; think in terms of ilike(), in_(), and and_().
from my_models import User, Session
from sqlalchemy import or_
session = Session()
# Find all users named 'Alice' (case-sensitive)
alices = session.query(User).filter(User.name == 'Alice').all()
# Find users whose name starts with 'A' (case-insensitive, which is often what you want)
a_people = session.query(User).filter(User.name.ilike('a%')).all()
# Find users in a specific set of ids (avoids the dreaded IN injection risk)
users = session.query(User).filter(User.id.in_([5, 12, 42])).all()
# The classic: multiple conditions. This is an AND by default.
active_admins = session.query(User).filter(
User.is_admin == True,
User.is_active == True
).all()
# But sometimes you need an OR. Note the usage of or_().
important_users = session.query(User).filter(
or_(
User.is_admin == True,
User.email.ilike('%@ceo.com')
)
).all()
Why this is brilliant: SQLAlchemy builds a parameterized query under the hood. That ilike('a%') doesn’t get passed as a raw string to the database; it’s converted to a parameter, making your code immune to SQL injection. You’re writing Python, not SQL, and it’s secure by default.
Joining Tables Without Losing Your Mind
This is the part where people usually give up and write raw SQL. Don’t. The ORM makes it simple, if you set up your relationships correctly first (which we covered last section, you did that, right?).
The key is that you often don’t need an explicit join() if you have a relationship defined. You can just filter directly on the related table’s columns.
from my_models import User, Post, Session
session = Session()
# Find all posts written by an admin user.
# This does an implicit INNER JOIN because of the relationship.
admin_posts = session.query(Post).join(Post.author).filter(User.is_admin == True).all()
# But you can be more explicit if you want. This is identical to the above.
admin_posts_explicit = session.query(Post).join(User, Post.author_id == User.id).filter(User.is_admin == True).all()
# Here's a classic: Find users who have never written a post.
# This uses an OUTER JOIN and filters for NULL.
lurkers = session.query(User).outerjoin(Post).filter(Post.id == None).all()
The Pitfall: The N+1 Query Problem. Imagine you get your admin_posts list, and then you loop through them and print post.author.name for each one. You just made 1 query to get the posts, and then one new query for each post to get the author. For 100 posts, that’s 101 queries. Yikes.
The Solution: Use joinedload() to eagerly load the relationship in the first query.
from sqlalchemy.orm import joinedload
# One query with a JOIN that gets all posts and their authors in one go.
admin_posts = session.query(Post).options(joinedload(Post.author)).join(Post.author).filter(User.is_admin == True).all()
for post in admin_posts:
print(post.author.name) # No additional query happens here!
Ordering, Limiting, and Other Niceties
This is the straightforward stuff, but you still need to know it.
# Get the 5 most recent active users
recent_users = session.query(User).filter(
User.is_active == True
).order_by(
User.created_at.desc()
).limit(5).all()
# Get the first user, alphabetically by name. Note .first() instead of .all()
first_user = session.query(User).order_by(User.name.asc()).first()
# Pagination? Use .offset()
page_3_users = session.query(User).order_by(User.id).limit(10).offset(20).all() # Gets results 21-30
A Critical Distinction: .first() is useful, but understand what it does. It adds LIMIT 1 to the query and returns the instance. .one() is more strict: it expects exactly one result and will throw a NoResultFound exception if there are none, or a MultipleResultsFound exception if there’s more than one. Use .one() when a single, specific result is mandatory.
The Moment of Truth: When to Bail on the ORM
Look, the ORM is fantastic for probably 95% of your queries. But sometimes, for reporting, or complex analytics, you just need to run raw SQL. And that’s okay. SQLAlchemy doesn’t judge you. You can always drop down to text.
# A complex reporting query that's just easier in SQL
complex_report_sql = """
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON p.author_id = u.id
WHERE u.created_at > :some_date
GROUP BY u.id
HAVING COUNT(p.id) > 5
ORDER BY post_count DESC;
"""
result = session.execute(text(complex_report_sql), {'some_date': '2023-01-01'})
for row in result:
print(f"{row.name}: {row.post_count}")
The point is, you have the entire toolbox. Start with the clean, safe, Pythonic ORM queries. When you hit a wall, you’ve got a perfectly good escape hatch. Now go query something.