MongoDB Aggregation Pipelines: The Power You Might Not Be Using
Most developers use MongoDB as a simple document store: insert documents, query by ID or simple filters, maybe add an index or two. But MongoDB’s aggregation framework is where the real power lives. If you’re not using aggregation pipelines, you’re leaving performance and capability on the table.
What Are Aggregation Pipelines?
An aggregation pipeline is a sequence of stages that process documents. Each stage transforms the documents as they pass through. Think of it like Unix pipes for your data—each stage takes input, does something with it, and passes the result to the next stage.
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$customerId", totalSpent: { $sum: "$amount" } } },
{ $sort: { totalSpent: -1 } },
{ $limit: 10 }
])
This pipeline finds your top 10 customers by spending in four simple stages. Try doing that efficiently with basic queries.
Why Aggregation Beats Application-Side Processing
1. Data Stays in the Database
Moving data from MongoDB to your application, processing it, and potentially writing it back is expensive. Network latency, serialization overhead, and memory usage all add up. Aggregation pipelines process data where it lives.
2. Indexes Work Throughout the Pipeline
MongoDB can use indexes in aggregation pipelines, particularly in early stages like $match and $sort. A well-designed pipeline with proper indexes can be remarkably fast.
3. Memory-Efficient Processing
For large datasets, pipelines can spill to disk when needed. Your application server’s memory is finite and expensive. Let the database handle the heavy lifting.
Stages Every Developer Should Know
$match: Filter Early, Filter Often
Always put $match stages as early as possible. This reduces the documents flowing through subsequent stages.
// Good: Filter first
{ $match: { createdAt: { $gte: lastMonth } } },
{ $group: { ... } }
// Bad: Group everything, then filter
{ $group: { ... } },
{ $match: { total: { $gt: 1000 } } }
$lookup: Join Documents Across Collections
Yes, MongoDB can do joins. The $lookup stage performs a left outer join to another collection.
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customer"
}
},
{ $unwind: "$customer" }
])
This fetches the customer document for each order. Use it judiciously—it’s not as optimized as a relational join, but it’s there when you need it.
$facet: Multiple Pipelines in One Query
Need to run several aggregations on the same data? $facet lets you run multiple pipelines in parallel.
db.products.aggregate([
{
$facet: {
byCategory: [
{ $group: { _id: "$category", count: { $sum: 1 } } }
],
priceStats: [
{ $group: {
_id: null,
avg: { $avg: "$price" },
min: { $min: "$price" },
max: { $max: "$price" }
}}
],
topRated: [
{ $sort: { rating: -1 } },
{ $limit: 5 }
]
}
}
])
One query, three different analyses. Perfect for dashboard data.
$bucket: Automatic Histogram Creation
Group documents into buckets based on a field value. Great for analytics and reporting.
db.orders.aggregate([
{
$bucket: {
groupBy: "$amount",
boundaries: [0, 50, 100, 250, 500, 1000, Infinity],
default: "Other",
output: {
count: { $sum: 1 },
avgAmount: { $avg: "$amount" }
}
}
}
])
This creates order amount ranges automatically—no manual binning required.
Advanced Patterns
Rolling Averages with $setWindowFields
MongoDB 5.0 introduced window functions. Calculate running totals, moving averages, and rankings directly in your queries.
db.sales.aggregate([
{
$setWindowFields: {
partitionBy: "$region",
sortBy: { date: 1 },
output: {
movingAvg: {
$avg: "$amount",
window: { documents: [-6, 0] }
},
runningTotal: {
$sum: "$amount",
window: { documents: ["unbounded", "current"] }
}
}
}
}
])
Seven-day moving average and running total, partitioned by region. This used to require application code or a separate analytics database.
Recursive Lookups with $graphLookup
Navigate hierarchical or graph data structures within MongoDB.
db.employees.aggregate([
{ $match: { name: "CEO" } },
{
$graphLookup: {
from: "employees",
startWith: "$_id",
connectFromField: "_id",
connectToField: "reportsTo",
as: "allReports",
maxDepth: 10
}
}
])
Find all employees in the reporting chain, recursively. Organizational charts, category trees, social graphs—all queryable.
Performance Tips
1. Explain Your Pipelines
Use explain() to understand how MongoDB executes your pipeline:
db.orders.explain("executionStats").aggregate([...])
Look for COLLSCAN (bad) vs IXSCAN (good) in the early stages.
2. Project Early
If you only need specific fields, use $project early to reduce document size through the pipeline.
{ $project: { customerId: 1, amount: 1, date: 1 } }
3. Use $merge for Materialized Views
For expensive aggregations that don’t need real-time data, write results to a collection:
db.orders.aggregate([
// ... complex pipeline ...
{ $merge: { into: "dailySummary", whenMatched: "replace" } }
])
Run this on a schedule, query the summary collection for fast reads.
When Not to Use Aggregation
Aggregation pipelines aren’t always the answer:
- Simple queries: Don’t overcomplicate basic find operations
- Real-time user-facing queries: Complex pipelines can have unpredictable latency
- Transactions: Aggregations run outside multi-document transactions
- When you need the full document: If you’re just filtering and need complete documents,
find()is simpler
Conclusion
MongoDB’s aggregation framework transforms it from a simple document store into a powerful data processing engine. The learning curve is worth it—you’ll write less application code, reduce data transfer, and often see significant performance improvements.
Start small. Take one piece of data processing logic from your application and move it to an aggregation pipeline. Measure the difference. Then do it again.
Further Reading
- MongoDB Aggregation Pipeline Documentation — official reference
- Aggregation Pipeline Operators — complete operator list
- MongoDB University — free courses including aggregation
- MongoDB Compass — GUI tool with aggregation pipeline builder
- Practical MongoDB Aggregations — free online book
Building applications with MongoDB? Get in touch to discuss how we can help optimize your data layer.
Back to Blog