Ace Your Interviews 🎯
Browse our collection of interview questions across various technologies.
What is MongoDB and how does it differ from a relational database like PostgreSQL?
MongoDB is a document-oriented NoSQL database that stores data as flexible JSON-like documents in collections. Unlike PostgreSQL (which uses tables with fixed columns and rows), MongoDB documents in the same collection can have different fields. MongoDB doesn't require schema definition before inserting data and doesn't support JOINs natively — instead, related data is either embedded in the same document or referenced by ObjectId. MongoDB is better for variable-schema data and hierarchical data read as a unit; PostgreSQL is better for complex relationships, ACID transactions across many tables, and financial data.
What is a MongoDB document and what is BSON?
A MongoDB document is a JSON-like data structure with key-value pairs, where values can be strings, numbers, booleans, arrays, nested objects, or special BSON types. BSON (Binary JSON) is the binary serialization format MongoDB uses internally — it extends JSON with additional types: ObjectId (12-byte unique ID), Date (UTC datetime), Decimal128 (high-precision decimals for financial data), Binary (for file data), and Regular Expression. BSON is more space-efficient than JSON and supports types that JSON doesn't have natively.
What is Mongoose and why do developers use it with MongoDB?
Mongoose is an ODM (Object Document Mapper) for Node.js that provides schema definition, validation, type casting, middleware (pre/post hooks), virtuals, and query helper methods on top of the MongoDB Node.js driver. Developers use Mongoose because MongoDB itself doesn't enforce schema — Mongoose adds the structure that makes large Node.js codebases maintainable: required fields are checked before saving, enum values are validated, related documents can be populated automatically, and passwords can be hashed in pre-save hooks. It also provides a cleaner Promise-based API than raw driver operations.
What is an ObjectId in MongoDB?
An ObjectId is MongoDB's default primary key type — a 12-byte BSON value that is globally unique. It encodes: 4 bytes of timestamp (seconds since epoch), 5 bytes of random value (process ID + random bytes), and 3 bytes of incrementing counter. This structure means ObjectIds are roughly sortable by insertion time (the first 4 bytes are the timestamp). In Mongoose, every document automatically gets an _id field of type ObjectId unless you specify otherwise. When filtering by ID in Mongoose, use findById(id) or findOne({ _id: id }) — Mongoose automatically handles the string-to-ObjectId conversion.
What is the difference between find() and findOne()?
find() returns a cursor that can iterate all matching documents — use it when you expect multiple results. It returns an array (after .exec() or awaiting) of all documents matching the filter. findOne() returns the first matching document (or null if none found) — use it when you expect zero or one result (e.g., finding a user by email, finding a product by slug). findById(id) is a convenience method equivalent to findOne({ _id: id }) and is the fastest way to fetch a specific document by its primary key.
What is the embedding vs referencing decision in MongoDB schema design?
Embedding means storing related data inside the same document (e.g., product images stored as an array within the product document). Referencing means storing related data in a separate collection and storing an ObjectId reference (e.g., reviews stored in a separate 'reviews' collection with a productId field). Embed when: data is always read together, the relationship is 1:few (bounded array size), and data doesn't need to be accessed independently. Reference when: the sub-collection can grow without bound (like reviews), the data is accessed independently by different parts of the app, or the relationship is many-to-many.
What are update operators in MongoDB? Name the most common ones.
$set: sets the value of a field (or adds it if it doesn't exist). $unset: removes a field from a document. $inc: atomically increments a numeric field by a specified amount. $push: adds a value to an array. $pull: removes elements from an array matching a condition. $addToSet: adds a value to an array only if it doesn't already exist (like a set). $pop: removes the first or last element from an array. The key insight: always use these operators for updates — never fetch a document, modify it, and save it back (race condition). Atomic operators are applied by MongoDB server-side without a read.
What is the aggregation pipeline in MongoDB?
The aggregation pipeline is MongoDB's framework for data processing and analytics. It's an array of stages where each stage transforms a stream of documents. Common stages: $match (filter documents like SQL WHERE), $group (aggregate by a key like SQL GROUP BY), $project (reshape documents — include/exclude/compute fields), $sort, $limit, $skip, $unwind (deconstruct arrays), $lookup (join with another collection), and $addFields (add computed fields). The pipeline is the correct tool for analytics, reports, dashboards, and any query more complex than a simple filter.
What is a MongoDB index and why is it important?
An index is a data structure (B-tree) that MongoDB maintains on a field to enable fast document lookup by that field's value. Without an index, MongoDB performs a COLLSCAN (collection scan) — reading every document in the collection to find matches. With an index, MongoDB uses the B-tree to find matching documents in O(log n) time. For a collection with 1 million documents, an indexed query might read 100 documents; an unindexed query reads all 1 million. Missing indexes are the most common cause of slow MongoDB queries in production.
What does the .lean() method do in Mongoose and when should you use it?
lean() makes Mongoose return plain JavaScript objects instead of full Mongoose Document objects. Without lean(), query results are Mongoose Documents — they have all Mongoose methods (save(), toJSON(), etc.), getters, setters, and event emitters attached. This overhead adds memory usage and slows down serialization. lean() returns plain objects — faster and smaller. Use lean() for any read-only query where you won't call save(), don't need instance methods, and don't need virtuals. For a GET endpoint returning a list of products, always use lean() — you're just reading and returning data, never modifying the documents.
Explain the $elemMatch operator and when it's necessary.
$elemMatch ensures that multiple conditions apply to the same element of an array. Without it: { 'variants.color': 'Black', 'variants.stock': { $gt: 0 } } matches documents where ANY variant has color='Black' AND ANY variant has stock>0 — they could be different variants. With $elemMatch: { variants: { $elemMatch: { color: 'Black', stock: { $gt: 0 } } } } — matches only documents where a SINGLE variant has BOTH color='Black' AND stock>0. Use $elemMatch whenever you need to match multiple conditions on the same element of an embedded document array.
What is the correct compound index field order using the ESR rule?
ESR stands for Equality, Sort, Range — the correct order for compound index fields: Equality fields first (fields compared with exact values: category: 'smartphones'), Sort fields next (fields used in .sort(): price: 1), Range fields last (fields compared with $gt, $lt, $gte, $lte, $in). Example: query { category: 'smartphones', isActive: true, price: { $gte: 10000 } }.sort({ price: 1 }) needs index { category: 1, isActive: 1, price: 1 } — category and isActive are equality fields first, price is both sort and range field last. Wrong field order creates indexes that are partially useful or not useful at all.
What is the purpose of explain('executionStats') and what do you look for?
explain('executionStats') reveals how MongoDB executed a query — which execution plan it chose and the actual performance metrics. Key things to check:
winningPlan.stage — IXSCAN means an index was used (good), COLLSCAN means full collection scan (bad for large collections).
totalDocsExamined vs nReturned — should be close to equal (high ratio means poor index selectivity).
executionTimeMillis — actual execution time. A healthy query has IXSCAN, totalDocsExamined ≈ nReturned, and fast execution. A problematic query has COLLSCAN, totalDocsExamined >> nReturned, and slow execution.
How do MongoDB transactions work and when should you use them?
MongoDB transactions allow atomic operations across multiple documents and collections. They require a replica set. Usage pattern: const session = await mongoose.startSession(); session.startTransaction(); then pass { session } to all operations inside the transaction; finally await session.commitTransaction() on success or session.abortTransaction() on failure. Use transactions when: creating an order AND decrementing stock (both must succeed or both fail), transferring funds between two account documents, creating a post AND updating a counter on the user document. Don't use transactions for single-document operations — those are atomic by default.
What is the Computed Pattern in MongoDB and when would you use it?
The Computed Pattern pre-calculates expensive aggregated values and stores them directly in the document, updating them incrementally when source data changes. Example: storing ratings.average and ratings.count on a product document. Without the pattern: every product page request triggers an aggregation across potentially thousands of reviews. With the pattern: product page reads the pre-computed average instantly; only when a review is added does an aggregation run to update the stored value. Use when: a value is read far more often than it changes (read/write ratio > 10:1), and the aggregation to compute it is expensive.
What is the $facet stage in the aggregation pipeline and what problem does it solve?
$facet runs multiple aggregation pipelines in parallel on the same input documents, returning all results in a single response. It solves the problem of search pages that need both paginated results AND filter sidebar counts (by category, brand, price range) — without $facet, this requires 5 separate database queries. With $facet, one aggregation returns { products: [...], byCategory: [...], byBrand: [...], priceRanges: [...], totalCount: [...] }. The performance advantage: MongoDB processes the input documents once and distributes them to each sub-pipeline — more efficient than 5 separate queries.
How would you design a MongoDB schema for a food delivery app like Swiggy?
Collections: restaurants (embedded menu structure: categories → items → customizations, geolocation as GeoJSON Point for $nearSphere queries, ratings aggregate), users (embedded addresses array, max 5), orders (embedded snapshot of ordered items with prices at order time, embedded address, embedded payment info, status field for lifecycle, reference to restaurant and user), delivery_partners (current location as GeoJSON with TTL index, availability status). Reviews in separate collection (productId/restaurantId reference). Key indexes: restaurants(location: '2dsphere'), orders(userId: 1, createdAt: -1), orders(restaurantId: 1, status: 1), delivery_partners(location: '2dsphere') for nearby driver queries.
What are MongoDB change streams and what are they used for?
Change streams allow applications to subscribe to real-time notifications when documents in a collection change (inserts, updates, deletes). They're built on MongoDB's oplog (operation log). Used for: real-time order status updates (watch orders collection → push via Socket.io to customer), inventory alerts (watch products → notify when stock hits 0), audit logging (watch sensitive collections → write changes to audit log), and event-driven microservices (watch for events → publish to Kafka). Change streams require a replica set. Always store the resume token (change._id) to restart from the last position if the stream is interrupted.
How do TTL indexes work in MongoDB?
A TTL (Time-To-Live) index on a Date field automatically deletes documents when the current time exceeds the document's date value plus the configured expiry seconds. db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 }) deletes session documents 24 hours after their createdAt time. MongoDB's TTL background process runs every 60 seconds to clean up expired documents — deletion is not instant but happens within a minute of expiration. Use cases: user sessions, OTP verification codes (expireAfterSeconds: 600 for 10-minute codes), password reset tokens, temporary cache entries, rate limit tracking documents.
What is the difference between $lookup with a pipeline and simple $lookup?
Simple $lookup matches on equality of two fields (like a SQL JOIN). Pipeline $lookup (MongoDB 3.6+) allows running a full aggregation pipeline on the joined collection — enabling range conditions, filtering, projection, and nested lookups on the joined data. Simple: { from: 'users', localField: 'userId', foreignField: '_id', as: 'user' } — joins every matching user. Pipeline: adds pipeline: [{ $match: { isActive: true } }, { $project: { name: 1 } }] — only returns active users with just the name field. Pipeline $lookup is significantly more flexible and can reduce the amount of joined data returned.
How do you implement cursor-based pagination in MongoDB and why is it better than skip() pagination?
Skip-based pagination: db.products.find().skip
.limit
— MongoDB must scan and skip 10,000 documents before returning 20. For deep pages (page 500 of 1,000), this becomes catastrophically slow. Cursor-based: use the last document's _id (or a sort field value) as the cursor. Next page: db.products.find({ _id: { $gt: lastId } }).sort({ _id: 1 }).limit
— uses the index to jump directly to the cursor position. O(log n) regardless of page depth. Trade-off: cursor-based pagination doesn't support jumping to an arbitrary page number, only next/previous navigation.
Design the MongoDB schema for a real-time food delivery platform like Zepto serving 5 million orders per day.
Restaurants: { location: GeoJSON 2dsphere, menu: embedded (categories.items.customizations), ratings: {avg, count} — computed pattern, operationalStatus: 'open'|'closed'|'busy' }. Products/Menu items: embedded in restaurant to avoid $lookup on every order screen. Orders: { userId ref, restaurantId ref, driverId ref, items: snapshot array (name+price+qty embedded), address: embedded, payment: embedded, status: enum with change stream for real-time updates, totalAmount, estimatedDelivery }. OrderEvents (for analytics): { orderId, eventType, timestamp, metadata } — separate collection, TTL 90 days. Drivers: { location: GeoJSON, isAvailable, lastLocationAt (TTL 5min on driver location) }. Indexes: restaurants(location '2dsphere'), orders(userId,createdAt-1), orders(driverId,status), orders(restaurantId,createdAt-1), drivers(location '2dsphere', isAvailable=true partial index).
How would you optimize a MongoDB aggregation pipeline that is running slowly on a collection with 50 million documents?
Ensure $match is first and uses an index — run db.orders.explain('executionStats') on just the $match stage to verify IXSCAN.
Reduce documents early — more selective $match conditions earlier in the pipeline.
If $lookup is present, check if it can be replaced with embedding or extended reference pattern.
If $unwind is present, can it be combined with $lookup using a pipeline to filter before unwinding?
For $group with large result sets, use allowDiskUse: true.
Consider pre-aggregating results with the Computed Pattern — if this aggregation runs frequently, store the result.
Check if Atlas Performance Advisor recommends a different index.
For reporting that runs infrequently, move it to a secondary read preference to avoid impacting the primary.
How do you handle schema evolution in MongoDB as application requirements change?
MongoDB doesn't enforce schema, so you have several strategies:
Lazy migration: new code handles both old and new document shapes. Add new field, write new documents with it, use $ifNull in queries to handle documents missing the field. Old documents get updated when they're next modified.
Background migration: a script that finds documents missing the new field and updates them in batches with a delay between batches to avoid impacting production.
Versioned schema: add a schemaVersion field, write code that handles each version.
Dual write: write to both old and new field during transition. The choice depends on urgency, document count, and whether the application can tolerate mixed document shapes during migration.
What is the MongoDB Aggregation Pipeline's $setWindowFields stage and how does it replace application-level calculation?
$setWindowFields (MongoDB 5.0+) adds window function capabilities to the aggregation pipeline — similar to SQL window functions. It computes a value for each document based on a window (group) of documents. Uses: $sum over a window for running totals (cumulative revenue to date), $avg for moving averages (7-day rolling average), $rank for ranking documents within a group, $shift for accessing previous/next document values (month-over-month growth calculation). Before $setWindowFields: sort in MongoDB → return to application → calculate running total in JavaScript → return to client. With $setWindowFields: entire calculation in one aggregation pipeline, no application code needed.
Explain MongoDB Atlas Search and when you would use it instead of the $text operator.
Atlas Search is built on Apache Lucene (same engine as Elasticsearch) integrated into MongoDB Atlas. Advantages over $text:
Typo tolerance (fuzzy search): 'onplus' finds 'OnePlus' with maxEdits:1.
Autocomplete: returns results as user types.
Compound queries: combine text search with numeric range filters, boolean conditions, and geo queries in one operation.
Relevance scoring: sophisticated BM25 scoring with field weighting.
Multiple languages: 70+ language analyzers. Use Atlas Search when: building a real product search with typo tolerance, implementing autocomplete, needing faceted search with text matching, or needing relevance-ranked results. Use $text for simple full-text search in development or when Atlas isn't available.
How would you implement a multi-tenant MongoDB architecture?
Three approaches:
Database-per-tenant: each tenant has their own MongoDB database. Complete data isolation, separate backups, can scale tenant independently. Operational overhead increases with tenant count. Best for regulated industries.
Collection-per-tenant: one database, separate collection per tenant (tenant_abc_products). Good isolation, manageable up to ~1,000 tenants.
Shared collection with tenantId: all tenants share collections, every document has tenantId field, every index includes tenantId. Most resource-efficient. Risk: bugs that omit tenantId filter expose cross-tenant data. Mitigation: Mongoose middleware that automatically injects tenantId into every query. Index strategy for option 3: all indexes must be compound with tenantId as the first field — { tenantId: 1, ...otherFields } — without tenantId first, queries touch documents from all tenants.
How do you prevent N+1 queries in a MongoDB/Mongoose application?
N+1 occurs when fetching N documents triggers N additional queries for related data. Solution:
populate() in Mongoose: Order.find().populate('userId', 'name email') fetches all referenced users in ONE additional query using $in, not N individual queries.
$lookup in aggregation pipeline: join in the database, return pre-assembled result.
Extended Reference Pattern: embed the frequently-needed fields (user name, seller name) in the document — zero additional queries.
Batch loading with DataLoader (for GraphQL): collect all IDs from a batch of requests, make one query for all. Identify N+1: query logging shows many identical queries with different _id values — the signature of N+1.
What are the trade-offs between using MongoDB's native transactions vs the application-level atomicity provided by atomic operators?
Atomic operators ($set, $inc, $push, $pull) are atomic at the single-document level — MongoDB applies them server-side without a read-modify-write cycle. They have zero transaction overhead and work in standalone MongoDB. Transactions span multiple documents/collections but require a replica set, have two network round trips (begin+commit) overhead, hold locks on modified documents during the transaction (causing contention under high write load), and timeout after 60 seconds. Use atomic operators when possible (single document changes, counter increments, array modifications). Use transactions only for genuinely multi-document atomicity requirements (order + stock decrement, fund transfer between accounts). Overusing transactions creates unnecessary contention and latency.
How do you monitor and debug production MongoDB performance issues on Atlas?
Atlas Performance Advisor: automatically identifies slow queries and recommends indexes based on your actual query patterns. Click 'Create Index' to apply without downtime.
Real-Time Performance Panel: shows current ops/sec, query targeting ratio (documents scanned:returned — should be close to 1), and cache hit ratio (target >80%).
Profiler: enable for queries >100ms in staging to capture slow query logs.
Query Profiler: in Atlas UI, shows a flame chart of recent slow queries with visual execution plan.
Metrics tab: CPU%, Connections, Opcounters — spike in these correlates with application-level issues. Process: identify slow query from metrics → find in Profiler → run explain('executionStats') → identify COLLSCAN or high keysExamined:nReturned ratio → add appropriate index.