Ace Your Interviews 🎯
Browse our collection of interview questions across various technologies.
What is SQL and what is it used for?
SQL (Structured Query Language) is the standard language for interacting with relational databases. It is used to create database structures (DDL), insert and modify data (DML), retrieve data (SELECT queries), and control access (DCL). It works with any relational database — MySQL, PostgreSQL, Oracle, SQL Server — with minor syntax differences.
What is the difference between DDL, DML, and DCL?
DDL (Data Definition Language): defines structure — CREATE TABLE, ALTER TABLE, DROP TABLE. DML (Data Manipulation Language): manipulates data — SELECT, INSERT, UPDATE, DELETE. DCL (Data Control Language): controls access — GRANT, REVOKE. Some also define TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT.
What is a PRIMARY KEY?
A column (or combination of columns) that uniquely identifies each row in a table. Rules: must be unique across all rows, cannot be NULL, one per table. Automatically creates an index. Usually implemented with AUTO_INCREMENT (MySQL) or SERIAL/IDENTITY (PostgreSQL) for integer IDs.
What is the difference between WHERE and HAVING?
WHERE filters individual rows BEFORE grouping and aggregation. It cannot reference aggregate functions. HAVING filters groups AFTER GROUP BY and aggregation — it can reference COUNT(), SUM(), AVG(), etc. Use WHERE to filter rows; use HAVING to filter the results of aggregation.
What is the difference between DELETE, TRUNCATE, and DROP?
DELETE: removes specific rows (with WHERE) or all rows, can be rolled back, logged row-by-row, does not reset AUTO_INCREMENT. TRUNCATE: removes all rows, faster than DELETE, resets AUTO_INCREMENT, cannot be rolled back in most databases. DROP: removes the entire table structure and all data — the table no longer exists.
What is a FOREIGN KEY?
A column that references the PRIMARY KEY of another table, enforcing referential integrity. A foreign key prevents inserting a value that doesn't exist in the parent table, and (optionally) cascades DELETE or UPDATE operations to child records. Example: orders.user_id is a foreign key referencing users.id.
What is the difference between CHAR and VARCHAR?
CHAR(n) is fixed-length — always stores exactly n characters, padded with spaces. VARCHAR(n) is variable-length — stores only as many characters as needed (up to n), more space-efficient for varying content. Use CHAR for fixed-length data (phone numbers, country codes, status codes). Use VARCHAR for names, emails, descriptions.
What does NULL mean in SQL and how do you handle it?
NULL means 'unknown' or 'no value' — not zero, not empty string. NULL compared with any value using = or != always returns NULL (not TRUE or FALSE). Check for NULL with IS NULL or IS NOT NULL. Use COALESCE(column, default) to substitute a value when NULL. COUNT(column) excludes NULLs; COUNT(*) includes all rows.
What is the difference between UNION and UNION ALL?
UNION combines results of two SELECT statements and removes duplicate rows (requires an extra sort/hash operation). UNION ALL combines results and keeps all rows including duplicates — faster. Use UNION when you need distinct results. Use UNION ALL when you know there are no duplicates or you intentionally want all rows (much better performance).
What are aggregate functions in SQL? Name five.
Functions that compute a single result from multiple rows: COUNT() — counts rows or non-NULL values. SUM() — adds up numeric values. AVG() — computes average. MIN() — finds minimum. MAX() — finds maximum. They are used with GROUP BY to compute values per group. They ignore NULL values (except COUNT(*)).
What is the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN?
INNER JOIN: returns only rows where the join condition matches in BOTH tables — excludes unmatched rows. LEFT JOIN: returns ALL rows from the left table, and matched rows from the right — unmatched right rows appear as NULL. RIGHT JOIN: returns ALL rows from the right table — rarely used, almost always rewritten as LEFT JOIN with tables swapped.
What is a subquery and when would you use a CTE instead?
A subquery is a query nested inside another query. CTEs (WITH clause) are named temporary result sets defined before the main query. Use CTE when: the subquery is reused multiple times in the same query, the query has multiple logical steps that benefit from names, or you need recursion. CTEs are more readable; subqueries are fine for simple one-off filtering.
Explain what an index is and how it works.
An index is a separate data structure (usually a B-tree) that stores column values in sorted order with pointers to the actual rows. When you query WHERE email = 'x@y.com', without an index the DB scans every row. With an index on email, it does a binary search — finding the row in O(log n) instead of O(n). Trade-off: indexes speed up reads but slow down writes (every INSERT/UPDATE/DELETE must update the index).
What is the difference between a clustered and non-clustered index?
Clustered index: the table rows are physically stored in the order of the index — there can be only one per table. In MySQL InnoDB, the PRIMARY KEY is always the clustered index. Non-clustered index: a separate structure with index values pointing to row locations — you can have multiple. Non-clustered index lookups require two steps: find in index, then fetch from table (unless it's a covering index).
What is a transaction and what are ACID properties?
A transaction is a sequence of SQL statements treated as a single unit. ACID: Atomicity — all operations succeed or all fail (no partial commits). Consistency — database moves from one valid state to another. Isolation — concurrent transactions don't interfere with each other. Durability — committed transactions survive system crashes (written to disk). Use BEGIN/COMMIT/ROLLBACK to manage transactions explicitly.
What is database normalization? Explain 1NF, 2NF, and 3NF.
Normalization eliminates redundancy and ensures data integrity. 1NF: no repeating groups, each cell has one atomic value, each row is unique. 2NF: in 1NF + no partial dependency (every non-key column depends on the WHOLE primary key — relevant for composite keys). 3NF: in 2NF + no transitive dependency (non-key columns depend only on the primary key, not on other non-key columns).
What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?
All three assign numbers to rows within a window. ROW_NUMBER(): always unique (1,2,3,4) — no ties. RANK(): tied rows get the same rank, next rank skips (1,2,2,4 — skips 3). DENSE_RANK(): tied rows get the same rank, no skipping (1,2,2,3). Use ROW_NUMBER() when you need uniqueness. Use RANK()/DENSE_RANK() when ties are meaningful.
What is the N+1 query problem in the context of SQL?
The N+1 problem occurs when you run 1 query to fetch N records, then run N additional queries to fetch related data for each — N+1 total queries. Example: fetch 100 orders, then for each order fetch the user (101 total queries). Solution: one query with JOIN fetches orders and users together. In ORM context, this is solved with eager loading or fetch joins.
What are database views and when would you use them?
A view is a stored SELECT query that acts like a virtual table — you query it with SELECT but it always runs the underlying query. Uses: simplify complex queries (create a view for a 5-table JOIN used across many reports), security (expose only specific columns to users), backward compatibility (rename tables while keeping old view). Views don't store data (unless materialized). Performance is the same as the underlying query.
Explain the difference between optimistic and pessimistic locking in databases.
Pessimistic locking: lock the row when you read it (SELECT ... FOR UPDATE) — no one else can modify it until your transaction commits. Safe for high-conflict scenarios. Pessimistic. Optimistic locking: no lock on read — check at update time that no one has modified the row since you read it (using a version column or timestamp). Rollback and retry if conflict detected. Better throughput for low-conflict scenarios.
What is the query execution order in SQL?
SQL clauses execute in this logical order (not the written order): FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT. This explains why you can't use a SELECT alias in WHERE (alias defined in SELECT, evaluated after WHERE), but you can use it in ORDER BY (evaluated after SELECT). HAVING can use aggregate functions because it runs after GROUP BY.
What are window functions and how do they differ from GROUP BY?
Window functions (OVER clause) perform calculations across a set of rows related to the current row, without collapsing rows. GROUP BY collapses many rows into one summary row per group. With GROUP BY you lose individual rows. With window functions, you keep all rows and add computed columns (running total, rank, previous row value). Window functions are strictly more expressive than GROUP BY for analytics.
How do you find the second highest salary without using LIMIT?
SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees). Or with DENSE_RANK: WITH ranked AS (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees) SELECT salary FROM ranked WHERE rnk = 2. The DENSE_RANK approach generalizes to Nth highest without changing the logic.
What is a recursive CTE and when do you use it?
A recursive CTE calls itself — the WITH RECURSIVE syntax. It has a base case (starting rows) and a recursive case (rows derived from previous iteration). Use for: hierarchical data (organization charts, category trees, folder structures), generating sequences of numbers or dates, graph traversal (finding connected nodes). They run until the recursive case returns no new rows.
Explain transaction isolation levels.
Isolation levels control what a transaction can see from concurrent transactions. READ UNCOMMITTED: can see uncommitted changes (dirty read) — almost never used. READ COMMITTED: can only see committed changes, but may see different values in two reads of the same row (non-repeatable read). REPEATABLE READ (MySQL InnoDB default): same row returns same value within a transaction. SERIALIZABLE: full isolation — transactions appear sequential. Higher isolation = fewer concurrency issues, more locking overhead.
How would you optimize a query that's taking 30 seconds on a 10-million-row table?
Step 1: EXPLAIN the query — identify full table scans (type: ALL) and missing indexes. Step 2: Add indexes on WHERE, JOIN ON, and ORDER BY columns. Step 3: Check if functions on indexed columns are disabling the index (YEAR(date) vs date BETWEEN). Step 4: Check for SELECT * and reduce columns. Step 5: See if GROUP BY can be replaced with a more targeted query. Step 6: Consider query decomposition — break into smaller queries with intermediate temp tables for very complex cases.
What is the difference between a star schema and a snowflake schema in data warehousing?
Star schema: one central fact table connected directly to multiple dimension tables — simple, fast for analytics queries (fewer JOINs). Snowflake schema: dimension tables are further normalized into sub-dimensions (e.g., Date → Month → Quarter table chain) — saves storage, more complex queries with more JOINs. Star schema is preferred for analytics performance. Snowflake for strict normalization.
How do you pivot rows into columns in SQL?
Using conditional aggregation with CASE: SELECT user_id, SUM(CASE WHEN month = 1 THEN revenue ELSE 0 END) AS jan, SUM(CASE WHEN month = 2 THEN revenue ELSE 0 END) AS feb FROM sales GROUP BY user_id. In databases that support it (SQL Server, Oracle), use the PIVOT keyword. In PostgreSQL, use FILTER: SUM(revenue) FILTER (WHERE month = 1) AS jan.
What is the difference between EXISTS and IN for subqueries? Which is more performant?
IN: evaluates the subquery once, stores results in memory, checks each outer row against the set — better for small subquery result sets. EXISTS: evaluates the subquery for each outer row, stops as soon as one match is found (short-circuit) — better for large subquery result sets or when you only care about existence, not the value. NOT EXISTS is generally faster than NOT IN because NOT IN behaves unexpectedly with NULLs (if subquery has any NULL, NOT IN returns no rows).
How do you find and handle duplicate rows in a table?
Find duplicates: SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1. Identify which to delete: SELECT id, email, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn FROM users. Delete duplicates keeping the earliest: DELETE FROM users WHERE id IN (SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn FROM users) t WHERE rn > 1).