Database Management Systems

Unit 3: Relational Operations

JOINs, Aggregates, Subqueries, Views, Set Operators & Relational Algebra — the SQL skills that separate beginners from professionals.

🏢 Oracle & PostgreSQL | 📝 15 MCQs (Bloom's) | 🔬 5 Lab Exercises | 💼 Interview Prep

Section 1

Why This Chapter Pays Your Salary

JOINs and subqueries are the #1 topic in SQL interviews. Every data analyst, backend developer, and DBA writes multi-table queries daily. If Unit 2 taught you to build one table, this unit teaches you to connect the entire database into meaningful reports. A Flipkart analyst writing a "Top 10 products by revenue per category per month" report uses JOINs, GROUP BY, HAVING, window functions, and CTEs — all in one query.

🏢 Industry Snapshot

Flipkart — Their sales analytics dashboard uses 50+ table JOINs, GROUP BY with ROLLUP for hierarchical subtotals, and materialized views refreshed every 15 minutes. A single report query connects: products → orders → order_items → sellers → categories → warehouses.

PhonePe — Their UPI transaction summary uses aggregate functions across billions of rows. Monthly reports to NPCI require: total volume, average transaction value, success rate — all computed using GROUP BY, COUNT, AVG, and HAVING on partitioned tables.

IRCTC — Waitlist processing uses correlated subqueries: "For each cancelled ticket, find the first waitlisted passenger on that train on that date." This runs thousands of times per minute during Tatkal booking.

🇮🇳 Flipkart🇮🇳 PhonePe🇮🇳 IRCTC🇮🇳 Swiggy🇮🇳 Razorpay🇮🇳 Zomato

📦 Schema Setup — E-Commerce System (Flipkart-style)

All examples in this chapter use an e-commerce domain — familiar and interview-relevant.

SQL — Schema Setup
CREATE TABLE categories (
    category_id   NUMBER(5)     PRIMARY KEY,
    category_name VARCHAR2(50)  NOT NULL UNIQUE
);
CREATE TABLE customers (
    customer_id   NUMBER(10)    PRIMARY KEY,
    customer_name VARCHAR2(100) NOT NULL,
    city          VARCHAR2(50),
    reg_date      DATE           DEFAULT SYSDATE
);
CREATE TABLE products (
    product_id    NUMBER(10)    PRIMARY KEY,
    product_name  VARCHAR2(100) NOT NULL,
    category_id   NUMBER(5)     REFERENCES categories(category_id),
    price         NUMBER(10,2)  CHECK (price > 0)
);
CREATE TABLE orders (
    order_id      NUMBER(10)    PRIMARY KEY,
    customer_id   NUMBER(10)    REFERENCES customers(customer_id),
    order_date    DATE           DEFAULT SYSDATE,
    status        VARCHAR2(15)  CHECK (status IN ('PENDING','SHIPPED','DELIVERED','CANCELLED'))
);
CREATE TABLE order_items (
    item_id       NUMBER(10)    PRIMARY KEY,
    order_id      NUMBER(10)    REFERENCES orders(order_id),
    product_id    NUMBER(10)    REFERENCES products(product_id),
    quantity      NUMBER(5)     CHECK (quantity > 0),
    unit_price    NUMBER(10,2)  NOT NULL
);

-- Sample Data
INSERT INTO categories VALUES (1,'Electronics');
INSERT INTO categories VALUES (2,'Clothing');
INSERT INTO categories VALUES (3,'Books');
INSERT INTO categories VALUES (4,'Home & Kitchen');

INSERT INTO customers VALUES (1,'Rahul Sharma','Mumbai',DATE '2023-01-15');
INSERT INTO customers VALUES (2,'Priya Patel','Pune',DATE '2023-03-20');
INSERT INTO customers VALUES (3,'Amit Joshi','Delhi',DATE '2023-06-10');
INSERT INTO customers VALUES (4,'Sneha Kulkarni','Bangalore',DATE '2024-01-05');
INSERT INTO customers VALUES (5,'Rajesh Gupta','Mumbai',DATE '2024-02-28');

INSERT INTO products VALUES (101,'iPhone 15',1,79999);
INSERT INTO products VALUES (102,'Samsung Galaxy S24',1,69999);
INSERT INTO products VALUES (103,'Levi''s Jeans',2,2499);
INSERT INTO products VALUES (104,'DBMS by Navathe',3,650);
INSERT INTO products VALUES (105,'Prestige Cooker',4,1850);
INSERT INTO products VALUES (106,'Allen Solly Shirt',2,1299);

INSERT INTO orders VALUES (1001,1,DATE '2024-11-15','DELIVERED');
INSERT INTO orders VALUES (1002,2,DATE '2024-11-20','DELIVERED');
INSERT INTO orders VALUES (1003,1,DATE '2024-12-01','SHIPPED');
INSERT INTO orders VALUES (1004,3,DATE '2024-12-10','CANCELLED');
INSERT INTO orders VALUES (1005,2,DATE '2025-01-05','PENDING');

INSERT INTO order_items VALUES (1,1001,101,1,79999);
INSERT INTO order_items VALUES (2,1001,104,2,650);
INSERT INTO order_items VALUES (3,1002,103,3,2499);
INSERT INTO order_items VALUES (4,1002,106,1,1299);
INSERT INTO order_items VALUES (5,1003,102,1,69999);
INSERT INTO order_items VALUES (6,1004,105,2,1850);
INSERT INTO order_items VALUES (7,1005,101,1,79999);
COMMIT;

Section 2

Learning Outcomes — Bloom's Taxonomy

Bloom's Level	Outcome Statement
L1 — Remember	List all JOIN types, aggregate functions, and set operators; recall the symbols for relational algebra operations
L2 — Understand	Explain when LEFT JOIN produces NULLs, why HAVING filters after GROUP BY, and how correlated subqueries differ from regular subqueries
L3 — Apply	Write multi-table JOIN queries, GROUP BY with HAVING, subqueries (single-row, multi-row, correlated), and CTEs for real business reports
L4 — Analyze	Compare JOIN vs subquery performance; analyze when EXISTS outperforms IN; determine correct join type from business requirements
L5 — Evaluate	Evaluate view design decisions (updatable vs read-only, materialized vs virtual); justify when to denormalize with materialized views
L6 — Create	Design complex report queries combining JOINs, aggregates, subqueries, CTEs, and views for a complete analytics dashboard

Section 3

Concept Explanations

3.1 Aggregate Functions & GROUP BY

SQL — Aggregate Functions
-- ═══════════════════════════════════════
-- Example 1: Basic aggregates on e-commerce data
-- Business Context: Dashboard KPIs for an e-commerce platform
-- ═══════════════════════════════════════

SELECT
    COUNT(*)                          AS total_orders,
    COUNT(DISTINCT customer_id)       AS unique_customers,
    SUM(oi.quantity * oi.unit_price)  AS total_revenue,
    AVG(oi.quantity * oi.unit_price)  AS avg_order_value,
    MIN(o.order_date)                 AS first_order,
    MAX(o.order_date)                 AS last_order
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status != 'CANCELLED';

SQL — GROUP BY & HAVING
-- ═══════════════════════════════════════
-- Example 2: Revenue per category (only categories with revenue > ₹5000)
-- Business Context: Identify top-performing product categories
-- ═══════════════════════════════════════

SELECT 
    c.category_name,
    COUNT(DISTINCT o.order_id)             AS total_orders,
    SUM(oi.quantity)                        AS units_sold,
    SUM(oi.quantity * oi.unit_price)        AS total_revenue,
    ROUND(AVG(oi.unit_price), 2)            AS avg_price
FROM categories c
JOIN products p   ON c.category_id = p.category_id
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o     ON oi.order_id = o.order_id
WHERE o.status != 'CANCELLED'       -- WHERE filters rows BEFORE grouping
GROUP BY c.category_name
HAVING SUM(oi.quantity * oi.unit_price) > 5000  -- HAVING filters groups AFTER
ORDER BY total_revenue DESC;

-- Output:
-- category_name | orders | units_sold | revenue  | avg_price
-- Electronics   | 3      | 3          | 229997   | 76666.33
-- Clothing      | 1      | 4          | 8796     | 2199.00

WHERE vs HAVING confusion. WHERE filters individual rows BEFORE grouping. HAVING filters groups AFTER GROUP BY. You cannot use aggregate functions in WHERE: WHERE SUM(price) > 1000 is ILLEGAL. Use HAVING SUM(price) > 1000 instead. Rule: if it involves an aggregate (SUM, COUNT, AVG), it goes in HAVING.

ROLLUP, CUBE & GROUPING SETS

SQL — Advanced Grouping
-- ROLLUP: Hierarchical subtotals + grand total
SELECT 
    c.category_name,
    TO_CHAR(o.order_date, 'YYYY-MM') AS order_month,
    SUM(oi.quantity * oi.unit_price)   AS revenue
FROM categories c
JOIN products p ON c.category_id = p.category_id
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
GROUP BY ROLLUP(c.category_name, TO_CHAR(o.order_date, 'YYYY-MM'));

-- Output includes: category+month rows, category subtotals (month=NULL),
-- and grand total row (both NULL). Perfect for financial reports.

-- CUBE: All possible subtotal combinations
GROUP BY CUBE(category_name, city);
-- Generates: (cat,city), (cat,NULL), (NULL,city), (NULL,NULL)

-- GROUPING SETS: Custom subset of groupings
GROUP BY GROUPING SETS((category_name), (city), ());
-- Only: by category, by city, and grand total — no cross combos

3.2 SQL JOINs — Connecting Tables

📌 JOINs — Why Data Lives in Multiple Tables

📌 WHAT IT IS

The relational model distributes data across normalized tables to avoid redundancy. JOINs reconstruct the complete picture by combining rows from two or more tables based on a related column (usually PK-FK relationship).

🌍 REAL-WORLD ANALOGY

Your Aadhaar card has basic info. Your bank statement has financial info. Your medical record has health info. A JOIN is like combining all three: "Show me Rahul's address (Aadhaar) + balance (bank) + blood group (hospital)" — one row connecting three sources.

JOIN Types — Visual Summary
Table A (orders)        Table B (customers)
┌────┬─────┐            ┌────┬──────────┐
│ o_id│c_id │            │c_id│ name     │
├────┼─────┤            ├────┼──────────┤
│1001│  1  │            │  1 │ Rahul    │
│1002│  2  │            │  2 │ Priya    │
│1003│  1  │            │  3 │ Amit     │
│1004│  3  │            │  4 │ Sneha    │← No orders
│1005│  2  │            │  5 │ Rajesh   │← No orders
└────┴─────┘            └────┴──────────┘

INNER JOIN:  Only matching rows (1001-Rahul, 1002-Priya, etc.) — 5 rows
LEFT JOIN:   All A rows + matching B (same 5 rows, all orders have customers)
RIGHT JOIN:  All B rows + matching A — includes Sneha(NULL), Rajesh(NULL) — 7 rows
FULL JOIN:   All rows from both — 7 rows (all orders + all customers)
CROSS JOIN:  Every A × every B — 5 × 5 = 25 rows (Cartesian product)

INNER JOIN

SQL
-- ═══════════════════════════════════════
-- Example 3: INNER JOIN — Orders with customer details
-- Business Context: Order listing with customer names
-- ═══════════════════════════════════════

-- SQL:1999 Standard (RECOMMENDED)
SELECT o.order_id, c.customer_name, c.city, o.order_date, o.status
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
ORDER BY o.order_date DESC;

-- Output:
-- order_id | customer_name  | city      | order_date | status
-- 1005     | Priya Patel    | Pune      | 2025-01-05 | PENDING
-- 1004     | Amit Joshi     | Delhi     | 2024-12-10 | CANCELLED
-- 1003     | Rahul Sharma   | Mumbai    | 2024-12-01 | SHIPPED
-- 1002     | Priya Patel    | Pune      | 2024-11-20 | DELIVERED
-- 1001     | Rahul Sharma   | Mumbai    | 2024-11-15 | DELIVERED

Oracle traditional JOIN syntax: SELECT * FROM orders o, customers c WHERE o.customer_id = c.customer_id. This comma-separated syntax is legacy. Use ANSI JOIN ... ON syntax — it's clearer, less error-prone (forgetting WHERE gives CROSS JOIN), and works everywhere.

LEFT OUTER JOIN

SQL
-- ═══════════════════════════════════════
-- Example 4: LEFT JOIN — All customers, even those with no orders
-- Business Context: Find customers who haven't ordered yet (marketing target)
-- ═══════════════════════════════════════

SELECT c.customer_name, c.city, o.order_id, o.status
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
ORDER BY c.customer_name;

-- Output:
-- customer_name  | city      | order_id | status
-- Amit Joshi     | Delhi     | 1004     | CANCELLED
-- Priya Patel    | Pune      | 1002     | DELIVERED
-- Priya Patel    | Pune      | 1005     | PENDING
-- Rahul Sharma   | Mumbai    | 1001     | DELIVERED
-- Rahul Sharma   | Mumbai    | 1003     | SHIPPED
-- Rajesh Gupta   | Mumbai    | NULL     | NULL      ← No orders!
-- Sneha Kulkarni | Bangalore | NULL     | NULL      ← No orders!

-- Find ONLY customers with NO orders:
SELECT c.customer_name, c.city
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;  -- The NULL-check trick

SELF JOIN & Multi-Table JOINs

SQL
-- ═══════════════════════════════════════
-- Example 5: SELF JOIN — Customers from the same city
-- Business Context: "People who live near you also bought..."
-- ═══════════════════════════════════════

SELECT c1.customer_name AS customer_a, 
       c2.customer_name AS customer_b, 
       c1.city
FROM customers c1
JOIN customers c2 ON c1.city = c2.city 
    AND c1.customer_id < c2.customer_id;  -- Avoid duplicates (Rahul-Rajesh, not Rajesh-Rahul)

-- 4-table JOIN: Order details with customer, product, and category
SELECT c.customer_name, p.product_name, cat.category_name,
       oi.quantity, oi.unit_price, (oi.quantity * oi.unit_price) AS line_total
FROM customers c
JOIN orders o       ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p     ON oi.product_id = p.product_id
JOIN categories cat ON p.category_id = cat.category_id
WHERE o.status != 'CANCELLED'
ORDER BY c.customer_name, o.order_date;

JOIN Summary Table

Join Type	Returns	Use Case
`INNER JOIN`	Only matching rows from both tables	Standard lookup — orders WITH customers
`LEFT JOIN`	All left rows + matching right (NULL if no match)	Find customers with NO orders, show all products even unsold
`RIGHT JOIN`	All right rows + matching left	Rare — usually rewrite as LEFT JOIN by swapping tables
`FULL OUTER JOIN`	All rows from both (NULLs on non-matching side)	Reconciliation — compare two datasets for mismatches
`CROSS JOIN`	Cartesian product (every A × every B)	Combination generation (size × color for a product matrix)
`SELF JOIN`	Table joined with itself	Hierarchies (employee-manager), same-city comparisons
`NATURAL JOIN`	Implicit join on same-named columns	⚠️ AVOID — fragile, breaks if columns are added/renamed

3.3 Set Operators — UNION, INTERSECT, MINUS/EXCEPT

SQL — Set Operators
-- ═══════════════════════════════════════
-- Example 6: Combine active + archived customer records (bank audit)
-- ═══════════════════════════════════════

-- UNION: combines and removes duplicates
SELECT customer_name, city FROM customers WHERE city = 'Mumbai'
UNION
SELECT customer_name, city FROM customers WHERE reg_date > DATE '2024-01-01';

-- UNION ALL: keeps duplicates — faster (no sort needed)
SELECT customer_name FROM active_customers
UNION ALL
SELECT customer_name FROM archived_customers;

-- INTERSECT: rows present in BOTH queries
SELECT customer_id FROM orders WHERE status = 'DELIVERED'
INTERSECT
SELECT customer_id FROM orders WHERE order_date > DATE '2024-12-01';
-- Customers with delivered orders AND orders after Dec 2024

-- MINUS (Oracle) / EXCEPT (PostgreSQL): rows in first NOT in second
SELECT customer_id FROM customers
MINUS  -- PostgreSQL: EXCEPT
SELECT customer_id FROM orders;
-- Customers who have NEVER placed an order (same as LEFT JOIN + IS NULL)

Operator	Duplicates	Oracle	PostgreSQL
`UNION`	Removed	✅	✅
`UNION ALL`	Kept	✅	✅
`INTERSECT`	Removed	✅	✅
`MINUS`	Removed	✅ MINUS	❌ Use EXCEPT

3.4 Views — Virtual Tables

📌 Views — A Saved Query Masquerading as a Table

📌 WHAT IT IS

A view is a stored SELECT query that behaves like a virtual table. It doesn't store data — it executes the underlying query each time you SELECT from it. Views provide security (hide columns), simplicity (hide complex JOINs), and consistency (same definition reused).

SQL — Views
-- ═══════════════════════════════════════
-- Example 7: Views for different user roles
-- Business Context: Dashboard views for sales team vs management
-- ═══════════════════════════════════════

-- Simple view: Order summary (hide internal IDs)
CREATE OR REPLACE VIEW v_order_summary AS
SELECT o.order_id, c.customer_name, c.city,
       o.order_date, o.status,
       SUM(oi.quantity * oi.unit_price) AS order_total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.order_id, c.customer_name, c.city, o.order_date, o.status;

-- Now use it like a table:
SELECT * FROM v_order_summary WHERE status = 'DELIVERED';

-- Read-only view (prevent DML)
CREATE VIEW v_product_catalog AS
SELECT p.product_name, c.category_name, p.price
FROM products p JOIN categories c ON p.category_id = c.category_id
WITH READ ONLY;

-- View with CHECK OPTION (prevent invisible row inserts)
CREATE VIEW v_mumbai_customers AS
SELECT * FROM customers WHERE city = 'Mumbai'
WITH CHECK OPTION;
-- INSERT into v_mumbai_customers with city='Delhi' is REJECTED
-- because the row wouldn't be visible through the view

Materialized View (Oracle) — Unlike regular views, a materialized view physically stores the query result on disk. It's refreshed periodically (ON DEMAND or ON COMMIT). Used for expensive report queries that don't need real-time data. Flipkart uses materialized views for their category-wise sales dashboard — refreshed every 15 minutes instead of computing from 500M rows on every page load.

SQL — Materialized View (Oracle)
CREATE MATERIALIZED VIEW mv_category_sales
BUILD IMMEDIATE
REFRESH COMPLETE ON DEMAND
AS
SELECT c.category_name, SUM(oi.quantity * oi.unit_price) AS revenue
FROM categories c
JOIN products p ON c.category_id = p.category_id
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY c.category_name;

-- Refresh manually when needed:
-- EXEC DBMS_MVIEW.REFRESH('mv_category_sales');

3.5 Subqueries & CTEs

SQL — Subquery Types
-- ═══════════════════════════════════════
-- Example 8: Single-row subquery
-- "Find products priced above the average"
-- ═══════════════════════════════════════

SELECT product_name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);

-- Multiple-row subquery (IN)
-- "Find customers who ordered Electronics"
SELECT customer_name FROM customers
WHERE customer_id IN (
    SELECT DISTINCT o.customer_id
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    JOIN products p ON oi.product_id = p.product_id
    WHERE p.category_id = 1
);

-- Correlated subquery (inner query references outer query)
-- "Find products priced above their category average"
SELECT p.product_name, p.price, p.category_id
FROM products p
WHERE p.price > (
    SELECT AVG(p2.price) FROM products p2
    WHERE p2.category_id = p.category_id  -- References outer p!
);

-- EXISTS (often faster than IN for large datasets)
-- "Customers who have at least one delivered order"
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id AND o.status = 'DELIVERED'
);

-- Scalar subquery in SELECT clause
SELECT p.product_name, p.price,
    (SELECT AVG(price) FROM products) AS overall_avg,
    p.price - (SELECT AVG(price) FROM products) AS diff_from_avg
FROM products p;

WITH Clause — Common Table Expressions (CTEs)

SQL — CTEs
-- ═══════════════════════════════════════
-- Example 9: CTE for readable complex queries
-- Business Context: Top customer per city by total spend
-- ═══════════════════════════════════════

WITH customer_spend AS (
    SELECT c.customer_id, c.customer_name, c.city,
           SUM(oi.quantity * oi.unit_price) AS total_spent
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.status != 'CANCELLED'
    GROUP BY c.customer_id, c.customer_name, c.city
),
ranked AS (
    SELECT cs.*,
           ROW_NUMBER() OVER (PARTITION BY city ORDER BY total_spent DESC) AS rn
    FROM customer_spend cs
)
SELECT customer_name, city, total_spent
FROM ranked WHERE rn = 1;

-- Recursive CTE: Category hierarchy (Electronics → Mobiles → Smartphones)
WITH RECURSIVE cat_tree AS (  -- PostgreSQL; Oracle: no RECURSIVE keyword
    SELECT category_id, category_name, parent_id, 1 AS level
    FROM categories WHERE parent_id IS NULL  -- Root categories
    UNION ALL
    SELECT c.category_id, c.category_name, c.parent_id, ct.level + 1
    FROM categories c
    JOIN cat_tree ct ON c.parent_id = ct.category_id
)
SELECT LPAD(' ', (level-1)*4) || category_name AS hierarchy
FROM cat_tree ORDER BY level;

JOIN vs Subquery — When to use which: Use JOIN when you need columns from multiple tables in the output. Use subquery when you need to filter based on another table's aggregate. Use EXISTS over IN when the subquery returns many rows (EXISTS short-circuits). Use CTE when a subquery is referenced multiple times or for readability of complex logic.

3.6 Relational Algebra — The Mathematical Foundation

📌 Relational Algebra — SQL's Mathematical Backbone

📌 WHAT IT IS

Relational algebra is a formal system of operations on relations (tables). Every SQL query is internally translated into a relational algebra expression by the query optimizer. Understanding it helps you write better SQL and understand EXPLAIN PLANs.

Operation	Symbol	Purpose	SQL Equivalent
Selection	σ (sigma)	Filter rows (horizontal)	`WHERE`
Projection	π (pi)	Choose columns (vertical)	`SELECT col1, col2`
Union	∪	Combine two compatible relations	`UNION`
Set Difference	−	Rows in R but not in S	`MINUS` / `EXCEPT`
Cartesian Product	×	All combinations of rows	`CROSS JOIN`
Rename	ρ (rho)	Rename relation or attributes	`AS alias`
Natural Join	⋈	Join on common attributes	`NATURAL JOIN`
Theta Join	⋈_θ	Join on arbitrary condition	`JOIN ON condition`
Intersection	∩	Rows common to both	`INTERSECT`
Division	÷	"Find X related to ALL Y"	Complex subquery with NOT EXISTS

Relational Algebra → SQL Mapping
Query: "Names and cities of customers from Mumbai who have placed orders"

Relational Algebra:
  π_{customer_name, city}(σ_{city='Mumbai'}(customers ⋈ orders))

Step by step:
  1. ⋈ (Natural Join):    customers JOIN orders ON customer_id
  2. σ (Selection):       WHERE city = 'Mumbai'
  3. π (Projection):      SELECT customer_name, city

SQL:
  SELECT DISTINCT c.customer_name, c.city
  FROM customers c JOIN orders o ON c.customer_id = o.customer_id
  WHERE c.city = 'Mumbai';

Query optimization basics: The query optimizer translates your SQL into a relational algebra expression tree, then rearranges operations for efficiency. For example, pushing σ (selection/WHERE) down the tree before ⋈ (join) reduces the number of rows joined. Use EXPLAIN PLAN (Oracle) or EXPLAIN ANALYZE (PostgreSQL) to see how the optimizer executes your query — then optimize by adding indexes on frequently filtered/joined columns.

Section 4

Industry Problems

🏢 Industry Problem #1 — PhonePe UPI Transaction Summary Report

Organization: Payments company (PhonePe-scale)

Scenario: Generate the monthly UPI transaction report for NPCI submission. Report must include: total transactions, successful vs failed, total volume (₹), average transaction value, top 5 receiving banks by volume — all grouped by month.

Schema:

SQL
CREATE TABLE upi_transactions (
    txn_id       NUMBER(15)    PRIMARY KEY,
    sender_vpa   VARCHAR2(50)  NOT NULL,
    receiver_vpa VARCHAR2(50)  NOT NULL,
    amount       NUMBER(12,2)  CHECK (amount > 0),
    status       VARCHAR2(10)  CHECK (status IN ('SUCCESS','FAILED','PENDING')),
    txn_date     DATE,
    receiver_bank VARCHAR2(30)
);

💡 Complete Solution

SQL
WITH monthly_stats AS (
    SELECT
        TO_CHAR(txn_date, 'YYYY-MM')  AS txn_month,
        COUNT(*)                       AS total_txn,
        SUM(CASE WHEN status = 'SUCCESS' THEN 1 ELSE 0 END) AS success_count,
        SUM(CASE WHEN status = 'FAILED' THEN 1 ELSE 0 END)  AS failed_count,
        SUM(CASE WHEN status = 'SUCCESS' THEN amount ELSE 0 END) AS total_volume,
        ROUND(AVG(CASE WHEN status = 'SUCCESS' THEN amount END), 2) AS avg_txn_value,
        ROUND(SUM(CASE WHEN status='SUCCESS' THEN 1 ELSE 0 END)*100.0/COUNT(*),2) AS success_rate
    FROM upi_transactions
    GROUP BY TO_CHAR(txn_date, 'YYYY-MM')
)
SELECT * FROM monthly_stats ORDER BY txn_month DESC;

-- Top 5 receiver banks by volume per month (using window function)
WITH bank_volume AS (
    SELECT TO_CHAR(txn_date,'YYYY-MM') AS month, receiver_bank,
           SUM(amount) AS vol,
           DENSE_RANK() OVER (PARTITION BY TO_CHAR(txn_date,'YYYY-MM')
                              ORDER BY SUM(amount) DESC) AS rnk
    FROM upi_transactions WHERE status='SUCCESS'
    GROUP BY TO_CHAR(txn_date,'YYYY-MM'), receiver_bank
)
SELECT * FROM bank_volume WHERE rnk <= 5;

DBA Insight: At PhonePe's scale (10 billion+ monthly), this query runs on partitioned tables (partition by month). Materialized views pre-compute daily summaries. The NPCI report is generated from the materialized view, not raw transactions.

🏢 Industry Problem #2 — College Placement Analytics Dashboard

Organization: University placement cell

Scenario: Generate a placement report: company-wise offers, department-wise placement rate, highest/lowest packages, students with multiple offers.

💡 Solution Highlights

SQL
-- Students with multiple offers
SELECT s.student_name, COUNT(po.offer_id) AS num_offers,
       MAX(po.ctc_lpa) AS highest_ctc
FROM students s
JOIN placement_offers po ON s.student_id = po.student_id
GROUP BY s.student_name
HAVING COUNT(po.offer_id) > 1
ORDER BY highest_ctc DESC;

-- Department-wise placement rate
SELECT d.dept_name,
       COUNT(DISTINCT s.student_id) AS total_students,
       COUNT(DISTINCT po.student_id) AS placed,
       ROUND(COUNT(DISTINCT po.student_id)*100.0/COUNT(DISTINCT s.student_id),1) AS placement_pct
FROM departments d
JOIN students s ON d.dept_id = s.dept_id
LEFT JOIN placement_offers po ON s.student_id = po.student_id
GROUP BY d.dept_name
ORDER BY placement_pct DESC;

🏢 Industry Problem #3 — Flipkart Product Recommendation Query

Organization: E-commerce (Flipkart-scale)

Scenario: "Customers who bought product X also bought..." — find co-purchased products using self-joins on order_items.

💡 Solution

SQL
-- Find products frequently bought WITH iPhone 15 (product_id=101)
SELECT p.product_name, COUNT(*) AS co_purchase_count
FROM order_items oi1
JOIN order_items oi2 ON oi1.order_id = oi2.order_id   -- Same order
    AND oi1.product_id != oi2.product_id             -- Different product
JOIN products p ON oi2.product_id = p.product_id
WHERE oi1.product_id = 101  -- iPhone 15
GROUP BY p.product_name
ORDER BY co_purchase_count DESC
FETCH FIRST 5 ROWS ONLY;

-- Output:
-- product_name     | co_purchase_count
-- DBMS by Navathe  | 1   (bought together in order 1001)

DBA Insight: Flipkart's actual recommendation engine uses Apache Spark on historical data, not live SQL queries. But the SQL pattern above is exactly how recommendation prototypes are built and tested. Index on order_items(order_id, product_id) is critical for performance.

Section 5

Lab Exercises

Exercise 1: Aggregate Functions & GROUP BY

⏱ 35 minutes🟢 Beginner

Schema: E-commerce schema from Section 1

Tasks:

Find total number of orders, total revenue, and average order value (excluding cancelled)
Count orders per status (PENDING, SHIPPED, DELIVERED, CANCELLED)
Find revenue per category — show only categories with revenue > ₹5000
Find the most expensive product in each category
Use GROUP BY with ROLLUP on (category, month) for subtotals

Hints: Remember HAVING for group-level filters, not WHERE.

Exercise 2: All JOIN Types

⏱ 45 minutes🟡 Intermediate

Tasks:

INNER JOIN: List all orders with customer names and product names
LEFT JOIN: Show ALL customers including those with no orders
LEFT JOIN + IS NULL: Find customers who never ordered
SELF JOIN: Find pairs of customers from the same city
4-table JOIN: Full order details (customer → order → items → product → category)
Verify: COUNT the rows returned by INNER vs LEFT vs FULL OUTER JOIN — explain the differences

Exercise 3: Subqueries — All Types

⏱ 50 minutes🟡 Intermediate

Tasks:

Single-row subquery: Products priced above average
Multi-row subquery (IN): Customers who bought Electronics
Correlated subquery: Products priced above their category average
EXISTS: Customers with at least one delivered order
Rewrite each subquery as a JOIN — compare which is more readable

Exercise 4: Views & Materialized Views

⏱ 40 minutes🟡 Intermediate

Tasks:

Create v_order_details: customer name, product name, quantity, total, order date
Create v_city_customers for Mumbai with CHECK OPTION — try inserting a Delhi customer
Create a READ ONLY view for the product catalog
Try INSERT/UPDATE on a view with JOINs — observe which operations fail
Create a materialized view for category-wise monthly revenue (Oracle) or use CREATE TABLE AS equivalent

Exercise 5: Complete Analytics Dashboard

⏱ 60 minutes🔴 Advanced

Tasks: Write a single CTE-based query that produces a complete e-commerce dashboard:

CTE 1: Customer lifetime value (total spend per customer)
CTE 2: Product popularity (total units sold per product)
CTE 3: Category revenue with ranking
Final SELECT: Top 3 customers by spend, joined with their most-purchased category

Extension: Add a "month-over-month growth" calculation using LAG() window function.

Section 6

MCQ Assessment Bank — 15 Questions

Hover to reveal answer and explanation.

Which aggregate function counts the number of non-NULL values in a column?

COUNT(*)
COUNT(column_name)
SUM(column_name)
TOTAL(column_name)

✅ B. COUNT(column_name) counts only non-NULL values. COUNT(*) counts all rows regardless of NULLs. If a column has 100 rows but 10 are NULL: COUNT(*)=100, COUNT(column)=90. SUM adds values, doesn't count. TOTAL is not standard SQL.
🏢 This distinction is tested in GATE CS and every campus placement SQL test.

L1 — RememberAggregates

Which JOIN type returns ALL rows from the left table even if there is no matching row in the right table?

INNER JOIN
LEFT OUTER JOIN
CROSS JOIN
SELF JOIN

✅ B. LEFT OUTER JOIN returns all rows from the left table. For non-matching rows, the right table columns are filled with NULL. INNER JOIN returns only matching rows. CROSS JOIN returns the Cartesian product. SELF JOIN is a table joined with itself (can use any join type).
🏢 LEFT JOIN is the most commonly used join in data analysis — finding "all X, even those without Y."

L1 — RememberJOINs

Which set operator removes duplicates when combining two SELECT results?

UNION ALL
UNION
CROSS JOIN
MERGE

✅ B. UNION combines result sets and removes duplicates (performs an implicit DISTINCT + sort). UNION ALL keeps all duplicates and is faster (no sort needed). Use UNION ALL when duplicates are expected/desired for performance. CROSS JOIN is a join type, not a set operator. MERGE is a DML statement.
🏢 Performance tip: always prefer UNION ALL unless you specifically need duplicate removal.

L1 — RememberSet Operators

Why can't you use WHERE to filter on aggregate results (e.g., WHERE COUNT(*) > 5)?

It's a syntax limitation that will be fixed in future SQL versions
Because WHERE filters individual rows BEFORE grouping occurs; aggregates don't exist yet at the WHERE stage of execution. Use HAVING to filter groups AFTER GROUP BY
You can — it works in PostgreSQL
WHERE only works with string comparisons

✅ B. SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. WHERE runs before GROUP BY, so aggregate values (COUNT, SUM, AVG) haven't been computed yet. HAVING runs after GROUP BY, when aggregates are available. This is a fundamental SQL concept, not a bug or limitation.
🏢 Understanding WHERE vs HAVING is essential for every SQL developer. Tested in every assessment.

L2 — UnderstandGROUP BY

What is the difference between a correlated subquery and a regular subquery?

They are identical
A correlated subquery references columns from the outer query and is re-executed for each outer row; a regular subquery is independent and executes only once
Correlated subqueries run faster
Regular subqueries can't use WHERE

✅ B. Regular subquery: WHERE price > (SELECT AVG(price) FROM products) — inner query runs once, returns one result. Correlated subquery: WHERE price > (SELECT AVG(price) FROM products p2 WHERE p2.category_id = p.category_id) — inner query references outer p.category_id and re-executes for EACH outer row. Correlated subqueries are slower (N executions vs 1) but sometimes necessary.
🏢 Correlated subqueries are asked in every advanced SQL interview — know how to optimize them.

L2 — UnderstandSubqueries

What is a materialized view and how does it differ from a regular view?

They are the same thing
A regular view stores only the query definition (re-executes each time). A materialized view physically stores the query result on disk and is refreshed periodically. MViews are faster for reads but may show stale data between refreshes
Materialized views are virtual
Regular views store data on disk

✅ B. Regular view: SELECT from it = re-run the underlying query every time. Materialized view: pre-computed result stored on disk, refreshed on schedule (ON DEMAND or ON COMMIT). MViews trade freshness for speed — perfect for dashboards where 15-minute-old data is acceptable. Amazon uses MViews for nightly sales reports instead of querying billions of raw transaction rows.
🏢 MViews are a key Oracle DBA concept. Also available in PostgreSQL (manual refresh).

L2 — UnderstandViews

Write a query: "Find customers who have never placed any order."

SELECT * FROM customers WHERE order_id IS NULL
SELECT c.customer_name FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_id IS NULL
SELECT * FROM customers WHERE customer_id NOT IN (SELECT * FROM orders)
SELECT * FROM customers MINUS SELECT * FROM orders

✅ B. LEFT JOIN + IS NULL pattern: LEFT JOIN returns all customers. For those without orders, o.order_id is NULL. Filtering on WHERE o.order_id IS NULL gives only no-order customers. Option A: customers table doesn't have order_id column. Option C: subquery must select customer_id, not *. Option D: different column structures, can't use MINUS.
🏢 This is the single most asked JOIN question in SQL interviews. Three ways to write it: LEFT JOIN + IS NULL, NOT EXISTS, NOT IN.

L3 — ApplyJOINs

Write a query to find the second highest salary from a doctors table.

SELECT MAX(salary) FROM doctors WHERE salary < (SELECT MAX(salary) FROM doctors)
SELECT salary FROM doctors ORDER BY salary DESC LIMIT 2
SELECT salary FROM doctors WHERE ROWNUM = 2
SELECT MIN(salary) FROM doctors

✅ A. Nested aggregate approach: inner query finds the MAX salary, outer query finds the MAX among salaries less than that — which is the second highest. Option B returns TWO rows (1st and 2nd), not just 2nd. Option C: ROWNUM doesn't work this way (ROWNUM is assigned before ORDER BY). Option D returns minimum, not second highest. Alternative: DENSE_RANK() OVER (ORDER BY salary DESC) = 2.
🏢 This is the #1 most asked SQL interview question across all Indian IT companies. Know 3 ways to solve it.

L3 — ApplySubqueries

Write a CTE to find the top-spending customer per city.

This requires a stored procedure
WITH spend AS (SELECT c.customer_name, c.city, SUM(oi.quantity*oi.unit_price) AS total FROM customers c JOIN orders o ON ... JOIN order_items oi ON ... GROUP BY c.customer_name, c.city), ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY city ORDER BY total DESC) AS rn FROM spend) SELECT * FROM ranked WHERE rn = 1
CTEs can't use window functions
Use GROUP BY city only

✅ B. Two-CTE approach: first CTE calculates total spend per customer, second CTE ranks customers within each city using ROW_NUMBER() OVER (PARTITION BY city ORDER BY total DESC). Final SELECT filters rn=1 — the top spender per city. CTEs CAN use window functions — they support everything a regular SELECT does. This pattern (CTE + window function + filter) is the standard "top-N per group" solution.
🏢 This exact pattern is used at Swiggy, Zomato, and Flipkart for analytics dashboards.

L3 — ApplyCTEs

Q10

A query uses LEFT JOIN but returns the same number of rows as INNER JOIN. What does this tell you about the data?

The query has a bug
LEFT JOIN and INNER JOIN are identical
Every row in the left table has at least one matching row in the right table — there are no "orphan" rows. The LEFT JOIN is technically unnecessary but not incorrect
The right table is empty

✅ C. LEFT JOIN includes non-matching rows as NULLs. If the row count is the same as INNER JOIN, it means every left row found a match — no NULLs were added. This often happens when there's a NOT NULL FK constraint guaranteeing referential integrity. The LEFT JOIN is safe (defensively correct) but an INNER JOIN would produce the same result. Analyzing row counts between join types is a key debugging technique.
🏢 Senior developers check this to optimize queries — unnecessary LEFT JOINs add overhead.

L4 — AnalyzeJOINs

Q11

A developer writes: SELECT * FROM customers WHERE customer_id NOT IN (SELECT customer_id FROM orders WHERE customer_id IS NULL). The query returns NO rows even though customers exist. What happened?

The customers table is empty
The orders subquery returns NULL values. NOT IN with any NULL in the list evaluates to UNKNOWN (not TRUE) for every row, so no rows are returned. Fix: use NOT EXISTS instead, or add WHERE customer_id IS NOT NULL in the subquery
NOT IN doesn't work with subqueries
The WHERE clause syntax is wrong

✅ B. This is the infamous NULL trap with NOT IN. If the subquery returns {1, 2, NULL}, then NOT IN checks: customer_id != 1 AND customer_id != 2 AND customer_id != NULL. Comparing anything with NULL yields UNKNOWN, and UNKNOWN AND TRUE = UNKNOWN. So NO rows pass the filter. NOT EXISTS handles NULLs correctly because it checks existence, not value equality. This is why experienced developers prefer NOT EXISTS over NOT IN.
🏢 This NULL trap causes production bugs regularly. It's a frequent interview "gotcha" question.

L4 — AnalyzeSubqueries

Q12

A view v_order_summary uses JOINs and GROUP BY. A developer tries INSERT INTO v_order_summary VALUES (...). Evaluate this operation.

It works normally
It fails — views with JOINs, GROUP BY, DISTINCT, aggregates, or UNION are NOT updatable. The DBMS cannot determine which underlying table to modify. Only simple single-table views without these constructs are updatable
Only the first table in the JOIN is updated
It works in PostgreSQL but not Oracle

✅ B. Updatable view requirements: single base table, no GROUP BY/HAVING, no DISTINCT, no aggregates, no UNION, no subqueries in SELECT. If a view violates any of these, DML is rejected. For JOINs, Oracle allows DML on "key-preserved" tables (the table whose PK is preserved in the join), but not in general. Use INSTEAD OF triggers to enable DML on complex views if needed.
🏢 View updatability rules are tested in Oracle OCA certification (1Z0-071).

L5 — EvaluateViews

Q13

A team debates: use a correlated subquery or a JOIN+GROUP BY to find "employees earning more than their department average." Evaluate both approaches.

Subquery is always better
JOIN is always better
Correlated subquery is more readable (closer to natural language) but re-executes the inner query for each row (O(n²)). JOIN+GROUP BY computes averages once, then joins (O(n log n)). For large tables, the JOIN approach is significantly faster. For small tables, difference is negligible. Modern optimizers may rewrite one into the other
They can't solve the same problem

✅ C. Correlated: WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept=e.dept) — readable but slow for large tables. JOIN:

WITH dept_avg AS (SELECT dept, AVG(salary) as avg_sal FROM emp GROUP BY dept) SELECT e.* FROM emp e JOIN dept_avg d ON e.dept=d.dept WHERE e.salary > d.avg_sal

— computes averages once. Modern query optimizers (Oracle's CBO, PG's planner) may automatically rewrite correlated subqueries into JOINs — but don't rely on it.
🏢 Query optimization is a senior-level interview topic. Know both approaches and their trade-offs.

L5 — EvaluateOptimization

Q14

Design queries for a Swiggy-like food delivery dashboard: (1) Total orders per restaurant with revenue, (2) Top 3 cuisines by order count, (3) Customers who ordered from 3+ different restaurants.

Single query can do all three
Three separate queries using JOINs + GROUP BY + HAVING: (1) restaurants JOIN orders JOIN order_items, GROUP BY restaurant, SUM(amount). (2) cuisines JOIN restaurants JOIN orders, GROUP BY cuisine, ORDER BY COUNT DESC, FETCH FIRST 3. (3) customers JOIN orders, GROUP BY customer HAVING COUNT(DISTINCT restaurant_id) >= 3
This requires NoSQL
Views can handle this automatically

✅ B. Each is a standard JOIN + GROUP BY pattern: (1) 3-table join with SUM for revenue, (2) additional cuisine table join with ranking, (3) COUNT(DISTINCT) with HAVING. These are the exact queries data analysts at Swiggy write daily. All three can be combined into a dashboard CTE or materialized view for repeated access.
🏢 These are real data analyst interview questions at Swiggy, Zomato, and food delivery startups.

L6 — CreateDashboard

Q15

Express this query in relational algebra: "Find names of customers from Mumbai who ordered products in the Electronics category."

Just use SQL
π_{customer_name}(σ_{city='Mumbai' ∧ category_name='Electronics'}(customers ⋈ orders ⋈ order_items ⋈ products ⋈ categories))
σ_Mumbai(customers)
customers × orders

✅ B. Step by step: (1) Natural join all 5 tables (⋈), (2) Apply selection σ for both conditions (city='Mumbai' AND category='Electronics'), (3) Project π to get only customer_name. The optimizer would push σ conditions down before joins for efficiency. SQL equivalent:

SELECT DISTINCT c.customer_name FROM customers c JOIN orders o ON... JOIN order_items oi ON... JOIN products p ON... JOIN categories cat ON... WHERE c.city='Mumbai' AND cat.category_name='Electronics'

.
🏢 GATE CS exam always has 1-2 relational algebra questions. Map each symbol to SQL for easy solving.

L6 — CreateRelational Algebra

Section 7

Chapter Summary

RELATIONAL OPERATIONS │ ├── AGGREGATE FUNCTIONS │ ├── COUNT, SUM, AVG, MIN, MAX — NULL behavior matters │ ├── GROUP BY: single/multiple columns │ ├── HAVING: filter AFTER grouping (vs WHERE: before) │ ├── ROLLUP: hierarchical subtotals + grand total │ ├── CUBE: all possible subtotal combinations │ └── GROUPING SETS: custom grouping subsets │ ├── SQL JOINS │ ├── INNER JOIN: only matching rows (default) │ ├── LEFT JOIN: all left + matching right (NULLs) │ ├── RIGHT JOIN: all right + matching left │ ├── FULL OUTER JOIN: all from both │ ├── CROSS JOIN: Cartesian product (A × B) │ ├── SELF JOIN: table with itself (hierarchies) │ └── Multi-table: 3, 4, 5-table joins for reports │ ├── SET OPERATORS │ ├── UNION (dedup) / UNION ALL (keep dupes) │ ├── INTERSECT (common rows) │ └── MINUS (Oracle) / EXCEPT (PG): difference │ ├── VIEWS │ ├── Virtual table = stored SELECT query │ ├── Security (hide columns), simplicity (hide JOINs) │ ├── WITH CHECK OPTION / WITH READ ONLY │ ├── DML on views: only simple single-table views │ └── Materialized Views: physical storage, periodic refresh │ ├── SUBQUERIES │ ├── Single-row: returns 1 value (=, >, <) │ ├── Multi-row: returns set (IN, ANY, ALL, EXISTS) │ ├── Correlated: references outer query (re-executes per row) │ ├── Scalar: in SELECT clause │ ├── Inline view: in FROM clause │ └── CTE (WITH clause): named temp results, recursive │ └── RELATIONAL ALGEBRA ├── σ (Selection/WHERE), π (Projection/SELECT) ├── ∪ (Union), − (Difference), × (Cartesian) ├── ⋈ (Join), ∩ (Intersect), ÷ (Division) └── Maps directly to SQL — query optimizer uses this

🎯 3 SQL Skills This Chapter Unlocks

Multi-table Queries — JOINs connect normalized data into meaningful reports. Every real-world query involves 2-5 table JOINs.
Analytics Reports — GROUP BY + HAVING + aggregates + ROLLUP = the exact toolkit for financial reports, dashboards, and KPI tracking.
Subquery Thinking — Correlated subqueries, EXISTS, CTEs — these solve "find X where condition depends on another table's aggregate" problems that JOINs alone can't elegantly handle.

📋 SQL Quick Reference

JOINS:
  FROM a INNER JOIN b ON a.id = b.a_id       -- Matching only
  FROM a LEFT JOIN b ON a.id = b.a_id        -- All a + matching b
  FROM a FULL OUTER JOIN b ON a.id = b.a_id  -- All from both
  FROM a CROSS JOIN b                         -- Cartesian product
  FROM a a1 JOIN a a2 ON a1.col = a2.col     -- Self join

AGGREGATES:
  SELECT col, COUNT(*), SUM(x), AVG(x), MIN(x), MAX(x)
  FROM t GROUP BY col HAVING COUNT(*) > 5
  GROUP BY ROLLUP(col1, col2)  -- Subtotals + grand total

SET OPS:
  query1 UNION query2          -- Dedup
  query1 UNION ALL query2      -- Keep dupes (faster)
  query1 INTERSECT query2      -- Common rows
  query1 MINUS query2          -- Oracle (EXCEPT in PG)

SUBQUERIES:
  WHERE col > (SELECT AVG(col) FROM t)            -- Single-row
  WHERE col IN (SELECT col FROM t2)                -- Multi-row
  WHERE col > (SELECT AVG(x) FROM t2 WHERE t2.id = t.id)  -- Correlated
  WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.id = t.id)      -- EXISTS

VIEWS:
  CREATE [OR REPLACE] VIEW v AS SELECT ...
  CREATE VIEW v AS ... WITH CHECK OPTION
  CREATE VIEW v AS ... WITH READ ONLY
  CREATE MATERIALIZED VIEW mv AS ... REFRESH ON DEMAND

RELATIONAL ALGEBRA:
  σ = WHERE | π = SELECT cols | ⋈ = JOIN | ∪ = UNION
  − = MINUS/EXCEPT | × = CROSS JOIN | ρ = AS alias

Section 8

Interview & Career Preparation

Q1: Explain INNER JOIN vs LEFT JOIN with an example.

Model Answer: INNER JOIN returns only matching rows from both tables. LEFT JOIN returns ALL rows from the left table and matching rows from the right — non-matching right rows appear as NULLs. Example: customers LEFT JOIN orders shows all customers, including those with no orders (order columns = NULL). Use LEFT JOIN to find "customers without orders" by adding WHERE order_id IS NULL.

Q2: How do you find the second highest salary?

Model Answer: Three approaches: (1) Subquery: SELECT MAX(salary) FROM emp WHERE salary < (SELECT MAX(salary) FROM emp). (2) DENSE_RANK: WITH ranked AS (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM emp) SELECT salary FROM ranked WHERE rnk = 2. (3) OFFSET: SELECT DISTINCT salary FROM emp ORDER BY salary DESC OFFSET 1 FETCH FIRST 1 ROW ONLY. Approach 2 handles ties best — if two people share the highest salary, it still returns the second-highest correctly.

Q3: What is the difference between WHERE and HAVING?

Model Answer: WHERE filters individual rows BEFORE GROUP BY. HAVING filters groups AFTER GROUP BY. WHERE cannot use aggregate functions (SUM, COUNT, AVG). HAVING can. Example: WHERE salary > 50000 filters individual employees. HAVING COUNT(*) > 5 filters departments with more than 5 employees. Both can coexist in one query.

Q4: What is a correlated subquery?

Model Answer: A correlated subquery references a column from the outer query, causing it to re-execute for each outer row. Example: "Find employees earning more than their department average": SELECT * FROM emp e WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept_id = e.dept_id). The inner query depends on e.dept_id from the outer query. Performance: O(n²) for naive execution; optimizers may rewrite as JOIN.

Q5: When is EXISTS better than IN?

Model Answer: EXISTS is better when: (1) the subquery returns many rows (EXISTS stops at first match; IN checks all), (2) the subquery may return NULLs (NOT IN with NULLs returns no rows — a common bug; NOT EXISTS handles NULLs correctly). IN is simpler for small, known value sets: WHERE dept_id IN (1, 2, 3). Rule of thumb: use EXISTS for correlated checks, IN for static lists.

Q6: What is a CTE? Why use it over subqueries?

Model Answer: CTE (Common Table Expression) is a named temporary result set defined with the WITH clause. Advantages over subqueries: (1) Readability — named steps vs nested brackets. (2) Reusability — referenced multiple times in one query. (3) Recursive queries — hierarchical data (org charts, category trees) impossible with regular subqueries. (4) Same performance as subqueries — CTEs are "syntactic sugar," not physically materialized (unlike temp tables).

Q7: What is a view? Can you do INSERT on a view?

Model Answer: A view is a stored SELECT query that behaves like a virtual table. INSERT is allowed only on simple single-table views without GROUP BY, DISTINCT, aggregates, UNION, or complex expressions. Views with JOINs are generally not updatable (except "key-preserved" tables in Oracle). WITH READ ONLY explicitly prevents all DML. Use INSTEAD OF triggers to enable DML on complex views.

Q8: What is UNION vs UNION ALL?

Model Answer: UNION combines two SELECT results and removes duplicate rows (implicit DISTINCT + sort). UNION ALL keeps all rows including duplicates — no sort, significantly faster. Use UNION ALL when: (1) duplicates are impossible (disjoint datasets), (2) duplicates are acceptable, (3) performance matters. Both require same number of columns with compatible data types. ORDER BY goes at the end of the last SELECT.

Q9: Explain relational algebra σ, π, and ⋈.

Model Answer: σ (sigma/Selection): filters rows based on condition — equivalent to SQL WHERE. π (pi/Projection): selects specific columns — equivalent to SQL SELECT column list. ⋈ (natural join): combines two relations matching on common attribute names — equivalent to SQL JOIN. Example: π_name(σ_salary>50000(employees)) = SELECT name FROM employees WHERE salary > 50000. The query optimizer translates SQL into these operations internally.

Q10: Write a query to find departments with more than 3 employees where the average salary exceeds ₹2,00,000.

Model Answer: SELECT d.dept_name, COUNT(*) AS emp_count, ROUND(AVG(e.salary), 2) AS avg_salary FROM departments d JOIN employees e ON d.dept_id = e.dept_id GROUP BY d.dept_name HAVING COUNT(*) > 3 AND AVG(e.salary) > 200000 ORDER BY avg_salary DESC. Key points: JOIN for table connection, GROUP BY for aggregation, HAVING for two conditions on aggregates. WHERE would only filter individual rows, not groups.

🎓 Practice Platforms

LeetCode SQL 50 — 50 curated SQL problems. Covers JOINs, subqueries, window functions.
HackerRank SQL — Free challenges from Easy to Hard. Earn SQL certificates.
pgexercises.com — PostgreSQL-specific exercises with a real schema.
SQLZoo — Interactive SQL tutorials with quizzes.
Oracle Live SQL — Free Oracle environment in the browser.