Languages
SQL
Beyond Basic Queries

Advanced SQL: Beyond Basic Queries

-- Example: Recursive CTE for hierarchical data
WITH RECURSIVE org_tree AS (
    SELECT id, name, manager_id
    FROM employees
    WHERE id = 1  -- CEO
 
    UNION ALL
 
    SELECT e.id, e.name, e.manager_id
    FROM employees e
    JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree;

Window Functions: Powerful Data Analysis

Common Window Functions

FunctionDescription
ROW_NUMBER()Unique sequential row numbers
RANK()Rank with gaps for ties
DENSE_RANK()Rank without gaps
LEAD()/LAG()Access next/previous row values
FIRST_VALUE()First value in window frame
SUM() OVER()Running/cumulative sums
-- Running total example
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

Common Table Expressions (CTEs)

Types of CTEs

  1. Basic CTE

    WITH regional_sales AS (
        SELECT region, SUM(amount) AS total_sales
        FROM orders
        GROUP BY region
    )
    SELECT * FROM regional_sales;
  2. Recursive CTE (for hierarchical data)

    WITH RECURSIVE employee_hierarchy AS (
        -- Base case
        SELECT id, name, manager_id, 1 AS level
        FROM employees
        WHERE manager_id IS NULL
     
        UNION ALL
     
        -- Recursive case
        SELECT e.id, e.name, e.manager_id, eh.level + 1
        FROM employees e
        JOIN employee_hierarchy eh ON e.manager_id = eh.id
    )
    SELECT * FROM employee_hierarchy;
  3. Materialized CTE (PostgreSQL)

    WITH MATERIALIZED expensive_query AS (
        SELECT * FROM large_table WHERE complex_condition
    )
    SELECT * FROM expensive_query JOIN other_table...;

Indexing Strategies

When to Index

  • Columns frequently in WHERE clauses
  • Join conditions
  • Columns used in ORDER BY or GROUP BY

Index Types

TypeBest For
B-TreeDefault, range queries
HashExact matches (PostgreSQL)
GiSTGeometric data, full-text search
GINArray/JSON operations
BRINVery large tables with sorting
-- Create a partial index
CREATE INDEX idx_active_users ON users(email)
WHERE active = true;
 
-- Multi-column index
CREATE INDEX idx_name_department ON employees(last_name, department_id);

Query Optimization Techniques

Execution Plan Analysis

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

Optimization Tips

  1. Avoid SELECT

    -- Instead of:
    SELECT * FROM customers;
     
    -- Use:
    SELECT id, name, email FROM customers;
  2. Use JOINs properly

    -- Prefer:
    SELECT c.name, o.amount
    FROM customers c
    JOIN orders o ON c.id = o.customer_id;
     
    -- Over:
    SELECT c.name, o.amount
    FROM customers c, orders o
    WHERE c.id = o.customer_id;
  3. Limit early

    SELECT *
    FROM large_table
    WHERE condition
    LIMIT 10;
  4. Batch operations

    -- Instead of multiple INSERTs:
    INSERT INTO users (name) VALUES
    ('Alice'), ('Bob'), ('Charlie');

Advanced Joins

Lateral Joins

-- Get top 3 products for each category
SELECT c.name, p.product_name, p.price
FROM categories c
CROSS JOIN LATERAL (
    SELECT product_name, price
    FROM products
    WHERE category_id = c.id
    ORDER BY price DESC
    LIMIT 3
) p;

Self Joins

-- Find employees and their managers
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Transaction Isolation Levels

LevelDirty ReadsNon-Repeatable ReadsPhantom Reads
READ UNCOMMITTEDPossiblePossiblePossible
READ COMMITTEDNoPossiblePossible
REPEATABLE READNoNoPossible
SERIALIZABLENoNoNo
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
-- Critical operations
COMMIT;

JSON Support in Modern SQL

PostgreSQL JSON Operations

-- Extract value
SELECT data->>'name' FROM users;
 
-- Query inside JSON
SELECT *
FROM products
WHERE metadata->>'color' = 'blue';
 
-- JSON aggregation
SELECT
    department_id,
    jsonb_agg(jsonb_build_object('id', id, 'name', name)) AS employees
FROM employees
GROUP BY department_id;

Temporal Tables (SQL:2011)

-- System-versioned temporal table
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2),
    valid_from TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
    valid_to TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
    PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH SYSTEM VERSIONING;
 
-- Query historical data
SELECT * FROM products
FOR SYSTEM_TIME AS OF TIMESTAMP '2023-01-01';

Further Reading

Last updated on April 10, 2025