SQL Questions

1. What Is PostgreSQL, And How Does It Differ from Other SQL Databases?

Feature Description
Open Source Freely available and actively maintained.
Extensibility Define custom data types, functions, and extensions.
Advanced Indexing B-tree, Hash, GIN, GiST, BRIN indexes.
Concurrency MVCC for high performance.
Full-Text Search Built-in support for text search.
Rich Data Types Arrays, hstore, UUID, JSON/JSONB, geometric types.
Procedural Languages Supports PL/pgSQL, PL/Python, PL/Perl, and more.
Partitioning & Table Inheritance Useful for large-scale data management.

2. Explain Data Types in PostgreSQL

3. Explain Primary and Foreign key

Primary Key

Foreign Key

4. Explain VIEW

View is a virtual table based on the result of a SQL query. It does not store data itself, but it provides a saved query that you can treat like a table in your SQL statements.

Why Use Views?

Syntax:


      CREATE VIEW view_name AS
      SELECT column1, column2
      FROM table_name
      WHERE condition;
        

5. πŸ“Œ What are CTEs (Common Table Expressions) in PostgreSQL?

CTE is a temporary result set that we can define within a query that is used to simplify the complex statement. A common table expression (CTE) allows you to create a temporary result set within a query. A CTE helps you enhance the readability of a complex query by breaking it down into smaller and more reusable parts.

Syntax:


      WITH cte_name AS (
        SELECT column1, column2
        FROM some_table
        WHERE condition
      )
      SELECT *
      FROM cte_name
      WHERE another_condition;
        

Example:


      WITH recent_orders AS (
        SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
      )
      SELECT customer_id, COUNT(*) AS total_orders
      FROM recent_orders
      GROUP BY customer_id;
        

Use Case – I: Find all employees whose salary is greater than the average salary of their department.

  1. Create a CTE that calculates average salary per department.
  2. Use that CTE to filter employees whose salary is higher than the average.

      WITH dept_avg_salary AS (
        SELECT 
          department_id,
          AVG(salary) AS avg_salary
        FROM employees
        GROUP BY department_id
      )
      SELECT 
        e.employee_id,
        e.name,
        e.salary,
        e.department_id
      FROM employees e
      JOIN dept_avg_salary d
        ON e.department_id = d.department_id
      WHERE e.salary > d.avg_salary;
        

Use Case – II: Find the highest paid employee(s) in each department.

  1. Create a CTE to get the maximum salary per department.
  2. Join this with the employee table to get full details of the top earners.

      WITH dept_max_salary AS (
        SELECT 
          department_id,
          MAX(salary) AS max_salary
        FROM employees
        GROUP BY department_id
      )
      SELECT 
        e.employee_id,
        e.name,
        e.salary,
        e.department_id
      FROM employees e
      JOIN dept_max_salary d
        ON e.department_id = d.department_id
       AND e.salary = d.max_salary;
        

6. πŸ“Œ Explain Triggers and How to Create Them?

A PostgreSQL trigger is a database object that automatically executes a function in response to an event such as insert, update, delete, truncate operation. It is mainly used for data validation, automatic updates, and preventing invalid transactions, etc.

Use OLD to access the row before change.
Use NEW to access the row after change.

Trigger Components:

  1. Trigger function
  2. Trigger (Defines when the function is called and on which table/event)

Example:

Track every time an employee’s salary is changed and store the old and new salary in a log table.


      CREATE TABLE employees (
        employee_id SERIAL PRIMARY KEY,
        name TEXT NOT NULL,
        department_id INT,
        salary NUMERIC(10,2)
      );
        

πŸ“Œ This is your main table storing employee data.


      CREATE TABLE salary_audit_log (
        audit_id SERIAL PRIMARY KEY,
        employee_id INT,
        old_salary NUMERIC(10,2),
        new_salary NUMERIC(10,2),
        changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      );
        

πŸ“Œ This table will log every salary change β€” who changed, what changed, and when.


      CREATE OR REPLACE FUNCTION log_salary_change()
      RETURNS TRIGGER AS $$
      BEGIN
        -- Only log if salary actually changed
        IF NEW.salary IS DISTINCT FROM OLD.salary THEN
          INSERT INTO salary_audit_log (employee_id, old_salary, new_salary)
          VALUES (OLD.employee_id, OLD.salary, NEW.salary);
        END IF;
      
        RETURN NEW;
      END;
      $$ LANGUAGE plpgsql;
        

πŸ“Œ This function checks if the salary was changed and logs it to salary_audit_log.


      CREATE TRIGGER trigger_salary_change
      AFTER UPDATE ON employees
      FOR EACH ROW
      EXECUTE FUNCTION log_salary_change();
        

πŸ“Œ This connects the function to the employees table and ensures it's triggered after every UPDATE.

7. πŸ“Œ Explain Functions in PostgreSQL

In PostgreSQL, functions (also called stored functions or user-defined functions) are reusable blocks of code that perform operations and return results. Functions can take input parameters, perform complex computations, interact with the database, and return a single value or a set of rows.

Why Use Functions?

πŸ”Ή Syntax of a Function in PostgreSQL


      CREATE [OR REPLACE] FUNCTION function_name (
          [parameter_name data_type, ...]
      )
      RETURNS return_type AS $$
      BEGIN
          -- function body (logic)
          RETURN some_value;
      END;
      $$ LANGUAGE plpgsql;
        

πŸ”Ή Example 1: A Simple Function


      CREATE OR REPLACE FUNCTION add_numbers(a INTEGER, b INTEGER)
      RETURNS INTEGER AS $$
      BEGIN
          RETURN a + b;
      END;
      $$ LANGUAGE plpgsql;
        

Usage:
SELECT add_numbers(5, 3); -- Output: 8

πŸ”Ή Example 2: Function Returning a Table (Set-returning function)


      CREATE OR REPLACE FUNCTION get_users_by_role(role_name TEXT)
      RETURNS TABLE(id INT, name TEXT) AS $$
      BEGIN
          RETURN QUERY
          SELECT id, name FROM users WHERE role = role_name;
      END;
      $$ LANGUAGE plpgsql;
        

Usage:
SELECT * FROM get_users_by_role('admin');

πŸ”Ή Function vs Stored Procedure

πŸ“Š Explain Window Functions in PostgreSQL

β€’ A window function performs a calculation across a set of rows related to the current row.
β€’ This set of rows is called the window frame.
β€’ Window functions are always used with the OVER() clause.
β€’ You can group rows using the PARTITION BY clause (e.g., rank employees within each department).
β€’ You can sort rows within the window using the ORDER BY clause.
β€’ Unlike aggregate functions, window functions do not collapse or remove rows β€” they keep all rows and add calculated results.

1. Ranking Functions

Used to assign rank or sequence to rows within a partition.

2. Aggregate Functions

Traditional aggregate functions used as window functions to retain row-level detail.

3. Lead and Lag Functions

Used to look at another row's value in the result β€” either before (LAG) or after (LEAD) the current row.

4. Window Frame Functions

These functions let you pick values from certain positions in a window frame β€” like the first, last, or nth row β€” not just the current row.

5. Percentile Functions

Percentile functions help you find values at a certain percentage in a dataset β€” like finding the median or the value at the 90th percentile. Useful in statistics, data analysis, and performance measurement.

6. Distribution Functions

Divides the result set or calculates distribution metrics.

Examples:


      -- Assigns a unique number to each row based on the order.
      SELECT name, salary,
        ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
      FROM employees;
      
      -- Ranks rows with gaps if values are tied.
      SELECT name, salary,
        RANK() OVER (ORDER BY salary DESC) AS rank
      FROM employees;
      
      -- Ranks rows without gaps (no skipping numbers).
      SELECT name, salary,
        DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
      FROM employees;
      
      -- Splits rows into n roughly equal groups.
      SELECT name, salary,
        NTILE(4) OVER (ORDER BY salary DESC) AS quartile
      FROM employees;
      
      -- Returns the first value in the window frame.
      SELECT name, salary,
        FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS top_salary
      FROM employees;
        

🧩 Explain Partitioning in PostgreSQL

Partitioning allows you to divide a large table into smaller, more manageable pieces called partitions, which can improve query performance and make maintenance easier. Instead of storing all rows in one big table, PostgreSQL splits them into smaller child tables, based on some rule (like date range, category, etc.).

Types of Partitioning

Type Description
Range Partitioning Rows are divided based on value ranges (e.g., dates).
List Partitioning Rows are divided by a list of values (e.g., region = 'US').
Hash Partitioning Rows are divided based on a hash of a column (even distribution).
Composite Partitioning Combination of two types (e.g., range + list).

Example: Range Partitioning by Date


      CREATE TABLE sales (
        id SERIAL,
        amount NUMERIC,
        sale_date DATE
      ) PARTITION BY RANGE (sale_date);
      
      CREATE TABLE sales_2024_q1 PARTITION OF sales
        FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
      
      CREATE TABLE sales_2024_q2 PARTITION OF sales
        FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
        

Now, if you insert:


      INSERT INTO sales (amount, sale_date) VALUES (500, '2024-02-15');
      πŸ‘‰ It will automatically go into sales_2024_q1.
        

Example: List Partitioning by Region


      CREATE TABLE customers (
        id SERIAL,
        name TEXT,
        region TEXT
      ) PARTITION BY LIST (region);
      
      CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US');
      CREATE TABLE customers_uk PARTITION OF customers FOR VALUES IN ('UK');
        

πŸ”„ Explain Transaction in PostgreSQL?

A transaction is a sequence of one or more SQL operations executed as a single unit of work, which must either completely succeed or completely fail. Transactions ensure data integrity and consistency, especially in the presence of errors, concurrent access, or system failures.

Transactions follow the ACID properties:

  1. Atomicity: Guarantees that all parts of a transaction are completed successfully. If any part fails, the entire transaction is rolled back (the "all or nothing" principle).
  2. Consistency: A transaction brings the database from one valid state to another, preserving data integrity. Consistency ensures the data stays accurate and reliable, before and after each transaction.
  3. Isolation: Transactions are isolated from each other until they are committed, avoiding interference (don’t affect each other while they’re running).
  4. Durability: Once committed, changes are permanent even in case of a crash. This ensures that the effects of the transaction are permanently recorded in the database.

There are three main commands in a transaction block. They are:

  1. BEGIN: Starts a transaction block.
  2. COMMIT: Commits the current transaction block, making all changes made within the block permanent.
  3. ROLLBACK: Cancels all changes made in the current transaction block, reverting to the state before the transaction began.

Transfer $100 from Account A (id=1) to Account B (id=2). If any step fails (e.g., insufficient balance), roll back to a safe point or cancel the whole transaction.


      -- Step 1: Start the transaction
      BEGIN;
      
      -- Step 2: Check balance of Account A
      SELECT balance FROM accounts WHERE id = 1;
      
      -- Step 3: Savepoint before update
      SAVEPOINT before_transfer;
      
      -- Step 4: Deduct $100 from Account A
      UPDATE accounts
      SET balance = balance - 100
      WHERE id = 1
        AND balance >= 100;
      
      -- Check how many rows were affected (simulate a check in an application)
      -- If 0 rows affected, rollback to savepoint (insufficient funds)
      -- For demo, assume no rows affected
      
      ROLLBACK TO before_transfer;
      
      -- Log the failure or notify user (done outside SQL)
      
      -- Step 5: Proceed with a valid case (simulate proper balance now)
      UPDATE accounts
      SET balance = balance - 100
      WHERE id = 1;
      
      -- Step 6: Add $100 to Account B
      UPDATE accounts
      SET balance = balance + 100
      WHERE id = 2;
      
      -- Step 7: Finalize the transaction
      COMMIT;
        

Code:


      CREATE OR REPLACE FUNCTION transfer_money(from_id INT, to_id INT, amount NUMERIC)
      RETURNS TEXT AS $$
      DECLARE
        rows_updated INT;
      BEGIN
        BEGIN
          -- Start transaction implicitly in function
      
          SAVEPOINT before_transfer;
      
          -- Try to deduct amount
      
          UPDATE accounts
          SET balance = balance - amount
          WHERE id = from_id AND balance >= amount;
      
          GET DIAGNOSTICS rows_updated = ROW_COUNT;
      
          IF rows_updated = 0 THEN
            ROLLBACK TO SAVEPOINT before_transfer;
            RETURN 'Insufficient funds';
          END IF;
      
          -- Credit the other account
      
          UPDATE accounts
          SET balance = balance + amount
          WHERE id = to_id;
      
          RETURN 'Transfer successful';
      
        EXCEPTION WHEN OTHERS THEN
          ROLLBACK;
          RETURN 'Error occurred';
        END;
      END;
      $$ LANGUAGE plpgsql;
        

Call the above function:


      SELECT transfer_money(1, 2, 100);
        

Explain Indexing in PostgreSQL?

Types of Indexes:

  1. B-tree (default):
    This is the default index type in PostgreSQL
    It is used for Equality and range comparisons (=, <, >, BETWEEN, ORDER BY)
    Structure: Balanced tree (B-tree)
    Queries like:
    SELECT * FROM users WHERE age > 30;
    Default index type in PostgreSQL when you write:
    CREATE INDEX ON users(age);
  2. Hash:
    Hash indexes use a hash function to map values to locations in the index. These indexes are efficient for exact matches but are not suited for range queries.
    It is used for exact match query (=)

    CREATE INDEX idx_hash ON books USING HASH (author);
    CREATE INDEX USING hash ON users(email);
    SELECT * FROM users WHERE email = 'john@example.com';
  3. GiST (Generalized Search Tree):
    GiST indexes are flexible and support a wide range of data types and search operations.
    They are particularly useful for spatial and full-text search queries.
    Best for: Spatial and range queries (<<, &<, &>, >>, @>, <@, ~=)

    CREATE INDEX idx_gist ON books USING GIST (published_year);
  4. SP-GiST:
    SP-GiST is a specialized type of index in PostgreSQL designed to support partitioned search structures, especially where data can be recursively split into non-overlapping regions β€” great for hierarchical, spatial, or multidimensional data.
    Best for: IP ranges, prefix searches, geometric data, etc
  5. GIN:
    GIN indexes are designed for handling complex data types such as arrays and full-text searches.
    They are optimized for fast search operations but it can handle fast searches and typically consumes more space.
    Best for: Array and full-text searches (<@, @>, =, &&).

    CREATE INDEX idx_gin ON books USING GIN (title);
  6. BRIN:
    BRIN indexes are suitable for large tables with ordered data.
    They divide the table into blocks and store summarized information for each block, making them efficient for range queries on sorted data.
    Best for: Large datasets with ordered data, especially for range queries (<, <=, =, >=, >).

    CREATE INDEX idx_brin ON books USING BRIN (published_year);

Explain JOIN in PostgreSQL?

A JOIN is used to combine rows from two or more tables based on a related column between them. This is commonly used in relational databases to retrieve data that is spread across multiple tables. You can filter JOIN results with WHERE or ON conditions.

Types of JOINs in PostgreSQL

  1. INNER JOIN
    Returns only rows that have matching values in both tables.
    SELECT *
    FROM orders
    INNER JOIN customers ON orders.customer_id = customers.id;

    βœ… Only matching records.
  2. LEFT JOIN (or LEFT OUTER JOIN)
    Returns all rows from the left table, and matched rows from the right table. If no match, NULLs are returned.
    SELECT *
    FROM orders
    LEFT JOIN customers ON orders.customer_id = customers.id;

    βœ… All from left table + matches from right.
  3. RIGHT JOIN (or RIGHT OUTER JOIN)
    Opposite of LEFT JOIN. Returns all rows from the right table and matched ones from the left.
    SELECT *
    FROM orders
    RIGHT JOIN customers ON orders.customer_id = customers.id;
  4. FULL JOIN (or FULL OUTER JOIN)
    Returns rows when there is a match in either left or right table. NULLs are filled where there's no match.
    SELECT *
    FROM orders
    FULL JOIN customers ON orders.customer_id = customers.id;
  5. CROSS JOIN
    Returns the Cartesian product: all combinations of rows from both tables.
    SELECT *
    FROM products
    CROSS JOIN categories;

    ❗ Be careful – can produce a huge result set.
  6. SELF JOIN
    A table is joined with itself using aliases (allowing you to compare data within the same table).
    SELECT a.name AS employee, b.name AS manager
    FROM employees a
    JOIN employees b ON a.manager_id = b.id;

Explain MVCC in PostgreSQL?

MVCC (Multi-Version Concurrency Control) in PostgreSQL is a concurrency control method used to handle simultaneous transactions without locking the entire database. It allows multiple transactions to access the same data at the same time without interfering with each other, ensuring consistency, isolation, and performance.

When a transaction modifies a row, PostgreSQL creates a new version of that row and tags it with the transaction ID. Other transactions continue to see the previous version of the row until the modifying transaction is committed. This approach ensures consistency and isolation without requiring read locks.

Key Concepts

Garbage Collection and VACUUM

Once a transaction is committed, older versions of the data that are no longer visible to any active transactions are eventually cleaned up by a process called vacuuming.

Vacuuming: The process of cleaning up old, no longer needed versions of data.

How Does VACUUM Work?

  1. Identifies dead tuples: Any tuple (row) with an xmax transaction ID older than the current transaction is marked as dead.
  2. Removes the dead tuples: It removes these tuples from storage, freeing up space.
  3. Updates visibility maps: Ensures visibility information is properly updated for each page of the table.

Real-World Analogy

Summary

Exception Handling in PostgreSQL

Exception handling is primarily managed using the PL/pgSQL language's BEGIN ... EXCEPTION block. Use BEGIN ... EXCEPTION to handle errors gracefully within functions, procedures, and DO blocks.

You can catch specific exceptions like division_by_zero, or use a generic handler for others. Use RAISE to throw errors or display custom messages.

Common Exception Types

Syntax for Exception Handling

BEGIN
         -- Code that might throw an exception
      EXCEPTION
         WHEN exception_type THEN
            -- Handling code
      END;

Example: Handling Division by Zero

DO $$
      DECLARE
         result numeric;
      BEGIN
         BEGIN
            result := 10 / 0;
         EXCEPTION
            WHEN division_by_zero THEN
               RAISE NOTICE 'Cannot divide by zero.';
               result := NULL;
         END;
         RAISE NOTICE 'Result is: %', result;
      END;
      $$ LANGUAGE plpgsql;

Using RAISE for Custom Errors

DO $$
      BEGIN
         RAISE EXCEPTION 'This is a custom error message.';
      END;
      $$ LANGUAGE plpgsql;

Handling Multiple Exceptions

BEGIN
         -- risky operation
      EXCEPTION
         WHEN division_by_zero THEN
            RAISE NOTICE 'Handled division by zero.';
         WHEN others THEN
            RAISE NOTICE 'Some other error occurred.';
      END;

16. How to Handle Backup and Restore in PostgreSQL?

How to Backup in PostgreSQL

  1. Using pg_dump (Good for most cases)
    This creates a file with all your database info.
    pg_dump -U your_username -d your_dbname -f backup.sql
    βœ… This saves everything into backup.sql.
  2. Backup All Databases
    pg_dumpall -U your_username > all_databases.sql
    βœ… This backs up everything in your PostgreSQL server.
  3. Copying the Data Folder (Advanced)
    You can also back up the entire PostgreSQL data folder, but this is recommended for bigger systems.

How to Restore in PostgreSQL

  1. Restore from pg_dump Backup
    If you used pg_dump, run:
    psql -U your_username -d new_dbname -f backup.sql
    βœ… This restores everything into a new database.
  2. Restore All Databases
    If you used pg_dumpall, run:
    psql -U your_username -f all_databases.sql
    βœ… This brings back all your databases.

Quick Commands Summary

Action Command
Backup one DB pg_dump -U user -d db -f file.sql
Restore one DB psql -U user -d db -f file.sql
Backup all DBs pg_dumpall -U user > all.sql
Restore all DBs psql -U user -f all.sql

17. What Is the pg_hba.conf file?

The pg_hba.conf file controls client authentication in PostgreSQL. It specifies which clients are allowed to connect, what authentication methods they use, and which databases they can access. This file is essential for securing your PostgreSQL server.

HBA stands for Host-Based Authentication.

This file tells PostgreSQL:

Typically found in the PostgreSQL data directory, for example:
/etc/postgresql/15/main/pg_hba.conf
or
/var/lib/pgsql/data/pg_hba.conf

19. How to Optimize Queries in PostgreSQL?

  1. Use EXPLAIN to See What’s Happening
    Before optimizing, run:
    EXPLAIN ANALYZE your_query_here;
    It shows how PostgreSQL executes your query: indexes used, rows scanned, and more.
  2. Create indexes on columns used in WHERE, JOIN, ORDER BY, or GROUP BY
    Example:
    CREATE INDEX idx_user_email ON users(email);
    Avoid over-indexing as it slows down writes (INSERT, UPDATE, DELETE).
  3. Avoid SELECT *; select only necessary columns.
    Example:
    SELECT id, name FROM users;
    Fetching only what you need reduces I/O and memory usage.
  4. Filter early using WHERE
    Example:
    SELECT id FROM users WHERE status = 'active';
    This helps PostgreSQL scan fewer rows.
  5. Use LIMIT to reduce result size during development or pagination
    Example:
    SELECT * FROM products LIMIT 50;
  6. Do VACUUM and ANALYZE regularly
    Keep statistics and table health fresh:
    VACUUM ANALYZE;
  7. Optimize joins and subqueries

20. What are the Different Isolation Levels in PostgreSQL?

21. What Is The WAL (Write-Ahead Logging) In PostgreSQL, And How Does It Work?

Write-Ahead Logging (WAL) is a method to ensure data integrity.
Before any changes are made to the database, changes are first recorded in a log (the WAL).
This log helps recover the database to a consistent state in case of a crash.
WAL works by writing changes to the log file before applying them to the database, ensuring safety even during failures.

23. Extensions in PostgreSQL

PostgreSQL extensions are packages that enhance the database by adding new capabilities such as:

Extensions are managed using the CREATE EXTENSION command and help keep PostgreSQL modular and lightweight.

Common PostgreSQL Extensions

Extension Description
UUID-OSSP Generates UUIDs (v1, v4, etc.)
PG_STAT_STATEMENTS Tracks execution statistics for SQL statements
HSTORE Enables key-value storage in a single column
CITEXT Case-insensitive text type
PG_TRGM Fuzzy text search using trigram matching
BTREE_GIN Enables B-tree indexing in GIN indexes
TABLEFUNC Functions like crosstab for pivot tables
POSTGRES_FDW Connect to other PostgreSQL databases
POSTGIS Support for geographic (GIS) objects
PLV8 Use JavaScript (V8 engine) for stored procedures
PGCRYPTO Cryptographic functions (hashing, encryption)

How to Use Extensions

  1. Install:
    sudo apt install postgresql-contrib (Ubuntu/Debian)
  2. Enable:
    CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
  3. List Installed Extensions:
    \dx
  4. Remove:
    DROP EXTENSION "uuid-ossp";

24. Features of PostgreSQL

PostgreSQL is known for its rich set of features:

ACID Properties in PostgreSQL

PostgreSQL ensures data integrity and reliability using the ACID properties of transactions:

These properties make PostgreSQL a reliable choice for critical applications where data accuracy and consistency are essential.