A subquery, also known as a nested query or inner query, is a query within another SQL query. It is used to retrieve data that will be used in the main query as a condition to further restrict the data to be retrieved. Subqueries can be used in SELECT, INSERT, UPDATE, and DELETE statements, as well as in clauses like WHERE, FROM, and HAVING.

Syntax

SELECT column_name(s)
FROM table_name
WHERE column_name OPERATOR (SELECT column_name FROM table_name WHERE condition);

Types of Subqueries

  1. Single-row subquery: Returns only one row.
  2. Multi-row subquery: Returns more than one row.
  3. Correlated subquery: A subquery that depends on the outer query for its values.

Single-Row Subqueries

A single-row subquery returns only one row of data. It is often used with comparison operators like =, >, <, >=, <=, and <>.

Example: Find the employee with the highest salary

SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

The subquery (SELECT MAX(salary) FROM employees) returns the highest salary from the employees table. The main query then uses this value to find the employee(s) with that salary.

Multi-Row Subqueries

A multi-row subquery returns more than one row of data. It is often used with operators like IN, ANY, ALL, and EXISTS.

Example: Find employees who work in the same district as Anand

SELECT name, district
FROM employees
WHERE district IN (SELECT district FROM employees WHERE name = 'Anand');

The subquery (SELECT district FROM employees WHERE name = 'Anand') returns the district where Anand works. The main query then finds all employees who work in that district.

Correlated Subqueries

A correlated subquery is a subquery that depends on the outer query for its values. It is executed repeatedly, once for each row that is evaluated by the outer query.

Example: Find employees whose salary is greater than the average salary in their district

SELECT name, salary, district
FROM employees e1
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.district = e1.district);

The subquery (SELECT AVG(salary) FROM employees e2 WHERE e2.district = e1.district) calculates the average salary for each district. The main query then compares each employee’s salary to the average salary in their district.

Practical Use Cases

  1. Filtering Data: Subqueries can be used to filter data based on conditions that are derived from other tables or the same table.
  2. Data Aggregation: Subqueries can be used to perform calculations or aggregations that are then used in the main query.
  3. Data Comparison: Subqueries can be used to compare data between different tables or within the same table.

Example Database

Let’s assume we have an employees table with the following structure:

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    salary DECIMAL(10, 2),
    district VARCHAR(50),
    state VARCHAR(50)
);

Sample Data:

idnamesalarydistrictstate
1Anand50000KaraikalPuducherry
2Bala60000CoimbatoreTamil Nadu
3Kavitha55000ChennaiTamil Nadu
4Raj70000CuddaloreTamil Nadu
5Kumar45000MaduraiTamil Nadu
6Ram80000SalemTamil Nadu
7Karthik90000TrichyTamil Nadu
8David40000TirupatiAndhra Pradesh
9Kannan75000BangaloreKarnataka
10Siva65000KannurKerala
11Ramesh85000WayanadKerala
12Suresh95000PalakkadKerala
13Sathish50000IdukkiKerala
14Sujatha60000KaraikalPuducherry

Example Queries Using Subqueries

1. Single-Row Subquery

Find the employee with the highest salary

SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

Output:

namesalary
Suresh95000

The subquery (SELECT MAX(salary) FROM employees) finds the highest salary in the employees table, which is 95000. The main query then retrieves the employee(s) with that salary.

2. Multi-Row Subquery

Find employees who work in the same district as Anand

SELECT name, district
FROM employees
WHERE district IN (SELECT district FROM employees WHERE name = 'Anand');

Output:

namedistrict
AnandKaraikal
SujathaKaraikal

The subquery (SELECT district FROM employees WHERE name = 'Anand') returns Karaikal, the district where Anand works. The main query then retrieves all employees who work in Karaikal.

3. Multi-Row Subquery with IN

Find employees who work in districts located in Tamil Nadu

SELECT name, district
FROM employees
WHERE district IN (SELECT district FROM employees WHERE state = 'Tamil Nadu');

Output:

namedistrict
BalaCoimbatore
KavithaChennai
RajCuddalore
KumarMadurai
RamSalem
KarthikTrichy

The subquery (SELECT district FROM employees WHERE state = 'Tamil Nadu') returns all districts in Tamil Nadu. The main query then retrieves employees who work in those districts.

4. Correlated Subquery

Find employees whose salary is greater than the average salary in their district

SELECT name, salary, district
FROM employees e1
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.district = e1.district);

Output:

namesalarydistrict
Bala60000Coimbatore
Kavitha55000Chennai
Raj70000Cuddalore
Ram80000Salem
Karthik90000Trichy
Kannan75000Bangalore
Ramesh85000Wayanad
Suresh95000Palakkad

The subquery (SELECT AVG(salary) FROM employees e2 WHERE e2.district = e1.district) calculates the average salary for each district. The main query compares each employee’s salary to the average salary in their district and retrieves those who earn more.

5. Subquery in SELECT

Display each employee’s salary along with the average salary of their state

SELECT name, salary, state, 
       (SELECT AVG(salary) FROM employees e2 WHERE e2.state = e1.state) AS avg_state_salary
FROM employees e1;

Output:

namesalarystateavg_state_salary
Anand50000Puducherry55000.00
Bala60000Tamil Nadu65000.00
Kavitha55000Tamil Nadu65000.00
Raj70000Tamil Nadu65000.00
Kumar45000Tamil Nadu65000.00
Ram80000Tamil Nadu65000.00
Karthik90000Tamil Nadu65000.00
David40000Andhra Pradesh40000.00
Kannan75000Karnataka75000.00
Siva65000Kerala76250.00
Ramesh85000Kerala76250.00
Suresh95000Kerala76250.00
Sathish50000Kerala76250.00
Sujatha60000Puducherry55000.00

The subquery (SELECT AVG(salary) FROM employees e2 WHERE e2.state = e1.state) calculates the average salary for each state. This value is displayed alongside each employee’s details.

6. Subquery in HAVING

Find districts where the average salary is greater than the overall average salary

SELECT district, AVG(salary) AS avg_district_salary
FROM employees
GROUP BY district
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

Output:

districtavg_district_salary
Bangalore75000.00
Palakkad95000.00
Salem80000.00
Trichy90000.00
Wayanad85000.00

The subquery (SELECT AVG(salary) FROM employees) calculates the overall average salary. The main query groups employees by district and retrieves districts where the average salary is greater than the overall average.

7. Subquery in FROM

Subquery in FROM clause is called as Derived table in SQL Server.

Find the top 3 highest-paid employees

SELECT name, salary
FROM (SELECT name, salary FROM employees ORDER BY salary DESC) AS top_employees
LIMIT 3;

Output:

namesalary
Suresh95000
Karthik90000
Ramesh85000

The subquery (SELECT name, salary FROM employees ORDER BY salary DESC) retrieves all employees sorted by salary in descending order. The main query then limits the result to the top 3 rows.

Key Takeaways

  1. Subqueries are queries nested inside another query and can be used in SELECT, INSERT, UPDATE, and DELETE statements.
  2. Single-row subqueries are useful for comparisons with a single value (e.g., finding the highest salary).
  3. Multi-row subqueries are ideal for filtering data based on multiple values (e.g., finding employees in specific districts).
  4. Correlated subqueries allow you to compare data within the same table (e.g., comparing salaries within districts).
  5. Always test subqueries independently to ensure they return the expected results before integrating them into the main query.