Leveraging SQL SET Operators for Advanced Data Manipulation
SET Operators
Here we will see SET operators in SQL, so SET operators are used to to get combine result from two different set of queries. We use the SET operators enable us to combine, compare and filter the data from different sources. These queries are referred to as combined queries.
Set operators are used for data cleaning, analysis results from two difference tables, different tables present in different schemas. These operators are very useful
for data comparing and analytical purpose.
There are four SET operators in SQL mentioned below:
Operator | Return |
---|---|
UNION | All DISTINCT rows from both tables will be returned |
UNION ALL | All rows from both tables will be returned with duplicates also |
MINUS | All distinct row from first table not including the second table |
INTERSECT | All the common rows returned from both the table |
When to Use SET Operators:
-
Combining Data: When you need a consolidated view of data from multiple tables that have similar structures (e.g., merging customer lists from different regions).
-
Comparing Data: Identifying differences or commonalities between datasets (e.g., finding customers present in one marketing campaign but not another, or identifying products sold in all regions).
-
Data Cleaning: Removing duplicates or identifying inconsistencies across datasets.
-
Reporting: Generating comprehensive reports by combining data from various segments.
UNION :
All DISTINCT rows from both tables will be returned, there are some pre-requisite for using UNION command on two tables:-
- Every select statement should have same number of columns in the same order
- The data type of every column should be same in both tables.
SYNTAX
SELECT column_nm1,column_nm2.. FROM table1
UNION
SELECT column_nm1,column_nm.. FROM table2;
Lets understand from this example:
Employees_NY:
EmployeeID | Name |
---|---|
101 | Alice |
102 | Bob |
103 | Charlie |
Employees_LA:
EmployeeID | Name |
---|---|
103 | Charlie |
104 | David |
105 | Eve |
SELECT employeeID,Name FROM Employees_NY
UNION
sELECT employeeID,Name FROM Employees_LA;
Result:
EmployeeID | Name |
---|---|
101 | Alice |
102 | Bob |
103 | Charlie |
104 | David |
105 | Eve |
Note: Charlie (103) appears only once since in UNION operator no duplicates are there.
UNION ALL:
All rows from both tables will be returned with duplicates also, there are some pre-requisite for using UNION ALL command on two tables:-
- Every select statement should have same number of columns in the same order
- The data type of every column should be same in both tables.
SYNTAX
SELECT column_nm1,column_nm2.. FROM table1
UNION ALL
SELECT column_nm1,column_nm.. FROM table2;
SELECT employeeID,Name FROM Employees_NY
UNION ALL
SELECT employeeID,Name FROM Employees_LA;
Lets understand from this example:
Result:
EmployeeID | Name |
---|---|
101 | Alice |
102 | Bob |
103 | Charlie |
103 | Charlie |
104 | David |
105 | Eve |
Note: Charlie (103) appears twice, as it's present in both tables, so it indicates in union all no duplicates will be removed it will simply add all data from both the tables.
INTERSECT :
There are some pre-requisite for using INTERSECT command on two tables:-
- Every select statement should have same number of columns in the same order
- The data type of every column should be same in both tables.
SYNTAX
SELECT
column_nm1, column_nm2, ...FROM table1 [WHERE condition]
INTERSECT
SELECT column_nm1, column_nm2, ...FROM table2 [WHERE condition]
SELECT employeeID,Name FROM Employees_NY
INTERSECT
SELECT employeeID,Name FROM Employees_LA;
Result:
EmployeeID | Name |
---|---|
103 | Charlie |
Note: Only the common row from both the table is returned which is charlie.
MINUS :
All distinct row from first table not including any rows from the second table, there are some pre-requisite for using MINUS command on two tables:-
- Every select statement should have same number of columns in the same order
- The data type of every column should be same in both tables.
SYNTAX
SELECT
column_nm1, column_nm2, ...FROM table1 [WHERE condition]
MINUS
SELECT column_nm1, column_nm2, ...FROM table2 [WHERE condition]
Here we are trying to get data from first table which are not identical to second table, Hence firstly we are trying to get data from Employees_NY which is not there in Employees_LA see below:
SELECT employeeID,Name FROM Employees_NY
MINUS
SELECT employeeID,Name FROM Employees_LA;
Result:
EmployeeID | Name |
---|---|
101 | Alice |
102 | Bob |
Note : These are the employees in Employees_NY who are not in Employees_LA.
Conversely
Now we are trying to get data from second table which are not identical to first table, Hence firstly we are trying to get data from Employees_LA which is not there in Employees_NY see below:
SELECT employeeID,Name FROM Employees_LA
MINUS
SELECT employeeID,Name FROM Employees_NY;
Result:
EmployeeID | Name |
---|---|
104 | David |
105 | Eve |
Note: These are the employees in Employees_LA who are not in Employees_NY.
Conclusion: In essence, set operators provide a powerful and flexible way to perform advanced data analysis and integration by enabling you to perform set-theoretic operations directly within your SQL queries.