RANK,DENSE_RANK, ROW_NUMBER
ROW_NUMBER():
It assigns a unique sequential integer to each row within a partition of a result set. The numbering starts at 1 for each partition. The order in which rows are numbered within a partition is determined by the ORDER BY clause in the OVER() clause.
SYNTAX
ROW_NUMBER() OVER (
[PARTITION BY column1, column2,]
ORDER BY column_to_sort [ASC|DESC]
) AS row_num
Key aspects of ROW_NUMBER() with PARTITION BY:
- Grouping: The PARTITION BY clause divides your result set into logical partitions based on the unique values in the specified column(s).
Independent - Numbering: The ROW_NUMBER() function operates independently within each of these partitions. This means that the numbering restarts at 1 for the first row in each new partition.
Ordering within - Partitions: The ORDER BY clause within the OVER() clause dictates the order in which the rows within each partition are numbered. This is crucial for getting meaningful sequential numbers.
RANK():
It assigns the same rank to rows with equal values in the ordering criteria. This can lead to gaps in the ranking sequence.
Here's the syntax for RANK():
RANK() OVER (
[PARTITION BY column1, column2]
ORDER BY column_to_sort [ASC|DESC]
) AS rank_num
Key aspects of RANK() with PARTITION BY:
- Partitioning: The PARTITION BY clause divides the result set into separate groups based on the unique values in the specified column(s).
- Independent Ranking: The RANK() function is applied to each of these partitions individually. The ranking restarts at 1 for the first row in each new partition.
- Ordering within Partitions: The ORDER BY clause within the OVER() clause determines the order of rows within each partition, and the rank is assigned based on this order.
DENSE_RANK():
Similar to RANK(), DENSE_RANK() assigns the same rank to rows with equal values in the ORDER BY clause. However, unlike RANK(), DENSE_RANK() does not skip any ranks. The next rank assigned after a tie will be the immediately following integer.
Here's the syntax for DENSE_RANK():
DENSE_RANK() OVER (
[PARTITION BY column1, column2]
ORDER BY column_to_sort [ASC|DESC]
) AS dense_rank_num
The below query will show the difference between Row_number, Rank and Dense_rank:
select *, row_number() over (partition by customer_id order by amount desc) as rn,
dense_rank() over (partition by customer_id order by amount desc) as dr,
RANK() over (partition by customer_id order by amount desc) as r
from public.payment
So, here the output for ROW_NUMBER, RANK, AND DENSE RANK is totally different wheren we are partitioned data based on customer id,
The ROW_NUMBER function has ranked every customer based on id , so here if you see the data the Customer_id='1' is 20 times hence its has ranked all the 20 records with succeeding integer value.
By using Rank for each customer with Id as '1' and based on amount the ranking is done, suppose cutomer_id='1' and amount as '9.99' is ranked as '1' the customer_id='1' and amount as '7.99' will be ranked as 2, but in the scenario where customer_id='1' and amount as '5.99' is has 4 different entry then all the 4 entry will be ranked as '3' and the next different record will be ranked as 7 because of the sequece if you count the entery oif next record is on 7th position , so using RANK function it will be ranked like this but ,
Alternatively this can be solved using DENSE RANK is where customer_id='1' and amount as '5.99' is has 4 different entry then all the 4 entry will be ranked as '3' and the next different record will be ranked as '4' , this will not skip the ranking.
So, according to the scenarios you do partitions on multiple columns and the result will vary accordingly. So lets have a look where the scenario is :
How to find the third heigest amount of each customer using windows function:
Here to have to find for every customer what is the third heigest amount , suppose for the customer one from above snap the third heigest amount is '5.99' and for customer with id 2 have third heigest amount as '5.99' similarly for all customers only the third heigest amount should be reflecting .
Using dense rank we can get this output, the below query will represent the same.
select * from (select *, row_number() over (partition by customer_id order by amount desc) as rn,
dense_rank() over (partition by customer_id order by amount desc) as dr,
RANK() over (partition by customer_id order by amount desc) as r
from public.payment) where dr=3
The column 'dr' is fecting the rank having value as 3, that is the third heigest value for each customer , as we have did partition based on customer id.