Showing posts with label RANK(). Show all posts
Showing posts with label RANK(). Show all posts

Windowing Partition Functions (Rank(), Row_Number() and Dense_Rank())

Friday, October 2, 2009 |


Microsoft SQL Server 2005 comes up with fabulous Partitioning functions which work well in SQL Server 2008 too. Just because of ignorance or unawareness of these functions, programmer used to iterate BIG BIG loops in front-end. Without much a boring lecture, let me start over functions now.

Row_Number():  Basically Row_Number() function is used to give number to each row in result set. It plays an important and a crucial role in many difficult situations. One can use it for paging purpose too (I probably will post one SP in future article which you can directly use in your front-end for paging purpose).

--create temp table for demo
IF OBJECT_ID('tempdb..#Employee') is not null DROP TABLE #EMPLOYEE
Create Table #Employee
(
      FirstName varchar(20),
      LastName Varchar(20),
      DepartMent varchar(20)
)


--insert few records
Insert Into #Employee
SELECT 'Ritesh','Shah', 'MIS' UNION ALL
SELECT 'Rajan','Shah','ACCT' UNION ALL
SELECT 'Rajan','Mehta','ACCT' UNION ALL
SELECT 'Alka','Shah','MIS'


--simple Row_Number with Order By First Name, Last Name
--this will not make any partition and simply give row number to every row
SELECT ROW_NUMBER() over(order by FirstName,LastName) as Num,FirstName,LastName,DepartMent
FROM #Employee


--this will make a partition on First Name
--so, every first instance of FirstName will have row number 1
--you can find duplicate records with this way too. :)
SELECT ROW_NUMBER() over(Partition by FirstName order by FirstName,LastName) as Num,FirstName,LastName,DepartMent
FROM #Employee


Rank() and Dense_Rank(): These two functions mainly used to use for giving Rank to each row.  You may use it for finding Toppers based on examination results records set or maybe use it for finding few top vendors based on track records of sales you have etc. There is only one small but technically big difference between Rank() and Dense_Rank() functions which I am going to show you in practical script below which will be easy to evaluate as I am going to show you all possible partitioning function in one T-SQL.

--create temp table for demo
IF OBJECT_ID('tempdb..#SampleOrder') is not null DROP TABLE #SampleOrder
Create Table #SampleOrder
(
      OrderID Int Identity(1,1),
      ClientID int,
      TotalSample int,
      SampleDate datetime
)

--insert few records
Insert Into #SampleOrder
SELECT 1,2,GETDATE()-5 UNION ALL
SELECT 2,5,GETDATE()-8 UNION ALL
SELECT 1,22,GETDATE()-3 UNION ALL
SELECT 3,2,GETDATE()-1 UNION ALL
SELECT 1,2,GETDATE()-5


SELECT *,
            ROW_NUMBER() over(order by TotalSample) as RowNum,
            ROW_NUMBER() over(Partition By ClientID order by TotalSample) as RowNumP,
            Rank() over(order by TotalSample) as Ran,
            Rank() over(Partition By ClientID order by TotalSample) as RanP,
            Dense_Rank() over(order by TotalSample) as DRan,
            Dense_Rank() over(Partition By ClientID order by TotalSample) as DRanP
FROM #SampleOrder


If you will observe output of above T-SQL, you will get to know the difference between Rank and Dense_Rank. There is only difference, if you will get same instance based on partition, rank will give same number all, suppose we get three same instance (same clientID three time with same value) rank will give it, suppose 1 for all three and when next instance come, rank will give it 4 rather than 2. In Dense_Rank, you will get 2, it won’t break the chain.


Reference: Ritesh Shah
http://www.sqlhub.com
Note: Microsoft Books online is a default reference of all articles but examples and explanations prepared by Ritesh Shah, founder of
 http://www.SQLHub.com

Use of RANK() and Partition by clause in SQL-Server 2005

Tuesday, March 3, 2009 |

Use of RANK() and Partition by clause in SQL-Server 2005
We will start by creating one table for demonstration and enter some records in it.
--Create one table
CREATE TABLE BlogCount
(
BloggerName VARCHAR(10),
Topic VARCHAR(15),
[Year] INT,
Total INT
)

--Insert records in above table
INSERT INTO BlogCount VALUES('Ritesh','SQL',2005,10)
INSERT INTO BlogCount VALUES('Ritesh','SQL',2006,17)
INSERT INTO BlogCount VALUES('Ritesh','SQL',2007,124)
INSERT INTO BlogCount VALUES('Ritesh','SQL',2008,124)
INSERT INTO BlogCount VALUES('Ritesh','.NET',2008,24)
INSERT INTO BlogCount VALUES('Alka','SQL',2007,14)
INSERT INTO BlogCount VALUES('Alka','.NET',2007,18)
INSERT INTO BlogCount VALUES('Alka','SQL',2008,14)

Once, you are done with above task, let’s think of usability of RANK() and Partition by clause in our real world. Suppose you wish to get record set of all the blogger with their highest total of blog. What will you do? You may use sub-query or/and GROUP BY clause. Right???? Doesn’t it seems tedious and bit difficult? Here is easy solution for the same.
First we will use Rank() and Partition By clause and then we will filter our record set. Partition is like Group by, we want list sorted based on “BloggerName” so we will be putting it in Partition by clause and Order by clause in below query will give Ranking 1 to highest “Total”.
SELECT BloggerName,Topic,[Year],Total,
Rank() OVER (Partition by BloggerName Order by Total DESC) as 'Ranking'
FROM
BlogCount
After running above query you will get result set like:
BloggerName Topic Year Total Ranking

----------- --------------- ----------- ----------- --------------------
Alka SQL 2007 18 1
Alka .NET 2007 18 1
Alka SQL 2008 14 3
Ritesh SQL 2007 124 1
Ritesh SQL 2008 124 1
Ritesh .NET 2008 24 3
Ritesh SQL 2006 17 4
Ritesh SQL 2005 10 5

(8 row(s) affected)
Now, we may like to see only highest total article by bloggers, we can’t put direct “where condition” in above query so we will wrap it up like:
SELECT *
FROM
(
SELECT BloggerName,Topic,[Year],Total,
Rank() OVER (Partition by BloggerName Order by Total DESC) as 'Ranking'
FROM
BlogCount
) Testing
where Ranking<2
Here is the output:
BloggerName Topic Year Total Ranking
----------- --------------- ----------- ----------- --------------------
Alka SQL 2007 18 1
Alka .NET 2007 18 1
Ritesh SQL 2007 124 1
Ritesh SQL 2008 124 1

(4 row(s) affected)
See above result set, we got only 4 records out of 9, which have the highest “Total”. See “Ritesh’s” record # 3 and #4. Both belongs to SQL and has same total so in that case, I may wish to see only one records of year 2008. In this case, year is a tie breaker. Use below given query for breaking a tie.
SELECT *
FROM
(
SELECT BloggerName,Topic,[Year],Total,
Rank() OVER (Partition by BloggerName Order by Total DESC, [year] DESC) as 'Ranking'
FROM
BlogCount
) Testing
where Ranking<2
Now, you will get following output.
BloggerName Topic Year Total Ranking
----------- --------------- ----------- ----------- --------------------
Alka SQL 2007 18 1
Alka .NET 2007 18 1
Ritesh SQL 2008 124 1

(3 row(s) affected)
How easy is this???? Much much better than using sub-query and group by.
Reference: Ritesh Shah