Using RANK() and CTE

by Allan Svelmøe Hansen June 05, 2008 19:53
Despite having used SQL Server for quite a while I’ve not really had much practical usage for many of the new features. Lately however I’ve started to use Common Table Expressions (CTE) much more and I’m absolutely starting to love them. Another feature I am starting to love are the ranking functions, RANK().

Basically speaking CTEs are subqueries which you can reuse, so practically they remind me very much of temporary tables and similar techniques. But they not only provide vastly better clarity (in my view), but they can also perform better. So I’m going to be turning more attention towards CTEs and their usage and hopefully some performance checks in this and hopefully future blog posts.
RANK() is a function which enables you to – well rank – groups (aka partitions) of data, providing you with a powerful tool for solving problems. It is a good syntax to have in your problem solving toolbox.

The problem I was faced with at work was that we had some price data which we needed to clean up.
We had a number of products which had a number of prices. And out of these combinations we only wanted to keep the latest ones. So utilizing some of the new things in SQL Server 2005 we were able to solve it. For this problem we used RANK and CTE’s. Granted we could have solved it without using CTE’s but it was here I started seeing the benefits of using these common table expressions, and the solution turned out to be elegant thanks to these techniques.

We had a table which simplified consisted of a PriceID (identity), a price value and a product id foreign key.
Design of Pricetable


Because we only wanted to keep the “newest” Price we knew that we only wanted the highest PriceID from each “group” of Products.
Then I started to look at the GROUP BY clause on ProductID, but it left me with a problem of getting the highest PriceID for each group, so I could delete the rows from Price which weren’t selected.
I then remembered the RANK function, which I’d noticed a few times, and the pieces of the puzzle simply clicked together.

What I did was selecting the RANK on the partition of ProductID (basically, you group the RANKs on the partition). And because I wanted to keep the highest PriceID I ordered by that field descending. Then I also selected the ProductID and the PriceID in my query. Like this:

SELECT
 
ProductIDPriceID,
 
RANK() OVER (PARTITION BY ProductID ORDER BY PriceID DESCAS R
FROM Price 
    

Now this gave me a result akin to this (PriceID, ProductID, R)
Sample result from query

You’ll see that I have now got each ProductID, PriceID with a grouped rank in my resultset. And then it is merely a matter of selecting the rows which have a rank higher then 1.
For this I used the CTE syntax like this:

WITH priceCTE AS (
 
SELECT
  
ProductIDPriceID,
  
RANK() OVER (PARTITION BY ProductID ORDER BY PriceID DESCAS R
 
FROM Price    
)
SELECT FROM priceCTE WHERE 1
GO
 

As can be seen the CTE is declare using the WITH <NAME> AS syntax. Then the contens of the parenthises provides the subquery which you then can use after the paranthises.
Then it was just chancing the SELECT to a DELETE and I had achieved my goal. Simple and clean, and most of all – easy to read again in a month or two, incase I’m faced with a similar problem.

Then as always I was interested in how this performed against a temporary table variable so I simply made a dublicate of this query but instead of a CTE, I placed the subquery into a table variable, and ran the same.

DECLARE @T AS TABLE(ProductID INTPriceID INTINT)
INSERT INTO @T
SELECT
 
ProductIDPriceID,
 
RANK() OVER (PARTITION BY ProductID ORDER BY PriceID DESCAS R
FROM Price

SELECT FROM @T WHERE 1
GO
 

Then I took a look at the estimated query plans, and it listed the CTE one as using 47% of the batch and the table variable using 53%. So there is a performance benefit which I’ll look more into at a later point, although not major one in this case.
Bookmark and Share DotnetKicks dotnetshoutout

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.