Introduction
In early November 2005, Microsoft launched three new product lines: Visual Studio 2005, SQL Server 2005, and the .NET Framework version 2.0 (which includes ASP.NET version 2.0). SQL Server 2005 is the newest version of Microsoft's database platform, with the last major SQL Server release dating back to SQL Server 2000. Over the past five years of development, SQL Server has added a vast array of new features and capabilities, which are summed up on Microsoft's website in What's New in SQL Server 2005? The new features that will likely be of particular interest to developers building web-based applications that use SQL Server 2005 as a backend data store, include the new Transact-SQL (T-SQL) enhancements, better Visual Studio integration, integration with the CLR/.NET Framework, and the SQL Server 2005 Management Studio application, which is a slicker version of SQL Server 2000's Enterprise Manager.
SQL Server 2005's T-SQL enhancements make writing certain types of queries significantly easier than in SQL Server 2000. While the new additions to 2005's T-SQL syntax could be replicated either in SQL Server 2000 or in the application layer, with 2005 the syntax is much more terse, readable, and sensible. In this article we'll look at SQL Server 2005's rank-related functions, which simplify the process of assigning ranks to query results, retrieve results with a particular rank, and other rank-related tasks common to reporting. Read on to learn more!
Before we look at how to accomplish these common query patterns, let's first create a data model against which we can run these queries. I am going to be using the free Express version of SQL Server 2005 for my demos, and have included the database and an ASP.NET 2.0 website for download at the end of this article. (Like Visual Studio, SQL Server ships with a number of different "versions." The Express version is a free version that is designed for hobbyists, students, and the like. If you download and install Visual Web Developer - the express version of Visual Studio geared toward web developers - you can indicate to install SQL Server 2005 Express edition as well.)
For these examples we'll use a database that has products, salespersons (employees), customers, and orders. This is modeled using five tables: Customers
, Employees
, Products
, Orders
, and OrderItems
. The Customers
, Employees
, and Products
tables contain a row for each customer, employee, and product, respectively. Each time a customer makes a purchase, a record is added to the Orders
table indicating the customer that made the purchase, the employee that made the sale, and the date the order was placed. The OrderItems
maps each product purchased for an order, the quantity of product, and the total price billed (assuming that larger purchases may be discounted). The following figure shows these tables, their fields, and the relationships among them.
As the diagram illustrates, the OrderItems
establishes a many-to-many join between the Orders
and Products
tables.
When building reports or analyzing data, often users or management are interested in seeing the data ranked in some manner. For example, your boss may want to have a report that shows the top 10 selling items, or the three people in the sales team who have generated the greatest revenue in the third quarter. More complicated examples might be returning just the 3rd through 5th ranked salespersons. With SQL Server 2000 queries returning the top ranking items could be accomplished using the TOP
or ROWCOUNT
keywords, as discussed in Retrieving the Top N Records from a SQL Query. To retrieve a particular ranked subset, you needed to use a derived tables- or view-based approach.
SQL Server 2005 provides four new ranking functions: ROW_NUMBER
, RANK
, DENSE_RANK
, and NTILE
. While these are definitely a step above what SQL Server 2000 offers, the ranking functions still have some limitations that require the use of dervied tables or views for more interesting applications. Let's look at each of these individually.
Numbering Rows with ROW_NUMBER
The ROW_NUMBER
function assigns an ordinal value with each record returned, with the ordinal values depending on a particular ORDER BY
clause used in tandem with the function. The syntax for ROW_NUMBER
is: ROW_NUMBER() OVER([partition] order-by-clause)
. For example, the following query would return the products from most expensive to least expensive, with a rank assigned to each product based on price:
|
The results would be:
ProductID | Name | Price | PriceRank |
8 | Desk | 495.0000 | 1 |
10 | Executive Chair | 295.0000 | 2 |
9 | Chair | 125.0000 | 3 |
5 | Mouse | 14.9500 | 4 |
6 | Mousepad | 9.9900 | 5 |
11 | Scissors | 8.5000 | 6 |
4 | Stapler | 7.9500 | 7 |
3 | Binder | 1.9500 | 8 |
... |
By default, the ROW_NUMBER
function assigns a increasing value to each record in the resultset. With the optional partition parameter, you can instruct the ROW_NUMBER
function to start renumbering whenever there's a change in the value of the column for partitioning. To demonstrate this, I created a view called vwTotalAmountBilledPerOrder
that returns each OrderID
and the total amount billed for that order using the following query syntax:
SELECT OrderID, SUM(AmountBilled) AS TotalOrderAmount |
This returns each unique order in the OrderItems
table along with the sum of AmountBilled
values pertaining to that order. With this view we can use the ROW_NUMBER
method to rank those orders by most amount of money spent using the following:
|
This will return results like:
Name | DateOrdered | TotalOrderAmount | BestCustomer |
Bob | 12/1/2005 | 12649.9900 | 1 |
Darren | 1/2/2006 | 620.0000 | 2 |
Bob | 12/19/2005 | 265.8500 | 3 |
Tito | 12/22/2005 | 14.9500 | 4 |
Bruce | 1/5/2006 | 14.9500 | 5 |
Tito | 12/18/2005 | 12.4400 | 6 |
Bruce | 1/4/2006 | 9.9900 | 7 |
Lee Ann | 1/3/2006 | 8.5000 | 8 |
... |
Notice that certain customers are on this list multiple times (Bob, Tito, and Bruce). Perhaps rather than seeing all orders, ordered by sales amount, we are interested in seeing the top orders per customer. We can accomplish this with the PARTITION BY
clause in the ROW_NUMBER
function like so:
|
This will return the following results:
Name | DateOrdered | TotalOrderAmount | BestCustomer |
Bob | 12/1/2005 | 12649.9900 | 1 |
Bob | 12/19/2005 | 265.8500 | 2 |
Tito | 12/22/2005 | 14.9500 | 1 |
Tito | 12/18/2005 | 12.4400 | 2 |
Darren | 1/2/2006 | 620.0000 | 1 |
Bruce | 1/5/2006 | 14.9500 | 1 |
Bruce | 1/4/2006 | 9.9900 | 2 |
Lee Ann | 1/3/2006 | 8.5000 | 1 |
... |
While these results are definitely cool, the bad news is that you can't use the ROW_NUMBER()
(or any of the other ranking functions) in the WHERE
clause. That is, you might want to say, "Give me the products ranked by price whose rank is between 5 and 8." To accomplish this you'd need to use a derived table or view. For example, you could put the above query in a view called vwPriceRankedProducts
and then get back just the 5th through 8th ranked products use the query:
|
Handling Ties with RANK
and DENSE_RANK
The ROW_NUMBER
function blindly increments the value for each returned result based on the optional partition clause and the required order by clause. However, sometimes you may wish to handle ties differently, instead assigning the same value to ties. For example, earlier when displaying the list of total order amounts the amount of the order placed by Tito on 12/22/2005 is the same as the amount of the order placed by Bruce on 1/5/2006. However, the ROW_NUMBER
function ranks these two rows as #4 and #5, rather than using #4 for both.
Both the RANK
and DENSE_RANK
functions will number ties with the same ranking. For example, using the following query:
|
The results will be:
Name | DateOrdered | TotalOrderAmount | BestCustomer |
Bob | 12/1/2005 | 12649.9900 | 1 |
Darren | 1/2/2006 | 620.0000 | 2 |
Bob | 12/19/2005 | 265.8500 | 3 |
Tito | 12/22/2005 | 14.9500 | 4 |
Bruce | 1/5/2006 | 14.9500 | 4 |
Tito | 12/18/2005 | 12.4400 | 6 |
Bruce | 1/4/2006 | 9.9900 | 7 |
Lee Ann | 1/3/2006 | 8.5000 | 8 |
... |
Notice how the two orders with the same amount are both ranked #4. The difference between RANK
and DENSE_RANK
is in how they begin renumbering after the tied results. RANK
bypasses as many numbers as there are tied results. In our example above, since there are two results tied at #4, following that rank #5 is skipped and the ranking resumes with #6. DENSE_RANK
, on the other hand, picks up with the next number. Had we replaced the function name RANK
with DENSE_RANK
in the above query, Tito's order on 12/18/2005 for $12.44 would have the rank #5.
Like ROW_NUMBER
both RANK
and DENSE_RANK
can use the optional PARTITION BY
clause.
Grouping Ranked Results with NTILE
The final ranking-related T-SQL function new to SQL Server 2005 is NTILE(int)
. NTILE
operates like the other ranking functions, but breaks down the results into int number of groups, assigning the same rankings for all records in the specified group. You can use NTILE
to break results down into halves, thirds, quarters, and so on, as this example shows:
SELECT ProductID, Name, Price, NTILE(4) OVER (ORDER BY Price DESC) as Quartile |
With the results being:
ProductID | Name | Price | Quartile |
8 | Desk | 495.0000 | 1 |
10 | Executive Chair | 295.0000 | 1 |
9 | Chair | 125.0000 | 2 |
5 | Mouse | 14.9500 | 2 |
6 | Mousepad | 9.9900 | 3 |
11 | Scissors | 8.5000 | 3 |
4 | Stapler | 7.9500 | 4 |
3 | Binder | 1.9500 | 4 |
In the downloadable demo I have an example that uses a view with NTILE
to allow the user to selectively view various quartiles of the highest order amounts.
Conclusion
In this article we examined four functions new to SQL Server 2005. These four functions - ROW_NUMBER
, RANK
, DENSE_RANK
, and NTILE
- make it easier to rank results than was previously available with SQL Server 2000. These functions are but a small sample of the new T-SQL features available with SQL Server 2005. Future articles will highlight other useful facets!