SQL Server Thread: Geek City: Using a Nonclustered Index to Avoid a Sort

Most of you are probably aware that having a clustered index on the column(s) in an ORDER BY clause means that SQL Server can avoid having to sort your data, because it is already logically stored in order of the clustered index, and SQL Server can just access the data in order to get the sorted data .

For example, consider the SalesOrderHeader table in the AdventureWorks database. The clustered index is on SalesOrderID, so this query doesn't need to do a sort, just a clustered index scan:

SELECT * FROM Sales.SalesOrderHeader
ORDER BY SalesOrderID

But what about a nonclustered index? Its leaf level stores the index key values in order, so it can help avoid a sort if it completely covers the query, i.e. all the data your query needs is in the nonclustered index. The following query is covered by the nonclustered index on CustomerID, because the nonclustered index always includes the clustered key, in this case, SalesOrderID.

SELECT CustomerID, SalesOrderID
FROM Sales.SalesOrderHeader
ORDER BY CustomerID

But what about if the query is not covered? What if we wanted every column returned:

SELECT * FROM Sales.SalesOrderHeader
ORDER BY CustomerID

The default plan for this query will be to perform a sort.

However, the leaf level of the nonclustered index on CustomerID does have all the CustomerID values already sorted, so why can't that index be used? The answer is, it CAN be used, but it just isn't the default. SQL Server's optimizer tries to find the plan that will run to completion in the least amount of time. That sounds good, right? But with the SORT operator in the plan, everything stops while the sorting is taking place, and no data can be returned until all the data is sorted.

However, another alternative would be to scan the nonclustered index, where the CustomerID values are already in order. For each row, SQL Server would have to do a key lookup into the clustered index and the total time to do a key lookup for every row would probably be more than the time required to sort all the data. The first few rows can be returned very quickly. How can we get such a plan? SQL Server provides us with a hint called FASTFIRSTROW that tells the optimizer to come up with a plan that returns the first row in a minimum amount of time. It's a table hint, so it looks like this:

SELECT * FROM Sales.SalesOrderHeader WITH (FASTFIRSTROW)
ORDER BY CustomerID

The plan looks like this:

So you have to decide what is most important to you. Do you want the total processing time to be minimized, or do you want the time to have the first row returned to be minimized? It's up to you. The default is to minimize total processing time, but you can use the FASTFIRSTROW hint if you want to take advantage of the nonclustered index to avoid the sort, and have the first few rows returned quickly.

Be careful if you try to do a cost comparison of queries with and without the FASTFIRSTROW hint. Look at the two plans below:

Comparing the two plans makes it look like the query with the hint is infinitely faster the query without the hint. However, if you look at the details for the nonclustered index scan, as shown in the pop-up properties box, you can see that the way the optimizer comes up with this plan is by assuming only one row will be returned. It optimizes as if only one row will be accessed, which is why the nonclustered index is chosen, but during execution it will retrieve all the rows. So of course a plan for accessing one row will be considered MUCH faster than a plan to access the entire table (31465 rows).

The documentation for this hint can be a little misleading. You might think using the hint indicates that the plan should get the first row quickly, and then get all the rest of rows using perhaps a different access method, so they will ALL come back as quickly as possible (which would then mean a sort), it actually means that the optimizer should just come up with a plan for getting the first row as quickly as possible. Period. (Then whatever plan was chosen for the first row will also be used for all the rest of the rows.)

This FASTFIRSTROW hint is listed in the SQL Server Books Online as a deprecated feature, which means it may be removed in a future version. However, it is still available in SQL Server 2008. Microsoft recommends using the FAST N query hint (in the OPTION clause):

SELECT * FROM Sales.SalesOrderHeader
ORDER BY CustomerID
OPTION (FAST 1)

With this hint, you can specify an value for N, and the optimizer just assumes there are N rows, and comes up with the best plan as if there were that number. As an exercise, you might want to try determining at what value for N the optimizer will switch from using an nonclustered index scan to using a clustered scan plus SORT.

Today, a reader asked me a question about the FASTFIRSTROW hint and wanted to know if we should "use such a hint for large data sets [e.g. OLTP queries]".

Of course, the answer is the usual: It depends.

On the one hand, using this hint to avoid a sort can be a good thing, because sorting a large data set can use a lot of system resources (time and tempdb). But on the other hand, if you're really running OLTP queries, there should only be a few rows you're dealing with in any query, and then the value of this hint might not be as noticeable. So you should run your own tests and see if you like the results. Note that FAST N or FASTFIRSTROW is not the default, and that is probably for a good reason. Try running your queries first with whatever plan the optimizer comes us with, and only if you're not satisfied with the performance, you can try using a hint.

Have fun!

~Kalen

Source Click Here.

Geek City: Using a Nonclustered Index to Avoid a Sort

Blog Archive