I've been working with a number of clients recently who all have suffered at the hands of TSQL Scalar functions. Scalar functions were introduced in SQL 2000 as a means to wrap logic so we benefit from code reuse and simplify our queries. Who would be daft enough not to think this was a good idea. I for one jumped on this initially thinking it was a great thing to do.
However as you might have gathered from the title scalar functions aren't the nice friend you may think they are.
If you are running queries across large tables then this may explain why you are getting poor performance.
In this post we will look at a simple padding function, we will be creating large volumes to emphasize the issue with scalar udfs.
create function PadLeft(@val varchar(100), @len int, @char char(1))
returns varchar(100)
as
begin
return right(replicate(@char,@len) + @val, @len)
end
go
Interpreted
Scalar functions are interpreted code that means EVERY call to the function results in your code being interpreted. That means overhead for processing your function is proportional to the number of rows.
Running this code you will see that the native system calls take considerable less time than the UDF calls. On my machine it takes 2614 ms for the system calls and 38758ms for the UDF. Thats a 19x increase.
set statistics time on
go
select max(right(replicate('0',100) + o.name + c.name, 100))
from msdb.sys.columns o
cross join msdb.sys.columns c
select max(dbo.PadLeft(o.name + c.name, 100,'0'))
from msdb.sys.columns o
cross join msdb.sys.columns c
If you run the last one again but with half the rows i.e. as follows the time is halved. This highlights how linear the performance is in relation to the number of rows
select max(dbo.PadLeft(o.name + c.name, 100,'0')), count(1)
from (select top 50 percent * from msdb.sys.columns )o
cross join msdb.sys.columns c
Parallel
Scalar functions are calculated on a single thread. This means that if you move to a multi core machine your performance will not be increased. This is shown by running against a query that results in parallel operators.
Create a very large table and populate it.
create table veryLargeTable (col1 bigint, col2 bigint)--varchar(100))
go
declare @i int
set @i = 0
while @i < 20
begin
insert into veryLargeTable
select object_id, object_id
from sys.columns
set @i = @i + 1
end
Then they perform the following queries (You need to do this on a multi core machine),
set statistics time on
go
select max(right(replicate('0',100) + cast(v1.col1+ v2.col2 as varchar(100)), 100))
from veryLargeTable v1
join veryLargeTable v2 on v2.col1 = v1.col2
go
select max(dbo.PadLeft(v1.col1+v2.col2, 100,'0'))
from veryLargeTable v1
join veryLargeTable v2 on v2.col2 = v1.col1
You will see in the duration that the system function version takes x amount of time, but it uses ~2X time in CPU. On my machine I get elapsed time of 3247ms and CPU time of 6094 ms. That highlights that the query was able to work in parallel. Whereas the scalar UDF results in an elapsed time of 27041ms and an CPU time of 26000ms, showing no parallelism.
Reduction of CTE and views
Common table expressions are largely syntactic constructs that are merged into the main query, i.e. they perform like a view and not like a physical set of data. However the use of scalar functions in CTEs can cause undesired behaviour. In the worst case, a query might only return a few rows and you may believe that the CTE is only evaluated a few times and thus your function only called a few times, however due the optimiser might choose a query plan which results in the CTE being evaluated for every row in the source dataset resulting in your function being called many more times than you expect. This in conjunction with the issues above results in very very poor performance.
Profiler
If you've used scalar functions and tried to perform statement level profiling you will have hit this last problem. Because each line in a scalar function is considered a statement then EVERY time the function is evaluated, EACH line in the function is recorded in profiler.
This firstly means you can't generally find the statements you are really concerned with, but of more a concern is that the performance hit of capturing ALL these statements if very very large.
To test this set up profiler to record SP:StmtCompleted and run the samples above, you will soon be overwhelmed with data.
There is a workaround for this which reduces the impact but it only reduces the impact. The workaround is to filter out statements where the statement is for an object of type "Scalar function"
ObjectType <> 20038
Options
So what are your options,
1. You either need to put your code in line, but then you lose the benefits of code reuse.
2. Write a CLR function
3. Write a table valued function and use a subquery.
Option | Code Reuse | Performance | Extensible | Parallelism |
System functions | No | Very High | Limited (single line) | Yes |
CLR function | Yes | High | Yes | Yes |
Table Valued function | Yes | High | Limited (single line) | Yes |
Scalar function | Yes | Low | Yes | No |
The Table Valued Function option is the one that is not commonly known, but is the best option if you can convert your function into a single line of system function calls and don't want to go to CLR. What you do is write a Table Valued function.
So how does that actually work. You create your table valued function to return 1 row. In your query you then use a subquery to get the value from the table valued function.
create function PadLeftTVF(@val varchar(100), @len int, @char char(1))
returns table
as
return (select right(replicate(@char,@len) + @val, @len) val)
go
select max(val)
from (select (select * from PadLeftTVF(o.name + c.name, 100,'0')) val
from msdb.sys.columns o
cross join msdb.sys.columns c) d
select max(val)
from (select (select val from PadLeftTVF(v1.col1+ v2.col2 , 100,'0')) val
from veryLargeTable v1
join veryLargeTable v2 on v2.col1 = v1.col2 ) d
Note: The derived table is used because you can't use a subquery in an aggregate. But if you do it via a derived table it works. Why I have no idea and it seems odd that the optimiser can't do this for me. Anyway the Max is so we only return 1 row from the query and so our timing isn't affected by the time it takes to return and render the data.
Whilst this looks very complicated this benefits from the fact that the optimiser can effectively consume the system calls in line, because it can figure out there is only going to be own row and column returned. You can see this by looking at the properties of the streaming aggregate. The table valued function will have an output of something like where the function has been totally reduced to system calls.
MAX(right('0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'+CONVERT_IMPLICIT(varchar(100),[msdb].[sys].[syscolpars].[name]+[msdb].[sys].[syscolpars].[name],0),(100)))
Where as the scalar one has the function call to PadLeft.
MAX([tempdb].[dbo].[PadLeft](CONVERT_IMPLICIT(varchar(100),[msdb].[sys].[syscolpars].[name]+[msdb].[sys].[syscolpars].[name],0),(100),'0')))
Use of Table Valued functions means that using this method you can benefit from parallelism because the function has been reduced to system function calls and so your query will scale better when you go to a multi core machine. This is highlighted if you run the last query above on a multi core machine.
Summary
If you aren't into writing C# code and you want ultimate performance and code reuse then write your scalar udfs as table valued functions and use a subquery
Other References
-
Source Click Here.
Post a Comment