One of the things I have spent a lot of time investigating recently is slow running queries, why are some of the queries in the data warehouse slow? They are relivtivly simple, return only a few rows each but take 5-10 minutes to execute. Often they have a function which for example gets the start of the working week for the date passed.
I set about investigating what was causing this, and it turns out having UDF (User Defined Functions) inside your views is a really bad idea, it forces the view to run serially (single thread)
Brent Ozar has a really good article on this, but I wanted to investigate myself and write up my findings.
Lets see this in action!!! For this I am going to use the StackOverflow database
So we are going to need a function, I have created a basic on that takes a date and then gives me the day before the date passed in
CREATE FUNCTION dbo.fn_GetDate
(
@Date DATETIME
)
RETURNS DATETIME
AS
BEGIN
DECLARE @Out DATETIME
SELECT @Out = DATEADD(dd,-1,@Date)
RETURN @Out
END
Next we need a view
CREATE VIEW dbo.LetsGoSerial
AS
SELECT
*,
dbo.fn_GetDate(u.CreationDate) as Yesterday
FROM
dbo.Users u
Now let’s select the data back out of that view we just created
SELECT
id,
CreationDate,
Yesterday
FROM
dbo.LetsGoSerial
In the query results window you will see that 8,917,507 rows were returned, using BlitzCache lets dive a bit deeper and see what actually happened when that query ran
EXEC testing.dbo.sp_BlitzCache @DatabaseName = 'StackOverflow', @ExportToExcel = 1
You can see from the results
That the function was actually executed for every single row returned even though we only executed the call for the view once, additionally that query was sent serialised.
It is also good to note that it doesn’t matter if the query includes the function or not in the output, if it is in the view the same results as above will happen.
Here is the execution plan for the query, the function took up almost 20 seconds of the run time of the query;
While this might not seem like a big deal on this dataset, running something like this one a complex dataset with many joins or a much large rowset is going to impede performance massively.
My advice, if possible, don’t use the functions at all in your queries!