SQL Server and Table-Valued User-Defined Function optimizations - sql-server-2005

If I have an UDF that returns a table, with thousands of rows, but I just want a particular row from that rowset, will SQL Server be able to handle this effciently?
DECLARE #pID int; --...
SELECT * FROM dbo.MyTableUDF(#pID)
WHERE SomeColumn BETWEEN 1 AND 2 OR SomeOtherColumn LIKE 'this_pattern&'
To what extent is the query optimizer capable of reasoning about this type of query?
How are Table-Valued UDFs different from traidtional views if they take no parameters?
Any gotchas I should know about?

Wouldn't you pass in the ID that you require as a parameter rather query the entire table?
Something like this:
CREATE FUNCTION dbo.MyTableUDF(#ID int)
RETURNS #myTable TABLE
(
ID int PRIMARY KEY NOT NULL,
FirstName nvarchar(50) NULL,
LastName nvarchar(50) NULL
)
as begin
Insert Into #myTable (ID, FirstName, LastName)
Select ID, FirstName, LastName
From Users
Where ID = #ID
return
end
go
Select * From MyTableUDF(1)
For this scenario, it would be a far better approach.
EDIT:
Ok well as you're using a Table UDF rather than a view I will assume that it a multi statement table UDF rather than an inline. I am pretty sure that the performance won't be affected by using the UDF in this way.
The performance will really be hit if you used the UDF in a Select statement or Where clause. This is because the UDF will be called for each row returned from the table.
e.g Select col1, col2, dbo.MyUDF(col3) From MyTable
or
Select col1, col2 from dbo.MyTable Where dbo.MyUDF(col3) != 1
So if you MyTable contained 100,000 rows, your UDF will be called 100,000 times. If the UDF takes 5 seconds to execute that is where you will run in to issues.
As far as I can make out you don't intend on using the UDF in this manner.

Related

Nested Loop in Where Statement killing performance

I am having serious performance issues when using a nested loop in a WHERE clause.
When I run the below code as is, it takes several minutes. The trick is I'm using the WHERE clause to pull ALL data if the report_id is NULL, but only certain report_id's if I set them in the parameter string.
The function [fn_Parse_List] turns a VARCHAR string such as '123,456,789' into a table where each row is each number in integer form, which is then used in the IN clause.
When I run the code below with report_id = '456' (the dashed out portion), the code takes seconds, but passing the temporary table and using the SELECT statement in the WHERE clause kills it.
alter procedure dbo.p_revenue
(#report_id varchar(max) = NULL)
as
select cast(value as int) Report_ID
into #report_ID_Temp
from [fn_Parse_List] (#report_id)
SELECT *
FROM BIGTABLE
where #report_id is null
or a.report_id in (select Report_ID from #report_ID_Temp)
--Where #report_id is null or a.report_id in (456)
exec p_revenue #report_id = '456'
Is there a way to optimize this? I tried a JOIN with the table #report_ID_Temp, but it still takes just as long and doesn't work when the report_id is NULL.
You're breaking three different rules.
If you want two query plans, you need two queries: OR does not give you two query plans. IF does.
If you have a temporary table, make sure it has a primary key and any appropriate indexes. In your case, you need an ALTER TABLE statement to add the primary key clustered index. Or you can CREATE TABLE to declare the structure in the first place.
If you think fn_Parse_List is a good idea, you haven't read enough Sommarskog
If I were to write the Stored Procedure for your case, I would use a Table Valued Parameter (TVP) instead of passing multiple values as a comma-seperated string.
Something like the following:
-- Create a type for the TVP
CREATE TYPE REPORT_IDS_PAR AS TABLE(
report_id INT
);
GO
-- Use the TVP type instead of VARCHAR
CREATE PROCEDURE dbo.revenue
#report_ids REPORT_IDS_PAR READONLY
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS(SELECT 1 FROM #report_ids)
SELECT
*
FROM
BIGTABLE;
ELSE
SELECT
*
FROM
#report_ids AS ids
INNER JOIN BIGTABLE AS bt ON
bt.report_id=ids.report_id;
-- OPTION(RECOMPILE) -- see remark below
END
GO
-- Execute the Stored Procedure
DECLARE #ids REPORT_IDS_PAR;
-- Empty table for all rows:
EXEC dbo.revenue #ids;
-- Specific report_id's for specific rows:
INSERT INTO #ids(report_id)VALUES(123),(456),(789);
EXEC dbo.revenue #ids;
GO
If you run this procedure with a TVP with a lot of rows or a wildly varying number of rows, I suggest you add the option OPTION(RECOMPILE) to the query.
I see 2 possible things that could help improve performance. Depends on which part is taking the longest. First off, SELECT INTO is a single threaded operation until SQL Server 2014. If this is taking a long time, create an explicitly defined temp table with CREATE TABLE. Secondly, depending on the number of records inserted into the temp table, you probably need an index on the Report_ID column. That can all be done in the body of the stored procedure. If you do end up using an explicitly defined temp table, I would create the index after the data is loaded.
If that doesn't help, first check that the report_id column on the BIGTABLE is indexed. Then try splitting the select into 2 and combining with a UNION ALL like this:
ALTER PROCEDURE dbo.p_revenue
(
#report_id VARCHAR(MAX) = NULL
)
AS
SELECT CAST(value AS INT) Report_ID
INTO #report_ID_Temp
FROM fn_Parse_List(#report_id);
SELECT *
FROM BIGTABLE
WHERE #report_id IS NULL
UNION ALL
SELECT *
FROM BIGTABLE
WHERE a.report_id IN ( SELECT Report_ID
FROM #report_ID_Temp );
GO
EXEC p_revenue #report_id = '456';
Are you saying I should have two queries, one where it pulls if the report_id doesn't exists and one where there is a list of report_ids?
Yes, yes, yes. The fact, that it somehow works when You enter the numbers directly, distracts You from the core problem. You need table scan when #report_id is null and index seek when it is not and You can not have both in one execution plan. The performance would inevitably have to suffer, one way or another.
I would prefer not to, as the table i'm pulling from is actually a
view with 800 lines with an additional parameter not shown above.
I do not see where is the problem, SELECT * FROM BIGTABLE and SELECT * FROM BIGVIEW seems the same. If You need parameters You can use inline table valued function. If You have more parameters with variable selectivity like #report_id, I guess You would end up with dynamic sql anyway, sooner or later.
UNION ALL as proposed by #db_brad would help, but one of those subquery is executed even when there is no need for it.
As a quick patch You can append OPTION(RECOMPILE) to the SELECT and have table scan one time and index seek the other time, but recompiling every time would induce nontrivial overhead.

Having a same temp table name with 2 different IF statements

I have resolved this problem because I have overlooked something that is already part of my code and this situation is not needed.
In SQL Server 2008, I have two IF statements
If value = ''
begin
select * into #temptable from table 1
end
Else If value <> ''
begin
select * into #temptable from table 2
end
but when I try to execute it gives me because of the second
temptable:
There is already an object named '#temptable' in the database.
I don't want to use another temp table name as I would have to change the after code a lot. Is there a way to bypass this?
I would recommend making some changes so that your code is a little more maintainable. One problem with the way you have it set up here is with the SELECT * syntax you're using. If you later decide to make a change to the schema of table1 or table2, you could have non-obvious consequences. In production code, it's better to spell these things out so that it's clear exactly which columns you're using and where.
Also, are you really using all of the columns from table 1 and table 2 in the code that follows? You might be taking a performance hit loading more data than you need. I'd go through the code that uses #temptable and figure out which columns it's actually using. Then start by creating your temp table:
CREATE TABLE #temptable(col1 int, col2 int, col3 int, col4 int)
Include all of the possible columns that could be used, even if some of them might be null in certain cases. Presumably, the code that follows already understands that. Then you can set up your IF statements:
IF value = ''
BEGIN
INSERT INTO #temptable(col1, col2, col3)
SELECT x,y,z
FROM table1
END
ELSE
INSERT INTO #temptable(col1, col4)
SELECT alpha,beta
FROM table2
END
Your SELECT statement, as written, is creating the temp table and INSERTING into it all in one statement. Create the temp table separately with a CREATE TABLE statement, then INSERT INTO in your two IF statements.
Using SELECT INTO creates the table on the fly, as you know. Even if your query only referenced #temptable once, if you were to run it more than once (without dropping the table after the first run), you would get the same error (although if it were inside a stored procedure, it would probably only exist in the scope of the stored procedure).
However, you can't even compile this query. Using the Parse command (Ctrl+F5) on the following query, for example, fails even though the same table is used as the source table.
select * into #temptable from SourceTable
select * into #temptable from SourceTable
If the structure of tables 1 and 2 were the same, you could do something like the following.
select * into #temptable from
(select * from Table1 where #value = ''
union
select * from Table2 where #value <> '') as T
If, however, the tables have different structures, then I'm not sure what you can do, other than what agt and D. Lambert recommended.

Return multiple tables from a T-SQL function in SQL Server 2008

You can return a single table from a T-SQL Function in SQL Server 2008.
I am wondering if it is possible to return more than one table.
The scenario is that I have three queries that filter 3 different tables. Each table is filtered against 5 filter tables that I would like to return from a function; rather than copy and paste their creation in each query.
An simplified example of what this would look like with copy and paste:
FUNCTION GetValuesA(#SomeParameter int) RETURNS #ids TABLE (ID int) AS
WITH Filter1 As ( Select id FROM FilterTable1 WHERE Attribute=SomeParameter )
, Filter2 As ( Select id FROM FilterTable2 WHERE Attribute=SomeParameter )
INSERT INTO #IDs
SELECT ID FROM ValueTableA
WHERE ColA IN (SELECT id FROM Filter1)
AND ColB IN (SELECT id FROM Filter2)
RETURN
-----------------------------------------------------------------------------
FUNCTION GetValuesB(#SomeParameter int) RETURNS #ids TABLE (ID int) AS
WITH Filter1 As ( Select id FROM FilterTable1 WHERE Attribute=SomeParameter )
, Filter2 As ( Select id FROM FilterTable2 WHERE Attribute=SomeParameter )
INSERT INTO #IDs
SELECT ID FROM ValueTableB
WHERE ColA IN (SELECT id FROM Filter1)
AND ColB IN (SELECT id FROM Filter2)
AND ColC IN (SELECT id FROM Filter2)
RETURN
So, the only difference between the two queries is the Table being filtered, and HOW (the Where clause).
I would like to know if I could return Filter1 & Filter2 from a function. I am also open to suggestions on different ways to approach this problem.
No.
Conceptually, how would you expect to handle a function that returned a variable number of tables? You would JOIN on two tables at once? What if the returned fields don't line up?
Is there some reason you can't have a TVF for each filter?
As others say, NO. A function in TSQL must return exactly one result (although that result can come in the form of a table with numerous values).
There are a couple of ways you could achieve something similar though. A stored procedure can execute multiple select statements and deliver the results up to whatever called it, whether that be an application layer or something like SSMS. Many libraries require you to add additional commands to access more result sets though. For instance, in Pyodbc to access result sets after the first one you need to call cursor.nextset()
Also, inside a function you could UNION several result sets together although that would require each result set to have the same columns. One way to achieve that if they have a different column structure is to add in nulls for the missing columns for each select statement. If you needed to know which select statement returned the value, you could also add a column which indicated that. This should work with your simplified example since in each case it is just returning a single ID column, but it could get awkward very quickly if the column names or types are radically different.

Using User Defined Functions and performance?

I'm using stored procedure to fetch data and i needed to filter dynamically. For example if i dont want to fetch some data which's id is 5, 10 or 12 im sending it as string to procedure and im converting it to table via user defined function. But i must consider performance so here is a example:
Solution 1:
SELECT *
FROM Customers
WHERE CustomerID NOT IN (SELECT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',','));
Solution 2:
CREATE TABLE #tempTable (Value NVARCHAR(4000));
INSERT INTO #tempTable
SELECT Value FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
SELECT *
FROM BusinessAds
WHERE AdID NOT IN (SELECT Value FROM #tempTable)
DROP TABLE #tempTable
Which solution is better for performance?
You would probably be better off creating the #temp table with a clustered index and appropriate datatype
CREATE TABLE #tempTable (Value int primary key);
INSERT INTO #tempTable
SELECT DISTINCT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
You can also put a clustered index on the table returned by the TVF.
As for which is better SQL Server will always assume that the TVF will return 1 row rather than recompiling after the #temp table is populated, so you would need to consider whether this assumption might cause sub optimal query plans for the case that the list is large.

Using with vs declare a temporary table: performance / difference?

I have created a sql function in SQLServer 2008 that declared a temporary table and uses it to compute a moving average on the values inside
declare #tempTable table
(
GeogType nvarchar(5),
GeogValue nvarchar(7),
dtAdmission date,
timeInterval int,
fromTime nvarchar(5),
toTime nvarchar(5),
EDSyndromeID tinyint,
nVisits int
)
insert #tempTable select * from aces.dbo.fEDVisitCounts(#geogType, #hospID,DATEADD(DD,-#windowDays + 1,#fromDate),
#toDate,#minAge,#maxAge,#gender,#nIntervalsPerDay, #nSyndromeID)
INSERT #table (dtAdmission,EDSyndromeID, MovingAvg)
SELECT list.dtadmission
, #nSyndromeID
, AVG(data.nVisits) as MovingAvg
from #tempTable as list
inner join #tempTable as data
ON list.dtAdmission between data.dtAdmission and DATEADD(DD,#windowDays - 1,data.dtAdmission)
where list.dtAdmission >= #fromDate
GROUP BY list.dtAdmission
but I also found out that you can declare the tempTable like this:
with tempTable as
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
)
Question: Is there a major difference in these two approaches? Is one faster than the other or more common / standard? I would think the declare is faster since you define what the columns you are looking for are.. Would it also be even faster if I were to omit the columns that were not used in the calculations of moving average?(not sure about this one since it has to get all of the rows anyways, though selecting less columns makes intuitive sense that it would be faster/less to do)
I also have found a create temporary table #table from here How to declare Internal table in MySQL? but I don't want the table to persist outside of the function (I am not sure if the create temporary table does this or not.)
The #table syntax creates a table variable (an actual table in tempdb) and materialises the results to it.
The WITH syntax defines a Common Table Expression which is not materialised and is just an inline View.
Most of the time you would be better off using the second option. You mention that this is inside a function. If this is a TVF then most of the time you want these to be inline rather than multi statement so they can be expanded out by the optimiser - this would instantly disallow the use of table variables.
Sometimes however (say the underlying query is expensive and you want to avoid it being executed multiple times) you might determine that materializing the intermediate results improves performance in some specific cases. There is currently no way of forcing this for CTEs (without forcing a plan guide at least)
In that eventuality you (in general) have 3 options. A #tablevariable, #localtemp table and a ##globaltemp table. However only the first of these is permitted for use inside a function.
For further information regarding the differences between table variables and #temp tables see here.
In addition to what Martin answered
;with tempTable as
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
)
SELECT * FROM tempTable
can also be written like this
SELECT * FROM
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
) AS tempTable --now you can join here with other tables
In addition,and correcting to Martin
The #table syntax creates a table variable IN MEMORY
The #Temp syntax creates a table variable in Tempdb
Thats why #tables are faster than #temp tables