I am using SQL Server 2008 and I have the following query:
SELECT [Id] FROM [dbo].[Products] WHERE [dbo].GetNumOnOrder([Id]) = 0
With the following "GetNumOnOrder" Scalar-valued function:
CREATE FUNCTION [dbo].[GetNumOnOrder]
(
#ProductId INT
)
RETURNS INT
AS
BEGIN
DECLARE #NumOnOrder INT
SELECT #NumOnOrder = SUM([NumOrdered] - [NumReceived])
FROM [dbo].[PurchaseOrderDetails]
INNER JOIN [dbo].[PurchaseOrders]
ON [PurchaseOrderDetails].[PurchaseOrderId] = [PurchaseOrders].[Id]
WHERE [PurchaseOrders].[StatusId] <> 5
AND [PurchaseOrderDetails].[ProductId] = #ProductId
RETURN CASE WHEN #NumOnOrder IS NOT NULL THEN #NumOnOrder ELSE 0 END
END
However it takes around 6 seconds to execute. Unfortunately I have no control over the initial SQL generated but I can change the function. Is there any way the function can be modified to speed this up? I'd appreciate the help. Thanks
If you have the rights to add indexes to the tables (and dependant on the version of SQL Server you are using), I would investigate what performance gain adding the following would have:-
create index newindex1 on PurchaseOrders (id)
include (StatusId);
create index newindex2 on PurchaseOrderDetails (PurchaseOrderId)
include (ProductId,NumOrdered,NumReceived);
You probably already have indexes on these columns - but the indexes above will support just the query in your function in the most efficient way possible (reducing the number of page reads to a minimum). If the performance of this function is important enough, you could also consider adding a calculated column into your table - for NumOrdered-NumReceived (and then only include the result column in the index above - and your query). You could also consider doing this in an indexed view rather than the table - but schema binding a view can by tiresome and inconvenient. Obviously, the wider the tables in question are - the greater the improvement in performance will be.
If you still want to use a function and can not live without it, use a in-line table value version. It is a-lot faster. Check out these articles from some experts.
http://aboutsqlserver.com/2011/10/23/sunday-t-sql-tip-inline-vs-multi-statement-table-valued-functions/
http://dataeducation.com/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post/
I have had a couple MVP friends say that this the only function they ever write since scalar functions are treated as a bunch of Stored Procedure calls.
Re-write using in-line table value function. Check the syntax since I did not. Use the Coalesce function to convert NULL to Zero.
--
-- Table value function
--
CREATE FUNCTION [dbo].[GetNumOnOrder] ( #ProductId INT )
RETURNS TABLE
AS
RETURN
(
SELECT
COALESCE(SUM([NumOrdered] - [NumReceived]), 0) AS Num
FROM
[dbo].[PurchaseOrderDetails]
INNER JOIN [dbo].[PurchaseOrders]
ON [PurchaseOrderDetails].[PurchaseOrderId] = [PurchaseOrders].[Id]
WHERE [PurchaseOrders].[StatusId] <> 5
AND [PurchaseOrderDetails].[ProductId] = #ProductId
);
--
-- Sample call with cross apply
--
SELECT [Id]
FROM [dbo].[Products] P
CROSS APPLY [dbo].[GetNumOnOrder] (C.Id) AS CI
WHERE CI.Num = 0;
If the data is unevenly distributed data in table PurchaseOrderDetails then cached query plans might impact your query performance. This is a case where "Parameter Sniffing" might create bad query plans. Actually SQL Server supports an optimization called "parameter sniffing", where it will choose different plan based on the particular values in #ProductId variable.
So to improve performance of your query you can re-write your function as:
CREATE FUNCTION [dbo].[GetNumOnOrder]
(
#ProductId INT
)
RETURNS INT
AS
BEGIN
DECLARE #NumOnOrder INT,#v_ProductId INT
SET #v_ProductId = #ProductId;
SELECT #NumOnOrder = SUM([NumOrdered] - [NumReceived])
FROM [dbo].[PurchaseOrderDetails]
INNER JOIN [dbo].[PurchaseOrders]
ON [PurchaseOrderDetails].[PurchaseOrderId] = [PurchaseOrders].[Id]
WHERE [PurchaseOrders].[StatusId] <> 5
AND [PurchaseOrderDetails].[ProductId] = #v_ProductId
RETURN CASE WHEN #NumOnOrder IS NOT NULL THEN #NumOnOrder ELSE 0 END
END
or you can include a Recomplie hint.
Related
In my stored procedure, I have a temporary table which was created to increase performance.
With the actual select statement in the stored procedure, several scalar UDF's were used and the temp table replaces them:
INSERT INTO #BEDRAGEN
SELECT
DD.ColumnA, DD.ColumnB, DD.ColumnC,
ISNULL(DBO.SIF_get_SalesAmount(DD.ColumnA, DD.ColumnB, DD.ColumnC), 0) AS Totaalbedrag,
FROM
T_InvoiceDetailDosDet as IDD
My question is: I want to replace dbo.SIF_get_SalesAmount with code or make the scalar UDF a tabled one if that will increase performance.
What is in this UDF:
Returns an amount.
It reads an file and calculates several things before resulting in an total.
Function has 3 parameters going in and Amount going out.
Piece of UDF:
ALTER FUNCTION [dbo].[SIF_get_SalesAmountDosDetail]
(#A VARCHAR(20),
#B VARCHAR(20),
#C VARCHAR(20)
)
RETURNS NUMERIC(12,2)
AS
DECLARE #SalesAmount NUMERIC(12,2)
, #SalesUnitOfAccount TINYINT
, #Unit NCHAR(5)
, #SalesUnit NCHAR(5)
, #TotalUnits NUMERIC(15, 3)
SELECT
#unit = p.Unit,
#SalesUnit = p.SalesUnit,
#SalesUnitOfAccount = dd.SalesUnitOfAccount
FROM
dbo.T_table p
WHERE
p.ColumnA = #A AND p.ColumnB = #B AND p.ColumnC = #C
SELECT #rc = ##ROWCOUNT
IF #rc <> 1
BEGIN
SELECT #SalesAmount = 0
RETURN #SalesAmount
END
IF #SalesUnit = 0
BEGIN
SELECT #SalesUnit = 1
END
-- several calculations follow based on values of #Unit etc.
-- at the end of the UDF:
-- last if then else calculation and then returning the Amount.
IF #SalesUnitOfAccount = 4
BEGIN
SELECT #PricePerDesc = #SalesUnit
SELECT #SalesAmount = CONVERT(numeric(12, 2), round((#CurrPrice * (#TotalSalesUnits / #SalesComputQty)) - #DiscAmount, 2))
END
SELECT #TotalSalesAmount = #TotalSalesAmount + ISNULL(#SalesAmount, 0)
-- Return the result of the function
RETURN #TotalSalesAmount
What way could I insert this UDF-code in my stored procedure select? Or what way could I make it a UDF_table function?
Thanks for helping.
What you are asking for is exactly the focus of a recently announced feature in SQL Server 2019 (CTP2.1 onwards) called "Scalar UDF Inlining". This feature works by automatically embedding (or inlining) the logic of a UDF into the calling query. You could give it a try by downloading it for free.
If you want to know how it works behind the scenes, the details can be found in a recent research paper “Froid: Optimization of Imperative programs in a Relational Database“. That paper describes a systematic approach to express entire UDFs as SQL, which you can use. The Scalar UDF inlining feature is based on Froid, and can result in huge performance gains in many cases.
[Disclosure: I am a co-author of the Froid paper]
I have a query that should be reused in many scenarios. This query receives some parameters.
Because it has to be reused, it can't be a stored procedure. So, it's created as a Function (not a View, because it needs some parameters).
This is the best approach so far, right?
The issue is that this query returns data that needs some post processing, i.e. reused in some other queries. I'm facing the issue about reusing them in other queries.
Example:
Function GetMyFirstData returns several columns, including a FootNoteSymbol column. I should create another Function (GetFootnoteText) to return the text (and some other details) about these footnotes.
How should I create the second function that will receive as a parameter the FootNoteSymbol (many) returned by the first function GetMyFirstData?
I'm avoiding Stored Procedure, because these results will most likely be reused in other queries.
Also, the FootNoteSymbol is also returned in many other functions, with different return structures (therefore I can't create a TableType, because the structure is not fixed - however FootNoteSymbol is common among all of them).
Using SQL Server 2008 R2.
Functions that return data:
CREATE FUNCTION GetMyFirstData
(
#Param1 int,
#Param2 int
)
RETURNS #Return TABLE
(
Col1 int,
Col2 int,
FootnoteSymbol int,
Col3 int,
Col4 int
)
AS
BEGIN
SELECT Col1, Col2, FootnoteSymbol, Col3, Col4
FROM MyData
RETURN;
END
CREATE FUNCTION GetMySecondData
(
#Param1 int,
#Param2 int
)
RETURNS #Return TABLE
(
Col1 int,
FootnoteSymbol int,
Col2 int
)
AS
BEGIN
SELECT Col1, FootnoteSymbol, Col2
FROM MyOtherData
RETURN;
END
Function that should get footnotes text:
CREATE FUNCTION GetFootnoteText
(
#FootnoteSymbol --this is the issue, how to reuse the footnotesymbols from the other functions
)
RETURNS #Return TABLE
(
Symbol int,
Text text,
OtherDetail nvarchar(200)
)
AS
BEGIN
SELECT Symbol, Text, OtherDetail
FROM MyFootnotes
WHERE Symbol in --this is the issue, how to reuse the footnotesymbols from the other functions
RETURN;
END
Thanks!
DO. NOT. DO. THIS.
Reusing code is a noble goal, but SQL is not the language for it. There are many documented performance problems resulting from your approach. Some quick links Query Performance and multi-statement table valued functions, Improving query plans with the SCHEMABINDING option on T-SQL UDFs or Compute Scalars, Expressions and Execution Plan Performance.
I wish I had a good alternative for you, but I don't. Views are OK for query re-use. But attempting to compose SQL table value functions has always ended in disaster, in every engagement I've seen.
Don't do it.
At the very least stick to Inline Table Value Functions;
The RETURNS clause contains only the keyword table. You do not have to define the format of a return variable, because it is set by the format of the result set of the SELECT statement in the RETURN clause.
There is no function_body delimited by BEGIN and END.
The RETURN clause contains a single SELECT statement in parentheses. The result set of the SELECT statement forms the table returned by the function. The SELECT statement used in an inline function is subject to the same restrictions as SELECT statements used in views.
The table-valued function accepts only constants or #local_variable arguments
As far as I can tell (and I reference you to #SeanLange comment "You know what your tables look like, what the data is like, what the rules are and what the expected results are. I on the other hand can't see any of that.") you have a basic miss-understanding about how relational databases work. To "solve" the problem presented here using standard relational database practices I would not split it up into multiple functions (as there is no gain there) instead I would create a SP that did a JOIN to get all the data you need. Like this:
CREATE PROCEDURE GetData
(
#Param1 int,
#Param2 int
)
AS
BEGIN
SELECT MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
JOIN MyFootnotes ON MyData.FootnoteSymbol = MyFootnotes.Symbol
END
You don't show how you use the parameters so I can't address that, but I can guess. Let's say the parameters in this function are used in the where clause to limit the results. (Col1=#Param1 and Col2=#Param2) but in another case you have different limits (eg Col3=#Param1 and Col4=#Param2).
In this case the best way to do it is to make a view that is shared and limited in each SP. I would not use functions as I see no value to them (and a high potential for problems as #RemusRusanu points out). Like this:
CREATE VIEW MyData AS
SELECT MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
JOIN MyFootnotes ON MyData.FootnoteSymbol = MyFootnotes.Symbol
with
CREATE PROCEDURE GetData1
(
#Param1 int,
#Param2 int
)
AS
BEGIN
SELECT *
FROM MyData
WHERE MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
WHERE Col1=#Param1 and Col2=#Param2
END
and
CREATE PROCEDURE GetData2
(
#Param1 int,
#Param2 int
)
AS
BEGIN
SELECT *
FROM MyData
WHERE MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
WHERE Col3=#Param1 and Col4=#Param2
END
I know that as a programmer who has worked in non-relational systems this is not intuitive. However trust me, this will get you the best results. This is how your server software expects to be used and over the years it it has been tuned to deliver you fast results using a view in this way.
Can we create parameterized VIEW in SQL Server 2008.
Or Any other alternative for this ?
Try creating an inline table-valued function. Example:
CREATE FUNCTION dbo.fxnExample (#Parameter1 INTEGER)
RETURNS TABLE
AS
RETURN
(
SELECT Field1, Field2
FROM SomeTable
WHERE Field3 = #Parameter1
)
-- Then call like this, just as if it's a table/view just with a parameter
SELECT * FROM dbo.fxnExample(1)
If you view the execution plan for the SELECT you will not see a mention of the function at all and will actually just show you the underlying tables being queried. This is good as it means statistics on the underlying tables will be used when generating an execution plan for the query.
The thing to avoid would be a multi-statement table valued function as underlying table statistics will not be used and can result in poor performance due to a poor execution plan.
Example of what to avoid:
CREATE FUNCTION dbo.fxnExample (#Parameter1 INTEGER)
RETURNS #Results TABLE(Field1 VARCHAR(10), Field2 VARCHAR(10))
AS
BEGIN
INSERT #Results
SELECT Field1, Field2
FROM SomeTable
WHERE Field3 = #Parameter1
RETURN
END
Subtly different, but with potentially big differences in performance when the function is used in a query.
No, you cannot. But you can create a user defined table function.
in fact there exists one trick:
create view view_test as
select
*
from
table
where id = (select convert(int, convert(binary(4), context_info)) from master.dbo.sysprocesses
where
spid = ##spid)
...
in sql-query:
set context_info 2
select * from view_test
will be the same with
select * from table where id = 2
but using udf is more acceptable
As astander has mentioned, you can do that with a UDF. However, for large sets using a scalar function (as oppoosed to a inline-table function) the performance will stink as the function is evaluated row-by-row. As an alternative, you could expose the same results via a stored procedure executing a fixed query with placeholders which substitutes in your parameter values.
(Here's a somewhat dated but still relevant article on row-by-row processing for scalar UDFs.)
Edit: comments re. degrading performance adjusted to make it clear this applies to scalar UDFs.
no. You can use UDF in which you can pass parameters.
We have a SQL Server DB with 150-200 stored procs, all of which produce a viewable query plan in sys.dm_exec_query_plan except for one. According to http://msdn.microsoft.com/en-us/library/ms189747.aspx:
Under the following conditions, no Showplan output is returned in the query_plan column of the returned table for sys.dm_exec_query_plan:
If the query plan that is specified by using plan_handle has been evicted from the plan cache, the query_plan column of the returned table is null. For example, this condition may occur if there is a time delay between when the plan handle was captured and when it was used with sys.dm_exec_query_plan.
Some Transact-SQL statements are not cached, such as bulk operation statements or statements containing string literals larger than 8 KB in size. XML Showplans for such statements cannot be retrieved by using sys.dm_exec_query_plan unless the batch is currently executing because they do not exist in the cache.
If a Transact-SQL batch or stored procedure contains a call to a user-defined function or a call to dynamic SQL, for example using EXEC (string), the compiled XML Showplan for the user-defined function is not included in the table returned by sys.dm_exec_query_plan for the batch or stored procedure. Instead, you must make a separate call to sys.dm_exec_query_plan for the plan handle that corresponds to the user-defined function.
And later..
Due to a limitation in the number of nested levels allowed in the xml data type, sys.dm_exec_query_plan cannot return query plans that meet or exceed 128 levels of nested elements.
I'm confident that none of these apply to this procedure. The result never has a query plan, no matter what the timing, so 1 doesn't apply. There are no long string literals or bulk operations, so 2 doesn't apply. There are no user defined functions or dynamic SQL, so 3 doesn't apply. And there's little nesting, so the last doesn't apply. In fact, it's a very simple proc, which I'm including in full (with some table names changed to protect the innocent). Note that the parameter-sniffing shenanigans postdate the problem. It still happens even if I use the parameters directly in the query. Any ideas on why I don't have a viewable query plan for this proc?
ALTER PROCEDURE [dbo].[spGetThreadComments]
#threadId int,
#stateCutoff int = 80,
#origin varchar(255) = null,
#includeComments bit = 1,
#count int = 100000
AS
if (#count is null)
begin
select #count = 100000
end
-- copy parameters to local variables to avoid parameter sniffing
declare #threadIdL int, #stateCutoffL int, #originL varchar(255), #includeCommentsL bit, #countL int
select #threadIdL = #threadId, #stateCutoffL = #stateCutoff, #originL = #origin, #includeCommentsL = #includeComments, #countL = #count
set rowcount #countL
if (#originL = 'Foo')
begin
select * from FooComments (nolock) where threadId = #threadId and statusCode <= #stateCutoff
order by isnull(parentCommentId, commentId), dateCreated
end
else
begin
if (#includeCommentsL = 1)
begin
select * from Comments (nolock)
where threadId = #threadIdL and statusCode <= #stateCutoffL
order by isnull(parentCommentId, commentId), dateCreated
end
else
begin
select userId, commentId from Comments (nolock)
where threadId = #threadIdL and statusCode <= #stateCutoffL
order by isnull(parentCommentId, commentId), dateCreated
end
end
Hmm, perhaps the tables aren't really tables. They could be views or something else.
try putting dbo. or whatever the schema is in front of all of the table names, and then check again.
see this article:
http://www.sommarskog.se/dyn-search-2005.html
quote from the article:
As you can see, I refer to all tables
in two-part notation. That is, I also
specify the schema (which in SQL
7/2000 parlance normally is referred
to as owner.) If I would leave out the
schema, each user would get his own
his own private version of the query
plan
I want to build a single select stored procedure for SQL 2005 that is universal for any select query on that table.
**Columns**
LocationServiceID
LocationID
LocationServiceTypeID
ServiceName
ServiceCode
FlagActive
For this table I may need to select by LocationServiceID, or LocationID, or LocationServiceTypeID or ServiceName or a combination of the above.
I'd rather not have a separate stored procedure for each of them.
I assume the best way to do it would be to build the 'WHERE' statement on NOT NULL. Something like
SELECT * FROM LocationServiceType WHERE
IF #LocationID IS NOT NULL (LocationID = #LocationID)
IF #LocationServiceID IS NOT NULL (LocationServiceID = #LocationServiceID)
IF #LocationServiceTypeID IS NOT NULL (LocationServiceTypeID = #LocationServiceTypeID)
IF #ServiceName IS NOT NULL (ServiceName = #ServiceName)
IF #ServiceCode IS NOT NULL (ServiceCode = #ServiceCode)
IF #FlagActive IS NOT NULL (FlagActive = #FlagActive)
Does that make sense?
here is the most extensive article I've ever seen on the subject:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
here is an outline of the article:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
First of all, your code will not work. It should look like this:
SELECT * FROM LocationServiceType WHERE
(#LocationID IS NULL OR (LocationID = #LocationID)
... -- all other fields here
This is totally valid and known as 'all-in-one query'. But from a performance point of view this is not a perfect solution as soon as you don't allow SQL Server to select optimal plan. You can see more details here.
Bottom line: if your top priority is 'single SP', then use this approach. In case you care about the performance, look for a different solution.
SELECT *
FROM LocationServiceType
WHERE LocationServiceID = ISNULL(#LocationServiceID,LocationServiceID)
AND LocationID = ISNULL(#LocationID,LocationID)
AND LocationServiceTypeID = ISNULL(#LocationServiceTypeID,LocationServiceTypeID)
AND ServiceName = ISNULL(#ServiceName,ServiceName)
AND ServiceCode = ISNULL(#ServiceCode,ServiceCode)
AND FlagActive = ISNULL(#FlagActive,FlagActive)
If a null value is sent in it will cancel out that line of the where clause, otherwise it will return rows that match the value sent in.
What I've always done is is set the incoming parameters to null if should be ignored in query
then check variable for null first, so if variable is null condition short circuits and filter is not applied. If variable has value then 'or' causes filter to be used. Has worked for me so far.
SET #LocationID = NULLIF(#LocationID, 0)
SET #LocationServiceID = NULLIF(#LocationServiceID, 0)
SET #LocationServiceTypeID = NULLIF(#LocationServiceTypeID, 0)
SELECT * FROM LocationServiceType WHERE
(#LocationID IS NULL OR LocationID = #LocationID)
AND (#LocationServiceID IS NULL OR LocationServiceID = #LocationServiceID)
AND (#LocationServiceTypeID IS NULL OR #LocationServiceTypeID = #LocationServiceTypeID)
etc...