Multi-statement Table Valued Function vs Inline Table Valued Function - sql

A few examples to show, just incase:
Inline Table Valued
CREATE FUNCTION MyNS.GetUnshippedOrders()
RETURNS TABLE
AS
RETURN SELECT a.SaleId, a.CustomerID, b.Qty
FROM Sales.Sales a INNER JOIN Sales.SaleDetail b
ON a.SaleId = b.SaleId
INNER JOIN Production.Product c ON b.ProductID = c.ProductID
WHERE a.ShipDate IS NULL
GO
Multi Statement Table Valued
CREATE FUNCTION MyNS.GetLastShipped(#CustomerID INT)
RETURNS #CustomerOrder TABLE
(SaleOrderID INT NOT NULL,
CustomerID INT NOT NULL,
OrderDate DATETIME NOT NULL,
OrderQty INT NOT NULL)
AS
BEGIN
DECLARE #MaxDate DATETIME
SELECT #MaxDate = MAX(OrderDate)
FROM Sales.SalesOrderHeader
WHERE CustomerID = #CustomerID
INSERT #CustomerOrder
SELECT a.SalesOrderID, a.CustomerID, a.OrderDate, b.OrderQty
FROM Sales.SalesOrderHeader a INNER JOIN Sales.SalesOrderHeader b
ON a.SalesOrderID = b.SalesOrderID
INNER JOIN Production.Product c ON b.ProductID = c.ProductID
WHERE a.OrderDate = #MaxDate
AND a.CustomerID = #CustomerID
RETURN
END
GO
Is there an advantage to using one type (in-line or multi statement) over the other? Is there certain scenarios when one is better than the other or are the differences purely syntactical? I realise the two example queries are doing different things but is there a reason I would write them in that way?
Reading about them and the advantages/differences haven't really been explained.

In researching Matt's comment, I have revised my original statement. He is correct, there will be a difference in performance between an inline table valued function (ITVF) and a multi-statement table valued function (MSTVF) even if they both simply execute a SELECT statement. SQL Server will treat an ITVF somewhat like a VIEW in that it will calculate an execution plan using the latest statistics on the tables in question. A MSTVF is equivalent to stuffing the entire contents of your SELECT statement into a table variable and then joining to that. Thus, the compiler cannot use any table statistics on the tables in the MSTVF. So, all things being equal, (which they rarely are), the ITVF will perform better than the MSTVF. In my tests, the performance difference in completion time was negligible however from a statistics standpoint, it was noticeable.
In your case, the two functions are not functionally equivalent. The MSTV function does an extra query each time it is called and, most importantly, filters on the customer id. In a large query, the optimizer would not be able to take advantage of other types of joins as it would need to call the function for each customerId passed. However, if you re-wrote your MSTV function like so:
CREATE FUNCTION MyNS.GetLastShipped()
RETURNS #CustomerOrder TABLE
(
SaleOrderID INT NOT NULL,
CustomerID INT NOT NULL,
OrderDate DATETIME NOT NULL,
OrderQty INT NOT NULL
)
AS
BEGIN
INSERT #CustomerOrder
SELECT a.SalesOrderID, a.CustomerID, a.OrderDate, b.OrderQty
FROM Sales.SalesOrderHeader a
INNER JOIN Sales.SalesOrderHeader b
ON a.SalesOrderID = b.SalesOrderID
INNER JOIN Production.Product c
ON b.ProductID = c.ProductID
WHERE a.OrderDate = (
Select Max(SH1.OrderDate)
FROM Sales.SalesOrderHeader As SH1
WHERE SH1.CustomerID = A.CustomerId
)
RETURN
END
GO
In a query, the optimizer would be able to call that function once and build a better execution plan but it still would not be better than an equivalent, non-parameterized ITVS or a VIEW.
ITVFs should be preferred over a MSTVFs when feasible because the datatypes, nullability and collation from the columns in the table whereas you declare those properties in a multi-statement table valued function and, importantly, you will get better execution plans from the ITVF. In my experience, I have not found many circumstances where an ITVF was a better option than a VIEW but mileage may vary.
Thanks to Matt.
Addition
Since I saw this come up recently, here is an excellent analysis done by Wayne Sheffield comparing the performance difference between Inline Table Valued functions and Multi-Statement functions.
His original blog post.
Copy on SQL Server Central

Internally, SQL Server treats an inline table valued function much like it would a view and treats a multi-statement table valued function similar to how it would a stored procedure.
When an inline table-valued function is used as part of an outer query, the query processor expands the UDF definition and generates an execution plan that accesses the underlying objects, using the indexes on these objects.
For a multi-statement table valued function, an execution plan is created for the function itself and stored in the execution plan cache (once the function has been executed the first time). If multi-statement table valued functions are used as part of larger queries then the optimiser does not know what the function returns, and so makes some standard assumptions - in effect it assumes that the function will return a single row, and that the returns of the function will be accessed by using a table scan against a table with a single row.
Where multi-statement table valued functions can perform poorly is when they return a large number of rows and are joined against in outer queries. The performance issues are primarily down to the fact that the optimiser will produce a plan assuming that a single row is returned, which will not necessarily be the most appropriate plan.
As a general rule of thumb we have found that where possible inline table valued functions should be used in preference to multi-statement ones (when the UDF will be used as part of an outer query) due to these potential performance issues.

There is another difference. An inline table-valued function can be inserted into, updated, and deleted from - just like a view. Similar restrictions apply - can't update functions using aggregates, can't update calculated columns, and so on.

Your examples, I think, answer the question very well. The first function can be done as a single select, and is a good reason to use the inline style. The second could probably be done as a single statement (using a sub-query to get the max date), but some coders may find it easier to read or more natural to do it in multiple statements as you have done. Some functions just plain can't get done in one statement, and so require the multi-statement version.
I suggest using the simplest (inline) whenever possible, and using multi-statements when necessary (obviously) or when personal preference/readability makes it wirth the extra typing.

Another case to use a multi line function would be to circumvent sql server from pushing down the where clause.
For example, I have a table with a table names and some table names are formatted like C05_2019 and C12_2018 and and all tables formatted that way have the same schema. I wanted to merge all that data into one table and parse out 05 and 12 to a CompNo column and 2018,2019 into a year column. However, there are other tables like ACA_StupidTable which I cannot extract CompNo and CompYr and would get a conversion error if I tried. So, my query was in two part, an inner query that returned only tables formatted like 'C_______' then the outer query did a sub-string and int conversion. ie Cast(Substring(2, 2) as int) as CompNo. All looks good except that sql server decided to put my Cast function before the results were filtered and so I get a mind scrambling conversion error. A multi statement table function may prevent that from happening, since it is basically a "new" table.

look at Comparing Inline and Multi-Statement Table-Valued Functions you can find good descriptions and performance benchmarks

I have not tested this, but a multi statement function caches the result set. There may be cases where there is too much going on for the optimizer to inline the function. For example suppose you have a function that returns a result from different databases depending on what you pass as a "Company Number". Normally, you could create a view with a union all then filter by company number but I found that sometimes sql server pulls back the entire union and is not smart enough to call the one select. A table function can have logic to choose the source.

Maybe in a very condensed way.
ITVF ( inline TVF) : more if u are DB person, is kind of parameterized view, take a single SELECT st
MTVF ( Multi-statement TVF): Developer, creates and load a table variable.

if you are going to do a query you can join in your Inline Table Valued function like:
SELECT
a.*,b.*
FROM AAAA a
INNER JOIN MyNS.GetUnshippedOrders() b ON a.z=b.z
it will incur little overhead and run fine.
if you try to use your the Multi Statement Table Valued in a similar query, you will have performance issues:
SELECT
x.a,x.b,x.c,(SELECT OrderQty FROM MyNS.GetLastShipped(x.CustomerID)) AS Qty
FROM xxxx x
because you will execute the function 1 time for each row returned, as the result set gets large, it will run slower and slower.

Related

SQL - IN clause vs equals operator for small list

Which should be the preferred and efficient way?
where #TeamId in (Team1Id, Team2Id)
or
where #TeamId=Team1Id or #TeamId=Team2Id
I am using sql server 2008.
Edit
When I checked execution plans, both the queries showed that they are using indexes and same execution plan.
Both are same
SQL server converts this
where #TeamId in (Team1Id, Team2Id)
Into
where #TeamId=Team1Id or #TeamId=Team2Id
It's better to write IN compare to OR more readable and easy.
For the specific example yo provide, of testing a variable, IN is simply syntactic sugar for multiple OR's.
However in the related case of selecting rows of a relation the use of a join to another relation is superior, particulalry if the data field being compared is indexed or the list of comparison values grows. Such a comparison relation is easily created using a static sub-query like this:
select *
from data
join (
select Team1Id as TeamId union all
select Team2Id
) comparison on comparison.TeamId = data.TeamId
This technique of a static sub-query is widely applicable to many circumstances.

Why use table valued function instead of a temp table in SQL?

I am trying to speed up my monster of a stored procedure that works on millions of records across many tables.
I've stumbled on this:
Is it possible to use a Stored Procedure as a subquery in SQL Server 2008?
My question is why using a table valued function be better then using a temp table.
Suppose my stored procedure #SP1
declare #temp table(a int)
insert into #temp
select a from BigTable
where someRecords like 'blue%'
update AnotherBigTable
set someRecords = 'were blue'
from AnotherBigTable t
inner join
#temp
on t.RecordID = #temp.a
After reading the above link it seems that the consunsus is instead of using my #temp as temp table, rather create a table valued function that will do that select.
(and inline it if its a simple select like I have in this example) But my actual selects are multiple and often not simple (ie with subqueires, etc)
What is the benefit?
Thanks
Generally, you would use a temporary table (#) instead of a table variable. Table variables are really only useful for
functions, which cannot create temporary objects
passing table-valued data (sets) as read-only parameters
gaming statistics for certain query edge-cases
execution plan stability (related to statistics and also the fact that INSERT INTO table variables cannot use a parallel plan)
prior to SQL Server 2012, #temp tables inherit collation from the tempdb whereas #table variables uses the current database collation
Other than those, a #temporary table will work as well as if not better than a variable.
Further reading: What's the difference between a temp table and table variable in SQL Server?
Probably no longer relevant... but two things I might suggest that take two different approaches.
Simple approach 1:
Try a primary key on your table valued variable:
declare #temp table(a int, primary key(a))
Simple approach 2:
In this particular case try a common table expression (CTE)...
;with
temp as (
SELECT a as Id
FROM BigTable
WHERE someRecords like '%blue'
),
UPDATE AnotherBigTable
SET someRecords = 'were Blue'
FROM AnotherBigTable
JOIN temp
ON temp.Id = AnotherBigTable.RecordId
CTE's are really great and help to isolate specific data sets you actually want to work on from the myriad of records contained in larger tables... and if you find your self utilizing the same CTE declaration repeatedly consider formalizing that expression into a view. Views are an often overlooked and very valuable tool for DBA and DB programmers to manage large complex data sets with lots of records and relationships.

Is this an inefficient way to write a SQL query?

Let's suppose I had a view, like this:
CREATE VIEW EmployeeView
AS
SELECT ID, Name, Salary(PaymentPlanID) AS Payment
FROM Employees
The user-defined function, Salary, is somewhat expensive.
If I wanted to do something like this,
SELECT *
FROM TempWorkers t
INNER JOIN EmployeeView e ON t.ID = e.ID
will Salary be executed on every row of Employees, or will it do the join first and then only be called on the rows filtered by the join? Could I expect the same behavior if EmployeeView was a subquery or a table valued function instead of a view?
The function will only be called where relevant. If your final select statement does not include that field, it's not called at all. If your final select refers to 1% of your table, it will only be called for that 1% of the table.
This is effectively the same for sub-queries/inline views. You could specify the function for a field in a sub-query, then never use that field, in which case the function never gets called.
As an aside: scalar functions are indeed notoriously expensive in many regards. You may be able to reduce it's cost by forming it as an inline table valued function.
SELECT
myTable.*,
myFunction.Value
FROM
myTable
CROSS APPLY
myFunction(myTable.field1, myTable.field2) as myFunction
As long as MyFunction is Inline (not multistatement) and returns only one row for each set of inputs, this often scales much better than Scalar Functions.
This is slightly different from making the whole view a table valued function, that returns many rows.
If such a TVF is multistatment, it WILL call the Salary function for every record. But inline functions can expanded inline, as if a SQL macro, and so only call Salary as required; like the view.
As a general rule for TVFs though, don't return records that will then be discarded.
It should only execute the Salary function for the joined rows. But you are not filtering the tables any further. If ID is a foreign key column and not null then it will execute that function for all the rows.
The actual execution plan is a good place to see for sure.
As said above, the function will only be called for relevant rows. For your further questions, and to get a really good idea of what's happening, you need to gather performance data either through SQL Profiler, or by viewing the actual execution plan and elapsed times. Then test out a few theories and find which is best performance.

Use of function calls in stored procedure sql server 2005?

Use of function calls in where clause of stored procedure slows down performance in sql server 2005?
SELECT * FROM Member M
WHERE LOWER(dbo.GetLookupDetailTitle(M.RoleId,'MemberRole')) != 'administrator'
AND LOWER(dbo.GetLookupDetailTitle(M.RoleId,'MemberRole')) != 'moderator'
In this query GetLookupDetailTitle is a user defined function and LOWER() is built in function i am asking about both.
Yes.
Both of these are practices to be avoided where possible.
Applying almost any function to a column makes the expression unsargable which means an index cannot be used and even if the column is not indexed it makes cardinality estimates incorrect for the rest of the plan.
Additionally your dbo.GetLookupDetailTitle scalar function looks like it does data access and this should be inlined into the query.
The query optimiser does not inline logic from scalar UDFs and your query will be performing this lookup for each row in your source data, which will effectively enforce a nested loops join irrespective of its suitability.
Additionally this will actually happen twice per row because of the 2 function invocations. You should probably rewrite as something like
SELECT M.* /*But don't use * either, list columns explicitly... */
FROM Member M
WHERE NOT EXISTS(SELECT *
FROM MemberRoles R
WHERE R.MemberId = M.MemberId
AND R.RoleId IN (1,2)
)
Don't be tempted to replace the literal values 1,2 with variables with more descriptive names as this too can mess up cardinality estimates.
Using a function in a WHERE clause forces a table scan.
There's no way to use an index since the engine can't know what the result will be until it runs the function on every row in the table.
You can avoid both the user-defined function and the built-in by
defining "magic" values for administrator and moderator roles and compare Member.RoleId against these scalars
defining IsAdministrator and IsModerator flags on a MemberRole table and join with Member to filter on those flags

Table Valued Function where did my query plan go?

I've just wrapped a complex SQL Statement in a Table-valued function on SQLServer 2000.
When looking at the Query Plan for a SELECT * FROM dbo.NewFunc it just gives me a Table Scan of the table I have created.
I'm guessing that this is because table is created in tempdb and I am just selecting from it.
So the query is simply :
SELECT * FROM table in tempdb
My questions are:
Is the UDF using the same plan as the complex SQL statement?
How can I tune indexes for this UDF?
Can I see the true plan?
Multi-statement table valued functions (TVF) are black boxes to the optimiser for the outer query. You can only see IO, CPU etc from profiler.
The TVF must run to completion and return all rows before any processing happens. That means a where clause will not be optimised for example.
So if this TVF returns a million rows, it has be sorted first.
SELECT TOP 1 x FROM db.MyTVF ORDER BY x DESC
Single statement/inline TVFs do not suffer because they are expanded like macros and evaluated. The example above would evaluate indexes etc.
Also here too: Does query plan optimizer works well with joined/filtered table-valued functions? and Relative Efficiency of JOIN vs APPLY in Microsoft SQL Server 2008
To answer exactly: no, no, and no
I have very few multi statement TVFs: where I do, I have lots of parameters to filter inside the UDF.