What is the best way of writing an sql query that calls a user defined function as part of the query before produce the selected output. i.e. I want to do something like below please where I the user defined function does some calculations on the table data.
select field1, field2 from table1 where function(table1.field3, table1.field4) > 10
Your (scalar) function:
CREATE FUNCTION my_function (#a AS int, #b AS int)
RETURNS int
BEGIN
RETURN #a * #b
END
Your query:
SELECT field1, field2
FROM table1
WHERE dbo.my_function(table1.field3, table1.field4) > 10
Don't forget the dbo in dbo.my_function. It is required for user defined scalar functions.
Stored procedure may play well or just to write a math/string condition. Depends on how complicated is your condition.
Related
I've been developing a few stored procedure and I have been repeating a portion of codes that derives a column based on a few other columns. So instead of copy this piece of code from one stored procedure to another, I'm thinking of having a function that takes the input columns and produces the output columns.
Basically, the function goes as:
SELECT columnA, columnB, columnC, myFunction(columnA, columnB) as columnD FROM myTable
As we can see, this function will take column A and column B as inputs, then return column D.
However, based on some research, it seems to have some performance issues when using UDF (user-defined) function like this. Is that true? What's the best way to handle this situation?
Thank you guys.
Scalar functions and multi statement table valued user defined functions can cause performance issues, because they implicitly turn your set based operation into a cursor based operation.
However, inline table valued user defined functions do not suffer from this problem. They're fast.
The difference is how you declare the fuction, and what the code looks like inside them. A multi statement function does what it says on the tin - it lets you have multiple statements. Like this:
create function slow() returns #t table(j int, k int) as
begin
declare #j int = 1; -- statement 1
declare #k int = 2; -- statement 2
insert #t values (#j, #k); -- statement 3
return; -- statement 4
end
An inline table valued function does not return a named table which is populated inside the function. It returns a select statement:
create function quick() returns table as
return
(
select j = 1, k = 2
);
The inline table valued function can be "inlined" into the outer select statement, in much the same way as a view. The difference, of course, being that the UDF can take parameters, whereas a view cannot.
You also have to use them differently. Use cross apply:
select t.columnA, t.columnB, u.j, u.k
from MyTable t
cross apply quick(t.columnA, t.columnB) u
In case it's not clear - yes, in your case you only want a "scalar" value back, but that's just a table valued function which returns a single column and a single row. So instead of writing a scalar function, write an inline table valued function that does the same job, and cross apply it.
I have a query that should be reused in many scenarios. This query receives some parameters.
Because it has to be reused, it can't be a stored procedure. So, it's created as a Function (not a View, because it needs some parameters).
This is the best approach so far, right?
The issue is that this query returns data that needs some post processing, i.e. reused in some other queries. I'm facing the issue about reusing them in other queries.
Example:
Function GetMyFirstData returns several columns, including a FootNoteSymbol column. I should create another Function (GetFootnoteText) to return the text (and some other details) about these footnotes.
How should I create the second function that will receive as a parameter the FootNoteSymbol (many) returned by the first function GetMyFirstData?
I'm avoiding Stored Procedure, because these results will most likely be reused in other queries.
Also, the FootNoteSymbol is also returned in many other functions, with different return structures (therefore I can't create a TableType, because the structure is not fixed - however FootNoteSymbol is common among all of them).
Using SQL Server 2008 R2.
Functions that return data:
CREATE FUNCTION GetMyFirstData
(
#Param1 int,
#Param2 int
)
RETURNS #Return TABLE
(
Col1 int,
Col2 int,
FootnoteSymbol int,
Col3 int,
Col4 int
)
AS
BEGIN
SELECT Col1, Col2, FootnoteSymbol, Col3, Col4
FROM MyData
RETURN;
END
CREATE FUNCTION GetMySecondData
(
#Param1 int,
#Param2 int
)
RETURNS #Return TABLE
(
Col1 int,
FootnoteSymbol int,
Col2 int
)
AS
BEGIN
SELECT Col1, FootnoteSymbol, Col2
FROM MyOtherData
RETURN;
END
Function that should get footnotes text:
CREATE FUNCTION GetFootnoteText
(
#FootnoteSymbol --this is the issue, how to reuse the footnotesymbols from the other functions
)
RETURNS #Return TABLE
(
Symbol int,
Text text,
OtherDetail nvarchar(200)
)
AS
BEGIN
SELECT Symbol, Text, OtherDetail
FROM MyFootnotes
WHERE Symbol in --this is the issue, how to reuse the footnotesymbols from the other functions
RETURN;
END
Thanks!
DO. NOT. DO. THIS.
Reusing code is a noble goal, but SQL is not the language for it. There are many documented performance problems resulting from your approach. Some quick links Query Performance and multi-statement table valued functions, Improving query plans with the SCHEMABINDING option on T-SQL UDFs or Compute Scalars, Expressions and Execution Plan Performance.
I wish I had a good alternative for you, but I don't. Views are OK for query re-use. But attempting to compose SQL table value functions has always ended in disaster, in every engagement I've seen.
Don't do it.
At the very least stick to Inline Table Value Functions;
The RETURNS clause contains only the keyword table. You do not have to define the format of a return variable, because it is set by the format of the result set of the SELECT statement in the RETURN clause.
There is no function_body delimited by BEGIN and END.
The RETURN clause contains a single SELECT statement in parentheses. The result set of the SELECT statement forms the table returned by the function. The SELECT statement used in an inline function is subject to the same restrictions as SELECT statements used in views.
The table-valued function accepts only constants or #local_variable arguments
As far as I can tell (and I reference you to #SeanLange comment "You know what your tables look like, what the data is like, what the rules are and what the expected results are. I on the other hand can't see any of that.") you have a basic miss-understanding about how relational databases work. To "solve" the problem presented here using standard relational database practices I would not split it up into multiple functions (as there is no gain there) instead I would create a SP that did a JOIN to get all the data you need. Like this:
CREATE PROCEDURE GetData
(
#Param1 int,
#Param2 int
)
AS
BEGIN
SELECT MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
JOIN MyFootnotes ON MyData.FootnoteSymbol = MyFootnotes.Symbol
END
You don't show how you use the parameters so I can't address that, but I can guess. Let's say the parameters in this function are used in the where clause to limit the results. (Col1=#Param1 and Col2=#Param2) but in another case you have different limits (eg Col3=#Param1 and Col4=#Param2).
In this case the best way to do it is to make a view that is shared and limited in each SP. I would not use functions as I see no value to them (and a high potential for problems as #RemusRusanu points out). Like this:
CREATE VIEW MyData AS
SELECT MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
JOIN MyFootnotes ON MyData.FootnoteSymbol = MyFootnotes.Symbol
with
CREATE PROCEDURE GetData1
(
#Param1 int,
#Param2 int
)
AS
BEGIN
SELECT *
FROM MyData
WHERE MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
WHERE Col1=#Param1 and Col2=#Param2
END
and
CREATE PROCEDURE GetData2
(
#Param1 int,
#Param2 int
)
AS
BEGIN
SELECT *
FROM MyData
WHERE MyData.Col1,
MyData.Col2,
MyFootnotes.Text,
MyFootnotes.OtherDetail,
MyData.Col3,
MyData.Col4
FROM MyData
WHERE Col3=#Param1 and Col4=#Param2
END
I know that as a programmer who has worked in non-relational systems this is not intuitive. However trust me, this will get you the best results. This is how your server software expects to be used and over the years it it has been tuned to deliver you fast results using a view in this way.
I have a base stored procedure simply returning a select from the database, like this:
CREATE PROCEDURE MyProcedure
AS
BEGIN
SELECT * FROM MyTable
END
GO
But now I need to execute some logic for every row of my select. According to the result I need to return or not this row. I would have my select statement running with a cursor, checking the rule and return or not the row. Something like this:
CREATE PROCEDURE MyProcedure
AS
BEGIN
DECLARE CURSOR_MYCURSOR FOR SELECT Id, Name FROM MyTable
OPEN CURSOR_MYCURSOR
FETCH NEXT FROM CURSOR_MYCURSOR INTO #OUTPUT1, #OUTPUT2
WHILE (##FETCH_STATUS=0)
BEGIN
IF (SOME_CHECK)
SELECT #OUTPUT1, #OUTPUT2
ELSE
--WILL RETURN SOMETHING ELSE
END
END
GO
The first problem is that everytime I do SELECT #OUTPUT1, #OUTPUT2 the rows are sent back as different result sets and not in a single table as I would need.
Sure, applying some logic to a row sounds like a "FUNCTION" job. But I can't use the result of the function to filter the results being selected. That is because when my check returns false I need to select something else to replace the faulty row. So, I need to return the faulty rows so I can be aware of them and replace by some other row.
The other problem with this method is that I would need to declare quite a few variables so that I can output them through the cursor iteration. And those variables would need to follow the data types for the original table attributes and somehow not getting out of sync if something changes on the original tables.
So, what is the best approach to return a single result set based on a criteria?
Thanks in advance.
I recommend use of cursors but easy solution to your question would be to use table variable or temp table
DECLARE #MyTable TABLE
(
ColumnOne VARCHAR(20)
,ColumnTwo VARCHAR(20)
)
CREATE TABLE #MyTable
(
ColumnOne VARCHAR(20)
,ColumnTwo VARCHAR(20)
)
than inside your cursors you can insert records that match your logic
INSERT INTO #MyTable VALUES (#Output1, #Output2)
INSERT INTO #MyTable VALUES (#Output1, #Output2)
after you done with cursor just select everything from table
SELECT * FROM #MyTable
SELECT * FROM #MyTable
Can we create parameterized VIEW in SQL Server 2008.
Or Any other alternative for this ?
Try creating an inline table-valued function. Example:
CREATE FUNCTION dbo.fxnExample (#Parameter1 INTEGER)
RETURNS TABLE
AS
RETURN
(
SELECT Field1, Field2
FROM SomeTable
WHERE Field3 = #Parameter1
)
-- Then call like this, just as if it's a table/view just with a parameter
SELECT * FROM dbo.fxnExample(1)
If you view the execution plan for the SELECT you will not see a mention of the function at all and will actually just show you the underlying tables being queried. This is good as it means statistics on the underlying tables will be used when generating an execution plan for the query.
The thing to avoid would be a multi-statement table valued function as underlying table statistics will not be used and can result in poor performance due to a poor execution plan.
Example of what to avoid:
CREATE FUNCTION dbo.fxnExample (#Parameter1 INTEGER)
RETURNS #Results TABLE(Field1 VARCHAR(10), Field2 VARCHAR(10))
AS
BEGIN
INSERT #Results
SELECT Field1, Field2
FROM SomeTable
WHERE Field3 = #Parameter1
RETURN
END
Subtly different, but with potentially big differences in performance when the function is used in a query.
No, you cannot. But you can create a user defined table function.
in fact there exists one trick:
create view view_test as
select
*
from
table
where id = (select convert(int, convert(binary(4), context_info)) from master.dbo.sysprocesses
where
spid = ##spid)
...
in sql-query:
set context_info 2
select * from view_test
will be the same with
select * from table where id = 2
but using udf is more acceptable
As astander has mentioned, you can do that with a UDF. However, for large sets using a scalar function (as oppoosed to a inline-table function) the performance will stink as the function is evaluated row-by-row. As an alternative, you could expose the same results via a stored procedure executing a fixed query with placeholders which substitutes in your parameter values.
(Here's a somewhat dated but still relevant article on row-by-row processing for scalar UDFs.)
Edit: comments re. degrading performance adjusted to make it clear this applies to scalar UDFs.
no. You can use UDF in which you can pass parameters.
I do a select and it brings back a list of IDs. I then want to call another procedure for each ID that calculates a result, so I end up with a list of results.
How can I do this? I need some kind of loop but I am not very good at SQL.
Edit: Microsoft SQL 2008, and purely in SQL
Write a user defined function that takes in the ID and returns the calculated result you can then get that result for each ID with a query like this:
SELECT id, DatabaseYouUsed.dbo.functionYouWrote(id)
FROM DatabaseYouUsed.dbo.TableWithIDs
You can have a stored procedure that calls the select to get the IDs, use a cursor on the result list and call the other procedure for each ID. All inside a stored procedure.
If one row generates one result:
CREATE FUNCTION f(x int) RETURNS int AS BEGIN
RETURN x * 2
END
GO
SELECT id, x, dbo.f(x) FROM my_table
If one row might generate more than one result:
CREATE FUNCTION f(x int) RETURNS #r TABLE(result int) AS BEGIN
INSERT INTO #r VALUES(x)
INSERT INTO #r VALUES(x * 2)
RETURN
END
GO
SELECT t.id, t.x, r.result FROM my_table t CROSS APPLY dbo.f(t.x) r