reuse sql with view or function - sql

I have a sql query that I will be reusing in multiple stored procedures. The query works against multiple tables and returns an integer value based on 2 variables passed to it.
Rather than repeating the query in different stored procedures I want to share it and have 2 options:
create a view to which I can join to based on the variables and get the integer value from it.
create a function again with criteria passed to it and return integer variable
I am leaning towards option 1 but would like opinions on which is better and common practice. Which would be better performance wise etc. (joining to a view or calling function)
EDIT: The RDBMS is SQL Server

If you will always be using the same parametrised predicate to filter the results then I'd go for a parametrised inline table valued function. In theory this is treated the same as a View in that they both get expanded out by the optimiser in practice it can avoid predicate pushing issues. An example of such a case can be seen in the second part of this article.
As Andomar points out in the comments most of the time the query optimiser does do a good job of pushing down the predicate to where it is needed but I'm not aware of any circumstance in which using the inline TVF will perform worse so this seems a rational default choice between the two (very similar) constructs.
The one advantage I can see for the View would be that it would allow you to select without a filter or with different filters so is more versatile.
Inline TVFs can also be used to replace scalar UDFs for efficiency gains as in this example.

You cannot pass variables into a view, so your only option it seems is to use a function. There are two options for this:
a SCALAR function
a TABLE-VALUED function (inline or multi-statement)
If you were returning records, then you could use a WHERE clause from outside a not-too-complex VIEW which can get in-lined into the query within the view, but since all you are returning is a single column integer value, then a view won't work.
An inline TVF can be expanded by the query optimizer to work together with the outer (calling) query, so it can be faster in most cases when compared to a SCALAR function.
However, the usages are different - a SCALAR function returns a single value immediately
select dbo.scalarme(col1, col2), other from ..
whereas an inline-TVF requires you to either subquery it or CROSS APPLY against another table
select (select value from dbo.tvf(col1, col2)), other from ..
-- or
select f.value, t.other
from tbl t
CROSS apply dbo.tvf(col1, col2) f -- or outer apply

I'm going to give you a half-answer because I cannot be sure about what is better in terms of performance, I'm sorry. But then other people have surely got good pieces of advice on that score, I'm certain.
I will stick to your 'common practice' part of question.
So, a scalar function wood seem to me a natural solution in this case. Why, you only want a value, an integer value to be returned - this is what scalar functions are for, isn't it?
But then, if I could see a probability that later I would need more than one value, I might then consider switching to a TVF. Then again, what if you have already implemented your scalar function and used it in many places of your application and now you need a row, a column or a table of values to be returned using basically the same logic?
In my view (no pun intended), a view could become something like the greatest common divisor for both scalar and table-valued functions. The functions would only need to apply the parameters.
Now you have said that you are only planning to choose which option to use. Yet, considering the above, I still think that views can be a good choice and prove useful when scaling your application, and you could actually use both views and functions (if only that didn't upset the performance too badly) just as I have described.

One advantage a TVF has has over a view is that you can force whoever calls it to target a specific index.

Related

Will using a Function to simplify SQL Query massively imapact performance?

I have a query which uses a complicated set of CASE statements, some nested, some with a COALESCE of CASE statements, and it is a pain to manage.
The same basic logic applies to around 12 columns in the SELECT, and could be modularised and placed in a function - with benefits including consistency and ease of prototyping.
Will putting the logic into a function massively impede performance? Are functions inherently slower?
The SELECT Pulls from 5 tables each of around 100,000 rows, so performance is of some importance.
SQL Server 2005
Normally, selecting a result of a scalar function won't hurt much, but filtering by it may easily cost hundreds of seconds (not necessarily though).
If you need to filter by a scalar function result (WHERE col = dbo.scalar_function()), it often helps to make an inline table-valued function instead. It would return its value as the only row of the result table. You would then do inner join with the function result, effectively filtering by the returned value. This works because SQL Server is always able to unwind inline table-valued functions and inline them into the calling query.
Note this trick won't work if the function is a multi-step one. These cannot be unwound.

Why are SQL-Server UDFs so limited?

From the MSDN docs for create function:
User-defined functions cannot be used to perform actions that modify the database state.
My question is simply - why?
Yes, a UDF that modifies data may have potentially unwanted side-effects.
Yes, there is overhead involved if a UDF is called thousands of times.
But that is the whole point of design and testing - to ensure that such issues are ironed out before deployment. So why do DB vendors insist on imposing these artificial limitations on developers? What is the point of a language construct that can essentially only be used as a wrapper for select statements?
The reason for this question is as follows: I am writing a function to return a GUID for a certain unique integer ID. If a GUID is already allocated for that ID I simply return it; otherwise I want to generate a new GUID, store that into a table, and return the newly-generated GUID. (Yes, this sounds long-winded and possibly crazy, but when you're sending data to another dev company who believes their design was handed down by God and cannot be improved upon, it's easier just to smile and nod and do what they ask).
I know that I can use a stored procedure with an output parameter to achieve the same result, but then I have to declare a new variable just to hold the result of the sproc. Not only that, I then have to convert my simple select into a while loop that inserts into a temporary table, and call the sproc for every iteration of that loop.
It's usually best to think of the available tools as a spectrum, from Views, through UDFs, out to Stored Procedures. At the one end (Views) you have a lot of restrictions, but this means the optimizer can actually "see through" the code and make intelligent choices. At the other end (Stored Procedures), you've got lots of flexibility, but because you have such freedom, you lose some abilities (e.g. because you can return multiple result sets from a stored proc, you lose the ability to "compose" it as part of a larger query).
UDFs sit in a middle ground - you can do more than you can do in a view (multiple statements, for example), but you don't have as much flexibility as a stored proc. By giving up this freedom, it allows the outputs to be composed as part of a larger query. By not having side effects, you guarantee that, for example, it doesn't matter in which row order the UDF is applied in. If you could have side effects, the optimizer might have to give an ordering guarantee.
I understand your issue, I think, but taking this from your comment:
I want to do something like select my_udf(my_variable) from my_table, where my_udf either selects or creates the value it returns
So you want a select that (potentially) modifies data. Can you look at that sentence on its own and tell me that that reads perfectly OK? - I certainly can't.
Reading your description of what you actually need to do:
I am writing a function to return a
GUID for a certain unique integer ID.
If a GUID is already allocated for
that ID I simply return it; otherwise
I want to generate a new GUID, store
that into a table, and return the
newly-generated GUID.
I know that I can use a stored
procedure with an output parameter to
achieve the same result, but then I
have to declare a new variable just to
hold the result of the sproc. Not only
that, I then have to convert my simple
select into a while loop that inserts
into a temporary table, and call the
sproc for every iteration of that
loop.
from that last sentence it sounds like you have to process many rows at once, so how about a single INSERT that inserts the GUIDs for those IDs that don't already have them, followed by a single SELECT that returns all the GUIDs that (now) exist?
Sometimes if you cannot implement the solution you came up with, it may be an indication that your solution is not optimal.
Using a statement like this
INSERT INTO IntGuids(IntValue, GuidValue)
SELECT MyIntValues.IntValue, NEWID()
FROM MyIntValues
LEFT OUTER JOIN IntGuids ON MyIntValues.IntValue = IntGuids.IntValue
WHERE IntGuids.IntValue IS NULL
creates all the GUIDs you need to have in 1 statement. No need to SELECT+INSERT for every single value.

SQL Query theory question - single-statement vs multi-statement queries

When I write SQL queries, I find myself often thinking that "there's no way to do this with a single query". When that happens I often turn to stored procedures or multi-statement table-valued functions that use temp tables (of one sort or another) and end up simply combining the results and returning the result table.
I'm wondering if anyone knows, simply as a matter of theory, whether it should be possible to write ANY query that returns a single result set as a single query (not multiple statements). Obviously, I'm ignoring relevant points such as code readability and maintainability, maybe even query performance/efficiency. This is more about theory - can it be done... and don't worry, I certainly don't plan to start forcing myself to write a single-statement query when multi-statement will better suit my purpose in all cases, but it might make me think twice or a little bit longer on whether there is a viable way to get the result from a single query.
I guess a few parameters are in order - I'm thinking of a relational database (such as MS SQL) with tables that follow common best practices (such as all tables having a primary key and so forth).
Note: in order to win 'Accepted Answer' on this, you'll need to provide a definitive proof (reference to web material or something similar.)
I believe it is possible. I've worked with very difficult queries, very long queries, and often, it is possible to do it with a single query. But most of the time, it's harder to mantain, so if you do it with a single query, make sure you comment your query carefully.
I've never encountered something that could not be done in a single query.
But sometimes it's best to do it in more than one query.
At least with the a recent version of Oracle is absolutely possible. It has a 'model clause' which makes sql turing complete. ( http://blog.schauderhaft.de/2009/06/18/building-a-turing-engine-in-oracle-sql-using-the-model-clause/ ). Of course this is all with the usual limitation that we don't really have unlimited time and memory.
For a normal sql dialect without these abdominations I don't think it is possible.
A task that I can't see how to implement in 'normal sql' would be:
Assume a table with a single column of type integer
For every row
'take the value at the current row and go that many rows back, fetch that value, go that many rows back, and continue until you fetch the same value twice consecutively and return that as the result.'
I can't prove it, but I believe the answer is a cautious yes - provided your database design is done properly. Usually being forced to write multiple statements to get a certain result is a sign that your schema may need some improvements.
I'd say "yes" but can't prove it. However, my main thought process:
Any select should be a set based operation
Your assumption is that you are dealing with mathematically correct sets (ie normalised correctly)
Set theory should guarantee it's possible
Other thoughts:
Multiple SELECT statement often load temp tables/table variables. These can be derived or separated in CTEs.
Any RBAR processing (for good or bad) now be dealt with CROSS/OUTER APPLY onto derived tables
UDFs would be classed as "cheating" in this context I feel, because it allows you to put a SELECT into another module rather than in your single one
No writes allowed in your "before" sequence of DML: this changes state from SELECT to SELECT
Have you seen some of the code in our shop?
Edit, glossary
RBAR = Row By Agonising Row
CTE = Common Table Expression
UDF = User Defined Function
Edit: APPLY: cheating?
SELECT
*
FROM
MyTable1 t1
CROSS APPLY
(
SELECT * FROM MyTable2 t2
WHERE t1.something = t2.something
) t2
In theory yes, if you use functions or a torturous maze of OUTER APPLYs or sub-queries; however, for readability and performance, we have always ended up going with temp tables and multi-statement stored procedures.
As someone above commented, this is usually a sign that your data structure is starting to smell; not that it's bad, but that maybe it's time to denormalise for performance reasons (happens to the best of us), or maybe put a denormalised querying layer in front of your normalised "real" data.

Difference between a inline function and a view

I am a newbie in using functions and it appears to me that an inline function is very similar to a view. Am I correct?
Also, can I have UPDATE statements within a function?
After reading many of the answers here, I'd like to note that there is a big difference between an inline table-valued function and any other kind of function (scalar or multi-line TVF).
An inline TVF is simply a parameterized view. It can be expanded and optimized away just like a view. It is not required to materialize anything before "returning results" or anything like that (although, unfortunately, the syntax has a RETURN.
A big benefit I've found of an inline TVF over a view is that it does force required parameterization whereas with a view, you have to assume that the caller will appropriately join or restrict the usage of the view.
For example, we have many large fact tables in DW with a typical Kimball star model. I have a view on a fact table-centered model, which called without any restriction, will return hundreds of millions of rows. By using an inline TVF with appropriate parameterization, users are unable to accidentally ask for all the rows. Performance is largely indistinguishable between the two.
No difference. They are both expanded/unnested into the containing query.
Note: indexed views are considered differently but still may be expanded, and multi-valued table functions are black boxes to the containing query.
Tony Rogerson: VIEWS - they offer no optimisation benefits; they are simply inline macros - use sparingly
Adam Machanic: Scalar functions, inlining, and performance: An entertaining title for a boring post
Related SO question: Does query plan optimizer works well with joined/filtered table-valued functions?
Scary DBA (at the end)
Finally, writes to tables are not allowed in functions
Edit, after Eric Z Beard's comment and downvote...
The question and answers (not just mine) are not about scalar udfs. "Inline" means "inline table valued functions". Very different concepts...
Update: Looks like I missed the "inline" part. However, I'm leaving the answer here in case someone wants to read about difference between VIEWs and regular functions.
If you only have a function that does SELECT and output the data, then they are similar. However, even then, they are not the same because VIEWs can be optimized by the engine. For example, if you run SELECT * FROM view1 WHERE x = 10; and you have index on table field that maps to X, then it will be used. On the other hand, function builds a result set prior to searching, so you would have to move WHERE inside it - however, this is not easy because you might have many columns and you cannot ORDER BY all of those in the same select statement.
Therefore, if you compare views and functions for the same task of giving a "view" over data, then VIEWs are a better choice.
BUT, functions can do much more. You can do multiple queries without needing to join tables with JOINS or UNIONs. You can do some complex calculations with the results, run additional queries and output data to the user. Functions are more like stored procedures that are able to return datasets, then they are like views.
Nobody seems to have mentioned this aspect.
You can't have Update statements in an inline function but you can write Update statements against them just as though they were an updatable view.
One big difference is that a function can take parameters whereas a VIEW cannot.
I tend to favour VIEWs, being a Standard and therefore portable implementation. I use functions when the equivalent VIEW would be meaningless without a WHERE clause.
For example, I have a function that queries a relatively large valid-time state table ('history' table). If this was a VIEW and you tried to query it without a WHERE clause you'd get a whole lot of fairly data (eventually!) Using a function establishes a contract that if you want the data then you must supply a customer ID, a start date and an end date and the function is how I establish this contact. Why not a stored proc? Well, I expect a user to want to JOIN the resultset to further data (tables, VIEWs, functions, etc) and a function is IMO the best way of doing this rather then, say, requiring the user to write the resultset to a temporary table.
Answering your question about updates in a function (msdn):
The only changes that can be made by
the statements in the function are
changes to objects local to the
function, such as local cursors or
variables. Modifications to database
tables, operations on cursors that are
not local to the function, sending
e-mail, attempting a catalog
modification, and generating a result
set that is returned to the user are
examples of actions that cannot be
performed in a function.
A view is a "view" of data that is returned from a query, almost a pseudo-table. A function returns a value/table usually derived from querying the data. You can run any sql statement in a function provided the function eventually returns a value/table.
A function allows you to pass in parameters to create a more specific view. Lets say you wanted to have grab customers based on state. A function would allow you to pass in the state you are looking for and give you all the customers by that state. A view can't do that.
A function performs a task, or many tasks. A view retrieves data via a query. What ever fits in that query is what you are limited too. In a function I can update, select, create table variables, delete some data, send an email, interact with a CLR that I create, etc. Way more powerful than a lowly view!

Best way to use a Scalar UDF that needs to appear in the Select and Where clauses

I have an expensive scalar UDF that I need to include in a select statement and use that value to narrow the results in the where clause. The UDF takes parameters from the current row so I can't just store it in a var and select from that.
Running the UDF twice per row just feels wrong:
Select someField,
someOtherField,
dbo.MyExpensiveScalarUDF(someField, someOtherField)
from someTable
where dbo.MyExpensiveScalarUDF(someField, someOtherField) in (aHandfulOfValues)
How do I clean this up so that the function only runs once per row?
Just because you happen to mention the function twice doesn't mean it will be computed twice per row. With luck, the query optimizer will computed it only once per row. Whether it does or not may depend in part on whether the UDF appears to be deterministic or nondeterministic.
Take a look at the estimated execution plan. Maybe you'll find out you're worrying about nothing.
If it's computed twice, you could try this and see if it changes the plan, but it's still no guarantee:
WITH T(someField,someOtherField,expensiveResult) as (
select someField, someOtherField, dbo.MyExpensiveScalarUDF(someField, someOtherField)
from someTable
)
select * from T
where expensiveResult in (thisVal,thatVal,theotherVal);
Steve is correct - the query plan will probably not re-evaluate identical expressions if the UDF is deterministic.
However, repeating yourself is a potential maintenance problem:
WITH temp AS (
Select someField,
someOtherField,
dbo.MyExpensiveScalarUDF(someField, someOtherField) AS scalar
from someTable
)
SELECT *
FROM temp
where scalar in (aHandfulOfValues)
You can avoid it with a CTE or nested query.
Scalar UDFs are best to be avoided if at all possible for rowsets of any significant size (say a half million evaluations). If you expand it inline here (and with a CTE you won't have to repeat yourself), you'll probably find a huge performance boost. Scalar UDFs should be a last resort. In my experience you're far better off using a persisted computed column, or inline or just about any other technique before relying on a scalar UDF.
I'd need a lot more detail before I could address this specific question, but two general ideas hit me right off:
(1) Can you make it a table-based function, join it in the FROM clause, and work from there?
(2) Look into the OUTER APPLY and CROSS APPLY join clauses. Essentially, they allow joins on table-based functions, where the parameters passed to the function are based on the row being joined (as opposed to a single call). Good examples of this are in BOL.