Performance improvement to a big if clause in SQL Server function - sql

I am maintaining a function in SQL Server 2005, that based on an integer input parameter needs to call different functions e.g.
IF #rule_id = 1
-- execute function 1
ELSE IF #rule_id = 2
-- execute function 2
ELSE IF #rule_id = 3
... etc
The problem is that there are a fair few rules (about 100), and although the above is fairly readable, its performance isn't great. At the moment it's implemented as a series of IF's that do a binary-chop, which is much faster, but becomes fairly unpleasant to read and maintain. Any alternative ideas for something that performs well and is fairly maintainable?

I would suggest you generate the code programatically, eg. via XML+XSLT. the resulted T-SQL will be the same as you have now, but maintaining it would be much easier (adding/removing functions).
Inside a function you don't have much choice, using IFs is pretty much the only solution. You can't do dynamic SQL in functions (you can't invoke exec). If its a stored procedure, then you have much more libery as you can use dynamic SQL and have tricks like a lookup table: select #function = function from table where rule_id = #rule_id; exec sp_executesql #function;.

Can you change it so that it execs a function as a string? I'd normally recommend against this sort of dynamic sql, and there may be better ways if you step back and look at overall design... but with what is known here you may have found one of the rare exceptions when it's better.
ex:
set #functionCall = 'functionRootName' + #rule_id
exec #functionCall

Whatever is calling the SQL function - why does it not choose the function?
This seems like a poorly chosen distribution of responsibility.

Related

Is avoiding SQL statements in programs a good idea?

I recently came across a program which is developed using sql statements in a table with a code for each statement. rather than having specific sql statements in the program itself.
So, rather than having code like this:
string query = "SELECT id, name from [Users]";
cmd.ExecuteQuery(query);
They use code like this: (simplified)
string firstQuery = "SELECT queryText from [Queries] where queryCode = 'SELECT_ALL_USERS'";
string userQuery = cmd.ExecuteQuery(firstQuery);//pretend this directly returns the result of the first query
cmd.ExecuteQuery(userQuery);
The logic behind this as far as I've heard is that it makes the program easier to maintain as the developer is free to change the "user sql" without having to actually change the program.
However, this struck me as maybe a little counterproductive. Would this kind of code be considered a good idea?
EDIT: I'm not looking for suggestions like "use an ORM". Assume that sql queries are the only option.
In my opinion, this approach is ridiculous. There is value (maintainability, modularity) in separating as much SQL from the middle tier as possible, but to accomplish this end, I would recommend using stored procedures.
No i really dont think its a good idea to proceed further with design.
As a test or learning activity is a differetn part, but going foward with such implementations is definately not advisable.
pros:
1. We get complete modularity. The Real Business Schema can change at any time, and we do not need to modify the Running application to get the results from Different schema (Considering result Format dont change).
Cons.
1. With this implementation we are firing 2 SQLs to Database each time when we want to execute 1. I/O call including DB calls are always performnace hit, and with this implementation we are doubling the performance which is definately not advisable.

Database Function VS Case Statement

Yesterday we got a scenario where had to get type of a db field and on base of that we had to write the description of the field. Like
Select ( Case DB_Type When 'I' Then 'Intermediate'
When 'P' Then 'Pending'
Else 'Basic'
End)
From DB_table
I suggested to write a db function instead of this case statement because that would be more reusable. Like
Select dbo.GetTypeName(DB_Type)
from DB_table
The interesting part is, One of our developer said using database function will be inefficient as database functions are slower than Case statement. I searched over the internet to find the answer which is better approach in terms of efficiency but unfortunately I found nothing that could be considered satisfied answer. Please enlighten me with your thoughts, which approach is better?
UDF function is always slower than case statements
Please refer the article
http://blogs.msdn.com/b/sqlserverfaq/archive/2009/10/06/performance-benefits-of-using-expression-over-user-defined-functions.aspx
The following article suggests you when to use UDF
http://www.sql-server-performance.com/2005/sql-server-udfs/
Summary :
There is a large performance penalty paid when User defined functions is used.This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more. The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing. It must invoke each UDF on each row. If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows. If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing.
It's the row by row processing that slows SQL Server the most.
When using a scalar function (a function that returns one value) the contents of the function will be executed once per row but the case statement will be executed across the entire set.
By operating against the entire set you allow the server to optimise your query more efficiently.
So the theory goes that the same query run both ways against a large dataset then the function should be slower. However, the difference may be trivial when operating against your data so you should try both methods and test them to determine if any performance trade off is worth the increased utility of a function.
Your devolper is right. Functions will slow down your query.
https://sqlserverfast.com/?s=user+defined+ugly
Calling functionsis like:
wrap parts into paper
put it into a bag
carry it to the mechanics
let him unwrap, do something, wrapt then result
carry it back
use it

SQL Concatenation filling up tempDB

We are attempting to concatenate possibly thousands of rows of text in SQL with a single query. The query that we currently have looks like this:
DECLARE #concatText NVARCHAR(MAX)
SET #concatText = ''
UPDATE TOP (SELECT MAX(PageNumber) + 1 FROM #OrderedPages) [#OrderedPages]
SET #concatText = #concatText + [ColumnText] + '
'
WHERE (RTRIM(LTRIM([ColumnText])) != '')
This is working perfectly fine from a functional standpoint. The only issue we're having is that sometimes the ColumnText can be a few kilobytes in length. As a result, we're filling up tempDB when we have thousands of these rows.
The best reason that we have come up with is that as we're doing these updates to #concatText, SQL is using implicit transactions so the strings are effectively immutable.
We are trying to figure out a good way of solving this problem and so far we have two possible solutions:
1) Do the concatenation in .NET. This is an OK option, but that's a lot of data that may go back across the wire.
2) Use .WRITE which operates in a similar fashion to .NET's String.Join method. I can't figure out the syntax for this as BoL doesn't cover this level of SQL shenanigans.
This leads me to the question: Will .WRITE work? If so, what's the syntax? If not, are there any other ways to do this without sending data to .NET? We can't use FOR XML because our text may contain illegal XML characters.
Thanks in advance.
I'd look at using CLR integration, as suggested in #Martin's comment. A CLR aggregate function might be just the ticket.
What exactly is filling up tempdb? It cannot be #concatText = #concatText + [ColumnText], there is no immutability involved and the #concatText variable will be at worst case 2GB size (I expect your tempdb is much larger than that, if not increase it). It seems more like your query plan creates a spool for haloween protection and that spool is the culprit.
As a generic answer, using the UPDATE ... SET #var = #var + ... for concatenation is known to have correctness issues and is not supported. Alternative approaches that work more reliably are discussed in Concatenating Row Values in Transact-SQL.
First, from your post, it isn't clear whether or why you need temp tables. Concatenation can be done inline in a query. If you show us more about the query that is filling up tempdb, we might be able to help you rewrite it. Second, an option that hasn't been mentioned is to do the string manipulation outside of T-SQL entirely. I.e., in your middle-tier query for the raw data, do the manipulation and push it back to the database. Lastly, you can use Xml such that the results handle escapes and entities properly. Again, we'd need to know more about what and how you are trying to accomplish.
Agreed..A CLR User Defined Function would be the best approach for what you guys are doing. You could actually read the text values into an object and then join them all together (inside the CLR) and have the function spit out a NVARCHAR(MAX) result. If you need details on how to do this let me know.

Why a simple T-SQL UDF function makes the code execution 3 times slower

I'm rewriting some old stored procedure and I've come across an unexpected performance issue when using a function instead of inline code.
The function is very simple as follow:
ALTER FUNCTION [dbo].[GetDateDifferenceInDays]
(
#first_date SMALLDATETIME,
#second_date SMALLDATETIME
)
RETURNS INT
AS
BEGIN
RETURN ABS(DATEDIFF(DAY, #first_date, #second_date))
END
So I've got two identical queries, but one uses the function and the other does the calculation in the query itself:
ABS(DATEDIFF(DAY, [mytable].first_date, [mytable].second_date))
Now the query with the inline code runs 3 times faster than the one using the function.
What you have is a scalar UDF ( takes 0 to n parameters and returns a scalar value ). Such UDFs typically cause a row-by-row operation of your query, unless called with constant parameters, with exactly the kind of performance degradation that you're experiencing with your query.
See here, here and here for detailed explanations of the peformance pitfalls of using UDFs.
Depending on the usage context, the query optimizer may be able to analyze the inline code and figure out a great index-using query plan, while it doesn't "inline the function" for similarly detailed analysis and so ends up with an inferior query plan when the function is involved. Look at the two query plans, side by side, and you should be able to confirm (or disprove) this hypothesis pretty easily!

Sql Optimization: Xml or Delimited String

This is hopefully just a simple question involving performance optimizations when it comes to queries in Sql 2008.
I've worked for companies that use Stored Procs a lot for their ETL processes as well as some of their websites. I've seen the scenario where they need to retrieve specific records based on a finite set of key values. I've seen it handled in 3 different ways, illustrated via pseudo-code below.
Dynamic Sql that concatinates a string and executes it.
EXEC('SELECT * FROM TableX WHERE xId IN (' + #Parameter + ')'
Using a user defined function to split a delimited string into a table
SELECT * FROM TableY INNER JOIN SPLIT(#Parameter) ON yID = splitId
USING XML as the Parameter instead of a delimited varchar value
SELECT * FROM TableZ JOIN #Parameter.Nodes(xpath) AS x (y) ON ...
While I know creating the dynamic sql in the first snippet is a bad idea for a large number of reasons, my curiosity comes from the last 2 examples. Is it more proficient to do the due diligence in my code to pass such lists via XML as in snippet 3 or is it better to just delimit the values and use an udf to take care of it?
There is now a 4th option - table valued parameters, whereby you can actually pass a table of values in to a sproc as a parameter and then use that as you would normally a table variable. I'd be preferring this approach over the XML (or CSV parsing approach)
I can't quote performance figures between all the different approaches, but that's one I'd be trying - I'd recommend doing some real performance tests on them.
Edit:
A little more on TVPs. In order to pass the values in to your sproc, you just define a SqlParameter (SqlDbType.Structured) - the value of this can be set to any IEnumerable, DataTable or DbDataReader source. So presumably, you already have the list of values in a list/array of some sort - you don't need to do anything to transform it into XML or CSV.
I think this also makes the sproc clearer, simpler and more maintainable, providing a more natural way to achieve the end result. One of the main points is that SQL performs best at set based/not looping/non string manipulation activities.
That's not to say it will perform great with a large set of values passed in. But with smaller sets (up to ~1000) it should be fine.
UDF invocation is a little bit more costly than splitting the XML using the built-in function.
However, this only needs to be done once per query, so the performance difference will be negligible.