Database Function VS Case Statement - sql

Yesterday we got a scenario where had to get type of a db field and on base of that we had to write the description of the field. Like
Select ( Case DB_Type When 'I' Then 'Intermediate'
When 'P' Then 'Pending'
Else 'Basic'
End)
From DB_table
I suggested to write a db function instead of this case statement because that would be more reusable. Like
Select dbo.GetTypeName(DB_Type)
from DB_table
The interesting part is, One of our developer said using database function will be inefficient as database functions are slower than Case statement. I searched over the internet to find the answer which is better approach in terms of efficiency but unfortunately I found nothing that could be considered satisfied answer. Please enlighten me with your thoughts, which approach is better?

UDF function is always slower than case statements
Please refer the article
http://blogs.msdn.com/b/sqlserverfaq/archive/2009/10/06/performance-benefits-of-using-expression-over-user-defined-functions.aspx
The following article suggests you when to use UDF
http://www.sql-server-performance.com/2005/sql-server-udfs/
Summary :
There is a large performance penalty paid when User defined functions is used.This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more. The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing. It must invoke each UDF on each row. If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows. If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing.
It's the row by row processing that slows SQL Server the most.

When using a scalar function (a function that returns one value) the contents of the function will be executed once per row but the case statement will be executed across the entire set.
By operating against the entire set you allow the server to optimise your query more efficiently.
So the theory goes that the same query run both ways against a large dataset then the function should be slower. However, the difference may be trivial when operating against your data so you should try both methods and test them to determine if any performance trade off is worth the increased utility of a function.

Your devolper is right. Functions will slow down your query.
https://sqlserverfast.com/?s=user+defined+ugly
Calling functionsis like:
wrap parts into paper
put it into a bag
carry it to the mechanics
let him unwrap, do something, wrapt then result
carry it back
use it

Related

Performance of SQL comparison using substring vs like with wildcard

I am working on a join condition between 2 tables where one of the columns to match on is a concatentation of values. I need to join columnA from tableA to the first 2 characters of columnB from tableB.
I have developed 2 different statements to handle this and I have tried to analyze the performance of each method.
Method 1:
ON tB.columnB like tA.columnA || '%'
Method 2:
ON substr(tB.columnB,1,2) = tA.columnA
The query execution plan has a lot less steps using Method 1 compared to Method 2, however, it looks like Method 2 executes much faster. Also, the execution plan shows a recommended index for Method 2 that could improve its performance.
I am running this on an IBM iSeries, though would be interested in answers in a general sense to learn more about sql query optimization.
Does it make sense that Method 2 would execute faster?
This SO question is similar, but it looks like no one provided any concrete answers to the performance difference of these approaches: T-SQL speed comparison between LEFT() vs. LIKE operator.
PS: The table design that requires this type of join is not something that I can get changed at this time. I realize having the fields separated which hold different types of data would be preferrable.
I ran the following in the SQL Advisor in IBM Data Studio on one of the tables in my DB2 LUW 10.1 database:
SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
AND DB30_VERSION_ID = 'YYY'
AND SUBSTR(DB30_REL_TABLE_NM, 1, 4) = 'ZZZZ'
and
SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
AND DB30_VERSION_ID = 'YYY'
AND DB30_REL_TABLE_NM LIKE 'ZZZZ%'
They both had the exact same access path utilizing the same index, the same estimated IO cost and the same estimated cardinality, the only difference being the estimated total CPU cost for the LIKE was 178,343.75 while the SUBSTR was 197,518.48 (~10% difference).
The cumulative total cost for both were the same though, so this difference is negligible as per the advisor.
Yes, Method 2 would be faster. LIKE is not as efficient a function.
To compare performance of various techniques, try using Visual Explain. You will find it buried in System i Navigator. Under your system connection, expand databases, then click onyour RDB name. In the lower right pane you can then click on the option to Run an SQL Script. Enter in your SELECT statement, and choose the menu option for Visual Explain or Run and Explain. Visual explain will break down the execution plan for your statement and show you the cost for each part as estimated on your tables with the indexes available.
You can actually run with real examples in your database.
LIKE is always better at my run.
select count(*) from u_log where log_text like 'AUT%';
1 row(s) returned : 90ms taken
select count(*) from u_log where substr(log_text,1,3)='AUT';
1 row(s) returned : 493ms taken
I found this reference in an IBM redbook related to SQL performance. It sounds like the SUBSTR scalar function can be handled in an optimized manner by an iSeries.
If you search for the first character and want to use the SQE instead
of the CQE, you can use the scalar function substring on the left sign
of the equal sign. If you have to search for additional characters in
the string, you can additionally use the scalar function POSSTR. By
splitting the LIKE predicate into several scalar function, you can
affect the query optimizer to use the SQE.
http://publib-b.boulder.ibm.com/abstracts/sg246654.html?Open

Why is the load method of a datatable sometimes so slow?

The project is a web app in ASP/VB.net. The issue is that some pages are mind-boggingly slow. After trying to track down the bottleneck, it was discovered to be the load method when filling a datatable with query results.
We are using an Oracle database and queries are executed in stored procedures. As an example, we have a relatively simple select statement within a procedure which returns 2 columns with 6 rows which was determined to take about 0.015 seconds to execute. However it takes on average 7 seconds to load the OracleDataReader into a datatable - a ridiculous amount of time for such a small record set. After messing around with the query, I found that a simple decode statement appeared to be causing the issue. The decode statement is used similar to as follows:
WHERE
DECODE(iBln, 1, column1, column2) BETWEEN iDate1 and iDate2
The iBln variable is simply a number being passed in to act as a boolean variable for determining which column should be between two dates. If I comment this decode statement out and make it simply "column1 BETWEEN iDate1 and iDate2" then the load method takes no time at all as it should, signifying that it is indeed the decode statement causing issues.
So I'm just hoping to hear from anyone that could have an idea as to what's causing this or how to fix it. It's a simple decode, how does it even affect the load method anyway?
I would verify that indexes exist for column1 and column2. If so, the likely problem is that the DECODE is preventing the use of the indexes. Try rewriting as:
WHERE ( ( iBin = 1 AND column1 BETWEEN iDate1 AND iDate2)
OR
( (iBin IS NULL OR iBin <> 1) AND column2 BETWEEN iDate1 AND iDate2)
)
If your stored procedure is returning a REF CURSOR, opening the cursor in the stored procedure will be very fast regardless of the query you're executing. Opening a cursor doesn't require that Oracle do any of the work of actually running the query, it just requires that Oracle determine the query plan which should be more or less instantaneous.
How long does it take to fetch the data from the REF CURSOR in something like SQL*Plus? If it takes something close to 7 seconds (as I suspect it will), you can eliminate the OracleDataReader class as the source of the problem. In that case, the problem would almost certainly be that the query plan is inefficient.
Based on your description, my guess is that column1 is indexed. column2 may also be indexed, it's not clear. But a regular index on either column1 or column2 could not be used to evaluate the predicate that involves the call to the DECODE function. If there are no other predicates on indexed columns, that may force Oracle to do a table scan on the underlying table (posting the full query, the table definition, and the query plan would be helpful).

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?
Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.
It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.
It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.
As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same

LINQ Count .. best method

My company has just started using LINQ and I still am having a little trouble with the abstractness (if thats a word) of the LINQ command and the SQL, my question is
Dim query = (From o In data.Addresses _
Select o.Name).Count
In the above in my mind, the SQL is returning all rows and the does a count on the number rows in the IQueryable result, so I would be better with
Dim lstring = Aggregate o In data.Addresses _
Into Count()
Or am I over thinking the way LINQ works ? Using VB Express at home so I can't see the actual SQL that is being sent to the database (I think) as I don't have access to the SQL profiler
As mentioned, these are functionally equivalent, one just uses query syntax.
As mentioned in my comment, if you evaluate the following as a VB Statement(s) in LINQPad:
Dim lstring = Aggregate o In Test _
Into Count()
You get this in the generated SQL output window:
SELECT COUNT(*) AS [value]
FROM [Test] AS [t0]
Which is the same as the following VB LINQ expression as evaluated:
(From o In Test_
Select o.Symbol).Count
You get the exact same result.
I'm not familiar with Visual Basic, but based on
http://msdn.microsoft.com/en-us/library/bb546138.aspx
Those two approaches are the same. One uses method syntax and the other uses query syntax.
You can find out for sure by using SQL Profiler as the queries run.
PS - The "point" of LINQ is you can easily do query operations without leaving code/VB-land.
An important thing here, is that the code you give will work with a wide variety of data sources. It will hopefully do so in a very efficient way, though that can't be fully guaranteed. It certainly will be done in an efficient way with a SQL source (being converted into a SELECT COUNT(*) SQL query. It will be done efficiently if the source was an in-memory collection (it gets converted to calling the Count property). It isn't done very efficiently if the source is an enumerable that is not a collection (in this case it does read everything and count as it goes), but in that case there really isn't a more efficient way of doing this.
In each case it has done the same conceptual operation, in the most efficient manner possible, without you having to worry about the details. No big deal with counting, but a bigger deal in more complex cases.
To a certain extent, you are right when you say "in my mind, the SQL is returning all rows and the does a count on the number rows". Conceptually that is what is happening in that query, but the implementation may differ. Compare with how the real query in SQL may not match the literal interpretation of the SQL command, to allow the most efficient approach to be picked.
I think you are missing the point as Linq with SQL has late binding the search is done when you need it so when you say I need the count number then a Query is created.
Before that Linq for SQL creates Expression trees that will be "translated" in to SQL when you need it....
http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
http://msdn.microsoft.com/en-us/netframework/aa904594.aspx
How to debug see Scott
http://weblogs.asp.net/scottgu/archive/2007/07/31/linq-to-sql-debug-visualizer.aspx
(source: scottgu.com)

Why a simple T-SQL UDF function makes the code execution 3 times slower

I'm rewriting some old stored procedure and I've come across an unexpected performance issue when using a function instead of inline code.
The function is very simple as follow:
ALTER FUNCTION [dbo].[GetDateDifferenceInDays]
(
#first_date SMALLDATETIME,
#second_date SMALLDATETIME
)
RETURNS INT
AS
BEGIN
RETURN ABS(DATEDIFF(DAY, #first_date, #second_date))
END
So I've got two identical queries, but one uses the function and the other does the calculation in the query itself:
ABS(DATEDIFF(DAY, [mytable].first_date, [mytable].second_date))
Now the query with the inline code runs 3 times faster than the one using the function.
What you have is a scalar UDF ( takes 0 to n parameters and returns a scalar value ). Such UDFs typically cause a row-by-row operation of your query, unless called with constant parameters, with exactly the kind of performance degradation that you're experiencing with your query.
See here, here and here for detailed explanations of the peformance pitfalls of using UDFs.
Depending on the usage context, the query optimizer may be able to analyze the inline code and figure out a great index-using query plan, while it doesn't "inline the function" for similarly detailed analysis and so ends up with an inferior query plan when the function is involved. Look at the two query plans, side by side, and you should be able to confirm (or disprove) this hypothesis pretty easily!