SQL Function in column running slow - sql

I have a computed column(function) that is causing one of my tables to be extremely slow (its output is a column in my table. I thought it might be some logical statements in my function. I commented those out and just returned a string called 'test'. This still caused the table to be slow. I believe the SELECT statement is slowing down the function. When I comment out the select statement, everything is cherry. I think I am not using functions in the correct manner.
FUNCTION [dbo].[Pend_Type](#Suspense_ID int, #Loan_ID nvarchar(10),#Suspense_Date datetime, #Investor nvarchar(10))
RETURNS nvarchar(20)
AS
BEGIN
DECLARE #Closing_Date Datetime, #Paid_Date Datetime
DECLARE #pendtype nvarchar(20)
--This is the issue!!!!
SELECT #Closing_Date = Date_Closing, #Paid_Date = Date_Paid from TABLE where Loan_ID = #Loan_ID
SET #pendtype = 'test'
--commented out logic
RETURN #pendtype
END
UPDATE:
I have another computed column that does something similar and is a column in the same table. This one runs fast. Anyone see a difference in why this would be?
Declare #yOrn AS nvarchar(1)
IF((Select count(suspense_ID) From TABLE where suspense_ID = #suspenseID) = 0)
SET #yOrn = 'N'
ELSE
SET #yOrn = 'Y'
RETURN #yOrn

You have isolated the performance problem in the select statement:
SELECT TOP 1 #Closing_Date = Date_Closing, #Paid_Date = Date_Paid
from TABLE
where Loan_ID = #Loan_ID;
To make this run faster, create a composite index on table(Load_id, Date_Closing, Date_Paid).
By the way, you are using top with no order by. When multiple rows match, you can get any one of them back. Normally, top is used with order by.
EDIT:
You can create the index by issuing the following command:
create index idx_table_load_closing_paid on table(Load_id, Date_Closing, Date_Paid);

Scalar functions are often executed like cursors, one row at a time; that is why they are slow and are to be avoided. I would not use the function as written but would write a set-based version instead. incidentally a select top 1 without an order by column will not always give you the same record and is generally a poor practice. In this case I would think you would want the latest date for instance or the earliest one.
In this particular case I think you would be better off not using a function but using a derived table join.

Related

T-Sql - Select query in another select query takes long time

I have a procedure with arguments but its calling takes a very long time. I decided to check what is wrong with my query and came to the conclusion that the problem is Column In (SELECT [...]).
Both queries return 1500 rows.
First query: time 45 second
Second query: time 0 second
1.
declare #FILTER_OPTION int
declare #ID_DISTRIBUTOR type_int_value
declare #ID_DATA_TYPE type_bigint_value
declare #ID_AGGREGATION_TYPE type_int_value
set #FILTER_OPTION = 8
insert into #ID_DISTRIBUTOR values (19)
insert into #ID_DATA_TYPE values (30025)
insert into #ID_AGGREGATION_TYPE values (10)
SELECT * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (select [VALUE] from #ID_DISTRIBUTOR)
AND [ID_DATA_TYPE] IN (select [VALUE] from #ID_DATA_TYPE)
AND [ID_AGGREGATION_TYPE] IN (select [VALUE] from #ID_AGGREGATION_TYPE)
2.
select * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (19)
AND [ID_DATA_TYPE] IN (30025)
AND [ID_AGGREGATION_TYPE] IN (10)
Why this is happening?
How should I create a stored procedure that takes an array of arguments to use it quickly?
Edit:
Maybe it's a problem with indexes? indexes are created on these three columns.
For such a large performance difference, I would guess that you have one or more indexes. In particular, if you have an index on (ID_DISTRIBUTOR, ID_DATA_TYPE, ID_AGGREGATION_TYPE), then the second query can make use of the index. SQL Server can recognize that the IN is really = and the query is a simple lookup.
In the first case, SQL Server doesn't "know" that the subqueries really have only one row in them. That requires a different set of optimizations. In particular, the above index cannot be used, because the IN generally optimizes differently from =.
As for what to do. First, look at the execution plans so you can see the different between the two versions. Then, test the second version with more than one value in the IN lists.
If you can live with just one value for each comparison, then use = rather than IN.

Optimizing stored procedure with multiple "LIKE"s

I am passing in a comma-delimited list of values that I need to compare to the database
Here is an example of the values I'm passing in:
#orgList = "1123, 223%, 54%"
To use the wildcard I think I have to do LIKE but the query runs a long time and only returns 14 rows (the results are correct, but it's just taking forever, probably because I'm using the join incorrectly)
Can I make it better?
This is what I do now:
declare #tempTable Table (SearchOrg nvarchar(max) )
insert into #tempTable
select * from dbo.udf_split(#orgList) as split
-- this splits the values at the comma and puts them in a temp table
-- then I do a join on the main table and the temp table to do a like on it....
-- but I think it's not right because it's too long.
select something
from maintable gt
join #tempTable tt on gt.org like tt.SearchOrg
where
AYEAR= ISNULL(#year, ayear)
and (AYEAR >= ISNULL(#yearR1, ayear) and ayear <= ISNULL(#yearr2, ayear))
and adate = ISNULL(#Date, adate)
and (adate >= ISNULL(#dateR1, adate) and adate <= ISNULL(#DateR2 , adate))
The final result would be all rows where the maintable.org is 1123, or starts with 223 or starts with 554
The reason for my date craziness is because sometimes the stored procedure only checks for a year, sometimes for a year range, sometimes for a specific date and sometimes for a date range... everything that's not used in passed in as null.
Maybe the problem is there?
Try something like this:
Declare #tempTable Table
(
-- Since the column is a varchar(10), you don't want to use nvarchar here.
SearchOrg varchar(20)
);
INSERT INTO #tempTable
SELECT * FROM dbo.udf_split(#orgList);
SELECT
something
FROM
maintable gt
WHERE
some where statements go here
And
Exists
(
SELECT 1
FROM #tempTable tt
WHERE gt.org Like tt.SearchOrg
)
Such a dynamic query with optional filters and LIKE driven by a table (!) are very hard to optimize because almost nothing is statically known. The optimizer has to create a very general plan.
You can do two things to speed this up by orders of magnitute:
Play with OPTION (RECOMPILE). If the compile times are acceptable this will at least deal with all the optional filters (but not with the LIKE table).
Do code generation and EXEC sp_executesql the code. Build a query with all LIKE clauses inlined into the SQL so that it looks like this: WHERE a LIKE #like0 OR a LIKE #like1 ... (not sure if you need OR or AND). This allows the optimizer to get rid of the join and just execute a normal predicate).
Your query may be difficult to optimize. Part of the question is what is in the where clause. You probably want to filter these first, and then do the join using like. Or, you can try to make the join faster, and then do a full table scan on the results.
SQL Server should optimize a like statement of the form 'abc%' -- that is, where the wildcard is at the end. (See here, for example.) So, you can start with an index on maintable.org. Fortunately, your examples meet this criteria. However, if you have '%abc' -- the wildcard comes first -- then the optimization won't work.
For the index to work best, it might also need to take into account the conditions in the where clause. In other words, adding the index is suggestive, but the rest of the query may preclude the use of the index.
And, let me add, the best solution for these types of searches is to use the full text search capability in SQL Server (see here).

On SQL Server (2008), if I want to filter a string field that starts with something, what is the best way?

On several SQL queries I need to check if a field starts with a character.
There are several ways to do it, which one is better in performance/standard?
I usually use
tb.field LIKE 'C%'
but I can also use
LEFT(LTRIM(tb.Field),1) = 'C'
I know well the uses of each case, but not in terms of performance.
I'd go with the first one LIKE C%, it'll use an index on the field if there is one rather than having to do a full table scan.
If you really need to include the whitespace LTRIM trimming in the query, you could create a persisted computed column with the value LEFT(LTRIM(tb.Field), 1) and put an index on it.
LIKE 'C%' is going to perform better than a LEFT(LTRIM()).
The LIKE predicate can still use a supporting index to get at the data you're looking for. I
However, when SQL Server encounters LEFT(LTRIM(tb.Field), 1) = 'C', the database can't determine what you mean. In order to perform a match, SQL Server must scan every row, LTRIM the data and then examine the first character. The end result is, most likely, a full table scan.
The first query is just a bit than the other. I've measured it with my query speed measurement script.
Try it yourself:
DECLARE #Measurements TABLE(
MeasuredTime INT NOT NULL
)
DECLARE #ExecutionTime INT
DECLARE #TimesMeasured INT
SET #TimesMeasured = 0
WHILE #TimesMeasured < 1000
BEGIN
DECLARE #StartTime DATETIME
SET #StartTime = GETDATE()
-- your select .. or what every query
INSERT INTO #Measurements
SELECT DATEDIFF(millisecond, #StartTime, getdate())
SET #TimesMeasured = #TimesMeasured + 1
END
SELECT #AvgTime = AVG(MeasuredTime) FROM #Measurements

Union or Use Flow Control Logic to Determine which table to report on

I'm working with a 3rd pary app where I can't alter the tables. We built custom matching "Monthly" tables with an additional datetime column "AsOfDate" where we dump the data at the end of the month and flag those data with the date of last day of the month.
I want to be able to create a single Stored Procedure (Application is designed to require a view or stored proc as the source of all reports.) and use a parameter which will either use the current data table (Parameter may be NULL or = Today's Date) or use the month end table and filter by the End of the month date. This way, I have one report where the user can either use current or data from a particular month end period.
Which would you prefer (And why) Sorry, this is not fully coded
Solution #1 Union Query
Create Proc Balance_Report (#AsOfDate)
AS
Select Column1
From
(Select GetDate() as AsOfDate
, Column1
From Current.Balance
Union
Select AsOfDate
, Column1 From MonthEnd.Balance
) AS All_Balances
Where All_Balances.AsOfDate = #AsOfDate
Solution #2 Use If Statement to select table
Create Proc Balance_Report (#AsOfDate)
AS
If #AsOfDate IS NULL or #AsOfDate = GetDate()
Select GetDate() as AsOfDate
, Column1
From Current.Balance
Else
Select AsOfDate
, Column1 From MonthEnd.Balance
Where AsOfDate = #AsOfDate
Again, this is not fully coded and is sort of db agnostic (But it is SQL Server 2005).
Edit: Variation to Solution #2 using separate stored procedures
Create Proc Balance_Report (#AsOfDate)
AS
If #AsOfDate IS NULL or #AsOfDate = GetDate()
Exec Current_Balance_Date -- no param necessary
Else
exec MonthEnd_Balance_Date #AsOfDate
How you have things set up, the second method will probably be faster. If you were to use a partitioned view then you could set up constraints in such a way that the optimized would know to ignore one or more of the tables in the select and you would get the same performance. This would also let you keep all of your logic in one statement rather than having to keep two statements in sync. That may or may not be an issue for you based on how complex the SELECT statement is.
One thing to remember though, is that if you use the second method, be sure to mark your stored procedure as WITH (RECOMPILE) (I can't remember if the parentheses are require or not - check the syntax). That way the optimizer will create a new query plan based on which branch of the IF statement needs to be executed.
I prefer the non-union solution. Selecting from a single table will always be faster than doing a union and selecting a single table's data from the union.

Conditional Joins - Dynamic SQL

The DBA here at work is trying to turn my straightforward stored procs into a dynamic sql monstrosity. Admittedly, my stored procedure might not be as fast as they'd like, but I can't help but believe there's an adequate way to do what is basically a conditional join.
Here's an example of my stored proc:
SELECT
*
FROM
table
WHERE
(
#Filter IS NULL OR table.FilterField IN
(SELECT Value FROM dbo.udfGetTableFromStringList(#Filter, ','))
)
The UDF turns a comma delimited list of filters (for example, bank names) into a table.
Obviously, having the filter condition in the where clause isn't ideal. Any suggestions of a better way to conditionally join based on a stored proc parameter are welcome. Outside of that, does anyone have any suggestions for or against the dynamic sql approach?
Thanks
You could INNER JOIN on the table returned from the UDF instead of using it in an IN clause
Your UDF might be something like
CREATE FUNCTION [dbo].[csl_to_table] (#list varchar(8000) )
RETURNS #list_table TABLE ([id] INT)
AS
BEGIN
DECLARE #index INT,
#start_index INT,
#id INT
SELECT #index = 1
SELECT #start_index = 1
WHILE #index <= DATALENGTH(#list)
BEGIN
IF SUBSTRING(#list,#index,1) = ','
BEGIN
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
SELECT #start_index = #index + 1
END
SELECT #index = #index + 1
END
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
RETURN
END
and then INNER JOIN on the ids in the returned table. This UDF assumes that you're passing in INTs in your comma separated list
EDIT:
In order to handle a null or no value being passed in for #filter, the most straightforward way that I can see would be to execute a different query within the sproc based on the #filter value. I'm not certain how this affects the cached execution plan (will update if someone can confirm) or if the end result would be faster than your original sproc, I think that the answer here would lie in testing.
Looks like the rewrite of the code is being addressed in another answer, but a good argument against dynamic SQL in a stored procedure is that it breaks the ownership chain.
That is, when you call a stored procedure normally, it executes under the permissions of the stored procedure owner EXCEPT when executing dynamic SQL with the execute command,for the context of the dynamic SQL it reverts back to the permissions of the caller, which may be undesirable depending on your security model.
In the end, you are probably better off compromising and rewriting it to address the concerns of the DBA while avoiding dynamic SQL.
I am not sure I understand your aversion to dynamic SQL. Perhaps it is that your UDF has nicely abstracted away some of the messyness of the problem, and you feel dynamic SQL will bring that back. Well, consider that most if not all DAL or ORM tools will rely extensively on dynamic SQL, and I think your problem could be restated as "how can I nicely abstract away the messyness of dynamic SQL".
For my part, dynamic SQL gives me exactly the query I want, and subsequently the performance and behavior I am looking for.
I don't see anything wrong with your approach. Rewriting it to use dynamic SQL to execute two different queries based on whether #Filter is null seems silly to me, honestly.
The only potential downside I can see of what you have is that it could cause some difficulty in determining a good execution plan. But if the performance is good enough as it is, there's no reason to change it.
No matter what you do (and the answers here all have good points), be sure to compare the performance and execution plans of each option.
Sometimes, hand optimization is simply pointless if it impacts your code maintainability and really produces no difference in how the code executes.
I would first simply look at changing the IN to a simple LEFT JOIN with NULL check (this doesn't get rid of your udf, but it should only get called once):
SELECT *
FROM table
LEFT JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
WHERE #Filter IS NULL
OR filter.Value IS NOT NULL
It appears that you are trying to write a a single query to deal with two scenarios:
1. #filter = "x,y,z"
2. #filter IS NULL
To optimise scenario 2, I would INNER JOIN on the UDF, rather than use an IN clause...
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
To optimise for scenario 2, I would NOT try to adapt the existing query, instead I would deliberately keep those cases separate, either an IF statement or a UNION and simulate the IF with a WHERE clause...
TSQL IF
IF (#filter IS NULL)
SELECT * FROM table
ELSE
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION to Simulate IF
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION ALL
SELECT * FROM table WHERE #filter IS NULL
The advantage of such designs is that each case is simple, and determining which is simple is it self simple. Combining the two into a single query, however, leads to compromises such as LEFT JOINs and so introduces significant performance loss to each.