Order By In a SQL Table Valued Function - sql

I've read about this problem on a few different sites, but I still don't understand the solution.
From what I understand, SQL will optimize the query in the function and sometimes the Order By clause will be ignored. How can you sort results?
How can I sort results in a simple table valued function like this?
Create function [dbo].fTest
--Input Parameters
(#competitionID int)
--Returns a table
RETURNS #table TABLE (CompetitionID int )
as
BEGIN
Insert Into #table (CompetitionID)
select CompetitionID from Competition order by CompetitionID desc
RETURN
END
UPDATE
I found inserting a primary key identity field seems to help (as mentioned in the answer posted Martin Smith). Is this a good solution?
--Returns a table
RETURNS #table TABLE
(
SortID int IDENTITY(1,1) PRIMARY KEY,
CompetitionID int
)
In reference to Martin's answer below, sorting outside of the select statement isn't that easy in my situation. My posted example is a stripped down version, but my real-life issue involves a more complicated order by case clause for custom sorting. In addition to that, I'm calling this function in an MVC controller with a LINQ query, which means that custom sorting would have to be added to the LINQ query. That's beyond my ability at this point.
If adding the identity field is a safe solution, I'm happy to go with that. It's simple and easy.

The order by needs to be in the statement that selects from the function.
SELECT CompetitionId
FROM [dbo].fTest()
ORDER BY CompetitionId
This is the only way to get reliable results that are assured to not suddenly break in the future.

You can duplicate your result table (declare a table var #X and #ret_X).
Then perform your actions on the #X table and make the following statement as last statement in your function.
insert into #ret_X
select top 10000 * from #X
order by (column_of_choise) desc
This gives me the sorting I want.

Best way is to return your data from the back end and do the sorting Using a linq query in you c sharp code

Related

SQL Function in column running slow

I have a computed column(function) that is causing one of my tables to be extremely slow (its output is a column in my table. I thought it might be some logical statements in my function. I commented those out and just returned a string called 'test'. This still caused the table to be slow. I believe the SELECT statement is slowing down the function. When I comment out the select statement, everything is cherry. I think I am not using functions in the correct manner.
FUNCTION [dbo].[Pend_Type](#Suspense_ID int, #Loan_ID nvarchar(10),#Suspense_Date datetime, #Investor nvarchar(10))
RETURNS nvarchar(20)
AS
BEGIN
DECLARE #Closing_Date Datetime, #Paid_Date Datetime
DECLARE #pendtype nvarchar(20)
--This is the issue!!!!
SELECT #Closing_Date = Date_Closing, #Paid_Date = Date_Paid from TABLE where Loan_ID = #Loan_ID
SET #pendtype = 'test'
--commented out logic
RETURN #pendtype
END
UPDATE:
I have another computed column that does something similar and is a column in the same table. This one runs fast. Anyone see a difference in why this would be?
Declare #yOrn AS nvarchar(1)
IF((Select count(suspense_ID) From TABLE where suspense_ID = #suspenseID) = 0)
SET #yOrn = 'N'
ELSE
SET #yOrn = 'Y'
RETURN #yOrn
You have isolated the performance problem in the select statement:
SELECT TOP 1 #Closing_Date = Date_Closing, #Paid_Date = Date_Paid
from TABLE
where Loan_ID = #Loan_ID;
To make this run faster, create a composite index on table(Load_id, Date_Closing, Date_Paid).
By the way, you are using top with no order by. When multiple rows match, you can get any one of them back. Normally, top is used with order by.
EDIT:
You can create the index by issuing the following command:
create index idx_table_load_closing_paid on table(Load_id, Date_Closing, Date_Paid);
Scalar functions are often executed like cursors, one row at a time; that is why they are slow and are to be avoided. I would not use the function as written but would write a set-based version instead. incidentally a select top 1 without an order by column will not always give you the same record and is generally a poor practice. In this case I would think you would want the latest date for instance or the earliest one.
In this particular case I think you would be better off not using a function but using a derived table join.

Optimizing stored procedure with multiple "LIKE"s

I am passing in a comma-delimited list of values that I need to compare to the database
Here is an example of the values I'm passing in:
#orgList = "1123, 223%, 54%"
To use the wildcard I think I have to do LIKE but the query runs a long time and only returns 14 rows (the results are correct, but it's just taking forever, probably because I'm using the join incorrectly)
Can I make it better?
This is what I do now:
declare #tempTable Table (SearchOrg nvarchar(max) )
insert into #tempTable
select * from dbo.udf_split(#orgList) as split
-- this splits the values at the comma and puts them in a temp table
-- then I do a join on the main table and the temp table to do a like on it....
-- but I think it's not right because it's too long.
select something
from maintable gt
join #tempTable tt on gt.org like tt.SearchOrg
where
AYEAR= ISNULL(#year, ayear)
and (AYEAR >= ISNULL(#yearR1, ayear) and ayear <= ISNULL(#yearr2, ayear))
and adate = ISNULL(#Date, adate)
and (adate >= ISNULL(#dateR1, adate) and adate <= ISNULL(#DateR2 , adate))
The final result would be all rows where the maintable.org is 1123, or starts with 223 or starts with 554
The reason for my date craziness is because sometimes the stored procedure only checks for a year, sometimes for a year range, sometimes for a specific date and sometimes for a date range... everything that's not used in passed in as null.
Maybe the problem is there?
Try something like this:
Declare #tempTable Table
(
-- Since the column is a varchar(10), you don't want to use nvarchar here.
SearchOrg varchar(20)
);
INSERT INTO #tempTable
SELECT * FROM dbo.udf_split(#orgList);
SELECT
something
FROM
maintable gt
WHERE
some where statements go here
And
Exists
(
SELECT 1
FROM #tempTable tt
WHERE gt.org Like tt.SearchOrg
)
Such a dynamic query with optional filters and LIKE driven by a table (!) are very hard to optimize because almost nothing is statically known. The optimizer has to create a very general plan.
You can do two things to speed this up by orders of magnitute:
Play with OPTION (RECOMPILE). If the compile times are acceptable this will at least deal with all the optional filters (but not with the LIKE table).
Do code generation and EXEC sp_executesql the code. Build a query with all LIKE clauses inlined into the SQL so that it looks like this: WHERE a LIKE #like0 OR a LIKE #like1 ... (not sure if you need OR or AND). This allows the optimizer to get rid of the join and just execute a normal predicate).
Your query may be difficult to optimize. Part of the question is what is in the where clause. You probably want to filter these first, and then do the join using like. Or, you can try to make the join faster, and then do a full table scan on the results.
SQL Server should optimize a like statement of the form 'abc%' -- that is, where the wildcard is at the end. (See here, for example.) So, you can start with an index on maintable.org. Fortunately, your examples meet this criteria. However, if you have '%abc' -- the wildcard comes first -- then the optimization won't work.
For the index to work best, it might also need to take into account the conditions in the where clause. In other words, adding the index is suggestive, but the rest of the query may preclude the use of the index.
And, let me add, the best solution for these types of searches is to use the full text search capability in SQL Server (see here).

SQL - ORDER BY running first

Please have a look at this database schema:
create table Person (id int not null identity,
[index] varchar(30),
datecreated datetime,
groupid int)
create table [Group] (id int identity not null, description varchar(30))
Sample data:
insert into Person ([index],datecreated,groupid) values ('4,5,6','2011-01-01',1)
insert into Person ([index],datecreated,groupid) values ('1,2,3','2011-02-02',1)
insert into Person ([index],datecreated,groupid) values ('7,8','2012-02-02',2)
insert into [Group] (description) values ('TestGroup')
insert into [Group] (description) values ('TestGroup2')
Please have a look at the SQL statement below:
select *
from Person
inner join [Group] on Person.groupid = [group].id
where [group].description = 'TestGroup'
order by
left(substring([index], charindex(',', [index]) + 1, 200),
charindex(',', substring([index], charindex(',', [index]) + 1, 200)) - 1)
This SQL statement fails with the following error:
Invalid length parameter passed to the SUBSTRING function.
It is the order by clause that is causing this error i.e. it is trying to find the third element of the index column but the third element does not exist on row 3 (there are only two elements).
However, I would expect the [group].description = 'TestGroup' to filter out record three. This does not appear to be the case. It is as if the order by clause is being run before the where clause. If you exclude the order by clause from the query, then the query runs.
Why is this?
Evaluation order in SQL has very weak guaranteed. Probably the sort is performed first, then a stream aggregate. Nothing wrong with that by it self.
You cannot rely on execution order in general. Except in a case-expression which you can use to create a dummy value like NULL in your order by if the input for SUBSTRING would be invalid. Case is the only way to enforce evaluation order.
This ORDER BY is pretty brutal. I would suggest breaking this into a couple of queries, using a temp-table or table sub-expression, so you can do your filtering first, and/or create a column containing the data to sort by.
Remember, SQL is a declarative language, not a procedural language. That is, you describe the result sets that you want. You depend on the SQL compiler/optimizer to set up the execution plan.
Very typically, a SQL engine will have a component that reads the data from the table and does all the calculations that are needed for that data. Of course, this includes calculations in the SELECT clause, but also calculations in "ON" clauses, "WHERE" clauses, and "ORDER BY" clauses.
The engine can then do the filtering after reading the data. This enables the engine to readily use computed values for the filtering.
I am not saying that all databases work this way. What I am saying is that there is no guarantee of the order of operations in a SQL statement. This situation is one of the cases where doing things in the wrong order results in an error, which prevents the SQL from completing. Do you want help rewriting the query so it doesn't get the error?

How to structure a query with a large, complex where clause?

I have an SQL query that takes these parameters:
#SearchFor nvarchar(200) = null
,#SearchInLat Decimal(18,15) = null
,#SearchInLng Decimal(18,15) = null
,#SearchActivity int = null
,#SearchOffers bit = null
,#StartRow int
,#EndRow int
The variables #SearchFor, #SearchActivity, #SearchOffers can be either null or not null. #SearchInLat and #SearchInLng must both null, or both have values.
I'm not going to post the whole query as its boring and hard to read, but the WHERE clause is shaped like this:
( -- filter by activity --
(#SearchActivity IS NULL)
OR (#SearchActivity = Activities.ActivityID)
)
AND ( -- filter by Location --
(#SearchInLat is NULL AND #SearchInLng is NULL)
OR ( ... )
)
AND ( -- filter by activity --
#SearchActivity is NULL
OR ( ... )
)
AND ( -- filter by has offers --
#SearchOffers is NULL
OR ( ... )
)
AND (
... -- more stuff
)
I have read that this is a bad way to structure a query - that SqlServer has trouble working out an efficient execution plan with lots of clauses like this, so I'm looking for other ways to do it.
I see two ways of doing this:
Construct the query as a string in my client application, so that the WHERE clause only contains filters for the relevant parameters. The problem with this is it means not accessing the database through stored procedures, as everything else is at the moment.
Change the stored procedure so that it examines which arguments are null, and executes child procedures depending on which arguments it is passed. The problem here is that it would mean repeating myself a lot in the definition of the procs, and thus be harder to maintain.
What should I do? Or should I just keep on as I am currently doing? I have OPTION (RECOMPILE) set for the procedures, but I've heard that this doesn't work right in Server 2005. Also, I plan to add more parameters to this proc, so I want to make sure whatever solution I have is fairly scaleable.
The answer is to use DynamicSQL (be it in the client, or in an SP using sp_executesql), but the reason why is long, so here's a link...
Dynamic Search Conditions in T-SQL
A very short version is that one-size does not fit all. And as the optimiser creates one plan for one query, it's slow. So the solution is to continue using parameterised queries (for execution plan caching), but to have many queries, for the different types of search that can happen.
Perhaps an alternative might be to perform several separate select statements?
e.g.
( -- filter by activity --
if #SearchActivity is not null
insert into tmpTable (<columns>)
select *
from myTable
where (#SearchActivity = Activities.ActivityID)
)
( -- filter by Location --
if #SearchInLat is not null and #SearchInLng is not null
insert into tmpTable (<columns>)
select *
from myTable
where (latCol = #SearchInLat AND lngCol = #SearchInLng)
etc...
then select the temp table to return the final result set.
I'm not sure how this would work with respect to the optimiser and the query plans, but each individual select would be very straightforward and could utilise the indexes that you would have created on each column which should make them very quick.
Depending on your requirements it also may make sense to create a primary key on the temp table to allow you to join to it on each select (to avoid duplicates).
Look at the performance first, like others have said.
If possible, you can use IF clauses to simplify the queries based on what parameters are provided.
You could also use functions or views to encapsulate some of the code if you find you are repeating it often.

Conditional Joins - Dynamic SQL

The DBA here at work is trying to turn my straightforward stored procs into a dynamic sql monstrosity. Admittedly, my stored procedure might not be as fast as they'd like, but I can't help but believe there's an adequate way to do what is basically a conditional join.
Here's an example of my stored proc:
SELECT
*
FROM
table
WHERE
(
#Filter IS NULL OR table.FilterField IN
(SELECT Value FROM dbo.udfGetTableFromStringList(#Filter, ','))
)
The UDF turns a comma delimited list of filters (for example, bank names) into a table.
Obviously, having the filter condition in the where clause isn't ideal. Any suggestions of a better way to conditionally join based on a stored proc parameter are welcome. Outside of that, does anyone have any suggestions for or against the dynamic sql approach?
Thanks
You could INNER JOIN on the table returned from the UDF instead of using it in an IN clause
Your UDF might be something like
CREATE FUNCTION [dbo].[csl_to_table] (#list varchar(8000) )
RETURNS #list_table TABLE ([id] INT)
AS
BEGIN
DECLARE #index INT,
#start_index INT,
#id INT
SELECT #index = 1
SELECT #start_index = 1
WHILE #index <= DATALENGTH(#list)
BEGIN
IF SUBSTRING(#list,#index,1) = ','
BEGIN
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
SELECT #start_index = #index + 1
END
SELECT #index = #index + 1
END
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
RETURN
END
and then INNER JOIN on the ids in the returned table. This UDF assumes that you're passing in INTs in your comma separated list
EDIT:
In order to handle a null or no value being passed in for #filter, the most straightforward way that I can see would be to execute a different query within the sproc based on the #filter value. I'm not certain how this affects the cached execution plan (will update if someone can confirm) or if the end result would be faster than your original sproc, I think that the answer here would lie in testing.
Looks like the rewrite of the code is being addressed in another answer, but a good argument against dynamic SQL in a stored procedure is that it breaks the ownership chain.
That is, when you call a stored procedure normally, it executes under the permissions of the stored procedure owner EXCEPT when executing dynamic SQL with the execute command,for the context of the dynamic SQL it reverts back to the permissions of the caller, which may be undesirable depending on your security model.
In the end, you are probably better off compromising and rewriting it to address the concerns of the DBA while avoiding dynamic SQL.
I am not sure I understand your aversion to dynamic SQL. Perhaps it is that your UDF has nicely abstracted away some of the messyness of the problem, and you feel dynamic SQL will bring that back. Well, consider that most if not all DAL or ORM tools will rely extensively on dynamic SQL, and I think your problem could be restated as "how can I nicely abstract away the messyness of dynamic SQL".
For my part, dynamic SQL gives me exactly the query I want, and subsequently the performance and behavior I am looking for.
I don't see anything wrong with your approach. Rewriting it to use dynamic SQL to execute two different queries based on whether #Filter is null seems silly to me, honestly.
The only potential downside I can see of what you have is that it could cause some difficulty in determining a good execution plan. But if the performance is good enough as it is, there's no reason to change it.
No matter what you do (and the answers here all have good points), be sure to compare the performance and execution plans of each option.
Sometimes, hand optimization is simply pointless if it impacts your code maintainability and really produces no difference in how the code executes.
I would first simply look at changing the IN to a simple LEFT JOIN with NULL check (this doesn't get rid of your udf, but it should only get called once):
SELECT *
FROM table
LEFT JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
WHERE #Filter IS NULL
OR filter.Value IS NOT NULL
It appears that you are trying to write a a single query to deal with two scenarios:
1. #filter = "x,y,z"
2. #filter IS NULL
To optimise scenario 2, I would INNER JOIN on the UDF, rather than use an IN clause...
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
To optimise for scenario 2, I would NOT try to adapt the existing query, instead I would deliberately keep those cases separate, either an IF statement or a UNION and simulate the IF with a WHERE clause...
TSQL IF
IF (#filter IS NULL)
SELECT * FROM table
ELSE
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION to Simulate IF
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION ALL
SELECT * FROM table WHERE #filter IS NULL
The advantage of such designs is that each case is simple, and determining which is simple is it self simple. Combining the two into a single query, however, leads to compromises such as LEFT JOINs and so introduces significant performance loss to each.