SQL - ORDER BY running first - sql

Please have a look at this database schema:
create table Person (id int not null identity,
[index] varchar(30),
datecreated datetime,
groupid int)
create table [Group] (id int identity not null, description varchar(30))
Sample data:
insert into Person ([index],datecreated,groupid) values ('4,5,6','2011-01-01',1)
insert into Person ([index],datecreated,groupid) values ('1,2,3','2011-02-02',1)
insert into Person ([index],datecreated,groupid) values ('7,8','2012-02-02',2)
insert into [Group] (description) values ('TestGroup')
insert into [Group] (description) values ('TestGroup2')
Please have a look at the SQL statement below:
select *
from Person
inner join [Group] on Person.groupid = [group].id
where [group].description = 'TestGroup'
order by
left(substring([index], charindex(',', [index]) + 1, 200),
charindex(',', substring([index], charindex(',', [index]) + 1, 200)) - 1)
This SQL statement fails with the following error:
Invalid length parameter passed to the SUBSTRING function.
It is the order by clause that is causing this error i.e. it is trying to find the third element of the index column but the third element does not exist on row 3 (there are only two elements).
However, I would expect the [group].description = 'TestGroup' to filter out record three. This does not appear to be the case. It is as if the order by clause is being run before the where clause. If you exclude the order by clause from the query, then the query runs.
Why is this?

Evaluation order in SQL has very weak guaranteed. Probably the sort is performed first, then a stream aggregate. Nothing wrong with that by it self.
You cannot rely on execution order in general. Except in a case-expression which you can use to create a dummy value like NULL in your order by if the input for SUBSTRING would be invalid. Case is the only way to enforce evaluation order.

This ORDER BY is pretty brutal. I would suggest breaking this into a couple of queries, using a temp-table or table sub-expression, so you can do your filtering first, and/or create a column containing the data to sort by.

Remember, SQL is a declarative language, not a procedural language. That is, you describe the result sets that you want. You depend on the SQL compiler/optimizer to set up the execution plan.
Very typically, a SQL engine will have a component that reads the data from the table and does all the calculations that are needed for that data. Of course, this includes calculations in the SELECT clause, but also calculations in "ON" clauses, "WHERE" clauses, and "ORDER BY" clauses.
The engine can then do the filtering after reading the data. This enables the engine to readily use computed values for the filtering.
I am not saying that all databases work this way. What I am saying is that there is no guarantee of the order of operations in a SQL statement. This situation is one of the cases where doing things in the wrong order results in an error, which prevents the SQL from completing. Do you want help rewriting the query so it doesn't get the error?

Related

SQL Server subquery behaviour

I have a case where I want to check to see if an integer value is found in a column of a table that is varchar, but is a mix of values that can be integers some are just strings. My first thought was to use a subquery to just select the rows with numeric-esque values. The setup looks like:
CREATE TABLE #tmp (
EmployeeID varchar(50) NOT NULL
)
INSERT INTO #tmp VALUES ('aa1234')
INSERT INTO #tmp VALUES ('1234')
INSERT INTO #tmp VALUES ('5678')
DECLARE #eid int
SET #eid = 5678
SELECT *
FROM (
SELECT EmployeeID
FROM #tmp
WHERE IsNumeric(EmployeeID) = 1) AS UED
WHERE UED.EmployeeID = #eid
DROP TABLE #tmp
However, this fails, with: "Conversion failed when converting the varchar value 'aa1234' to data type int.".
I don't understand why it is still trying to compare #eid to 'aa1234' when I've selected only the rows '1234' and '5678' in the subquery.
(I realize I can just cast #eid to varchar but I'm curious about SQL Server's behaviour in this case)
You can't easily control the order things will happen when SQL Server looks at the query you wrote and then determines the optimal execution plan. It won't always produce a plan that follows the same logic you typed, in the same order.
In this case, in order to find the rows you're looking for, SQL Server has to perform two filters:
identify only the rows that match your variable
identify only the rows that are numeric
It can do this in either order, so this is also valid:
identify only the rows that are numeric
identify only the rows that match your variable
If you look at the properties of this execution plan, you see that the predicate for the match to your variable is listed first (which still doesn't guarantee order of operation), but in any case, due to data type precedence, it has to try to convert the column data to the type of the variable:
Subqueries, CTEs, or writing the query a different way - especially in simple cases like this - are unlikely to change the order SQL Server uses to perform those operations.
You can force evaluation order in most scenarios by using a CASE expression (you also don't need the subquery):
SELECT EmployeeID
FROM #tmp
WHERE EmployeeID = CASE IsNumeric(EmployeeID) WHEN 1 THEN #eid END;
In modern versions of SQL Server (you forgot to tell us which version you use), you can also use TRY_CONVERT() instead:
SELECT EmployeeID
FROM #tmp
WHERE TRY_CONVERT(int, EmployeeID) = #eid;
This is essentially shorthand for the CASE expression, but with the added bonus that it allows you to specify an explicit type, which is one of the downsides of ISNUMERIC(). All ISNUMERIC() tells you is if the value can be converted to any numeric type. The string '1e2' passes the ISNUMERIC() check, because it can be converted to float, but try converting that to an int...
For completeness, the best solution - if there is an index on EmployeeID - is to just use a variable that matches the column data type, as you suggested.
But even better would be to use a data type that prevents junk data like 'aa1234' from getting into the table in the first place.

Order By In a SQL Table Valued Function

I've read about this problem on a few different sites, but I still don't understand the solution.
From what I understand, SQL will optimize the query in the function and sometimes the Order By clause will be ignored. How can you sort results?
How can I sort results in a simple table valued function like this?
Create function [dbo].fTest
--Input Parameters
(#competitionID int)
--Returns a table
RETURNS #table TABLE (CompetitionID int )
as
BEGIN
Insert Into #table (CompetitionID)
select CompetitionID from Competition order by CompetitionID desc
RETURN
END
UPDATE
I found inserting a primary key identity field seems to help (as mentioned in the answer posted Martin Smith). Is this a good solution?
--Returns a table
RETURNS #table TABLE
(
SortID int IDENTITY(1,1) PRIMARY KEY,
CompetitionID int
)
In reference to Martin's answer below, sorting outside of the select statement isn't that easy in my situation. My posted example is a stripped down version, but my real-life issue involves a more complicated order by case clause for custom sorting. In addition to that, I'm calling this function in an MVC controller with a LINQ query, which means that custom sorting would have to be added to the LINQ query. That's beyond my ability at this point.
If adding the identity field is a safe solution, I'm happy to go with that. It's simple and easy.
The order by needs to be in the statement that selects from the function.
SELECT CompetitionId
FROM [dbo].fTest()
ORDER BY CompetitionId
This is the only way to get reliable results that are assured to not suddenly break in the future.
You can duplicate your result table (declare a table var #X and #ret_X).
Then perform your actions on the #X table and make the following statement as last statement in your function.
insert into #ret_X
select top 10000 * from #X
order by (column_of_choise) desc
This gives me the sorting I want.
Best way is to return your data from the back end and do the sorting Using a linq query in you c sharp code

SQL equivalent of Excel's approximate VLOOKUP (first relevant value) without nested query

I am trying to find an efficient way to select the first relevant value from an existing table, as Excel's VLOOKUP( , , , TRUE) function would. Here is what I have, but if #tableWithData is very large compared to #requiredDates, this code can be very inefficient. I feel like I am missing something. Is there a better way of writing this:
DECLARE #requiredDates TABLE
(requiredDate datetime)
INSERT INTO #requiredDates VALUES ('2014-01-01');
INSERT INTO #requiredDates VALUES ('2014-01-15');
INSERT INTO #requiredDates VALUES ('2014-02-01');
INSERT INTO #requiredDates VALUES ('2014-02-15');
DECLARE #tableWithData TABLE
(respectiveDate datetime,
associatedValue int
)
INSERT INTO #tableWithData VALUES ('2014-01-01', 1);
INSERT INTO #tableWithData VALUES ('2014-02-01', 2);
SELECT
lookupTable.requiredDate,
dataTable.associatedValue
FROM #tableWithData as dataTable RIGHT JOIN
(
/*Create table which maps the requiredDates -> maxDate highest available date */
SELECT
dates.requiredDate,
MAX(data.respectiveDate) as maxDate/*,
data.associatedValue*/
FROM #requiredDates as dates JOIN #tableWithData as data
ON dates.requiredDate >= data.respectiveDate
GROUP BY dates.requiredDate
) as lookupTable
on lookupTable.maxDate = dataTable.respectiveDate
Note: I am using MS Server 2005, but would also appreciate a more generic SQL implementation, if there is one.
In Excel, vlookup() with a value of "TRUE" finds an approximate match. I am finding your query a bit hard to follow, but the following will get the largest value less than or equal to the respectiveDate field:
SELECT dt.associatedValue,
(SELECT TOP 1 rd.requiredDate
FROM dates.requiredDate rd
WHERE rd.requiredDate <= dt.respectiveDate
ORDER BY rd.requiredDate DESC
) as RequiredDate
FROM #tableWithData dt;
This structure for the query will work in all databases, with the caveat that TOP 1 probably needs to be replaced with something else (a limit clause, a fetch first 1 rows only clause, or something else). Of course, temporary tables will not.
In SQL Server, you can also express this using APPLY.
I was looking for a solution to this then realised that for the type of data I had there was a very simple answer and although it does not quite match your problem you might find it useful. Imagine the query you are trying to run is who was the reigning monarch on a given date and you have a table of the reigns of monarchs. In excel you would do a vlookup search on a sorted table of the start or end dates of the reigns but in SQL it is much easier to have separate columns for reign-start and reign-end the query then is as simple as
SELECT * FROM 'monarchs' WHERE reign_start > date AND reign_end < date
As I said this is such a simple query that in some cases it might be worth seeing if the data could be reconfigured to allow it.

Optimizing stored procedure with multiple "LIKE"s

I am passing in a comma-delimited list of values that I need to compare to the database
Here is an example of the values I'm passing in:
#orgList = "1123, 223%, 54%"
To use the wildcard I think I have to do LIKE but the query runs a long time and only returns 14 rows (the results are correct, but it's just taking forever, probably because I'm using the join incorrectly)
Can I make it better?
This is what I do now:
declare #tempTable Table (SearchOrg nvarchar(max) )
insert into #tempTable
select * from dbo.udf_split(#orgList) as split
-- this splits the values at the comma and puts them in a temp table
-- then I do a join on the main table and the temp table to do a like on it....
-- but I think it's not right because it's too long.
select something
from maintable gt
join #tempTable tt on gt.org like tt.SearchOrg
where
AYEAR= ISNULL(#year, ayear)
and (AYEAR >= ISNULL(#yearR1, ayear) and ayear <= ISNULL(#yearr2, ayear))
and adate = ISNULL(#Date, adate)
and (adate >= ISNULL(#dateR1, adate) and adate <= ISNULL(#DateR2 , adate))
The final result would be all rows where the maintable.org is 1123, or starts with 223 or starts with 554
The reason for my date craziness is because sometimes the stored procedure only checks for a year, sometimes for a year range, sometimes for a specific date and sometimes for a date range... everything that's not used in passed in as null.
Maybe the problem is there?
Try something like this:
Declare #tempTable Table
(
-- Since the column is a varchar(10), you don't want to use nvarchar here.
SearchOrg varchar(20)
);
INSERT INTO #tempTable
SELECT * FROM dbo.udf_split(#orgList);
SELECT
something
FROM
maintable gt
WHERE
some where statements go here
And
Exists
(
SELECT 1
FROM #tempTable tt
WHERE gt.org Like tt.SearchOrg
)
Such a dynamic query with optional filters and LIKE driven by a table (!) are very hard to optimize because almost nothing is statically known. The optimizer has to create a very general plan.
You can do two things to speed this up by orders of magnitute:
Play with OPTION (RECOMPILE). If the compile times are acceptable this will at least deal with all the optional filters (but not with the LIKE table).
Do code generation and EXEC sp_executesql the code. Build a query with all LIKE clauses inlined into the SQL so that it looks like this: WHERE a LIKE #like0 OR a LIKE #like1 ... (not sure if you need OR or AND). This allows the optimizer to get rid of the join and just execute a normal predicate).
Your query may be difficult to optimize. Part of the question is what is in the where clause. You probably want to filter these first, and then do the join using like. Or, you can try to make the join faster, and then do a full table scan on the results.
SQL Server should optimize a like statement of the form 'abc%' -- that is, where the wildcard is at the end. (See here, for example.) So, you can start with an index on maintable.org. Fortunately, your examples meet this criteria. However, if you have '%abc' -- the wildcard comes first -- then the optimization won't work.
For the index to work best, it might also need to take into account the conditions in the where clause. In other words, adding the index is suggestive, but the rest of the query may preclude the use of the index.
And, let me add, the best solution for these types of searches is to use the full text search capability in SQL Server (see here).

Write consistency with nested subquery in Oracle

I've read many of the gory details of write consistency and I understand how it works in the simple cases. What I'm not clear on is what this means for nested sub-queries.
Here's a concrete example:
A table with PK id, and other columns state, temp and date.
UPDATE table SET state = DECODE(state, 'rainy', 'snowy', 'sunny', 'frosty') WHERE id IN (
SELECT id FROM (
SELECT id,state,temp from table WHERE date > 50
) WHERE (state='rainy' OR state='sunny') AND temp < 0
)
The real thing was more convoluted (in the innermost query), but this captures the essence.
If we assume the state column is not nullable, can this update ever fail due to concurrent modification (i.e., the DECODE function doesn't find a match, a value of 'rainy' or 'sunny', and so tries to insert null into a non-nullable column)?
Oracle supports "statement level read and write consistency" (as all other serious DBMS)
This means that the statement as a whole will not see any changes to the database that occurred after the statement started.
As your UPDATE is one single statement there shouldn't be a case where the decode returns null.
Btw: the statement can be simplified, you don't need the outer SELECT in the sub-query:
UPDATE table SET state = DECODE(state, 'rainy', 'snowy', 'sunny', 'frosty')
WHERE id IN (
SELECT id
FROM table
WHERE date > 50
AND (state='rainy' OR state='sunny')
AND temp < 0
)
I don't see any reason to be concerned. The subquery explicitly retrieves only IDs of rows with state 'rainy' or 'sunny' and that's what outer DECODE is going to get. Thole thing is one statement, and is going to be executed within transaction boundaries.
Answering my own question: turns out there is a bug in Oracle which can cause this query to fail. Details confirmed by Tom Kyte, in the discussion starting here.