Table-Valued function - Order by is ignored in output - sql

We are moving from SQL Server 2008 to SQL Server 2012 and immediately noticed that all our table-valued functions no longer deliver their temp table contents in the correctly sorted order.
CODE:
INSERT INTO #Customer
SELECT Customer_ID, Name,
CASE
WHEN Expiry_Date < GETDATE() then 1
WHEN Expired = 1 then 1
ELSE 0
END
from Customer **order by Name**
In SQL Server 2008 this function returns the customers sorted by Name. In SQL Server 2012 it returns the table unsorted. The "order by" is ignored in SQL 2012.
Do we have to re-write all the functions to include a sort_id and then sort them when they are called in the main application or is there an easy fix??

There were two things wrong with your original approach.
On inserting to the table it was never guaranteed that the ORDER BY on the INSERT ... SELECT ... ORDER BY would be the order that the rows were actually inserted.
On selecting from it SQL Server does not guarantee that SELECT without an ORDER BY will return the rows in any particular order such as insertion order anyway.
In 2012 it looks as though the behaviour has changed with respect to item 1. It now generally ignores the ORDER BY on the SELECT statement that is the source for an INSERT
DECLARE #T TABLE(number int)
INSERT INTO #T
SELECT number
FROM master..spt_values
ORDER BY name
2008 Plan
2012 Plan
The reason for the change of behaviour is that in previous versions SQL Server produced one plan that was shared between executions with SET ROWCOUNT 0 (off) and SET ROWCOUNT N. The sort operator was only there to ensure the correct semantics in case the plan was run by a session with a non zero ROWCOUNT set. The TOP operator to the left of it is a ROWCOUNT TOP.
SQL Server 2012 now produces separate plans for the two cases so there is no need to add these to the ROWCOUNT 0 version of the plan.
A sort may still appear in the plan in 2012 if the SELECT has an explicit TOP defined (other than TOP 100 PERCENT) but this still doesn't guarantee actual insertion order of rows, the plan might then have another sort after the TOP N is established to get the rows into clustered index order for example.
For the example in your question I would just adjust the calling code to specify ORDER BY name if that is what it requires.
Regarding your sort_id idea from Ordering guarantees in SQL Server it is guaranteed when inserting into a table with IDENTITY that the order these are allocated will be as per the ORDER BY so you could also do
DECLARE #Customer TABLE (
Sort_Id INT IDENTITY PRIMARY KEY,
Customer_ID INT,
Name INT,
Expired BIT )
INSERT INTO #Customer
SELECT Customer_ID,
Name,
CASE
WHEN Expiry_Date < Getdate() THEN 1
WHEN Expired = 1 THEN 1
ELSE 0
END
FROM Customer
ORDER BY Name
but you would still need to order by the sort_id in your selecting queries as there is no guaranteed ordering without that (perhaps this sort_id approach might be useful in the case where the original columns used for ordering aren't being copied into the table variable)

add a column named rowno to #Customer table
INSERT INTO #Customer
SELECT ROW_NUMBER()over(order by Name)rowno,Customer_ID, Name,
CASE
WHEN Expiry_Date < GETDATE() then 1
WHEN Expired = 1 then 1
ELSE 0
END
from Customer

Related

Is there any better option to apply pagination without applying OFFSET in SQL Server?

I want to apply pagination on a table with huge data. All I want to know a better option than using OFFSET in SQL Server.
Here is my simple query:
SELECT *
FROM TableName
ORDER BY Id DESC
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).
In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.
For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.
In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.
One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.
Here is an example:
Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.
You begin with:
SELECT TOP (#numRows)
*
FROM TableName
ORDER BY Id DESC;
Note the use of ORDER BY to ensure the order is correct
In some RDBMSs you need LIMIT instead of TOP
The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:
SELECT TOP (#numRows)
*
FROM TableName
WHERE Id < #lastId
ORDER BY Id DESC;
Note the use of < not <=
In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.
The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.
SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (#lastOther, #lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).
Instead you need the following:
SELECT TOP (#numRows)
*
FROM TableName
WHERE (
(OtherColumn = #lastOther AND Id < #lastId)
OR OtherColumn < #lastOther
)
ORDER BY
OtherColumn DESC,
Id DESC;
This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.
The presence of NULLs complicates things further. You may want to query those rows separately.
On very big merchant website we use a technic compound of ids stored in a pseudo temporary table and join with this table to the rows of the product table.
Let me talk with a clear example.
We have a table design this way :
CREATE TABLE S_TEMP.T_PAGINATION_PGN
(PGN_ID BIGINT IDENTITY(-9 223 372 036 854 775 808, 1) PRIMARY KEY,
PGN_SESSION_GUID UNIQUEIDENTIFIER NOT NULL,
PGN_SESSION_DATE DATETIME2(0) NOT NULL,
PGN_PRODUCT_ID INT NOT NULL,
PGN_SESSION_ORDER INT NOT NULL);
CREATE INDEX X_PGN_SESSION_GUID_ORDER
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_GUID, PGN_SESSION_ORDER)
INCLUDE (PGN_SESSION_ORDER);
CREATE INDEX X_PGN_SESSION_DATE
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_DATE);
We have a very big product table call T_PRODUIT_PRD and a customer filtered it with many predicates. We INSERT rows from the filtered SELECT into this table this way :
DECLARE #SESSION_ID UNIQUEIDENTIFIER = NEWID();
INSERT INTO S_TEMP.T_PAGINATION_PGN
SELECT #SESSION_ID , SYSUTCDATETIME(), PRD_ID,
ROW_NUMBER() OVER(ORDER BY --> custom order by
FROM dbo.T_PRODUIT_PRD
WHERE ... --> custom filter
Then everytime we need a desired page, compound of #N products we add a join to this table as :
...
JOIN S_TEMP.T_PAGINATION_PGN
ON PGN_SESSION_GUID = #SESSION_ID
AND 1 + (PGN_SESSION_ORDER / #N) = #DESIRED_PAGE_NUMBER
AND PGN_PRODUCT_ID = dbo.T_PRODUIT_PRD.PRD_ID
All the indexes will do the job !
Of course, regularly we have to purge this table and this is why we have a scheduled job which deletes the rows whose sessions were generated more than 4 hours ago :
DELETE FROM S_TEMP.T_PAGINATION_PGN
WHERE PGN_SESSION_DATE < DATEADD(hour, -4, SYSUTCDATETIME());
In the same spirit as SQLPro solution, I propose:
WITH CTE AS
(SELECT 30000000 AS N
UNION ALL SELECT N-1 FROM CTE
WHERE N > 30000000 +1 - 20)
SELECT T.* FROM CTE JOIN TableName T ON CTE.N=T.ID
ORDER BY CTE.N DESC
Tried with 2 billion lines and it's instant !
Easy to make it a stored procedure...
Of course, valid if ids follow each other.

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

Order by clause execution in SQL

This question isn't about order of executions. It's about just the ORDER BY.
In standard execution is:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
TOP
EDIT: This question has been more or less the issue of "Does SQL Server apply short circuit evaluation when executing ORDER BY expressions?" The answer is SOMETIMES! I just haven't found a reasonable reason as to why. See Edit #4.
Now suppose I have a statement like this:
DECLARE #dt18YearsAgo AS DATETIME = DATEADD(YEAR,-18,GETDATE());
SELECT
Customers.Name
FROM
Customers
WHERE
Customers.DateOfBirth > #dt18YearsAgo
ORDER BY
Contacts.LastName ASC, --STATEMENT1
Contacts.FirstName ASC, --STATEMENT2
(
SELECT
MAX(PurchaseDateTime)
FROM
Purchases
WHERE
Purchases.CustomerID = Customers.CustomerID
) DESC --STATEMENT3
This isn't the real statement I'm trying to execute, but just an example.
There are three ORDER BY statements.
The third statement is only used for rare cases where the last name and first name match.
If there are no duplicate last names, does SQL Server not execute ORDER BY statements #2 and #3? And, logically, if there are no duplicate last name and first name, does SQL Server note execute statement #3.
This is really for optimization. Reading from the Purchases table should only be a last resort. In the case of my application, it wouldn't be efficient to read every single "PurchaseDateTime" from "Purchases" grouping by "CustomerID".
Please keep the answer related to my question and not a suggestion like building an index for CustomerID, PurchaseDateTime in Purchases. The real question is, does SQL Server skip unnecessary ORDER BY statements?
Edit: Apparently, SQL Server will always execute every statement as long as there is one row. Even with one row, this will give you a divide by zero error:
DECLARE #dt18YearsAgo AS DATETIME = DATEADD(YEAR,-18,GETDATE());
SELECT
Customers.Name
FROM
Customers
WHERE
Customers.DateOfBirth > #dt18YearsAgo
ORDER BY
Contacts.LastName ASC, --STATEMENT1
Contacts.FirstName ASC, --STATEMENT2
1/(Contacts.ContactID - Contacts.ContactID) --STATEMENT3
Edit2:
Apparently, this doesn't give divide by zero:
DECLARE #dt18YearsAgo AS DATETIME = DATEADD(YEAR,-18,GETDATE());
SELECT
Customers.Name
FROM
Customers
WHERE
Customers.DateOfBirth > #dt18YearsAgo
ORDER BY
Contacts.LastName ASC, --STATEMENT1
Contacts.FirstName ASC, --STATEMENT2
CASE WHEN 1=0
THEN Contacts.ContactID
ELSE 1/(Contacts.ContactID - Contacts.ContactID)
END --STATEMENT3
Well, the original answer to my question is YES, it does execute, but what's nice is that I can stop execute with a proper CASE WHEN
Edit 3: We can stop execution of an ORDER BY statement with a proper CASE WHEN. The trick, I guess, is to figure out how to use it properly. CASE WHEN will give me what I want, which a short circuit execution in an ORDER BY statement. I compared the Execution Plan in SSMS and depending on the CASE WHEN statement, the Purchases table isn't scanned at all EVEN THOUGH it's a clearly visible SELECT/FROM statement:
DECLARE #dt18YearsAgo AS DATETIME = DATEADD(YEAR,-18,GETDATE());
SELECT
Customers.Name
FROM
Customers
WHERE
Customers.DateOfBirth > #dt18YearsAgo
ORDER BY
Contacts.LastName ASC, --STATEMENT1
Contacts.FirstName ASC, --STATEMENT2
CASE WHEN 1=0
THEN
(
SELECT
MAX(PurchaseDateTime)
FROM
Purchases
WHERE
Purchases.CustomerID = Customers.CustomerID
)
ELSE Customers.DateOfBirth
END DESC
Edit 4: Now I'm completely confused. Here's an example by #Lieven
WITH Test (name, ID) AS
(SELECT 'Lieven1', 1 UNION ALL SELECT 'Lieven2', 2)
SELECT * FROM Test ORDER BY name, 1/ (ID - ID)
This yields no divide by zero, which means SQL Server does in fact, do short circuit evaluation on SOME tables, specifically those created with the WITH command.
Trying this with a TABLE variable:
DECLARE #Test TABLE
(
NAME nvarchar(30),
ID int
);
INSERT INTO #Test (Name,ID) VALUES('Lieven1',1);
INSERT INTO #Test (Name,ID) VALUES('Lieven2',2);
SELECT * FROM #Test ORDER BY name, 1/ (ID - ID)
will yield a divide by zero error.
First of all what you are calling "Statements" are no such thing. They are sub-clauses of the ORDER BY (major) clause. The difference is important, because "Statement" implies something separable, ordered and procedural, and SQL sub-clauses are none of those things.
Specifically, SQL sub-clauses (that is, the individual items of a SQL major clause (SELECT, FROM, WHERE, ORDER BY, etc.)) have no implicit (nor explicit) execution order of their own. SQL will re-order them in anyway that it finds convenient and will almost always execute all of them if it execute any of them. In short, SQL Server does not do that kind of "short-circuit" optimizations because they are trivially effective and seriously get in the way of the very different kind of optimizations that it does do (i.e., Statistical Data Access/Operator Optimizations).
So the correct answer to your original question (which you should not have changed) is NO, not reliably. You cannot rely on SQL Server to not use some sub-clause of the ORDER BY, simply because it looks like it does not need to.
The only common exception to this is that the CASE function can (in most circumstances) be used to short-circuit execution paths (within the CASE function though, not outside of it), but only because it is specifically designed for this. I cannot think of anything else in SQL that you can rely on to act like this.
DECLARE #MyTable TABLE
(
Data varchar(30)
)
INSERT INTO #MyTable (Data) SELECT 'One'
INSERT INTO #MyTable (Data) SELECT 'Two'
INSERT INTO #MyTable (Data) SELECT 'Three'
--SELECT *
--FROM #MyTable
--ORDER BY LEN(Data), LEN(Data)/0
-- Divide by zero error encountered.
SELECT *
FROM #MyTable
ORDER BY LEN(Data), CASE WHEN Data is null THEN LEN(Data)/0 ELSE 1 END
-- no problem
Also with SET STATISTICS IO ON I saw these results:
SELECT *
FROM #MyTable
ORDER BY LEN(Data)
--(3 row(s) affected)
--Table '#4F2895A9'. Scan count 1, logical reads 1
SELECT *
FROM #MyTable
ORDER BY LEN(Data), CASE WHEN Data = 'One' THEN (SELECT MAX(t2.Data) FROM #MyTable t2) ELSE Data END
--(3 row(s) affected)
--Table '#4F2895A9'. Scan count 2, logical reads 2
SELECT *
FROM #MyTable
ORDER BY LEN(Data), CASE WHEN Data = 'Zero' THEN (SELECT MAX(t2.Data) FROM #MyTable t2) ELSE Data END
--(3 row(s) affected)
--Table 'Worktable'. Scan count 0, logical reads 0
--Table '#4F2895A9'. Scan count 1, logical reads 1
I guess you have answered your question. However, why you are sorting the data on just firstname, lastname and if these two are same then purchase order otherwise you will do on DOB?
Logically, it should be firstname, lastname, DOB. If these three are the same, only then should you evaluate the purchaseorderdate. There are many people who have the same names, but very few have the same names and DOBs. This will reduce the time you will be querying the purchase table.

Set values from 0 to n in SQL

How can I set all values in OrderNumber column from 1 to n(when they are ordered by primary key) where n is entries count in table?
EDIT:
Let's assume we have 3 entries with ID's 4,7 and 15 ... I want to set their OrderValues 1,2 and 3.
I'm using SQL Server 2008.
;WITH x AS (SELECT ID, OrderValues,
rn = ROW_NUMBER() OVER (ORDER BY ID)
FROM dbo.tablename
)
UPDATE x SET OrderValues = rn;
However, why on earth do you want to do this, when you can derive this information using the ROW_NUMBER() function at query time? Storing the values means that they are guaranteed to be out of date and out of sync the moment you insert/update/delete a single row in the table. So unless you plan to run this update after every DML operation (e.g. using a trigger), which isn't very logical to me, you're likely much better off getting these row_number values when you run the query vs. storing them in the table.

SQL Server ROW_NUMBER() on SQL Server 2000?

I have a query that allows me to get records from a database table by giving it a minimum and maximum limit.
It goes like this:
SELECT T1.CDUSUARIO, T1.DSALIAS, T1.DSNOMBRE_EMPRESA, T1.DSCARGO, T1.DSDIRECCION_CORREO, T1.CDUSUARIO_ADMINISTRADOR, T1.FEMODIFICACION
FROM (SELECT *,
ROW_NUMBER() OVER (ORDER BY CDUSUARIO) as row FROM TBL_USUARIOS ) as T1
WHERE row > #limiteInf
and row <= #limiteSup
ORDER BY DSALIAS ASC;
Now, it works like heaven on SQL Server 2005 and SQL Server 2008 but tried to run it on an SQL Server 2000 database and says:
ROW_NUMBER it's an unknown function name or something like that.
What can I do??
There is a COUNT(*) with SELF JOIN solution here that will scale badly
You can load a temp table with an IDENTITY column and read back but it's not guaranteed to work (can't find article on it, was told at an MS Seminar years ago)
Neither solution will support PARTITION BY
I've not mentioned loop or CURSOR based solutions which are probably worse
Edit 20 May 20011
Example demo of why IDENTITY won't work: Do Inserted Records Always Receive Contiguous Identity Values
I know this thread is bit old, but for anyone else looking for same solution, I think it will be useful to know that there is a good solution for this problem.
Please see the original link here
For those who do not want to click on the link, I have copied and pasted the code below. Again, credit goes to original publisher
Here is the below SQL for SQL Server 2000 to select the latest version of a record grouping by a single column.
SELECT *
FROM (
SELECT *, (
SELECT COUNT(*)
FROM MyTable AS counter
WHERE counter.PartitionByColumn = MyTable.PartitionByColumn
AND counter.OrderByColumn >= MyTable.OrderByColumn
) AS rowNumber
FROM MyTable
) AS r1
WHERE r1.rowNumber = 1
Same code in SQL Server 2005 would look like this:
SELECT * FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY PartitionByColumn
ORDER BY OrderByColumn DESC) AS rowNumber FROM MyTable) AS rw1
WHERE rw1.rowNumber = 1
Use another function or upgrade your database. ROW_NUMBER did not exist back in the 2000 version of the database. Point. Nothing you can do about it.
This is my solution to the problem:
declare #i int
declare #t table (row int, stuff varchar(99))
insert into #t
select 0,stuff from mytable -- <= your query
set #i=0
update #t set row=#i, #i=#i+1
select * from #t
Explanation:
create a memory table
insert data (your query) with the row number as 0
update the row number field with an int variable which is incremented in the same update for the next record (actually the variable is incremented first and then updated, so it will start from 1)
"select" the result from the memory table.
You may ask, why don't i use the variable in the select statement? It would be simpler but it's not allowed, only if there is no result. It's ok to do it in an update.