How to optimize CTE query - sql

I have following query (for the sake of this topic it was simplified as below):
WITH CTE
(
Columns,
DeliverDate,
LastReplayDate
)
AS
(
SELECT IIF(LastReplayDate IS NULL, IIF(LastReplayDate>= DeliverDate, LastReplayDate,DeliverDate),LastReplayDate) AS SortDateColumn,R.* FROM
(
SELECT
Columns,
DeliverDate,
LastReplayDate
FROM MY_TABLE
WHERE
CONDITIONS
AND ( FIRST_HEAVY_FUNCTION)
AND ( SECOND_HEAVY_FUNCTION)
) R
ORDER BY SortDateColumn DESC
OFFSET (#CurrentPageIndex - 1) * #PageSize ROWS
FETCH NEXT 10 ROWS ONLY
)
SELECT CTE. *
FROM CTE
OPTION (RECOMPILE);
As you see I use CTE query from sorted data from other query. Last one include pagging each 10 rows.
The most problematic for me here is part:
WHERE
CONDITIONS
AND ( FIRST_HEAVY_FUNCTION)
AND ( SECOND_HEAVY_FUNCTION)
Because of those conditions return time sometimes reach up to 4 minutes or so. Without it's fairly quick (8-20 sec).
Of course indexes were made and improved a query from 15 minutes. But this is still to slow.
I was wondering is it possible to move problematic conditions outside CTE and at the same time still get all 10 rows from paging? In case where rows count < 10 do another loop to collect missing rows to achieve exactly 10 rows as final result.
Is it possible? Or how to optimize a such query?

A couple of questions:
can you rewrite the functions as inline table valued function? That can allevate the pressure
Does the CONDITIONS remove enough rows already? So that the HEAVY_FUNCTIONs are just the cherry on top. In that case you might be able to fake a push down (but i wouldn't guarantee it's always working):
select *
from (
select *
FROM MY_TABLE
WHERE
CONDITIONS
) x
cross apply (
select 1 AS test
where dbo.FN_SLOW_FUNCTION(x.param1, x.param2) = 1
) filter1
cross apply (
select 1 AS test
where dbo.FN_ANOTHER_SLOW_FUNCTION(x.param3, x.param4) = 1
) filter2
ORDER BY SomeDate OFFSET ...
There's a OPTION hint called FORCE ORDER which might guarantee that SQL isn't rearranged too much
Are your indexes good?

Related

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

How to skip the first n rows in sql query

I want to fire a Query "SELECT * FROM TABLE" but select only from row N+1. Any idea on how to do this?
For SQL Server 2012 and above, use this:
SELECT *
FROM Sales.SalesOrderHeader
ORDER BY OrderDate
OFFSET (#Skip) ROWS FETCH NEXT (#Take) ROWS ONLY
https://stackoverflow.com/a/19669165/1883345
SQL Server:
select * from table
except
select top N * from table
Oracle up to 11.2:
select * from table
minus
select * from table where rownum <= N
with TableWithNum as (
select t.*, rownum as Num
from Table t
)
select * from TableWithNum where Num > N
Oracle 12.1 and later (following standard ANSI SQL)
select *
from table
order by some_column
offset x rows
fetch first y rows only
They may meet your needs more or less.
There is no direct way to do what you want by SQL.
However, it is not a design flaw, in my opinion.
SQL is not supposed to be used like this.
In relational databases, a table represents a relation, which is a set by definition. A set contains unordered elements.
Also, don't rely on the physical order of the records. The row order is not guaranteed by the RDBMS.
If the ordering of the records is important, you'd better add a column such as `Num' to the table, and use the following query. This is more natural.
select *
from Table
where Num > N
order by Num
Query: in sql-server
DECLARE #N INT = 5 --Any random number
SELECT * FROM (
SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RoNum
, ID --Add any fields needed here (or replace ID by *)
FROM TABLE_NAME
) AS tbl
WHERE #N < RoNum
ORDER BY tbl.ID
This will give rows of Table, where rownumber is starting from #N + 1.
In order to do this in SQL Server, you must order the query by a column, so you can specify the rows you want.
Example:
select * from table order by [some_column]
offset 10 rows
FETCH NEXT 10 rows only
Do you want something like in LINQ skip 5 and take 10?
SELECT TOP(10) * FROM MY_TABLE
WHERE ID not in (SELECT TOP(5) ID From My_TABLE ORDER BY ID)
ORDER BY ID;
This approach will work in any SQL version. You need to stablish some order (by Id for example) so all rows are provided in a predictable manner.
I know it's quite late now to answer the query. But I have a little different solution than the others which I believe has better performance because no comparisons are performed in the SQL query only sorting is done. You can see its considerable performance improvement basically when value of SKIP is LARGE enough.
Best performance but only for SQL Server 2012 and above. Originally from #Majid Basirati's answer which is worth mentioning again.
DECLARE #Skip INT = 2, #Take INT = 2
SELECT * FROM TABLE_NAME
ORDER BY ID ASC
OFFSET (#Skip) ROWS FETCH NEXT (#Take) ROWS ONLY
Not as Good as the first one but compatible with SQL Server 2005 and above.
DECLARE #Skip INT = 2, #Take INT = 2
SELECT * FROM
(
SELECT TOP (#Take) * FROM
(
SELECT TOP (#Take + #Skip) * FROM TABLE_NAME
ORDER BY ID ASC
) T1
ORDER BY ID DESC
) T2
ORDER BY ID ASC
What about this:
SELECT * FROM table LIMIT 50 OFFSET 1
This works with all DBRM/SQL, it is standard ANSI:
SELECT *
FROM owner.tablename A
WHERE condition
AND n+1 <= (
SELECT COUNT(DISTINCT b.column_order)
FROM owner.tablename B
WHERE condition
AND b.column_order>a.column_order
)
ORDER BY a.column_order DESC
PostgreSQL: OFFSET without LIMIT
This syntax is supported, and it is in my opinion the cleanest API compared to other SQL implementations as it does not introduce any new keywords:
SELECT * FROM mytable ORDER BY mycol ASC OFFSET 1
that should definitely be standardized.
The fact that this is allowed can be seen from: https://www.postgresql.org/docs/13/sql-select.html since LIMIT and OFFSET can be given independently, since OFFSET is not a sub-clause of LIMIT in the syntax specification:
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
SQLite: negative limit
OFFSET requires LIMIT in that DBMS, but dummy negative values mean no limit. Not as nice as PostgreSQL, but it works:
SELECT * FROM mytable ORDER BY mycol ASC LIMIT -1 OFFSET 1
Asked at: SQLite with skip (offset) only (not limit)
Documented at: https://sqlite.org/lang_select.html
If the LIMIT expression evaluates to a negative value, then there is no upper bound on the number of rows returned.
MySQL: use a huge limit number
Terrible API design, the documentation actually recommends it:
SELECT * FROM tbl LIMIT 1,18446744073709551615;
Asked at: MySQL skip first 10 results
Node.js Sequelize ORM implements it
That ORM allows e.g. findAll({offset: without limit:, and implements workarounds such as the ones mentioned above for each different DBMS.
In Faircom SQL (which is a pseudo MySQL), i can do this in a super simple SQL Statement, just as follows:
SELECT SKIP 10 * FROM TABLE ORDER BY Id
Obviously you can just replace 10 with any declared variable of your desire.
I don't have access to MS SQL or other platforms, but I'll be really surprised MS SQL doesn't support something like this.
DECLARE #Skip int= 2, #Take int= 2
SELECT * FROM TABLE_NAME
ORDER BY Column_Name
OFFSET (#Skip) ROWS FETCH NEXT (#Take) ROWS ONLY
try below query it's work
SELECT * FROM `my_table` WHERE id != (SELECT id From my_table LIMIT 1)
Hope this will help
You can also use OFFSET to remove the 1st record from your query result like this-
Example - find the second max salary from the employee table
select distinct salary from employee order by salary desc limit 1 OFFSET 1
For SQL Server 2012 and later versions, the best method is #MajidBasirati's answer.
I also loved #CarlosToledo's answer, it's not limited to any SQL Server version but it's missing Order By Clauses. Without them, it may return wrong results.
For SQL Server 2008 and later I would use Common Table Expressions for better performance.
-- This example omits first 10 records and select next 5 records
;WITH MyCTE(Id) as
(
SELECT TOP (10) Id
FROM MY_TABLE
ORDER BY Id
)
SELECT TOP (5) *
FROM MY_TABLE
INNER JOIN MyCTE ON (MyCTE.Id <> MY_TABLE.Id)
ORDER BY Id

Fast approximate counting in Postgres

I'm querying my database (Postgres 8.4) with something like the following:
SELECT COUNT(*) FROM table WHERE indexed_varchar LIKE 'bar%';
The complexity of this is O(N) because Postgres has to count each row. Postgres 9.2 has index-only scans, but upgrading isn't an option, unfortunately.
However, getting an exact count of rows seems like overkill, because I only need to know which of the following three cases is true:
Query returns no rows.
Query returns one row.
Query returns two or more rows.
So I don't need to know that the query returns 10,421 rows, just that it returns more than two.
I know how to handle the first two cases:
SELECT EXISTS (SELECT COUNT(*) FROM table WHERE indexed_varchar LIKE 'bar%');
Which will return true if one or more rows exists and false is none exist.
Any ideas on how to expand this to encompass all three cases in a efficient manner?
SELECT COUNT(*) FROM (
SELECT * FROM table WHERE indexed_varchar LIKE 'bar%' LIMIT 2
) t;
Should be simple. You can use LIMIT to do what what you want and return data (count) using a CASE statement.
SELECT CASE WHEN c = 2 THEN 'more than one' ELSE CAST(c AS TEXT) END
FROM
(
SELECT COUNT(*) AS c
FROM (
SELECT 1 AS c FROM table WHERE indexed_varchar LIKE 'bar%' LIMIT 2
) t
) v

SQL Server 2008 Paged Row Retrieval and Large Tables

I'm using SQL Server 2008 and the following query to implement paged data retrieval from our JSF application, in below code i am retrieving 25 rows at a time sorted by the default sort column in DESC order.
SELECT * FROM
(
SELECT TOP 25 * FROM
(
SELECT TOP 25 ...... WHERE CONDITIONS
--ORDER BY ... DESC
)AS INNERQUERY ORDER BY INNERQUERY.... ASC
)
AS OUTERQUERY
ORDER BY OUTERQUERY.... DESC
It works, but with one obvious flow. If the users request to see the last page and there are over 10 million records in table, then the second TOP Query will have to first retrieve the 10 million records and only then the first top Query will pick out the Top 25 which will look like:
SELECT * FROM
(
SELECT TOP 25 * FROM
(
SELECT TOP 10000000 ...... WHERE CONDITIONS
--ORDER BY ... DESC
)AS INNERQUERY ORDER BY INNERQUERY.... ASC
)
AS OUTERQUERY
ORDER BY OUTERQUERY.... DESC
I looked into replacing the above with ROW_NUMBER OVER(....) but seemingly i had the same issue where the second TOP statement will have to get the entire result and only then you can do a where ROW_NUMBER between x and y.
Can you please point out my mistakes in the above approach and hints on how it can be optimized?
I'm currently using the following to code to retrieve subset of rows:
WITH PAGED_QRY (
SELECT *, ROW_NUMVER() OVER(ORDER BY Y) AS ROW_NO
FROM TABLE WHERE ....
)
SELECT * FROM PAGED_QRY WHERE ROW_NO BETWEEN #CURRENT_INDEX and # ROWS_TO_RETRIEVE
ORDER BY ROW_NO
where #current_index and #rows_to_retrieve (ie. 1 and 50) are your paging variables. it's cleaner and easier to read.
I've also tried using SET ROW_COUNT #ROWS_TO_RETRIEVE but doesn't seem to make much difference.
Using above query and by carefully studying the execution path of the query and modifying/creating indexes and statistics I've reached results that are sufficiently satisfactory, hence why i'm making this as the answer. The original goal of retrieving only the required rows in the inner query seems to be not possible yet, if you do find the way please let me know.
we can improve above query a bit more.
If I assume that #current_index is the current page number then we can rewrite the above query as:
WITH PAGED_QRY (
SELECT top (#current_index * #rows_to_retrieve) *, ROW_NUMVER()
OVER(ORDER BY Y) AS ROW_NO
FROM TABLE WHERE ....
)
SELECT TOP #ROWS_TO_RETRIEVE FROM PAGED_QRY
ORDER BY ROW_NO DESC
In this case, our inner query will not return the whole record set. Suppose our page_index is 3 & page_size is 50, then it will select only 150 rows(even if our table contains hundreds/thousands/millions of rows) & we can skip the where clause also.

Are the results deterministic, if I partition SQL SELECT query without ORDER BY?

I have SQL SELECT query which returns a lot of rows, and I have to split it into several partitions. Ie, set max results to 10000 and iterate the rows calling the query select time with increasing first result (0, 10000, 20000). All the queries are done in same transaction, and data that my queries are fetching is not changing during the process (other data in those tables can change, though).
Is it ok to use just plain select:
select a from b where...
Or do I have to use order by with the select:
select a from b where ... order by c
In order to be sure that I will get all the rows? In other word, is it guaranteed that query without order by will always return the rows in the same order?
Adding order by to the query drops performance of the query dramatically.
I'm using Oracle, if that matters.
EDIT: Unfortunately I cannot take advantage of scrollable cursor.
Order is definitely not guaranteed without an order by clause, but whether or not your results will be deterministic (aside from the order) would depend on the where clause. For example, if you have a unique ID column and your where clause included a different filter range each time you access it, then you would have non-ordered deterministic results, i.e.:
select a from b where ID between 1 and 100
select a from b where ID between 101 and 200
select a from b where ID between 201 and 300
would all return distinct result sets, but order would not be any way guaranteed.
No, without order by it is not guaranteed that query will ALWAYS return the rows in the same order.
No guarantees unless you have an order by on the outermost query.
Bad SQL Server example, but same rules apply. Not guaranteed order even with inner query
SELECT
*
FROM
(
SELECT
*
FROM
Mytable
ORDER BY SomeCol
) foo
Use Limit
So you would do:
SELECT * FROM table ORDER BY id LIMIT 0,100
SELECT * FROM table ORDER BY id LIMIT 101,100
SELECT * FROM table ORDER BY id LIMIT 201,100
The LIMIT would be from which position you want to start and the second variable would be how many results you want to see.
Its a good pagnation trick.