How can I speed up row_number in Oracle? - sql

I have a SQL query that looks something like this:
SELECT * FROM(
SELECT
...,
row_number() OVER(ORDER BY ID) rn
FROM
...
) WHERE rn between :start and :end
Essentially, it's the ORDER BY part that's slowing things down. If I were to remove it, the EXPLAIN cost goes down by an order of magnitude (over 1000x). I've tried this:
SELECT
...
FROM
...
WHERE
rownum between :start and :end
But this doesn't give correct results. Is there any easy way to speed this up? Or will I have to spend some more time with the EXPLAIN tool?

ROW_NUMBER is quite inefficient in Oracle.
See the article in my blog for performance details:
Oracle: ROW_NUMBER vs ROWNUM
For your specific query, I'd recommend you to replace it with ROWNUM and make sure that the index is used:
SELECT *
FROM (
SELECT /*+ INDEX_ASC(t index_on_column) NOPARALLEL_INDEX(t index_on_column) */
t.*, ROWNUM AS rn
FROM table t
ORDER BY
column
)
WHERE rn >= :start
AND rownum <= :end - :start + 1
This query will use COUNT STOPKEY
Also either make sure you column is not nullable, or add WHERE column IS NOT NULL condition.
Otherwise the index cannot be used to retrieve all values.
Note that you cannot use ROWNUM BETWEEN :start and :end without a subquery.
ROWNUM is always assigned last and checked last, that's way ROWNUM's always come in order without gaps.
If you use ROWNUM BETWEEN 10 and 20, the first row that satisifies all other conditions will become a candidate for returning, temporarily assigned with ROWNUM = 1 and fail the test of ROWNUM BETWEEN 10 AND 20.
Then the next row will be a candidate, assigned with ROWNUM = 1 and fail, etc., so, finally, no rows will be returned at all.
This should be worked around by putting ROWNUM's into the subquery.

Looks like a pagination query to me.
From this ASKTOM article (about 90% down the page):
You need to order by something unique for these pagination queries, so that ROW_NUMBER is assigned deterministically to the rows each and every time.
Also your queries are no where near the same so I'm not sure what the benefit of comparing the costs of one to the other is.

Is your ORDER BY column indexed? If not that's a good place to start.

Part of the problem is how big is the 'start' to 'end' span and where they 'live'.
Say you have a million rows in the table, and you want rows 567,890 to 567,900 then you are going to have to live with the fact that it is going to need to go through the entire table, sort pretty much all of that by id, and work out what rows fall into that range.
In short, that's a lot of work, which is why the optimizer gives it a high cost.
It is also not something an index can help with much. An index would give the order, but at best, that gives you somewhere to start and then you keep reading on until you get to the 567,900th entry.
If you are showing your end user 10 items at a time, it may be worth actually grabbing the top 100 from the DB, then having the app break that 100 into ten chunks.

Spend more time with the EXPLAIN PLAN tool. If you see a TABLE SCAN you need to change your query.
Your query makes little sense to me. Querying over a ROWID seems like asking for trouble. There's no relational info in that query. Is it the real query that you're having trouble with or an example that you made up to illustrate your problem?

Related

Very slow performance when Count(*) on subquery with

i need to know the total rows returned by a query to fill pagination text in a web page.
Im doing pagination on SQL side to improve performance.
Using the query below, i get 6560 records in 15 seconds, wich is slow for my needs:
1.
SELECT COUNT(*)
FROM dbo.vw_Lista_Pedidos_Backoffice_ix vlpo WITH (NOLOCK)
WHERE dataCriacaoPedido>=DATEADD(month,-6,getdate())
Using this query, i get the same result in 1 second:
2.
SELECT COUNT(*) FROM
(SELECT *, ROW_NUMBER() over (order by pedidoid desc) as RowNumber
FROM dbo.vw_Lista_Pedidos_Backoffice_ix vlpo WITH (NOLOCK)
WHERE
dataCriacaoPedido>=DATEADD(month,-6,getdate())
) records
WHERE RowNumber BETWEEN 1 AND 6560
If i change the above query (2.) and set the upper limit of RowNumber to a number greater than 6560 (the result of count(*)), the query takes again 15 seconds to run!
So, my questions are:
- why is the query 2. takes so less time, even that the limit on RowNumber actualy dont limit any of the rows in the subquery?
- is there any way i can use the query 2. on my advantage to get the total rows?
Ty all :)
This isn't going to fully answer your question, because the real answer lies in the view definition and optimizing that. This is intended to answer questions about behavior.
The reason why COUNT(*) is slower is because it has to generate all the rows in the view, and then count them. The counting isn't the issue. The generation is.
The reason why ROW_NUMBER() over (order by pedidoid desc) is fast is because an index exists on pedidoid. SQL Server uses the index for ROW_NUMBER(). And, just as important, it can access the data in the view using the same index. So, that speeds the query.
The reason why there is a magic number at 6,561. Well, that I don't know. That has to do with the vagaries of the SQL Server optimizer and your configuration. One possibility has to do with the WHERE clause:
WHERE dataCriacaoPedido >= DATEADD(month, -6, getdate())
My guess is that there are 6,560 matches to the condition. But, SQL Server has to scan the whole table. It scans the table, finds the matching values. However, the engine does not know that it is done, so it keeps searching for rows. As I say, though, this is speculation that explains the behavior.
The really fix the query, you need to understand how the view works.

SQL Select, different than the last 10 records

I have a table called "dutyroster". I want to make a random selection from this table's "names" column, but, I want the selection be different than the last 10 records so that the same guy is not given a second duty in 10 days. Is that possible ?
Create a temporary table with only one column called oldnames which will have no records initially. For each select, execute a query like
select names from dutyroster where dutyroster.names not in (select oldnamesfrom temporarytable) limit 10
and when execution is done add the resultset to the temporary table
The other answer already here is addressing the portion of the question on how to avoid duplicating selections.
To accomplish the random part of the selection, leverage newid() directly within your select statement. I've made this sqlfiddle as an example.
SELECT TOP 10
newid() AS [RandomSortColumn],
*
FROM
dutyroster
ORDER BY
[RandomSortColumn] ASC
Keep executing the query, and you'll keep getting different results. Use the technique in the other answer for avoiding doubling a guy up.
The basic idea is to use a subquery to get all but users from the last ten days, then sort the rest randomly:
select dr.*
from dutyroster dr
where dr.name not in (select dr2.name
from dutyroster dr2
where dr2.datetimecol >= date_sub(curdate(), interval 10 day)
)
order by rand()
limit 1;
Different databases may have different syntax for limit, rand(), and for the date/time functions. The above gives the structure of the query, but the functions may differ.
If you have a large amount of data and performance is a concern, there are other (more complicated) ways to take a random sample.
you could use TOP function for SQL Server
and for MYSQL you could use LIMIT function
Maybe this would help...
SELECT TOP number|percent column_name(s)
FROM table_name;
Source: http://www.w3schools.com/sql/sql_top.asp

Using limit in sqlite SQL statement in combination with order by clause

Will the following two SQL statements always produce the same result set?
1. SELECT * FROM MyTable where Status='0' order by StartTime asc limit 10
2. SELECT * FROM (SELECT * FROM MyTable where Status='0' order by StartTime asc) limit 10
Yes, but ordering subqueries is probably a bad habit to get into. You could feasibly add a further ORDER BY outside the subquery in your second example, e.g.
SELECT *
FROM (SELECT *
FROM Test
ORDER BY ID ASC
) AS A
ORDER BY ID DESC
LIMIT 10;
SQLite still performs the ORDER BY on the inner query, before sorting them again in the outer query. A needless waste of resources.
I've done an SQL Fiddle to demonstrate so you can view the execution plans for each.
No. First because the StartTime column may not have UNIQUE constraint. So, even the first query may not always produce the same result - with itself!
Second, even if there are never two rows with same StartTime, the answer is still negative.
The first statement will always order on StartTime and produce the first 10 rows. The second query may produce the same result set but only with a primitive optimizer that doesn't understand that the ORDER BY in the subquery is redundant. And only if the execution plan includes this ordering phase.
The SQLite query optimizer may (at the moment) not be very bright and do just that (no idea really, we'll have to check the source code of SQLite*). So, it may appear that the two queries are producing identical results all the time. Still, it's not a good idea to count on it. You never know what changes will be made in a future version of SQLite.
I think it's not good practice to use LIMIT without ORDER BY, in any DBMS. It may work now, but you never know how long these queries will be used by the application. And you may not be around when SQLite is upgraded or the DBMS is changed.
(*) #Gareth's link provides the execution plan which suggests that current SQLite code is dumb enough to execute the redundant ordering.

processing large table - how do i select the records page by page?

I need to do a process on all the records in a table. The table could be very big so I rather process the records page by page. I need to remember the records that have already been processed so there are not included in my second SELECT result.
Like this:
For first run,
[SELECT 100 records FROM MyTable]
For second run,
[SELECT another 100 records FROM MyTable]
and so on..
I hope you get the picture. My question is how do I write such select statement?
I'm using oracle btw, but would be nice if I can run on any other db too.
I also don't want to use store procedure.
Thank you very much!
Any solution you come up with to break the table into smaller chunks, will end up taking more time than just processing everything in one go. Unless the table is partitioned and you can process exactly one partition at a time.
If a full table scan takes 1 minute, it will take you 10 minutes to break up the table into 10 pieces. If the table rows are physically ordered by the values of an indexed column that you can use, this will change a bit due to clustering factor. But it will anyway take longer than just processing it in one go.
This all depends on how long it takes to process one row from the table of course. You could chose to reduce the load on the server by processing chunks of data, but from a performance perspective, you cannot beat a full table scan.
You are most likely going to want to take advantage of Oracle's stopkey optimization, so you don't end up with a full tablescan when you don't want one. There are a couple ways to do this. The first way is a little longer to write, but let's Oracle automatically figure out the number of rows involved:
select *
from
(
select rownum rn, v1.*
from (
select *
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rownum <= 200
)
where rn >= 101;
You could also achieve the same thing with the FIRST_ROWS hint:
select /*+ FIRST_ROWS(200) */ *
from (
select rownum rn, t.*
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rn between 101 and 200;
I much prefer the rownum method, so you don't have to keep changing the value in the hint (which would need to represent the end value and not the number of rows actually returned to the page to be accurate). You can set up the start and end values as bind variables that way, so you avoid hard parsing.
For more details, you can check out this post

SQLite3 (or general SQL) retrieve nth row of a query result

Quicky question on SQLite3 (may as well be general SQLite)
How can one retrieve the n-th row of a query result?
row_id (or whichever index) won't work on my case, given that the tables contain a column with a number. Based on some data, the query needs the data unsorted or sorted by asc/desc criteria.
But I may need to quickly retrieve, say, rows 2 & 5 of the results.
So other than implementing a sqlite3_step()==SQLITE_ROW with a counter, right now I have no idea on how to proceed with this.
And I don't like this solution very much because of performance issues.
So, if anyone can drop a hint that'd be highly appreciated.
Regards
david
add LIMIT 1 and OFFSET <n> to the query
example SELECT * FROM users LIMIT 1 OFFSET 5132;
The general approach is that, if you want only the nth row of m rows, use an appropriate where condition to only get that row.
If you need to get to a row and can't because no where criteria can get you there, your database has a serious design issue. It fails the first normal form, which states that "There's no top-to-bottom ordering to the rows."
But I may need to quickly retrieve, say, rows 2 & 5 of the results.
In scenario when you need non-continuous rows you could use ROW_NUMBER():
WITH cte AS (
SELECT *, ROW_NUMBER() OVER() AS rn --OVER(ORDER BY ...) --if specific order is required
FROM t
)
SELECT c
FROM cte
WHERE rn IN (2,5); -- row nums
db<>fiddle demo