Query using Rownum and order by clause does not use the index - sql

I am using Oracle (Enterprise Edition 10g) and I have a query like this:
SELECT * FROM (
SELECT * FROM MyTable
ORDER BY MyColumn
) WHERE rownum <= 10;
MyColumn is indexed, however, Oracle is for some reason doing a full table scan before it cuts the first 10 rows. So for a table with 4 million records the above takes around 15 seconds.
Now consider this equivalent query:
SELECT MyTable.*
FROM
(SELECT rid
FROM
(SELECT rowid as rid
FROM MyTable
ORDER BY MyColumn
)
WHERE rownum <= 10
)
INNER JOIN MyTable
ON MyTable.rowid = rid
ORDER BY MyColumn;
Here Oracle scans the index and finds the top 10 rowids, and then uses nested loops to find the 10 records by rowid. This takes less than a second for a 4 million table.
My first question is why is the optimizer taking such an apparently bad decision for the first query above?
An my second and most important question is: is it possible to make the first query perform better. I have a specific need to use the first query as unmodified as possible. I am looking for something simpler than my second query above. Thank you!
Please note that for particular reasons I am unable to use the /*+ FIRST_ROWS(n) */ hint, or the ROW_NUMBER() OVER (ORDER BY column) construct.

If this is acceptable in your case, adding a WHERE ... IS NOT NULL clause will help the optimizer to use the index instead of doing a full table scan when using an ORDER BY clause:
SELECT * FROM (
SELECT * FROM MyTable
WHERE MyColumn IS NOT NULL
-- ^^^^^^^^^^^^^^^^^^^^
ORDER BY MyColumn
) WHERE rownum <= 10;
The rational is Oracle does not store NULL values in the index. As your query was originally written, the optimizer took the decision of doing a full table scan, as if there was less than 10 non-NULL values, it should retrieve some "NULL rows" to "fill in" the remaining rows. Apparently it is not smart enough to check first if the index contains enough rows...
With the added WHERE MyColumn IS NOT NULL, you inform the optimizer that you don't want in any circumstances any row having NULL in MyColumn. So it can blindly use the index without worrying about hypothetical rows having NULL in MyColumn.
For the same reason, declaring the ORDER BY column as NOT NULL should prevent the optimizer to do a full table scan. So, if you can change the schema, a cleaner option would be:
ALTER TABLE MyTable MODIFY (MyColumn NOT NULL);
See http://sqlfiddle.com/#!4/e3616/1 for various comparisons (click on view execution plan)

Related

SQL tuning, long running query + rownum

I have million record in database table having account no, address and many more columns. I want 100 rows in sorting with desc order, I used rownum for this, but the query is taking a long time to execute, since it scans the full table first make it in sorted order then apply the rownum.
What is the solution to minimize the query execution time?
For example:
select *
from
(select
acc_no, address
from
customer
order by
acc_no desc)
where
ROWNUM <= 100;
From past experience I found that the TOP works best for this scenario.
Also you should always select the columns you need only and avoid using the all card (*)
SELECT TOP 100 [acc_no], [address] FROM [customer] ORDER BY [acc_no] DESC
Useful resources about TOP, LIMIT and even ROWNUM.
https://www.w3schools.com/sql/sql_top.asp
Make sure you use index on acc_no column.
If you have an index already present on acc_no, check if that's being used during query execution or not by verifying the query execution plan.
To create a new index if not present, use below query :
Create index idx1 on customer(acc_no); -- If acc_no is not unique
Create unique index idx1 on customer(acc_no); -- If acc_no is unique. Note: Unique index is faster.
If in explain plan output you see "Full table scan", then it is a case that optimizer is not using the index.
Try with a hint first :
select /*+ index(idx1) */ * from
(select
acc_no, address
from
customer
order by
acc_no desc)
where
ROWNUM <= 100;
If the query with hint above returned results quickly, then you need to check why optimizer is ignoring your index deliberately. One probable reason for this is outdated statistics. Refresh the statistics.
Hope this helps.
Consider getting your top account numbers in an inner query / in-line view such that you only perform the joins on those 100 customer records. Otherwise, you could be performing all the joins on the million+ rows, then sorting the million+ results to get the top 100. Something like this may work.
select .....
from customer
where customer.acc_no in (select acc_no from
(select inner_cust.acc_no
from customer inner_cust
order by inner_cust.acc_no desc
)
where rownum <= 100)
and ...
Or, if you are using 12C you can use FETCH FIRST 100 ROWS ONLY
select .....
from customer
where customer.acc_no in (select inner_cust.acc_no
from customer inner_cust
order by inner_cust.acc_no desc
fetch first 100 rows only
)
and ...
This will give the result within 100ms, but MAKE SURE that there is index on column ACC_NO. There also can be combined index on ACC_NO+other colums, but ACC_NO MUST be on the first position in the index. You have to see "range scan" in execution plan. Not "full table scan", not "skip scan". You can probably see nested loops in execution plan (that will fetch ADDRESSes from table). You can improve speed even more by creating combined index for ACC_NO, ADDRESS (in this order). In such case Oracle engine does not have to read the table at all, because all the information is contained in the index. You can compare it in execution plan.
select top 100 acc_no, address
from customer
order by acc_no desc

Query with rownum got slow

I have got a table table with 21 millions records of which 20 millions meet the criterion col1= 'text'. I then started iteratively to set the value of col2 to a value unequal to NULL. After I have mutated 10 million records, the following query got slow, that was fast in the beginning:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC)
WHERE ROWNUM < 100;
I noticed that as soon as I remove the DESC, the whole order by clause ORDER BY T_PK DESC or the whole outer query with the condition WHERE ROWNUM < 100 it is fast again (fast means a couple of seconds, < 10s).
The execution plan looks as follows:
where the index full scan descending index is performed on the PK of the table. Besides the index on the PK, I have an index defined on col2.
What could be the reason that the query was fast and then got slow? How can I make the query fast regardless of how many records are already set to non-null value?
For this query:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC
) t
WHERE ROWNUM < 100;
The optimal index is table(col1, col2, t_pk).
I think the problem is that the optimizer has a choice of two indexes -- either for the where clause (col1 and -- probably -- col2) or one on t_pk. If you have a single index that handles both clauses, then performance should improve.
One reason that the DESC might make a difference is where the matching rows lie. If all the matching rows are in the first 100,000 rows of the table, then when you order descending, the query might have to throw out 20.9 million rows before finding a match.
I think Burleson explained this quite nicely:
http://www.dba-oracle.com/t_sql_tuning_rownum_equals_one.htm
Beware!
This use of rownum< can cause performance problems. Using rownum may change the all_rows optimizer mode for a query to first_rows, causing unexpected sub-optimal execution plans. One solution is to always include an all_rows hint when using rownum to perform a top-n query.

Which of these queries are faster, when used with SQL Server?

I want to check whether a table contains a row or not. Which is faster?
IF EXISTS(SELECT * FROM TABLE)
or
IF EXISTS(SELECT TOP 1 * FROM TABLE)
There is no difference between the queries!
The columns in the select don't get evaluated.
If you recall Logical Query processing, the from clause is executed first. The select clause is executed in the last step (actually Order By is, but that is a cosmetic thing).
So when the from clause gets executed, there are rows returned, regardless of the column names.
You have to add columnnames because otherwise you get syntax errors
IF EXISTS(SELECT 1 FROM TABLE)
is faster
there are some more suggestions
IF EXISTS(SELECT null FROM TABLE)
Obviously SELECT TOP 1 * FROM TABLE is faster.
since the index scan is reduced to one,number of rows return is one and
the estimated operation cost is also much less.
but if there is only on row in the table both the queries will show same operation cost.
SELECT *
FROM (SELECT top 1 *
FROM ms_data) temp
SELECT TOP 1 *
FROM ms_data
both the above queries have same operation cost.

Limit number of rows from join, in oracle

I apologize in advance for my long-winded question and if the formatting isn't up to par (newbie), here goes.
I have a table MY_TABLE with the following schema -
MY_ID | TYPE | REC_COUNT
1 | A | 1
1 | B | 3
2 | A | 0
2 | B | 0
....
The first column corresponds to an ID, the second is some type and 3rd some count. NOTE that the MY_ID column is not the primary key, there could be many records having the same MY_ID.
I want to write a stored procedure which will take an array of IDs and return the subset of them that match the following criteria -
the ID should match the MY_ID field of at least 1 record in the table and at least 1 matching record should not have TYPE = A OR REC_COUNT = 0.
This is the procedure I came up with -
PROCEDURE get_id_subset(
iIds IN ID_ARRAY,
oMatchingIds OUT NOCOPY ID_ARRAY
)
IS
BEGIN
SELECT t.column_value
BULK COLLECT INTO oMatchingIds
FROM TABLE(CAST(iIds AS ID_ARRAY)) t
WHERE EXISTS (
SELECT /*+ NL_SJ */ 1
FROM MY_TABLE m
WHERE (m.my_id = t.column_value)
AND (m.type != 'A' OR m.rec_count != 0)
);
END get_id_subset;
But I really care about performance and some IDs could match 1000s of records in the table. There is an index on the MY_ID and TYPE column but no index on the REC_COUNT column. So I was thinking if there are more than 1000 rows that have a matching MY_ID field then I'll just return the ID without applying the TYPE and REC_COUNT predicates. Here's this version -
PROCEDURE get_id_subset(
iIds IN ID_ARRAY,
oMatchingIds OUT NOCOPY ID_ARRAY
)
IS
BEGIN
SELECT t.column_value
BULK COLLECT INTO oMatchingIds
FROM TABLE(CAST(iIds AS ID_ARRAY)) t, MY_TABLE m
WHERE (m.my_id = t.column_value)
AND ( ((SELECT COUNT(m.my_id) FROM m WHERE 1) >= 1000)
OR EXISTS (m.type != 'F' OR m.rec_count != 0)
);
END get_id_subset;
But this doesn't compile, I get the following error on the inner select -
PL/SQL: ORA-00936: missing expression
Is there another way of writing this? The inner select needs to work on the joined table.
And to clarify, I'm OK with the result set being different for this query. My assumption is that since there is an index on the my_id column, doing count(*) would be much cheaper than actually applying the rec_count predicate to 10000s of rows since there is no index on that column. Am I wrong?
I don't see your second query as being much if any improvement over the first. At best, the first subquery has to hit 1000 matching records in order to determine if the count is less than 1000, so I don't think it will save lots of work. Also it changes the actual result, and it's not clear from your description if you're saying that's OK as long as it's more efficient. (And if it is OK, then the business logic is very unclear -- why do the other conditions matter at all, if they don't matter when there's lots of records?)
You ask, "will the group by be applied before or after the predicate". I'm not clear what part of the query you're talking about, but logically speaking the order is always
Where predicates
Group By
Having predicates
The optimizer can change the order in which things are actually evaluated, but the result must always be logically equivalent to the above order of evaluation (barring optimizer bugs).
1000s of records is really not that much. Have you actually encountered a case where performance of the first query is unacceptable?
For either query, it may be better to rewrite the correlated EXISTS subquery as a non-correlated IN subquery. You need to test this.
You need to show actual execution plans to get more useful feedback.
Edit
For the kind of short-circuiting you're talking about, I think you need to rewrite your subquery (from the initial version of the query) like this (sorry, my first attempt at this wouldn't work because I tried to access a column from the top-level table in a sub-sub-query):
WHERE EXISTS (
SELECT /*+ NL_SJ */ 1
FROM MY_TABLE m
WHERE (m.my_id = t.column_value)
AND rownum <= 1000
HAVING MAX( CASE WHEN m.type != 'A' OR m.rec_count != 0 THEN 1 ELSE NULL END ) I S NOT NULL
OR MAX(rownum) >= 1000
)
That should force it to hit no more than 1,000 records per id, then return a row if either at least one row matches the conditions on type and rec_count, or the 1,000-record limit was reached. If you view the execution plan, you should expect to see a COUNT STOPKEY operation, which shows that Oracle is going to stop running a query block after a certain number of rows are returned.

Please help me understand why a sub-query affects the main query's use of index

Here is the main query without a sub-query:
SELECT * FROM
mytable AS idx
WHERE
idx.ID IN (1,2,3)
AND idx.P1 = 'galleries';
The index on this table is id_path (ID,P1)
Everything is fine at this point, the index is used, 3 rows are examined and 2 are returned. Without the index 9 rows would have to be examined.
Now if i replace the list of IDs with a sub-query that returns exactly the same set of IDs,
the main query still returns the correct rows, but it stops using the index and does an examination of 9 rows as if the index never even existed.
SELECT * FROM
mytable AS idx
WHERE
idx.ID IN (SELECT idxrev.ID FROM mytable AS idxrev WHERE idxrev.ID IN (1,2,3))
AND idx.P1 = 'galleries';
My question is, why does this happen and what could i do to make the main query use the index as before. I tried adding USE INDEX (id_path) but that just made it even worse, doing a whole table scan.
SELECT *
FROM mytable AS idx
WHERE idx.ID IN
(
SELECT idxrev.ID
FROM mytable AS idxrev
WHERE idxrev.ID IN (1,2,3)
)
AND idx.P1 = 'galleries'
MySQL's only way to make semi-joins is nested loops.
It needs to take every row of idx and check it against idxrev (using the indexes for that).
Of course a better method in this case would be a HASH SEMI JOIN or just reducing your query to the original one, but MySQL is just not capable of it.
To make the query use the index, just revert to your original query :)
That's one of the great mysteries of MySQL; it doesn't cope well with subqueries. You could try to change the IN to an EXISTS which is sometimes faster. It looks a bit silly in this example because you still use the hardcoded list, but I think thats just for testing, right?
SELECT * FROM
mytable AS idx
WHERE
idx.ID EXISTS
(SELECT idxrev.ID
FROM mytable AS idxrev
WHERE
idxrev.ID = idx.ID AND
idxrev.ID IN (1,2,3))
AND idx.P1 = 'galleries';
If this doesn't help, maybe you could run two queries. First you get all the ids an put them in a comma separated list (using GROUP_CONCAT if you like). Then you build the second query by using that value.