SQL tuning, long running query + rownum - sql

I have million record in database table having account no, address and many more columns. I want 100 rows in sorting with desc order, I used rownum for this, but the query is taking a long time to execute, since it scans the full table first make it in sorted order then apply the rownum.
What is the solution to minimize the query execution time?
For example:
select *
from
(select
acc_no, address
from
customer
order by
acc_no desc)
where
ROWNUM <= 100;

From past experience I found that the TOP works best for this scenario.
Also you should always select the columns you need only and avoid using the all card (*)
SELECT TOP 100 [acc_no], [address] FROM [customer] ORDER BY [acc_no] DESC
Useful resources about TOP, LIMIT and even ROWNUM.
https://www.w3schools.com/sql/sql_top.asp

Make sure you use index on acc_no column.
If you have an index already present on acc_no, check if that's being used during query execution or not by verifying the query execution plan.
To create a new index if not present, use below query :
Create index idx1 on customer(acc_no); -- If acc_no is not unique
Create unique index idx1 on customer(acc_no); -- If acc_no is unique. Note: Unique index is faster.
If in explain plan output you see "Full table scan", then it is a case that optimizer is not using the index.
Try with a hint first :
select /*+ index(idx1) */ * from
(select
acc_no, address
from
customer
order by
acc_no desc)
where
ROWNUM <= 100;
If the query with hint above returned results quickly, then you need to check why optimizer is ignoring your index deliberately. One probable reason for this is outdated statistics. Refresh the statistics.
Hope this helps.

Consider getting your top account numbers in an inner query / in-line view such that you only perform the joins on those 100 customer records. Otherwise, you could be performing all the joins on the million+ rows, then sorting the million+ results to get the top 100. Something like this may work.
select .....
from customer
where customer.acc_no in (select acc_no from
(select inner_cust.acc_no
from customer inner_cust
order by inner_cust.acc_no desc
)
where rownum <= 100)
and ...
Or, if you are using 12C you can use FETCH FIRST 100 ROWS ONLY
select .....
from customer
where customer.acc_no in (select inner_cust.acc_no
from customer inner_cust
order by inner_cust.acc_no desc
fetch first 100 rows only
)
and ...

This will give the result within 100ms, but MAKE SURE that there is index on column ACC_NO. There also can be combined index on ACC_NO+other colums, but ACC_NO MUST be on the first position in the index. You have to see "range scan" in execution plan. Not "full table scan", not "skip scan". You can probably see nested loops in execution plan (that will fetch ADDRESSes from table). You can improve speed even more by creating combined index for ACC_NO, ADDRESS (in this order). In such case Oracle engine does not have to read the table at all, because all the information is contained in the index. You can compare it in execution plan.
select top 100 acc_no, address
from customer
order by acc_no desc

Related

Informix - Efficiently find 10 most recent calls

I have a table that has data like calling number and timestamp. I'd like find the ten most recent unique calls. This SQL query works:
SELECT first 10 t.originatordn
FROM
(SELECT DISTINCT a.originatordn,a.startdatetime AS time
FROM contactcalldetail a
WHERE originatordn <> '') t
ORDER BY t.time DESC
The problem is this table has over 4 million records so it is very slow. Is there a better way to do this query?
Without an index I don't see how to make it fast. Therefore, I suggest creating an index if you don't have it already and run a simpler query that the SQL optimizer can execute by using an Index Only Scan.
create index ix1 on contactcalldetail (startdatetime, originatordn)
select
distinct originatordn as calling_number
from contactcalldetail
where originatordn <> ''
order by a.startdatetime desc
limit 10

Query with rownum got slow

I have got a table table with 21 millions records of which 20 millions meet the criterion col1= 'text'. I then started iteratively to set the value of col2 to a value unequal to NULL. After I have mutated 10 million records, the following query got slow, that was fast in the beginning:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC)
WHERE ROWNUM < 100;
I noticed that as soon as I remove the DESC, the whole order by clause ORDER BY T_PK DESC or the whole outer query with the condition WHERE ROWNUM < 100 it is fast again (fast means a couple of seconds, < 10s).
The execution plan looks as follows:
where the index full scan descending index is performed on the PK of the table. Besides the index on the PK, I have an index defined on col2.
What could be the reason that the query was fast and then got slow? How can I make the query fast regardless of how many records are already set to non-null value?
For this query:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC
) t
WHERE ROWNUM < 100;
The optimal index is table(col1, col2, t_pk).
I think the problem is that the optimizer has a choice of two indexes -- either for the where clause (col1 and -- probably -- col2) or one on t_pk. If you have a single index that handles both clauses, then performance should improve.
One reason that the DESC might make a difference is where the matching rows lie. If all the matching rows are in the first 100,000 rows of the table, then when you order descending, the query might have to throw out 20.9 million rows before finding a match.
I think Burleson explained this quite nicely:
http://www.dba-oracle.com/t_sql_tuning_rownum_equals_one.htm
Beware!
This use of rownum< can cause performance problems. Using rownum may change the all_rows optimizer mode for a query to first_rows, causing unexpected sub-optimal execution plans. One solution is to always include an all_rows hint when using rownum to perform a top-n query.

Query using Rownum and order by clause does not use the index

I am using Oracle (Enterprise Edition 10g) and I have a query like this:
SELECT * FROM (
SELECT * FROM MyTable
ORDER BY MyColumn
) WHERE rownum <= 10;
MyColumn is indexed, however, Oracle is for some reason doing a full table scan before it cuts the first 10 rows. So for a table with 4 million records the above takes around 15 seconds.
Now consider this equivalent query:
SELECT MyTable.*
FROM
(SELECT rid
FROM
(SELECT rowid as rid
FROM MyTable
ORDER BY MyColumn
)
WHERE rownum <= 10
)
INNER JOIN MyTable
ON MyTable.rowid = rid
ORDER BY MyColumn;
Here Oracle scans the index and finds the top 10 rowids, and then uses nested loops to find the 10 records by rowid. This takes less than a second for a 4 million table.
My first question is why is the optimizer taking such an apparently bad decision for the first query above?
An my second and most important question is: is it possible to make the first query perform better. I have a specific need to use the first query as unmodified as possible. I am looking for something simpler than my second query above. Thank you!
Please note that for particular reasons I am unable to use the /*+ FIRST_ROWS(n) */ hint, or the ROW_NUMBER() OVER (ORDER BY column) construct.
If this is acceptable in your case, adding a WHERE ... IS NOT NULL clause will help the optimizer to use the index instead of doing a full table scan when using an ORDER BY clause:
SELECT * FROM (
SELECT * FROM MyTable
WHERE MyColumn IS NOT NULL
-- ^^^^^^^^^^^^^^^^^^^^
ORDER BY MyColumn
) WHERE rownum <= 10;
The rational is Oracle does not store NULL values in the index. As your query was originally written, the optimizer took the decision of doing a full table scan, as if there was less than 10 non-NULL values, it should retrieve some "NULL rows" to "fill in" the remaining rows. Apparently it is not smart enough to check first if the index contains enough rows...
With the added WHERE MyColumn IS NOT NULL, you inform the optimizer that you don't want in any circumstances any row having NULL in MyColumn. So it can blindly use the index without worrying about hypothetical rows having NULL in MyColumn.
For the same reason, declaring the ORDER BY column as NOT NULL should prevent the optimizer to do a full table scan. So, if you can change the schema, a cleaner option would be:
ALTER TABLE MyTable MODIFY (MyColumn NOT NULL);
See http://sqlfiddle.com/#!4/e3616/1 for various comparisons (click on view execution plan)

ROW_NUMBER() execution plan

Please consider this query:
SELECT num,
*
FROM (
SELECT OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate,
ShippedDate,
ROW_NUMBER()
OVER(ORDER BY OrderID) AS num
FROM Orders
) AS numbered
WHERE NUM BETWEEN 0AND 100
when I execute this query and get the execution plan, it's like this:
I want to know
1) What steps SQL Server 2008 pass to add ROW_NUMBER() in a query?
2) Why in first step in Execution plan we have Clustered Index Scan?
3) Why filtering cost is 2%? I mean why for getting appropriate data sql server does not perform a table scan? Does ROW_NUMBER() cause creating an index?
The Segment/Sequence Project portions of the plan relate to the use of ROW_NUMBER().
You have a clustered index scan because there is no WHERE clause on your inner SELECT, hence all rows of the table have to be returned.
The Filter relates to the WHERE clause on the outer SELECT.
That "Compute Scalar" part of the query is the row_number being created.
Because you're selecting every row from Orders, then numbering it, then selecting 1-100. That's a table (or in this case a clustered index) scan anyway you slice it.
No, indexes aren't created on the fly. It's gotta check the rows because the set doesn't come back ordered in your subquery.

How can I optimize this query?

I've got a bit of a nasty query with several subselects that are really slowing it down. I'm already caching the query, but the results of it changes often and the query results are meant to be shown on a high traffic page.
SELECT user_id, user_id AS uid, (SELECT correct_words
FROM score
WHERE user_id = `uid`
ORDER BY correct_words DESC, incorrect_words ASC
LIMIT 0, 1) AS correct_words,
(SELECT incorrect_words
FROM score
WHERE user_id = `uid`
ORDER BY correct_words DESC, incorrect_words ASC
LIMIT 0, 1) AS incorrect_words
FROM score
WHERE user_id > 0
AND DATE(date_tested) = DATE(NOW())
GROUP BY user_id
ORDER BY correct_words DESC,incorrect_words ASC
LIMIT 0,7
The goal of the query is to pick out the top score for users for that day, but only show the highest scoring instance of that user instead of all of their scores (So, for instance, if one user actually had 4 of the top 10 scores for that day, I only want to show that user's top score and remove the rest)
Try as I might, I've yet to replicate the results of this query any other way. Right now its average run time is about 2 seconds, but I'm afraid that might increase greatly as the table gets bigger.
Any thoughts?
try this:
The subquery basically returns the resultset of all the scores in the right order, and the outer query greps out the first occurence. When grouping in MySQL, columns that are not grouped on return the equivalent to FIRST(column): the value of the first occurence.
SELECT user_id, correct_words, incorrect_words
FROM
( SELECT user_id, correct_words, incorrect_words
FROM score
WHERE user_id>0
AND DATE(date_tested)=DATE(NOW())
ORDER BY correct_words DESC,incorrect_words ASC
)
GROUP BY user_id
LIMIT 0,7
The subqueries for correct_words and incorrect_words could be really killing your performance. In the worst case, MySQL has to execute those queries for each row it considers (not each row that it returns!). Rather than using scalar subqueries, consider rewriting your query to use JOIN-variants as appropriate.
Additionally, filtering by DATE(date_tested)=DATE(NOW()) may be preventing MySQL from using an index. I don't believe any of the production versions of MySQL allow function-based indices.
Make sure you have indices on all the columns you filter and order by. MySQL can make use of multi-column indices if the columns filtered or ordered by match your query, e.g. CREATE INDEX score_correct_incorrect_idx ON score ( correct_words DESC, incorrect_words ASC ); would be a candidate index, though MySQL may choose not to use it depending on the execution plan it creates and its estimates of table sizes.