nested query with two tables take more than 20 minutes - sql

Here is our query which took 1229.206 seconds to execute that SQL (returning 8310286 rows):
SELECT t_01.object_uid
FROM HashTable t_01
WHERE t_01.object_uid IN (SELECT t_02.puid
FROM ObjectTable t_02
WHERE (t_02.arev_category IN (48, 40)))
Plan hash value: 1560846306
------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)|
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 780K(100)|
| 1 | NESTED LOOPS SEMI | | 7764K| 244M| 780K (1)|
| 2 | INDEX FULL SCAN | PIHashTable | 7764K| 111M| 4073 (1)|
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| ObjectTable | 290M| 4986M| 1 (0)|
|* 4 | INDEX RANGE SCAN | PIOBJECTTABLE | 1 | | 1 (0)|
------------------------------------------------------------------------------------------------
table HashTable has 51154 blocks, last analyzed 2022/04/19 with 7764715 rows
index PIHashTable on HashTable (OBJECT_UID) last analyzed 2022/04/19 over 7764715 rows
table ObjectTable has 3327616 blocks, last analyzed 2022/05/02 with 290473386 rows
index PIPPOM_OBJECT on ObjectTable (PUID) last analyzed 2022/05/02 over 290473386 rows
Table ObjectTable has 290 million rows and Hashtable has 7 million rows.
Any way to optimize this?

It's most likely going to take a while to return 8M+ rows. You should implement paging instead of returning all of the rows at once. It looks like you're using Oracle so try this:
select *
from (
SELECT t_01.object_uid, row_number() over(order by t_01.object_uid) r
FROM HashTable t_01
WHERE t_01.object_uid IN (
SELECT t_02.puid FROM ObjectTable t_02 WHERE (t_02.arev_category IN (48,40))
)
)
where r between 1 and 1000;
Then you can get rows 1 - 1000 then 1001 - 2000 and so on. This has the added benefit of using less memory and CPU on the server per query but also returning data sooner that you (or the user) can act on while additional data is loaded in the background

If you implement this as a join and put an index on t_02.arev_category it will be very quick.
SELECT t_01.object_uid
FROM HashTable t_01
JOIN ObjectTable t_02 ON t_02.arev_category IN (48, 40) and t_02.puid = t_01.object_uid
If the object table can contain the same puid for both categories 40 and 48 then do it like this:
SELECT t_01.object_uid
FROM HashTable t_01
JOIN (SELECT DISTINCT puid
FROM ObjectTable
WHERE arev_category IN (48, 40)
) t_02 ON t_02.puid = t_01.object_uid

Related

where column is null taking longer time to execute

I am executing a select statement like the one below which is taking more than 6mins to execute.
select * from table where col1 is null;
whereas:
select * from table;
returns results in few seconds. The table contains 25million records. No indexes are used. there is a composite PK but not on the col used. Same query when executed on a different table with 50 million records, returns results in few seconds. only this table poses a problem.
Rebuilt the table to check if there was a miss, but still facing the same issue.
can some one help me here on why it is taking time?
datatype: VARCHAR2(40)
PLAN:
Plan hash value: 2838772322
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 794 | 60973 (16)| 00:00:03 |
|* 1 | TABLE ACCESS STORAGE FULL| table | 1 | 794 | 60973 (16)| 00:00:03 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("column" IS NULL)
filter("column" IS NULL)
select * from table;
Oracle SQL Developer tool has a default setting to display only 50 records unless it was manually edited. So the entire 25 million records will not be fetched as you don't need all the records for display.
select * from table where col1 is null;
But when you filter for null values, the entire set of 25 million rows has to be scanned to apply the filter and get your 81 records satisfying that predicate. Hence your second query takes longer.

Oracle TOP optimizing

I'm having trouble optimizing a statement. The corrisponding table (INTERVAL_TBL) contains about 11.000.000 rows which causes that statement to take round about 8 seconds on my test system. Even on a dedicated Oracle Server (24gb RAM, 17gb DB size) it takes about 4 - 5 seconds.
SELECT
ID, STATUS_ID, INTERVAL_ID, BEGIN_TS, END_TS, PT, DISPLAYTEXT, RSC
FROM
(
SELECT
INTERVAL_TBL.ID, INTERVAL_TBL.INTERVAL_ID, INTERVAL_TBL.STATUS_ID, INTERVAL_TBL.BEGIN_TS, INTERVAL_TBL.END_TS, INTERVAL_TBL.PT, ST_TBL.DISPLAYTEXT, ST_TBL.RSC,
RANK() OVER (ORDER BY BEGIN_TS DESC) MY_RANK
FROM INTERVAL_TBL
INNER JOIN ST_TBL ON ST_TBL.STATUS_ID = INTERVAL_TBL.STATUS_ID
WHERE ID = '<id>'
)
WHERE MY_RANK <= 10
First of all I'd like to know if there is a way to optimize the Statement (Select most recent rows ordered by BEGIN).
Second I'd like to know if someone can make suggestions for an Index based on the statement.
EDIT:
Explain Plan:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 525K| 79M| 58469 (1)| 00:00:03 |
|* 1 | HASH JOIN | | 525K| 79M| 58469 (1)| 00:00:03 |
| 2 | TABLE ACCESS FULL| ST_TBL | 46 | 2438 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| INTERVAL_TBL | 525K| 52M| 58464 (1)| 00:00:03 |
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ST_TBL"."STATUS_ID"="INTERVAL_TBL"."STATUS_ID")
3 - filter("INTERVAL_TBL"."ID"='aef830a6-275b-4713-90da-9135f3f91a32'")
Rows in INTERVAL_TBL: 10.673.122
Rows in ST_TBL: 46
Rows in joined subset: 10.673.122
Rows in joined subset with filter on ID: 530.073
Ideally it would get down to about some milliseconds. Thats what that statement takes in MS SQL Server with 10.000.000 rows.
First of all I would perform the inner join to ST_TBL in the outer scope, I got an speed up with this. It reduces the Cost, IO Cost an Bytes in my Execution Plan comparing both variants.
I assume that columns PT, DISPLAYTEXT and RSC are part of the table ST_TBL.
SELECT
ID, STATUS_ID, INTERVAL_ID, BEGIN_TS, END_TS, SRESULT.PT, SRESULT.DISPLAYTEXT, SRESULT.RSC
FROM
(
SELECT
ID, INTERVAL_ID, INTERVAL_TBL.STATUS_ID, BEGIN_TS, END_TS,
RANK() OVER (ORDER BY BEGIN_TS DESC) MY_RANK
FROM INTERVAL_TBL
WHERE ID = '<id>'
) SRESULT
INNER JOIN ST_TBL ON ST_TBL.STATUS_ID = SRESULT.STATUS_ID
WHERE MY_RANK <= 10

Oracle SQL query running slow, full table scan on primary key, why?

I have a problem with a piece of code, I can't understand why the below query is doing a full table scan on the works table when wrk.cre_surr_id is the primary key. The stats on both tables are both up to date below are the indexes on both tables.
TABLE INDEXES
WORKS
INDEX NAME UNIQUE LOGGING COLUMN NAME ORDER
WRK_I1 N NO LOGICALLY_DELETED_Y Asc
WRK_ICE_WRK_KEY N YES ICE_WRK_KEY Asc
WRK_PK Y NO CRE_SURR_ID Asc
WRK_TUNECODE_UK Y NO TUNECODE Asc
TLE_TITLE_TOKENS
INDEX NAME UNIQUE LOGGING COLUMN NAME ORDER
TTT_I1 N YES TOKEN_TYPE, Asc
SEARCH_TOKEN,
DN_WRK_CRE_SURR_ID
TTT_TLE_FK_1 N YES TLE_SURR_ID
Problem query below. It has a cost of 245,876 which seems high, it's doing a FULL TABLE SCAN of the WORKS table which has 21,938,384 rows in the table. It is doing an INDEX RANGE SCAN of the TLE_TITLE_TOKENS table which has 19,923,002 rows in it. On the explain plan also is an INLIST ITERATOR which I haven't a clue what it means but it I think it's to do with having an "in ('E','N')" in my sql query.
SELECT wrk.cre_surr_id
FROM works wrk,
tle_title_tokens ttt
WHERE ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id
AND wrk.logically_deleted_y IS NULL
AND ttt.token_type in ('E','N')
AND ttt.search_token LIKE 'BELIEVE'||'%'
When I break the query down and do a simple select from the TLE_TITLE_TOKENS table I get 280,000 records back.
select ttt.dn_wrk_cre_surr_id
from tle_title_tokens ttt
where ttt.token_type in ('E','N')
and ttt.search_token LIKE 'BELIEVE'||'%'
How do I stop it doing a FULL TABLE scan on the WORKS table. I could put a hint on the query but I would have thought Oracle would be clever enough to know to use the index without a hint.
Also on TLE_TITLE_TOKENS table would it be better to create a fuction based index on the column SEARCH_TOKEN as users seem to do LIKE % searches on this field. What would that fuction based index look like.
I'm running on an Oracle 11g database.
Thanks in Advance to any answers.
First, rewrite the query using a join:
SELECT wrk.cre_surr_id
FROM tle_title_tokens ttt JOIN
works wrk
ON ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id
WHERE wrk.logically_deleted_y IS NULL AND
ttt.token_type in ('E', 'N') AND
ttt.search_token LIKE 'BELIEVE'||'%';
You should be able to speed this query by using indexes. It is not clear what the best index is. I would suggest either tle_title_tokens(search_token, toekn_type, dn_wrk_cre_surr_id) and works(cre_surr_id, logically_deleted_y).
Another possibility is to write the query using EXISTS, such as:
SELECT wrk.cre_surr_id
FROM works wrk
WHERE wrk.logically_deleted_y IS NULL AND
EXISTS (SELECT 1
FROM tle_title_tokens ttt
WHERE ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id AND
ttt.token_type IN ('N', 'E') AND
ttt.search_token LIKE 'BELIEVE'||'%'
) ;
For this version, you want indexes on works(logically_deleted_y, cre_surr_id) and tle_title_tokens(dn_wrk_cre_surr_id, token_type, search_token).
try this:
SELECT /*+ leading(ttt) */ wrk.cre_surr_id
FROM works wrk,
tle_title_tokens ttt
WHERE ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id
AND wrk.logically_deleted_y IS NULL
AND ttt.token_type in ('E','N')
AND ttt.search_token LIKE 'BELIEVE'||'%'
Out of the 19,923,002 rows in LE_TITLE_TOKENS,
How many records have TOKEN_TYPE 'E', how many have 'N'? Are there any other TokenTypes? If yes, then how many are they put together?
If E and N put together forms a small part of the total records, then check if histogram statistics are updated for that column.
The execution plan depends on how many records are being selected from LE_TITLE_TOKENS out of the 20M records for the given filters.
I'm assuming this index definition
create index works_idx on works (cre_surr_id,logically_deleted_y);
create index title_tokens_idx on tle_title_tokens(search_token,token_type,dn_wrk_cre_surr_id);
There are typically two possible scenarios to execute the join
NESTED LOOPS which access the inner table WORKS using index, but repeatedly in a loop for each row in the outer table
HASH JOIN which access the WORKS using FULL SCAN but only once.
It is not possible to say that one option is bad and the other good.
Nested loops is better if there are only few row in the outer table (few loops), but with increasing number of records in the outer table (TOKEN) gets slower and
slower and at some number of row the HASH JOIN is bettwer.
How to see what execution plan is better? Simple force Oracle using hint to run both scanarios and compare the elapsed time.
In your case you should see those two execution plans
HASH JOIN
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 207K| 10M| | 2439 (1)| 00:00:30 |
|* 1 | HASH JOIN | | 207K| 10M| 7488K| 2439 (1)| 00:00:30 |
|* 2 | INDEX RANGE SCAN | TITLE_TOKENS_IDX | 207K| 5058K| | 29 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| WORKS | 893K| 22M| | 431 (2)| 00:00:06 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TTT"."DN_WRK_CRE_SURR_ID"="WRK"."CRE_SURR_ID")
2 - access("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%')
filter("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%' AND ("TTT"."TOKEN_TYPE"='E' OR
"TTT"."TOKEN_TYPE"='N'))
3 - filter("WRK"."LOGICALLY_DELETED_Y" IS NULL)
NESTED LOOPS
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 207K| 10M| 414K (1)| 01:22:56 |
| 1 | NESTED LOOPS | | 207K| 10M| 414K (1)| 01:22:56 |
|* 2 | INDEX RANGE SCAN| TITLE_TOKENS_IDX | 207K| 5058K| 29 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| WORKS_IDX | 1 | 26 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%')
filter("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%' AND
("TTT"."TOKEN_TYPE"='E' OR "TTT"."TOKEN_TYPE"='N'))
3 - access("TTT"."DN_WRK_CRE_SURR_ID"="WRK"."CRE_SURR_ID" AND
"WRK"."LOGICALLY_DELETED_Y" IS NULL)
My gues is the (with 280K loops) the hash join (i.e. FULLTABLE SCAN) will be bettwer, but it could be that you recognise that nested loops should be used.
In this case the optimize doesn't correct recognise the switching point between nested loops and hash join.
Common cause of this is wrong or missing system statistics or improper optimizer parameters.

Performance of a sql query

I'm exectuting command on large table. It has around 7 millons rows.
Command is like this:
select * from mytable;
Now I'm restriting the number of rows to around 3 millons. I'm using this command:
select * from mytable where timest > add_months( sysdate, -12*4 )
I have an index on timest column. But the costs are almost same. I would expect they will decrease. What am I doing wrong?
Any clue?
Thank you in advance!
Here explain plans:
using an index for 3 out of 7 mio. of rows would most probably be even more expensive, so oracle makes a full table scan for both queries, which is IMO correct.
You may try to do parallel FTS (Full Table Scan) - it should be faster, BUT it will put your Oracle server under higher load, so don't do it on heavy loaded multiuser DBs.
Here is an example:
select /*+full(t) parallel(t,4)*/ *
from mytable t
where timest > add_months( sysdate, -12*4 );
To select a very small number of records from a table use index. To select a non-trivial part use partitioning.
In your case an effective acces would be enabled with range partitioning on timest column.
The big advantage is that only relevant partitions are accessed.
Here an exammple
create table test(ts date, s varchar2(4000))
PARTITION BY RANGE (ts)
(PARTITION t1p1 VALUES LESS THAN (TO_DATE('2010-01-01', 'YYYY-MM-DD')),
PARTITION t1p2 VALUES LESS THAN (TO_DATE('2015-01-01', 'YYYY-MM-DD')),
PARTITION t1p4 VALUES LESS THAN (MAXVALUE)
);
Query
select * from test where ts < to_date('2009-01-01','yyyy-mm-dd');
will access only the paartition 1, i.e. only before '2010-01-01'.
See pstart an dpstop in execution plan
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 10055 | 9 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
|* 2 | TABLE ACCESS FULL | TEST | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("TS"<TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
There are (at least) two problems.
add_months( sysdate, -12*4 ) is a function. It's not just a constant, so optimizer can't use index here.
Choosing 3 mln. from 7 mln. rows by index is not good idea anyway. Yes, you (would) go by index tree fast, yet each time you have to go to the heap (cause you need * = all rows). That means that there's no sense in using this index.
Thus the index plays no role here.

How to tune a range / interval query in Oracle?

I have a table A with intervals (COL1, COL2):
CREATE TABLE A (
COL1 NUMBER(15) NOT NULL,
COL2 NUMBER(15) NOT NULL,
VAL1 ...,
VAL2 ...
);
ALTER TABLE A ADD CONSTRAINT COL1_BEFORE_COL2 CHECK (COL1 <= COL2);
The intervals are guaranteed to be "exclusive", i.e. they will never overlap. In other words, this query yields no rows:
SELECT *
FROM (
SELECT
LEAD(COL1, 1) OVER (ORDER BY COL1) NEXT,
COL2
FROM A
)
WHERE COL2 >= NEXT;
There is currently an index on (COL1, COL2). Now, my query is the following:
SELECT /*+FIRST_ROWS(1)*/ *
FROM A
WHERE :some_value BETWEEN COL1 AND COL2
AND ROWNUM = 1
This performs well (less than a ms for millions of records in A) for low values of :some_value, because they're very selective on the index. But it performs quite badly (almost a second) for high values of :some_value because of a lower selectivity of the access predicate.
The execution plan seems good to me. As the existing index already fully covers the predicate, I get the expected INDEX RANGE SCAN:
------------------------------------------------------
| Id | Operation | Name | E-Rows |
------------------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | COUNT STOPKEY | | |
| 2 | TABLE ACCESS BY INDEX ROWID| A | 1 |
|* 3 | INDEX RANGE SCAN | A_PK | |
------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
3 - access("VAL2">=:some_value AND "VAL1"<=:some_value)
filter("VAL2">=:some_value)
In 3, it becomes obvious that the access predicate is selective only for low values of :some_value whereas for higher values, the filter operation "kicks in" on the index.
Is there any way to generally improve this query to be fast regardless of the value of :some_value? I can completely redesign the table if further normalisation is needed.
Your attempt is good, but misses a few crucial issues.
Let's start slowly. I'm assuming an index on COL1 and I actually don't mind if COL2 is included there as well.
Due to the constraints you have on your data (especially non-overlapping) you actually just want the row before the row where COL1 is <= some value....[--take a break--] it you order by COL1
This is a classic Top-N query:
select *
FROM ( select *
from A
where col1 <= :some_value
order by col1 desc
)
where rownum <= 1;
Please note that you must use ORDER BY to get a definite sort order. As WHERE is applied after ORDER BY you must now also wrap the top-n filter in an outer query.
That's almost done, the only reason why we actually need to filter on COL2 too is to filter out records that don't fall into the range at all. E.g. if some_value is 5 and you are having this data:
COL1 | COL2
1 | 2
3 | 4 <-- you get this row
6 | 10
This row would be correct as result, if COL2 would be 5, but unfortunately, in this case the correct result of your query is [empty set]. That's the only reason we need to filter for COL2 like this:
select *
FROM ( select *
FROM ( select *
from A
where col1 <= :some_value
order by col1 desc
)
where rownum <= 1
)
WHERE col2 >= :some_value;
Your approach had several problems:
missing ORDER BY - dangerous in connection with rownum filter!
applying the Top-N clause (rownum filter) too early. What if there is no result? Database reads index until the end, the rownum (STOPKEY) never kicks in.
An optimizer glitch. With the between predicate, my 11g installation doesn't come to the idea to read the index in descending order, so it was actually reading it from the beginning (0) upwards until it found a COL2 value that matched --OR-- the COL1 run out of the range.
.
COL1 | COL2
1 | 2 ^
3 | 4 | (2) go up until first match.
+----- your intention was to start here
6 | 10
What was actually happening was:
COL1 | COL2
1 | 2 +----- start at the beginning of the index
3 | 4 | Go down until first match.
V
6 | 10
Look at the execution plan of my query:
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 4 (0)| 00:00:01 |
|* 1 | VIEW | | 1 | 26 | 4 (0)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | VIEW | | 2 | 52 | 4 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID | A | 50000 | 585K| 4 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN DESCENDING| SIMPLE | 2 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Note the INDEX RANGE SCAN **DESCENDING**.
Finally, why didn't I include COL2 in the index? It's a single row-top-n query. You can save at most a single table access (irrespective of what the Rows estimation above says!) If you expect to find a row in most cases, you'll need to go to the table anyways for the other columns (probably) so you would not save ANYTHING, just consume space. Including the COL2 will only improve performance if you query doesn't return anything at all!
Related:
How to use index efficienty in mysql query
I answered a very similar question about this years ago. Same solution.
Use The Index, Lukas!
I think, because the ranges do not intersect, you can define col1 as primary key and execute the query like this:
SELECT *
FROM a
JOIN
(SELECT MAX (col1) AS col1
FROM a
WHERE col1 <= :somevalue) b
ON a.col1 = b.col1;
If there are gaps between the ranges you wil have to add:
Where col2 >= :somevalue
as last line.
Execution Plan:
SELECT STATEMENT
NESTED LOOPS
VIEW
SORT AGGREGATE
FIRST ROW
INDEX RANGE SCAN (MIN/MAX) PKU1
TABLE ACCESS BY INDEX A
INDEX UNIQUE SCAN PKU1
Maybe changing this heap table to IOT table would give better performance.
I didn't generate sample data to test this but you might want to give it a try.
ALTER TABLE A ADD COL3 NUMBER(15);
UPDATE A SET COL3 = COL2 - COL1;
Create index on COL3.
SELECT /*+FIRST_ROWS(1)*/ *
FROM A
WHERE :some_value < COL3
AND ROWNUM = 1;