How to SELECT RDBMS data for different values of Partition column - sql

I have an Oracle table as below:
CREATE TABLE "TABLE1"
(
"TABLE_ID" VARCHAR2(32 BYTE),
"TABLE_DATE" DATE,
"TABLE_NAME" VARCHAR2(2 BYTE)
)
PARTITION BY RANGE ("TABLE_DATE")
Guess this table has data partitioned by the TABLE_DATE column.
How can I use this partitioning column to fetch data faster from this table in a WHERE clause like ...
SELECT * FROM TABLE1 PARTITION (P1) p
WHERE p.TABLE_DATE > (SYSDATE - 90) ;

You should change your partitioning to match your queries, not change your queries to match your partitioning. In most cases we shouldn't have to specify which partitions to read from. Oracle can automatically determine how to prune the partitions at run time.
For example, with this table:
create table table1
(
table_id varchar2(32 byte),
table_date date,
table_name varchar2(2 byte)
)
partition by range (table_date)
(
partition p1 values less than (date '2019-05-06'),
partition p2 values less than (maxvalue)
);
There's almost never a need to directly reference a partition in the query. It's extra work, and if we list the wrong partition name the query won't work correctly.
We can see partition pruning in action using EXPLAIN PLAN like this:
explain plan for
SELECT * FROM TABLE1 p
WHERE p.TABLE_DATE > (SYSDATE - 90) ;
select *
from table(dbms_xplan.display);
In the results we can see the partitioning in the Pstart and Pstop columns. The KEY means that the partition will be determined at run time. In this case, the start partition is based on the value of SYSDATE.
Plan hash value: 434062308
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ITERATOR| | 1 | 30 | 2 (0)| 00:00:01 | KEY | 2 |
|* 2 | TABLE ACCESS FULL | TABLE1 | 1 | 30 | 2 (0)| 00:00:01 | KEY | 2 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("P"."TABLE_DATE">SYSDATE#!-90)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Related

Oracle - Hash Join Buffered Very Slow

I'm facing an issue during the merging of some data.
I've two tables:
CREATE TABLE tmp_table
(
TROWID ROWID NOT NULL
, NEW_FK1 NUMBER(10)
, NEW_FK2 NUMBER(10)
, CONSTRAINT TMP_TABLE_PK_1 PRIMARY KEY
(
TROWID
)
ENABLE
)
CREATE UNIQUE INDEX TMP_TABLE_PK_1 ON tmp_table (TROWID ASC)
CREATE TABLE my_table
(
M_ID NUMBER(10) NOT NULL
, M_FK1 NUMBER(10)
, M_FK2 NUMBER(10)
, M_START_DATE DATE NOT NULL
, M_END_DATE DATE
, M_DELETED NUMBER(1) NOT NULL
, M_CHECK1 NUMBER(1) NOT NULL
, M_CHECK2 NUMBER(1) NOT NULL
, M_CHECK3 NUMBER(1)
, M_CREATION_DATE DATE
, M_CREATION_USER NUMBER(10)
, M_UPDATE_DATE DATE
, M_UPDATE_USER NUMBER(10)
, CONSTRAINT MY_TABLE_PK_1 PRIMARY KEY
(
M_ID
)
ENABLE
)
CREATE UNIQUE INDEX TMP_TABLE_PK_1 ON my_table (M_ID ASC)
CREATE INDEX TMP_TABLE_IX_1 ON my_table (M_UPDATE_DATE ASC, M_FK2 ASC)
CREATE INDEX TMP_TABLE_IX_2 ON my_table (M_FK1 ASC, M_FK2 ASC)
The tmp_table is a temporary table where i stored only the records and informations that will be updated in my_table. That means tmp_table.TROWID is the rowid of my_table row that should be merged.
Total merged records should be: 94M on a total anount of my_table of 540M.
The query:
MERGE /*+parallel*/ INTO my_table m
USING (SELECT /*+parallel*/ * FROM tmp_table) t
ON (m.rowid = t.TROWID)
WHEN MATCHED THEN
UPDATE SET m.M_FK1 = t.M_FK1 , m.M_FK2 = t.M_FK2 , m.M_UPDATE_DATE = trunc(sysdate)
, m.M_UPDATE_USER = 0 , m.M_CREATION_USER = 0
The execution plan is:
Operation | Table | Estimated Rows |
MERGE STATEMENT | | |
- MERGE | my_table | |
-- PX CORDINATOR | | |
--- PX SENDER | | |
---- PX SEND QC (RANDOM) | | 95M |
----- VIEW | | |
------ HASH JOIN BUFFERED | | 95M |
------- PX RECEIVE | | 95M |
-------- PX SEND HASH | | 95M |
--------- PX BLOCK ITERATOR | | 95M |
---------- TABLE ACCESS FULL | tmp_table | 95M |
------- PX RECEIVE | | 540M |
-------- PX SEND HASH | | 540M |
--------- PX BLOCK ITERATOR | | 540M |
---------- TABLE ACCESS FULL | my_table | 540M |
In the above plan the most expensive op is the HASH JOIN BUFFERED.
For the two full scans I've seen that not require more of 5/6 minutes, instead for the hash join after 2h have reach 1% of the execution.
I've no idea how require that much time; any suggesitons?
EDIT
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 94M| 9719M| | 3027K (2)| 10:05:29 |
| 1 | MERGE | my_table | | | | | |
| 2 | VIEW | | | | | | |
|* 3 | HASH JOIN | | 94M| 7109M| 3059M| 3027K (2)| 10:05:29 |
| 4 | TABLE ACCESS FULL| tmp_table | 94M| 1979M| | 100K (2)| 00:20:08 |
| 5 | TABLE ACCESS FULL| my_table | 630M| 33G| | 708K (3)| 02:21:48 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("tmp_table"."TROWID"="m".ROWID)
You could do a number of things. Please check whether they are beneficial for your situation, as milage will vary.
1) Use only the columns of the target table you touch (by select or update):
MERGE
INTO (SELECT m_fk1, m_fk2, m_update_date, m_update_user, m_creation_user
FROM my_table) m
2) Use only the columns of the source table you need. In your case that's all columns, so there won't be any benefit:
MERGE
INTO (...) m
USING (SELECT trowid, new_fk1, new_fk2 FROM tmp_table) t
Both 1) and 2) will reduce the size of the storage needed for a hash join and will enable the optimizer to use an index over all the columns if available.
3) In your special case with ROWIDs, it seems to be very beneficial (at least in my tests) to sort the source table. If you sort the rowids, you will likely update rows in the same physical block together, which may be more performant:
MERGE
INTO (...) m
USING (SELECT ... FROM tmp_table ORDER BY trowid) t
4) As your source table is quite large, I guess that it's tablespace is distributed over a couple of datafiles. You can check this with the query
SELECT f, count(*) FROM (
SELECT dbms_rowid.rowid_relative_fno(trowid) as f from tmp_table
) GROUP BY f ORDER BY f;
If your target table uses more than a handful of datafiles, you could try to partition your temporary table by datafile:
CREATE TABLE tmp_table (
TROWID ROWID NOT NULL
, NEW_FK1 NUMBER(10)
, NEW_FK2 NUMBER(10)
, FNO NUMBER
) PARTITION BY RANGE(FNO) INTERVAL (1) (
PARTITION p0 VALUES LESS THAN (0)
);
You can fill the column FNO with the following statement:
dbms_rowid.rowid_relative_fno(rowid)
Now you can update datafile by datafile, reducing the required memory for the hash join. Get the list of file numbers with
SELECT DISTINCT fno FROM tmp_table;
14
15
16
17
and run the updates file by file:
MERGE
INTO (SELECT ... FROM my_table) m
USING (SELECT ... FROM tmp_table PARTITION FOR (14) ORDER BY trowid) t
and next PARTITION FOR (15) etc. The file numbers will obviously be different on your system.
5) Finally, try to use nested loops instead of a hash join. Usually the optimizer picks the better join plan, but I cannot resist trying it out:
MERGE /*+ USE_NL (m t) */
INTO (SELECT ... FROM my_table) m
USING (SELECT ... FROM tmp_table partition for (14) ORDER BY trowid) t
ON (m.rowid = t.TROWID)

How to make a schema dependant of other schema

Got a question and I don't find the answer, can someone help me ? here is the situation :
I have a schema, that is a template.
And I want to have 10 schemas of this template.
But I want that everytime I change the structure of the template schema, like making a new column, the column is created in all the schemas related to the template schema.
is this possible with Oracle ?
As the others said, it is not possible in Oracle to do this by default. BUT, if your on the latest versions (12.2 and higher), and don't mind paying for the multitenant option, you can look into something called application containers. This will trade your schemas in a single DB for the same schema but in different PDBs. Application containers allows you to define the schema in a parent PDB (including tables, views, triggers, ....) and have every modification propagated to the PDBs (you sync each PDB when you want).
But I want that everytime I change the structure of the template schema, like making a new column, the column is created in all the schemas related to the template schema.
is this possible with Oracle ?
No, it is not. You would need to separately create the column in the table owned by each individual user (a.k.a. schema).
As Justin Cave suggested, your problem is practically screaming for Oracle Partitioning.
If you do not have Partitioning licensed, there is still the old (but free!) approach of making a partitioned view.
In this approach, you would keep your historical tables unchanged (i.e., don't go back and add new columns to them). Instead, you would create a partitioned view that includes each historical table (concatenated together via UNION ALL). The view definition can provide values for newer columns that did not exist in the original table that year.
A partitioned view also has the benefits of
Making it easy to report across multiple years
"Partition pruning" -- skipping tables that are not of interest in a given query
Here is a walk through of the approach:
Create tables for 2019 and 2020 data
CREATE TABLE matt_data_2019
( id NUMBER NOT NULL,
creation_date DATE NOT NULL,
data_column1 NUMBER,
data_column2 VARCHAR2(200),
CONSTRAINT matt_data_2019 PRIMARY KEY ( id ),
CONSTRAINT matt_data_2019_c1 CHECK ( creation_date BETWEEN to_date('01-JAN-2019','DD-MON-YYYY') AND to_date('01-JAN-2020','DD-MON-YYYY') - interval '1' second )
);
CREATE TABLE matt_data_2020
( id NUMBER NOT NULL,
creation_date DATE NOT NULL,
data_column1 NUMBER,
data_column2 VARCHAR2(200),
data_column3 DATE, -- This is new for 2020
CONSTRAINT matt_data_2020 PRIMARY KEY ( id ),
CONSTRAINT matt_data_2020_c1 CHECK ( creation_date BETWEEN to_date('01-JAN-2020','DD-MON-YYYY') AND to_date('01-JAN-2021','DD-MON-YYYY') - interval '1' second )
);
Notice there is a new column for 2020 that does not exist in 2019.
Put some test data in to ensure accurate test results...
INSERT INTO matt_data_2019 ( id, creation_date, data_column1, data_column2 )
SELECT rownum id,
to_date('01-JAN-2019','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)), -- Some random date in 2019
dbms_random.value(0,1000),
lpad('2019',200,'X')
FROM dual
CONNECT BY rownum <= 100000;
INSERT INTO matt_data_2020 ( id, creation_date, data_column1, data_column2, data_column3 )
SELECT rownum id,
to_date('01-JAN-2020','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)), -- Some random date in 2020
dbms_random.value(0,1000),
lpad('2020',200,'X'),
to_date('01-JAN-2021','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)) -- Some random date in 2021
FROM dual
CONNECT BY rownum <= 100000;
Gather statistics on both tables for accurate test results ...
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'MATT_DATA_2019');
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'MATT_DATA_2020');
Create a view that includes all the tables.
You would need to modify this view every time a new table was created.
CREATE OR REPLACE VIEW matt_data_v AS
SELECT 2019 source_year,
id,
creation_date,
data_column1,
data_column2,
NULL data_column3 -- data_column3 did not exist in 2019
FROM matt_data_2019
UNION ALL
SELECT 2020 source_year,
id,
creation_date,
data_column1,
data_column2,
data_column3 -- data_column3 did not exist in 2019
FROM matt_data_2020;
Check how Oracle will process a query specifying a single year
EXPLAIN PLAN SET STATEMENT_ID = 'MM' FOR SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','MM'));
Plan hash value: 393585474
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110K| 15M| 620 (2)| 00:00:01 |
| 1 | VIEW | MATT_DATA_V | 110K| 15M| 620 (2)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MATT_DATA_2019 | 71238 | 9530K| 596 (2)| 00:00:01 |
| 5 | TABLE ACCESS FULL | MATT_DATA_2020 | 110K| 15M| 620 (2)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(NULL IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Hmmm, it looks like Oracle is still including the 2019 table...
... but it isn't. That NULL IS NOT NULL filter condition will cause Oracle to skip the 2019 table completely.
Prove that Oracle is skipping the 2019 table when we ask for 2020 data ...
alter session set statistics_level = ALL;
SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020;
-- Be sure to fetch entire result set (e.g., scroll to the end in SQL*Developer)
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null,
'ALLSTATS LAST'));
SQL_ID 1u3nwcnxs20jb, child number 0
-------------------------------------
SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020
Plan hash value: 393585474
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.21 | 5417 |
| 1 | VIEW | MATT_DATA_V | 1 | 110K| 100K|00:00:00.21 | 5417 |
| 2 | UNION-ALL | | 1 | | 100K|00:00:00.17 | 5417 |
|* 3 | FILTER | | 1 | | 0 |00:00:00.01 | 0 |
| 4 | TABLE ACCESS FULL| MATT_DATA_2019 | 0 | 71238 | 0 |00:00:00.01 | 0 |
| 5 | TABLE ACCESS FULL | MATT_DATA_2020 | 1 | 110K| 100K|00:00:00.09 | 5417 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(NULL IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
The results above show how Oracle skips the 2019 table when we don't ask for it.

How to improve performance of a JOIN of two SCD2 tables in Oracle SQL

I have two tables, both using valid to and valid from logic. Table 1 looks like this:
ID | VALID_FROM | VALID_TO
1 | 01.01.2000 | 04.01.2000
1 | 04.01.2000 | 16.01.2000
1 | 16.01.2000 | 17.01.2000
1 | 17.01.2000 | 19.01.2000
2 | 03.02.2001 | 04.04.2001
2 | 04.04.2001 | 14.03.2001
2 | 14.04.2001 | 18.03.2001
while table 2 looks like this:
ID | VAR | VALID_FROM | VALID_TO
1 | 3 | 01.01.2000 | 17.01.2000
1 | 2 | 17.01.2000 | 19.01.2000
2 | 4 | 03.02.2001 | 14.03.2001
Table 1 has 132,195,791 rows and table 2 has 16,964,846.
The valid from and valid to date of any observation in table 1 is within one or more valid from to valid to windows shown in table 2.
I created primary keys for both of them over ID and VALID_FROM
I want to do an inner join like:
select t1.*,
t2.var
from t1 t1
inner join t2 t2
on t1.id = t2.id
and t1.valid_from >= t2.valid_from
and t1.valid_to <= t2.valid_to;
This join is really slow. I ran it half a day without any success. What can I do to increase performance in this particular case? Please note that I also want to left join the resulting table in later stages. Any help is highly appreciated.
EDIT
Obviously, the information I gave was less then generally desired here on the platform.
I use Oracle Database 12c Enterprise Edition
The example I gave was illustrative for the bigger problem at hand. I
am concerned with joining information from different tables with
different valid_from / valid_to dates. For this I created a grid
first with the distinct values in the valid_from and valid_to
variables of all the relevant tables. This grid is what I refer here
to as table 1.
Results from the execution plan (I adjusted the column and table names to meet the terminology used in my illustrative example):
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 465M| 23G| | 435K (3)| 00:00:18 |
|* 1 | HASH JOIN | | 465M| 23G| 695M| 435K (3)| 00:00:18 |
| 2 | TABLE ACCESS FULL| TABLE2 | 16M| 501M| | 22961 (2)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TABLE1 | 132M| 3025M| | 145K (2)| 00:00:06 |
--------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
2 - SEL$58A6D7F6 / T2#SEL$1
3 - SEL$58A6D7F6 / T1#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_TO"<="T2"."VALID_TO" AND
"T1"."VALID_FROM">="T2"."VALID_FROM")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "T2"."ID"[VARCHAR2,20],
"T1"."ID"[VARCHAR2,20], "T1"."VALID_TO"[DATE,7],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7], "T1"."ID"[VARCHAR2,20],
"T1"."VALID_FROM"[DATE,7], "T1"."VALID_TO"[DATE,7], "T1"."VALID_FROM"[DATE,7]
2 - "T2"."ID"[VARCHAR2,20],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7]
3 - "T1"."ID"[VARCHAR2,20], "T1"."VALID_FROM"[DATE,7],
"T1"."VALID_TO"[DATE,7]
Note
-----
- this is an adaptive plan
A good practice is to ask first: what is expected the query will return?
Base on your WHERE predicate is seems you are interested on all versions from table2 that are included in the validity interval of table1. This may be intention, but more common you need all versions that intersect between the tables.
The second aspect is, do you need to see few first rows or all rows from the join.
If you only want to see few results, simple add AND t1.ID = nnnn to the WHERE clause to limit to some sample ID. If you have proper indexes (and tehre are no expreme lot of rows with this ID), you will get the result quick as NESTED LOOP join will kick in.
To perform the the full result, you must consider all rows from both tables. No index will help you to select all rows from a table - here is the FULL TABLE SCAN the best option.
To join the large row sets the best approach is HASH JOIN. NESTED LOOPS (which you probably use now) are quick to join few rows, but hangs on large row sets.
The smaller table (table2) is red in memory (hopefully) as a hash table. The larger table (table1) is probed against this hash table toperform the join.
This is the execution plan you should look for
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10T| 399T| | 190M(100)| 02:03:47 |
|* 1 | HASH JOIN | | 10T| 399T| 550M| 190M(100)| 02:03:47 |
| 2 | TABLE ACCESS FULL| SCD2 | 16M| 355M| | 39 (93)| 00:00:01 |
| 3 | TABLE ACCESS FULL| SCD1 | 132M| 2395M| | 211 (99)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_FROM">="T2"."VALID_FROM" AND
"T1"."VALID_TO"<="T2"."VALID_TO")
Provided you are on an enterprise database this should pass you from days to hours. Further you can deploy parallel option to get additional speed up.
Good luck!

Want to process 5000 records from the select query is taking long time in oracle database

Each time i want to process 5000 records like below.
First time i want to process records from 1 to 5000 rows.
second time i want to process records from 5001 to 10000 rows.
third time i want to process records from 10001 to 15001 rows like wise
I dont want to go for procedure or PL/SQL. I will change the rnum values in my code to fetch the 5000 records.
The given query is taking 3 minutes to fetch the records from 3 joined tables. How can i reduced the time to fetch the records.
select * from (
SELECT to_number(AA.MARK_ID) as MARK_ID, AA.SUPP_ID as supplier_id, CC.supp_nm as SUPPLIER_NAME, CC.supp_typ as supplier_type,
CC.supp_lock_typ as supplier_lock_type, ROW_NUMBER() OVER (ORDER BY AA.MARK_ID) as rnum
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL)
where rnum >=10001 and rnum<=15000;
I have tried below scenario but no luck.
I have tried the /*+ USE_NL(AA BB) */ hints.
I used exists in the where conditions. but its taking the same 3 minutes to fetch the records.
Below is the table details.
select count(*) from TABLE_B;
-----------------
2275
select count(*) from TABLE_A;
-----------------
2405276
select count(*) from TABLE_C;
-----------------
1269767
Result of my inner query total records is
SELECT count(*)
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL;
-----------------
2027055
All the used columns in where conditions are indexed properly.
Explain Table for the given query is...
Plan hash value: 3726328503
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 1 | VIEW | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 2 | WINDOW SORT PUSHED RANK | | 2082K| 166M| 200M| 85175 (1)| 00:17:03 |
|* 3 | HASH JOIN | | 2082K| 166M| | 44550 (1)| 00:08:55 |
| 4 | TABLE ACCESS FULL | TABLE_C | 1640 | 49200 | | 22 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 2082K| 107M| 27M| 44516 (1)| 00:08:55 |
|* 6 | VIEW | index$_join$_005 | 1274K| 13M| | 9790 (1)| 00:01:58 |
|* 7 | HASH JOIN | | | | | | |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | INDEX RANGE SCAN | TABLE_B_IN2 | 1274K| 13M| | 2371 (2)| 00:00:29 |
| 10 | INDEX FAST FULL SCAN| TABLE_B_IU1 | 1274K| 13M| | 4801 (1)| 00:00:58 |
|* 11 | TABLE ACCESS FULL | TABLE_A | 2356K| 96M| | 27174 (1)| 00:05:27 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=10001 AND "RNUM"<=15000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "A"."MARK_ID")<=15000)
3 - access("A"."SUPP_ID"="C"."LOC_ID" AND "A"."VALUE_KEY"="C"."VALUE_KEY")
5 - access("A"."MARK_ID"="A"."MARK_ID" AND "A"."VALUE_KEY"="A"."VALUE_KEY")
6 - filter("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
7 - access(ROWID=ROWID)
9 - access("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
11 - filter("A"."CHNL_ID"=160 AND "A"."VPR_ID" IS NOT NULL)
Could you please anyone help me on this to tune this query as i am trying from last 2 days?
Each query will take a long time because each query will have to join then sort all rows. The row_number analytic function can only return a result if the whole set has been read. This is highly inefficient. If the data set is large, you only want to sort and hash-join once.
You should fetch the whole set once, using batches of 5k rows. Alternatively, if you want to keep your existing code logic, you could store the result in a temporary table, for instance:
CREATE TABLE TMP AS <your above query>
CREATE INDEX ON TMP (rnum)
And then replace your query in your code by
SELECT * FROM TMP WHERE rnum BETWEEN :x AND :y
Obviously if your temp table is being reused periodically, just create it once and delete when done (or use a true temporary table).
How many unique MARK_ID values have you got in TABLE_A? I think you may get better performance if you limit the fetched ranges of records by MARK_ID instead of the artificial row number, because the latter is obviously not sargeable. Granted, you may not get exactly 5000 rows in each range but I have a feeling it's not as important as the query performance.
Firstly, giving obfuscated table names makes it nearly impossible to deduce anything about the data distributions and relationships between tables, so potential answerers are crippled from the start.
However, if every row in table_a matches one row in the other tables then you can avoid some of the usage of 200Mb of temporary disk space that is probably crippling performance by pushing the ranking down into an inline view or common table expression.
Monitor V$SQL_WORKAREA to check the exact amount of space being used for the window function, and if it is still excessive consider modifying the memory management to increase available sort area size.
Something like:
with cte_table_a as (
SELECT
to_number(MARK_ID) as MARK_ID,
SUPP_ID as supplier_id,
ROW_NUMBER() OVER (ORDER BY MARK_ID) as rnum
from
TABLE_A
where
char_id='160' and
VPR_ID IS NOT NULL)
select ...
from
cte_table_a aa,
TABLE_B BB,
TABLE_C CC
WHERE
aa.rnum >= 10001 and
aa.rnum <= 15000 and
AA.MARK_ID = BB.MARK_ID AND
AA.SUPP_ID = CC.location_id AND
BB.VALUE_KEY = AA.VALUE_KEY AND
BB.VALUE_KEY = CC.VALUE_KEY

Why won't Oracle use my index unless I tell it to?

I have an index:
CREATE INDEX BLAH ON EMPLOYEE(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4));
and an SQL STATEMENT:
SELECT COUNT(*)
FROM (SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
but it keeps doing a full table scan instead of using the index unless I add a hint.
EMPSHIRTNO is not the primary key, EMPNO is (which isn't used here).
Complex query
EXPLAIN PLAN FOR SELECT COUNT(*) FROM (SELECT COUNT(*) FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1712471557
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 24 (9)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 497 | | 24 (9)| 00:00:01 |
|* 3 | FILTER | | | | | |
----------------------------------------------------------------------------------
| 4 | HASH GROUP BY | | 497 | 2485 | 24 (9)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 49990 | 22 (0)| 00:00:01||
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>100)
17 rows selected.
ANALYZE INDEX BLAH VALIDATE STRUCTURE;
SELECT BTREE_SPACE, USED_SPACE FROM INDEX_STATS;
BTREE_SPACE USED_SPACE
----------- ----------
176032 150274
Simple query:
EXPLAIN PLAN FOR SELECT * FROM EMPLOYEE;
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2913724801
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9998 | 439K| 23 (5)| 00:00:01 |
| 1 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 439K| 23 (5)| 00:00:01 |
------------------------------------------------------------------------------
8 rows selected.
Maybe it is because the NOT NULL constraint is enforced via a CHECK constraint rather than being defined originally in the table creation statement? It will use the index when I do:
SELECT * FROM EMPLOYEE WHERE SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) = '1234';
For those suggesting that it needs to read all of the rows anyway (which I don't think it does as it is counting), the index is not used on this either:
SELECT SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) FROM EMPLOYEE;
In fact, putting an index on EMPSHIRTNO and performing SELECT EMPSHIRTNO FROM EMPLOYEE; does not use the index either. I should point out that EMPSHIRTNO is not unique, there are duplicates in the table.
Because of the nature of your query it needs to scan every row of the table anyway. So oracle is probably deciding that a full table scan is the most efficient way to do this. Because its using a HASH GROUP BY there is no nasty sort at the end like in oracle 7 days.
First get the count per SUBSTR(...) of shirt no. Its thus first part of the query which has to scan the entire table
SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
Next you want to discard the SUBSTR(...) where the count is <= 100. Oracle needs to scan all rows to verify this. Technically you could argue that once it has 101 it doesn't need any more, but I don't think Oracle can work this out, especially as you are asking it what the total numer is in the SELECT COUNT(*) of the subquery.
HAVING COUNT(*) > 100);
So basically to give you the answer you want Oracle needs to scan every row in the table, so an index is no help on filtering. Because its using a hash group by, the index is no help on the grouping either. So to use the index would just slow your query down, which is why Oracle is not using it.
I think you may need to build a function-based index on SUBSTR(TO_CHAR(EMPSHIRTNO), 1,4); Functions in your SQL have a tendency to invalidate regular indexes on a column.
I believe #Codo is correct. Oracle cannot determine that the expression will always be non-null, and then must assume that some nulls may not
be stored in the index.
(It seems like Oracle should be able to figure out that the expression is not nullable. In general, the chance of any random SUBSTR expression always being
not null is probably very low, maybe Oracle just lumps all SUBSTR expressions together?)
You can make the index usable for your query with one of these work-arounds:
--bitmap index:
create bitmap index blah on employee(substr(to_char(empshirtno), 1, 4));
--multi-column index:
alter table employee add constraint blah primary key (id, empshirtno);
--indexed virtual column:
create table employee(empshirtno varchar2(10) not null
,empshirtno_for_index as (substr(empshirtno,1,4)) not null );
create index blah on employee(empshirtno_for_index);