I'm facing an issue during the merging of some data.
I've two tables:
CREATE TABLE tmp_table
(
TROWID ROWID NOT NULL
, NEW_FK1 NUMBER(10)
, NEW_FK2 NUMBER(10)
, CONSTRAINT TMP_TABLE_PK_1 PRIMARY KEY
(
TROWID
)
ENABLE
)
CREATE UNIQUE INDEX TMP_TABLE_PK_1 ON tmp_table (TROWID ASC)
CREATE TABLE my_table
(
M_ID NUMBER(10) NOT NULL
, M_FK1 NUMBER(10)
, M_FK2 NUMBER(10)
, M_START_DATE DATE NOT NULL
, M_END_DATE DATE
, M_DELETED NUMBER(1) NOT NULL
, M_CHECK1 NUMBER(1) NOT NULL
, M_CHECK2 NUMBER(1) NOT NULL
, M_CHECK3 NUMBER(1)
, M_CREATION_DATE DATE
, M_CREATION_USER NUMBER(10)
, M_UPDATE_DATE DATE
, M_UPDATE_USER NUMBER(10)
, CONSTRAINT MY_TABLE_PK_1 PRIMARY KEY
(
M_ID
)
ENABLE
)
CREATE UNIQUE INDEX TMP_TABLE_PK_1 ON my_table (M_ID ASC)
CREATE INDEX TMP_TABLE_IX_1 ON my_table (M_UPDATE_DATE ASC, M_FK2 ASC)
CREATE INDEX TMP_TABLE_IX_2 ON my_table (M_FK1 ASC, M_FK2 ASC)
The tmp_table is a temporary table where i stored only the records and informations that will be updated in my_table. That means tmp_table.TROWID is the rowid of my_table row that should be merged.
Total merged records should be: 94M on a total anount of my_table of 540M.
The query:
MERGE /*+parallel*/ INTO my_table m
USING (SELECT /*+parallel*/ * FROM tmp_table) t
ON (m.rowid = t.TROWID)
WHEN MATCHED THEN
UPDATE SET m.M_FK1 = t.M_FK1 , m.M_FK2 = t.M_FK2 , m.M_UPDATE_DATE = trunc(sysdate)
, m.M_UPDATE_USER = 0 , m.M_CREATION_USER = 0
The execution plan is:
Operation | Table | Estimated Rows |
MERGE STATEMENT | | |
- MERGE | my_table | |
-- PX CORDINATOR | | |
--- PX SENDER | | |
---- PX SEND QC (RANDOM) | | 95M |
----- VIEW | | |
------ HASH JOIN BUFFERED | | 95M |
------- PX RECEIVE | | 95M |
-------- PX SEND HASH | | 95M |
--------- PX BLOCK ITERATOR | | 95M |
---------- TABLE ACCESS FULL | tmp_table | 95M |
------- PX RECEIVE | | 540M |
-------- PX SEND HASH | | 540M |
--------- PX BLOCK ITERATOR | | 540M |
---------- TABLE ACCESS FULL | my_table | 540M |
In the above plan the most expensive op is the HASH JOIN BUFFERED.
For the two full scans I've seen that not require more of 5/6 minutes, instead for the hash join after 2h have reach 1% of the execution.
I've no idea how require that much time; any suggesitons?
EDIT
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 94M| 9719M| | 3027K (2)| 10:05:29 |
| 1 | MERGE | my_table | | | | | |
| 2 | VIEW | | | | | | |
|* 3 | HASH JOIN | | 94M| 7109M| 3059M| 3027K (2)| 10:05:29 |
| 4 | TABLE ACCESS FULL| tmp_table | 94M| 1979M| | 100K (2)| 00:20:08 |
| 5 | TABLE ACCESS FULL| my_table | 630M| 33G| | 708K (3)| 02:21:48 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("tmp_table"."TROWID"="m".ROWID)
You could do a number of things. Please check whether they are beneficial for your situation, as milage will vary.
1) Use only the columns of the target table you touch (by select or update):
MERGE
INTO (SELECT m_fk1, m_fk2, m_update_date, m_update_user, m_creation_user
FROM my_table) m
2) Use only the columns of the source table you need. In your case that's all columns, so there won't be any benefit:
MERGE
INTO (...) m
USING (SELECT trowid, new_fk1, new_fk2 FROM tmp_table) t
Both 1) and 2) will reduce the size of the storage needed for a hash join and will enable the optimizer to use an index over all the columns if available.
3) In your special case with ROWIDs, it seems to be very beneficial (at least in my tests) to sort the source table. If you sort the rowids, you will likely update rows in the same physical block together, which may be more performant:
MERGE
INTO (...) m
USING (SELECT ... FROM tmp_table ORDER BY trowid) t
4) As your source table is quite large, I guess that it's tablespace is distributed over a couple of datafiles. You can check this with the query
SELECT f, count(*) FROM (
SELECT dbms_rowid.rowid_relative_fno(trowid) as f from tmp_table
) GROUP BY f ORDER BY f;
If your target table uses more than a handful of datafiles, you could try to partition your temporary table by datafile:
CREATE TABLE tmp_table (
TROWID ROWID NOT NULL
, NEW_FK1 NUMBER(10)
, NEW_FK2 NUMBER(10)
, FNO NUMBER
) PARTITION BY RANGE(FNO) INTERVAL (1) (
PARTITION p0 VALUES LESS THAN (0)
);
You can fill the column FNO with the following statement:
dbms_rowid.rowid_relative_fno(rowid)
Now you can update datafile by datafile, reducing the required memory for the hash join. Get the list of file numbers with
SELECT DISTINCT fno FROM tmp_table;
14
15
16
17
and run the updates file by file:
MERGE
INTO (SELECT ... FROM my_table) m
USING (SELECT ... FROM tmp_table PARTITION FOR (14) ORDER BY trowid) t
and next PARTITION FOR (15) etc. The file numbers will obviously be different on your system.
5) Finally, try to use nested loops instead of a hash join. Usually the optimizer picks the better join plan, but I cannot resist trying it out:
MERGE /*+ USE_NL (m t) */
INTO (SELECT ... FROM my_table) m
USING (SELECT ... FROM tmp_table partition for (14) ORDER BY trowid) t
ON (m.rowid = t.TROWID)
Related
I am trying to better understand how Oracle SQL determines distincts when executing code. I would expect that it works on one column at a time and then any duplicates are then set aside and the next column is compared only on those rows. This would continue by verifying the next column has/does not have any duplicates and only the rows that continue to have duplicates are put through the next step each time. Is this accurate? Additionally, is there a method that it uses to determine on which column to start? Some columns are very likely to have duplicates while others might be keys that should not contain any (assuming duplication didn't occur on a join or union).
There are two basic methods:
Building a hash table of the unique values
Sorting the data & de-duplicating in the process
Hashing tends to be more efficient.
Sorting is preferred in situations including:
The query has an order by - the data has to be sorted anyway, so there's no point hashing and sorting
There's an index on the distinct expression - the database can walk through the index to find the unique values. If it's a unique index the de-duplication step can be skipped too as the values are guaranteed to be unique.
The optimizer may consider other factors too (such as the expected number of distinct value, number of nulls, total data set size, etc.). The exact details of when the optimizer chooses a sort over hashing can change from version to version.
Here are some examples:
-- distinct a non-indexed column => hash
select distinct salary from hr.employees;
select *
from table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH UNIQUE | |
| 2 | TABLE ACCESS FULL| EMPLOYEES |
----------------------------------------
*/
-- order by & distinct => sort
select distinct salary from hr.employees
order by salary;
select *
from table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT UNIQUE | |
| 2 | TABLE ACCESS FULL| EMPLOYEES |
----------------------------------------
*/
-- index on distinct column => sort
select distinct department_id from hr.employees;
select *
from table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT UNIQUE NOSORT| |
| 2 | INDEX FULL SCAN | EMP_DEPARTMENT_IX |
------------------------------------------------
*/
-- distinct of unique/primary index => no sort/hash needed! Just read the index
select distinct employee_id from hr.employees;
select *
from table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | INDEX FULL SCAN | EMP_EMP_ID_PK |
------------------------------------------
*/
I have table
CREATE TABLE ard_signals
(id, val, str, date_val, dt);
This table records the values of all ID's signals. Unique ID's around 950. At the moment, the table contains about 50 million rows with different values of these signals. Each ID can have only a numeric values, string values or date values.
I get the last value of each ID, which, by condition, is less than input date:
select ID,
max(val) keep (dense_rank last order by dt desc) as val,
max(str) keep (dense_rank last order by dt desc) as str,
max(date_val) keep (dense_rank lastt order by dt desc) as date_val,
max(dt)
where dt <= to_date(any_date)
group by id;
I have index on ID. At the moment, the request takes about 30 seconds. Help, please, what ways of optimization it is possible to make for the given request?
EXPLAIN PLAN: with dt index
Example Data(This kind of rows are about 950-1000):
ID
VAL
STR
DATE_VAL
DT
920
0
20.07.2022 9:59:11
490
yes
20.07.2022 9:40:01
565
233
20.07.2022 9:32:03
32
1
20.07.2022 9:50:01
TL;DR You need your application to maintain a table of distinct id values.
So, you want the last record for each group (distinct id value) in your table, without doing a full table scan. It seems like it should be easy to tell Oracle: iterate through the distinct values for id and then do an index lookup to get the last dt value for each id and then give me that row. But looks are deceiving here -- it's not easy at all, if it is even possible.
Think about what an index on (id) (or (id, dt)) has. It has a bunch of leaf blocks and a structure to find the highest value of id in each block. But Oracle can only use the index to get all the distinct id values by reading every leaf block in the index. So, we might be find a way to trade our TABLE FULL SCAN for an INDEX_FFS for a marginal benefit, but it's not really what we're hoping for.
What about partitioning? Can't we create ARD_SIGNALS with PARTITION BY LIST (id) AUTOMATIC and use that? Now the data dictionary is guaranteed to have a separate partition for each distinct id value.
But again - think about what Oracle knows (from DBA_TAB_PARTITIONS) -- it knows what the highest partition key value is in each partition. It is true: for a list partitioned table, that highest value is guaranteed to be the only distinct value in the partition. But I think using that guarantee requires special optimizations that Oracle's CBO does not seem to make (yet).
So, unfortunately, you are going to need to modify your application to keep a parent table for ARDS_SIGNALS that has a (single) row for each distinct id.
Even then, it's kind of difficult to get what we want. Because, again, want Oracle to iterate through the distinct id values, then use the index to find the one with the highest dt for that id .. and then stop. So, we're looking for an execution plan that makes use of the INDEX RANGE SCAN (MIN/MAX) operation.
I find that tricky with joins, but not so hard with scalar subqueries. So, assuming we named our parent table ARD_IDS, we can start with this:
SELECT /*+ NO_UNNEST(#ssq) */ i.id,
( SELECT /*+ QB_NAME(ssq) */ max(dt)
FROM ard_signals s
WHERE s.id = i.id
AND s.dt <= to_date(trunc(SYSDATE) + 2+ 10/86400) -- replace with your date variable
) max_dt
FROM ard_ids i;
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 22 |
| 1 | SORT AGGREGATE | | 1000 | 1 | 1000 |00:00:00.02 | 3021 |
| 2 | FIRST ROW | | 1000 | 1 | 1000 |00:00:00.01 | 3021 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| ARD_SIGNALS_N1 | 1000 | 1 | 1000 |00:00:00.01 | 3021 |
| 4 | TABLE ACCESS FULL | ARD_IDS | 1 | 1000 | 1000 |00:00:00.01 | 22 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("S"."ID"=:B1 AND "S"."DT"<=TO_DATE(TO_CHAR(TRUNC(SYSDATE#!)+2+.00011574074074074074
0740740740740740740741)))
Note the use of hints to keep Oracle from merging the scalar subquery into the rest of the query and losing our desired access path.
Next, it is a matter of using those (id, max(dt)) combinations to look up the rows from the table to get the other column values. I came up with this; improvements may be possible (especially if (id, dt) is not as selective as I am assuming it is):
with k AS (
select /*+ NO_UNNEST(#ssq) */ i.id, ( SELECT /*+ QB_NAME(ssq) */ max(dt) FROM ard_signals s WHERE s.id = i.id AND s.dt <= to_date(trunc(SYSDATE) + 2+ 10/86400) ) max_dt
from ard_ids i
)
SELECT k.id,
max(val) keep ( dense_rank first order by dt desc, s.rowid ) val,
max(str) keep ( dense_rank first order by dt desc, s.rowid ) str,
max(date_val) keep ( dense_rank first order by dt desc, s.rowid ) date_val,
max(dt) keep ( dense_rank first order by dt desc, s.rowid ) dt
FROM k
INNER JOIN ard_signals s ON s.id = k.id AND s.dt = k.max_dt
GROUP BY k.id;
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.04 | 7009 | | | |
| 1 | SORT GROUP BY | | 1 | 1000 | 1000 |00:00:00.04 | 7009 | 302K| 302K| 268K (0)|
| 2 | NESTED LOOPS | | 1 | 1005 | 1000 |00:00:00.04 | 7009 | | | |
| 3 | NESTED LOOPS | | 1 | 1005 | 1000 |00:00:00.03 | 6009 | | | |
| 4 | TABLE ACCESS FULL | ARD_IDS | 1 | 1000 | 1000 |00:00:00.01 | 3 | | | |
|* 5 | INDEX RANGE SCAN | ARD_SIGNALS_N1 | 1000 | 1 | 1000 |00:00:00.02 | 6006 | | | |
| 6 | SORT AGGREGATE | | 1000 | 1 | 1000 |00:00:00.02 | 3002 | | | |
| 7 | FIRST ROW | | 1000 | 1 | 1000 |00:00:00.01 | 3002 | | | |
|* 8 | INDEX RANGE SCAN (MIN/MAX) | ARD_SIGNALS_N1 | 1000 | 1 | 1000 |00:00:00.01 | 3002 | | | |
| 9 | TABLE ACCESS BY GLOBAL INDEX ROWID| ARD_SIGNALS | 1000 | 1 | 1000 |00:00:00.01 | 1000 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("S"."ID"="I"."ID" AND "S"."DT"=)
8 - access("S"."ID"=:B1 AND "S"."DT"<=TO_DATE(TO_CHAR(TRUNC(SYSDATE#!)+2+.000115740740740740740740740740740740740741)))
... 7000 gets and 4/100ths of a second.
You want to see that latest entry per ID at a given time.
If you would usually be interested in times at the beginning of the recordings, an index on the time could help to limit the rows to have to work with. But I consider this highly unlikely.
It is much more likely that you are interested in the situation at more recent times. This means for instance that when looking for the latest entries until today, for one ID the latest entry may be found yesterday, while for another ID the latest entry my be from two years ago.
In my opinion, Oracle already chooses the best approach to deal with this: read the whole table sequentially.
If you have several CPUs at hand, parallel execution might speed up things:
select /*+ parallel(4) */ ...
If you really need this to be much faster, then you may want to consider an additional table with one row per ID, which gets a copy of the lastest date and value by a trigger. I.e. you'd introduce redundancy for the gain of speed, as it is done in data warehouses.
Got a question and I don't find the answer, can someone help me ? here is the situation :
I have a schema, that is a template.
And I want to have 10 schemas of this template.
But I want that everytime I change the structure of the template schema, like making a new column, the column is created in all the schemas related to the template schema.
is this possible with Oracle ?
As the others said, it is not possible in Oracle to do this by default. BUT, if your on the latest versions (12.2 and higher), and don't mind paying for the multitenant option, you can look into something called application containers. This will trade your schemas in a single DB for the same schema but in different PDBs. Application containers allows you to define the schema in a parent PDB (including tables, views, triggers, ....) and have every modification propagated to the PDBs (you sync each PDB when you want).
But I want that everytime I change the structure of the template schema, like making a new column, the column is created in all the schemas related to the template schema.
is this possible with Oracle ?
No, it is not. You would need to separately create the column in the table owned by each individual user (a.k.a. schema).
As Justin Cave suggested, your problem is practically screaming for Oracle Partitioning.
If you do not have Partitioning licensed, there is still the old (but free!) approach of making a partitioned view.
In this approach, you would keep your historical tables unchanged (i.e., don't go back and add new columns to them). Instead, you would create a partitioned view that includes each historical table (concatenated together via UNION ALL). The view definition can provide values for newer columns that did not exist in the original table that year.
A partitioned view also has the benefits of
Making it easy to report across multiple years
"Partition pruning" -- skipping tables that are not of interest in a given query
Here is a walk through of the approach:
Create tables for 2019 and 2020 data
CREATE TABLE matt_data_2019
( id NUMBER NOT NULL,
creation_date DATE NOT NULL,
data_column1 NUMBER,
data_column2 VARCHAR2(200),
CONSTRAINT matt_data_2019 PRIMARY KEY ( id ),
CONSTRAINT matt_data_2019_c1 CHECK ( creation_date BETWEEN to_date('01-JAN-2019','DD-MON-YYYY') AND to_date('01-JAN-2020','DD-MON-YYYY') - interval '1' second )
);
CREATE TABLE matt_data_2020
( id NUMBER NOT NULL,
creation_date DATE NOT NULL,
data_column1 NUMBER,
data_column2 VARCHAR2(200),
data_column3 DATE, -- This is new for 2020
CONSTRAINT matt_data_2020 PRIMARY KEY ( id ),
CONSTRAINT matt_data_2020_c1 CHECK ( creation_date BETWEEN to_date('01-JAN-2020','DD-MON-YYYY') AND to_date('01-JAN-2021','DD-MON-YYYY') - interval '1' second )
);
Notice there is a new column for 2020 that does not exist in 2019.
Put some test data in to ensure accurate test results...
INSERT INTO matt_data_2019 ( id, creation_date, data_column1, data_column2 )
SELECT rownum id,
to_date('01-JAN-2019','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)), -- Some random date in 2019
dbms_random.value(0,1000),
lpad('2019',200,'X')
FROM dual
CONNECT BY rownum <= 100000;
INSERT INTO matt_data_2020 ( id, creation_date, data_column1, data_column2, data_column3 )
SELECT rownum id,
to_date('01-JAN-2020','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)), -- Some random date in 2020
dbms_random.value(0,1000),
lpad('2020',200,'X'),
to_date('01-JAN-2021','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)) -- Some random date in 2021
FROM dual
CONNECT BY rownum <= 100000;
Gather statistics on both tables for accurate test results ...
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'MATT_DATA_2019');
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'MATT_DATA_2020');
Create a view that includes all the tables.
You would need to modify this view every time a new table was created.
CREATE OR REPLACE VIEW matt_data_v AS
SELECT 2019 source_year,
id,
creation_date,
data_column1,
data_column2,
NULL data_column3 -- data_column3 did not exist in 2019
FROM matt_data_2019
UNION ALL
SELECT 2020 source_year,
id,
creation_date,
data_column1,
data_column2,
data_column3 -- data_column3 did not exist in 2019
FROM matt_data_2020;
Check how Oracle will process a query specifying a single year
EXPLAIN PLAN SET STATEMENT_ID = 'MM' FOR SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','MM'));
Plan hash value: 393585474
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110K| 15M| 620 (2)| 00:00:01 |
| 1 | VIEW | MATT_DATA_V | 110K| 15M| 620 (2)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MATT_DATA_2019 | 71238 | 9530K| 596 (2)| 00:00:01 |
| 5 | TABLE ACCESS FULL | MATT_DATA_2020 | 110K| 15M| 620 (2)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(NULL IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Hmmm, it looks like Oracle is still including the 2019 table...
... but it isn't. That NULL IS NOT NULL filter condition will cause Oracle to skip the 2019 table completely.
Prove that Oracle is skipping the 2019 table when we ask for 2020 data ...
alter session set statistics_level = ALL;
SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020;
-- Be sure to fetch entire result set (e.g., scroll to the end in SQL*Developer)
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null,
'ALLSTATS LAST'));
SQL_ID 1u3nwcnxs20jb, child number 0
-------------------------------------
SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020
Plan hash value: 393585474
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.21 | 5417 |
| 1 | VIEW | MATT_DATA_V | 1 | 110K| 100K|00:00:00.21 | 5417 |
| 2 | UNION-ALL | | 1 | | 100K|00:00:00.17 | 5417 |
|* 3 | FILTER | | 1 | | 0 |00:00:00.01 | 0 |
| 4 | TABLE ACCESS FULL| MATT_DATA_2019 | 0 | 71238 | 0 |00:00:00.01 | 0 |
| 5 | TABLE ACCESS FULL | MATT_DATA_2020 | 1 | 110K| 100K|00:00:00.09 | 5417 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(NULL IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
The results above show how Oracle skips the 2019 table when we don't ask for it.
I have an Oracle table as below:
CREATE TABLE "TABLE1"
(
"TABLE_ID" VARCHAR2(32 BYTE),
"TABLE_DATE" DATE,
"TABLE_NAME" VARCHAR2(2 BYTE)
)
PARTITION BY RANGE ("TABLE_DATE")
Guess this table has data partitioned by the TABLE_DATE column.
How can I use this partitioning column to fetch data faster from this table in a WHERE clause like ...
SELECT * FROM TABLE1 PARTITION (P1) p
WHERE p.TABLE_DATE > (SYSDATE - 90) ;
You should change your partitioning to match your queries, not change your queries to match your partitioning. In most cases we shouldn't have to specify which partitions to read from. Oracle can automatically determine how to prune the partitions at run time.
For example, with this table:
create table table1
(
table_id varchar2(32 byte),
table_date date,
table_name varchar2(2 byte)
)
partition by range (table_date)
(
partition p1 values less than (date '2019-05-06'),
partition p2 values less than (maxvalue)
);
There's almost never a need to directly reference a partition in the query. It's extra work, and if we list the wrong partition name the query won't work correctly.
We can see partition pruning in action using EXPLAIN PLAN like this:
explain plan for
SELECT * FROM TABLE1 p
WHERE p.TABLE_DATE > (SYSDATE - 90) ;
select *
from table(dbms_xplan.display);
In the results we can see the partitioning in the Pstart and Pstop columns. The KEY means that the partition will be determined at run time. In this case, the start partition is based on the value of SYSDATE.
Plan hash value: 434062308
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ITERATOR| | 1 | 30 | 2 (0)| 00:00:01 | KEY | 2 |
|* 2 | TABLE ACCESS FULL | TABLE1 | 1 | 30 | 2 (0)| 00:00:01 | KEY | 2 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("P"."TABLE_DATE">SYSDATE#!-90)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
I need to optimize a procedure in Oracle SQL, mainly using indexes. This is the statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
FOR I IN (SELECT * FROM (SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE)WHERE ROWNUM<=cuantos)
LOOP
DELETE FROM OBSERVATIONS WHERE nplate=i.nplate AND odatetime=i.odatetime;
END LOOP;
end del_obs;
My plan was to create an index related with rownum since it is what appears to be used to do the deletes. But I don't know if it is going to be worthy. The problem with this procedure is that its randomness causes a lot of consistent gets. Can anyone help me with this?? Thanks :)
Note: I cannot change the code, only make improvements afterwards
Use the ROWID pseudo-column to filter the columns:
CREATE OR REPLACE PROCEDURE DEL_OBS(
cuantos number
)
IS
BEGIN
DELETE FROM OBSERVATIONS
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM observations
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM < cuantos
);
END del_obs;
If you have an index on the table then it can use a index fast full scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( id ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 50000;
Query 1: No Index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 123 | 00:00:02 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 123 | 00:00:02 |
| 3 | VIEW | VW_NSO_1 | 10000 | 120000 | 121 | 00:00:02 |
| 4 | SORT UNIQUE | | 1 | 120000 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 19974 | 239688 | 121 | 00:00:02 |
| * 7 | SORT ORDER BY STOPKEY | | 19974 | 239688 | 121 | 00:00:02 |
| 8 | TABLE ACCESS FULL | TABLE_NAME | 19974 | 239688 | 25 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 12 | 1 | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
Query 2 Add an index:
ALTER TABLE table_name ADD CONSTRAINT tn__id__pk PRIMARY KEY ( id )
Query 3 With the index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 37 | 13 | 00:00:01 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 37 | 13 | 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 9968 | 119616 | 11 | 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 119616 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 9968 | 119616 | 11 | 00:00:01 |
| * 7 | SORT ORDER BY STOPKEY | | 9968 | 119616 | 11 | 00:00:01 |
| 8 | INDEX FAST FULL SCAN | TN__ID__PK | 9968 | 119616 | 9 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 25 | 1 | 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
If you cannot do it in single SQL statement using ROWID then you can rewrite your existing procedure to use exactly the same queries but use the FORALL statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number)
IS
TYPE obs_tab IS TABLE OF observations%ROWTYPE;
begin
SELECT *
BULK COLLECT INTO obs_tab
FROM (
SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM<=cuantos;
FORALL i IN 1 .. obs_tab.COUNT
DELETE FROM OBSERVATIONS
WHERE nplate = obs_tab(i).nplate
AND odatetime = obs_tab(i).odatetime;
END del_obs;
What you definitively need is an index on OBSERVATIONS to allow the DELETEwith an index access.
CREATE INDEX cuantos ON OBSERVATIONS(nplate, odatetime);
The execution of the procedure will lead to one FULL TABLE SCANot the OBSERVATIONS table and to one INDEX ACCESS for each deleted record.
For a limited number deleted recrods it will behave similar as the set DELETEproposed in other answer; for larger number of deleted records the elapsed time will linerary scale with the number of deletes.
For a non-trival number of deleted records you must assume that the index is not completely in the buffer pool and lots of disc access will be requried. So you'll end with approximately 100 deleted rows per second.
In other words to delete 100K rows it will take ca. 1/4 hour.
To delete 1M rows you need 2 3/4 of an hour.
You see while deleting in this scale the first part of the task - the FULL SCAN of your table is neglectable, it will take few minutes only. The only possibility to get acceptable response time in this case is to switch the logic to a single DELETEstatement as proposed in other answers.
This behavior is also called the rule: "Row by Row is Slow by Slow" (i.e. processing in a loop works fine, but only with a limited number of records).
You can do this using a single delete statement:
delete from observations o
where (o.nplate, o.odatetime) in (select nplace, odatetime
from (select o2.nplate, o2.odatetime
from observations o2
order by DBMS_RANDOM.VALUE
) o2
where rownum <= v_cuantos
);
This is often faster than executing multiple queries for each row being deleted.
Try this. test on MSSQL hopes so it will work also on Oracle. please remarks the status.
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
DELETE OBSERVATIONS FROM OBSERVATIONS
join (select * from OBSERVATIONS ORDER BY VALUE ) as i on
nplate=i.nplate AND
odatetime=i.odatetime AND
i.ROWNUM<=cuantos;
End DEL_OBS;
Since you say that nplate and odatetime are the primary key of observations, then I am guessing the problem is here:
SELECT * FROM (
SELECT *
FROM observations
ORDER BY DBMS_RANDOM.VALUE)
WHERE ROWNUM<=cuantos;
There is no way to prevent that from performing a full scan of observations, plus a lot of sorting if that's a big table.
You need to change the code that runs. By far, the easiest way to change the code is to change the source code and recompile it.
However, there are ways to change the code that executes without changing the source code. Here are two:
(1) Use DBMS_FGAC to add a policy that detects whether you are in this procedure and, if so, add a predicate to the observations table like this:
AND rowid IN
( SELECT obs_sample.rowid
FROM observations sample (0.05) obs_sample)
(2) Use DBMS_ADVANCED_REWRITE to rewrite your query changing:
FROM observations
.. to ..
FROM observations SAMPLE (0.05)
Using the text of your query in the re-write policy should prevent it from affecting other queries against the observations table.
Neither of these are easy (at all), but can be worth a try if you are really stuck.