Got a question and I don't find the answer, can someone help me ? here is the situation :
I have a schema, that is a template.
And I want to have 10 schemas of this template.
But I want that everytime I change the structure of the template schema, like making a new column, the column is created in all the schemas related to the template schema.
is this possible with Oracle ?
As the others said, it is not possible in Oracle to do this by default. BUT, if your on the latest versions (12.2 and higher), and don't mind paying for the multitenant option, you can look into something called application containers. This will trade your schemas in a single DB for the same schema but in different PDBs. Application containers allows you to define the schema in a parent PDB (including tables, views, triggers, ....) and have every modification propagated to the PDBs (you sync each PDB when you want).
But I want that everytime I change the structure of the template schema, like making a new column, the column is created in all the schemas related to the template schema.
is this possible with Oracle ?
No, it is not. You would need to separately create the column in the table owned by each individual user (a.k.a. schema).
As Justin Cave suggested, your problem is practically screaming for Oracle Partitioning.
If you do not have Partitioning licensed, there is still the old (but free!) approach of making a partitioned view.
In this approach, you would keep your historical tables unchanged (i.e., don't go back and add new columns to them). Instead, you would create a partitioned view that includes each historical table (concatenated together via UNION ALL). The view definition can provide values for newer columns that did not exist in the original table that year.
A partitioned view also has the benefits of
Making it easy to report across multiple years
"Partition pruning" -- skipping tables that are not of interest in a given query
Here is a walk through of the approach:
Create tables for 2019 and 2020 data
CREATE TABLE matt_data_2019
( id NUMBER NOT NULL,
creation_date DATE NOT NULL,
data_column1 NUMBER,
data_column2 VARCHAR2(200),
CONSTRAINT matt_data_2019 PRIMARY KEY ( id ),
CONSTRAINT matt_data_2019_c1 CHECK ( creation_date BETWEEN to_date('01-JAN-2019','DD-MON-YYYY') AND to_date('01-JAN-2020','DD-MON-YYYY') - interval '1' second )
);
CREATE TABLE matt_data_2020
( id NUMBER NOT NULL,
creation_date DATE NOT NULL,
data_column1 NUMBER,
data_column2 VARCHAR2(200),
data_column3 DATE, -- This is new for 2020
CONSTRAINT matt_data_2020 PRIMARY KEY ( id ),
CONSTRAINT matt_data_2020_c1 CHECK ( creation_date BETWEEN to_date('01-JAN-2020','DD-MON-YYYY') AND to_date('01-JAN-2021','DD-MON-YYYY') - interval '1' second )
);
Notice there is a new column for 2020 that does not exist in 2019.
Put some test data in to ensure accurate test results...
INSERT INTO matt_data_2019 ( id, creation_date, data_column1, data_column2 )
SELECT rownum id,
to_date('01-JAN-2019','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)), -- Some random date in 2019
dbms_random.value(0,1000),
lpad('2019',200,'X')
FROM dual
CONNECT BY rownum <= 100000;
INSERT INTO matt_data_2020 ( id, creation_date, data_column1, data_column2, data_column3 )
SELECT rownum id,
to_date('01-JAN-2020','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)), -- Some random date in 2020
dbms_random.value(0,1000),
lpad('2020',200,'X'),
to_date('01-JAN-2021','DD-MON-YYYY') + (dbms_random.value(0, 365*24*60*60-1) / (365*24*60*60)) -- Some random date in 2021
FROM dual
CONNECT BY rownum <= 100000;
Gather statistics on both tables for accurate test results ...
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'MATT_DATA_2019');
EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'MATT_DATA_2020');
Create a view that includes all the tables.
You would need to modify this view every time a new table was created.
CREATE OR REPLACE VIEW matt_data_v AS
SELECT 2019 source_year,
id,
creation_date,
data_column1,
data_column2,
NULL data_column3 -- data_column3 did not exist in 2019
FROM matt_data_2019
UNION ALL
SELECT 2020 source_year,
id,
creation_date,
data_column1,
data_column2,
data_column3 -- data_column3 did not exist in 2019
FROM matt_data_2020;
Check how Oracle will process a query specifying a single year
EXPLAIN PLAN SET STATEMENT_ID = 'MM' FOR SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','MM'));
Plan hash value: 393585474
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110K| 15M| 620 (2)| 00:00:01 |
| 1 | VIEW | MATT_DATA_V | 110K| 15M| 620 (2)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MATT_DATA_2019 | 71238 | 9530K| 596 (2)| 00:00:01 |
| 5 | TABLE ACCESS FULL | MATT_DATA_2020 | 110K| 15M| 620 (2)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(NULL IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Hmmm, it looks like Oracle is still including the 2019 table...
... but it isn't. That NULL IS NOT NULL filter condition will cause Oracle to skip the 2019 table completely.
Prove that Oracle is skipping the 2019 table when we ask for 2020 data ...
alter session set statistics_level = ALL;
SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020;
-- Be sure to fetch entire result set (e.g., scroll to the end in SQL*Developer)
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null,
'ALLSTATS LAST'));
SQL_ID 1u3nwcnxs20jb, child number 0
-------------------------------------
SELECT * FROM MATT_DATA_V WHERE SOURCE_YEAR = 2020
Plan hash value: 393585474
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.21 | 5417 |
| 1 | VIEW | MATT_DATA_V | 1 | 110K| 100K|00:00:00.21 | 5417 |
| 2 | UNION-ALL | | 1 | | 100K|00:00:00.17 | 5417 |
|* 3 | FILTER | | 1 | | 0 |00:00:00.01 | 0 |
| 4 | TABLE ACCESS FULL| MATT_DATA_2019 | 0 | 71238 | 0 |00:00:00.01 | 0 |
| 5 | TABLE ACCESS FULL | MATT_DATA_2020 | 1 | 110K| 100K|00:00:00.09 | 5417 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(NULL IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
The results above show how Oracle skips the 2019 table when we don't ask for it.
Related
I have two tables, both using valid to and valid from logic. Table 1 looks like this:
ID | VALID_FROM | VALID_TO
1 | 01.01.2000 | 04.01.2000
1 | 04.01.2000 | 16.01.2000
1 | 16.01.2000 | 17.01.2000
1 | 17.01.2000 | 19.01.2000
2 | 03.02.2001 | 04.04.2001
2 | 04.04.2001 | 14.03.2001
2 | 14.04.2001 | 18.03.2001
while table 2 looks like this:
ID | VAR | VALID_FROM | VALID_TO
1 | 3 | 01.01.2000 | 17.01.2000
1 | 2 | 17.01.2000 | 19.01.2000
2 | 4 | 03.02.2001 | 14.03.2001
Table 1 has 132,195,791 rows and table 2 has 16,964,846.
The valid from and valid to date of any observation in table 1 is within one or more valid from to valid to windows shown in table 2.
I created primary keys for both of them over ID and VALID_FROM
I want to do an inner join like:
select t1.*,
t2.var
from t1 t1
inner join t2 t2
on t1.id = t2.id
and t1.valid_from >= t2.valid_from
and t1.valid_to <= t2.valid_to;
This join is really slow. I ran it half a day without any success. What can I do to increase performance in this particular case? Please note that I also want to left join the resulting table in later stages. Any help is highly appreciated.
EDIT
Obviously, the information I gave was less then generally desired here on the platform.
I use Oracle Database 12c Enterprise Edition
The example I gave was illustrative for the bigger problem at hand. I
am concerned with joining information from different tables with
different valid_from / valid_to dates. For this I created a grid
first with the distinct values in the valid_from and valid_to
variables of all the relevant tables. This grid is what I refer here
to as table 1.
Results from the execution plan (I adjusted the column and table names to meet the terminology used in my illustrative example):
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 465M| 23G| | 435K (3)| 00:00:18 |
|* 1 | HASH JOIN | | 465M| 23G| 695M| 435K (3)| 00:00:18 |
| 2 | TABLE ACCESS FULL| TABLE2 | 16M| 501M| | 22961 (2)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TABLE1 | 132M| 3025M| | 145K (2)| 00:00:06 |
--------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
2 - SEL$58A6D7F6 / T2#SEL$1
3 - SEL$58A6D7F6 / T1#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_TO"<="T2"."VALID_TO" AND
"T1"."VALID_FROM">="T2"."VALID_FROM")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "T2"."ID"[VARCHAR2,20],
"T1"."ID"[VARCHAR2,20], "T1"."VALID_TO"[DATE,7],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7], "T1"."ID"[VARCHAR2,20],
"T1"."VALID_FROM"[DATE,7], "T1"."VALID_TO"[DATE,7], "T1"."VALID_FROM"[DATE,7]
2 - "T2"."ID"[VARCHAR2,20],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7]
3 - "T1"."ID"[VARCHAR2,20], "T1"."VALID_FROM"[DATE,7],
"T1"."VALID_TO"[DATE,7]
Note
-----
- this is an adaptive plan
A good practice is to ask first: what is expected the query will return?
Base on your WHERE predicate is seems you are interested on all versions from table2 that are included in the validity interval of table1. This may be intention, but more common you need all versions that intersect between the tables.
The second aspect is, do you need to see few first rows or all rows from the join.
If you only want to see few results, simple add AND t1.ID = nnnn to the WHERE clause to limit to some sample ID. If you have proper indexes (and tehre are no expreme lot of rows with this ID), you will get the result quick as NESTED LOOP join will kick in.
To perform the the full result, you must consider all rows from both tables. No index will help you to select all rows from a table - here is the FULL TABLE SCAN the best option.
To join the large row sets the best approach is HASH JOIN. NESTED LOOPS (which you probably use now) are quick to join few rows, but hangs on large row sets.
The smaller table (table2) is red in memory (hopefully) as a hash table. The larger table (table1) is probed against this hash table toperform the join.
This is the execution plan you should look for
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10T| 399T| | 190M(100)| 02:03:47 |
|* 1 | HASH JOIN | | 10T| 399T| 550M| 190M(100)| 02:03:47 |
| 2 | TABLE ACCESS FULL| SCD2 | 16M| 355M| | 39 (93)| 00:00:01 |
| 3 | TABLE ACCESS FULL| SCD1 | 132M| 2395M| | 211 (99)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_FROM">="T2"."VALID_FROM" AND
"T1"."VALID_TO"<="T2"."VALID_TO")
Provided you are on an enterprise database this should pass you from days to hours. Further you can deploy parallel option to get additional speed up.
Good luck!
I need to optimize a procedure in Oracle SQL, mainly using indexes. This is the statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
FOR I IN (SELECT * FROM (SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE)WHERE ROWNUM<=cuantos)
LOOP
DELETE FROM OBSERVATIONS WHERE nplate=i.nplate AND odatetime=i.odatetime;
END LOOP;
end del_obs;
My plan was to create an index related with rownum since it is what appears to be used to do the deletes. But I don't know if it is going to be worthy. The problem with this procedure is that its randomness causes a lot of consistent gets. Can anyone help me with this?? Thanks :)
Note: I cannot change the code, only make improvements afterwards
Use the ROWID pseudo-column to filter the columns:
CREATE OR REPLACE PROCEDURE DEL_OBS(
cuantos number
)
IS
BEGIN
DELETE FROM OBSERVATIONS
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM observations
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM < cuantos
);
END del_obs;
If you have an index on the table then it can use a index fast full scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( id ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 50000;
Query 1: No Index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 123 | 00:00:02 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 123 | 00:00:02 |
| 3 | VIEW | VW_NSO_1 | 10000 | 120000 | 121 | 00:00:02 |
| 4 | SORT UNIQUE | | 1 | 120000 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 19974 | 239688 | 121 | 00:00:02 |
| * 7 | SORT ORDER BY STOPKEY | | 19974 | 239688 | 121 | 00:00:02 |
| 8 | TABLE ACCESS FULL | TABLE_NAME | 19974 | 239688 | 25 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 12 | 1 | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
Query 2 Add an index:
ALTER TABLE table_name ADD CONSTRAINT tn__id__pk PRIMARY KEY ( id )
Query 3 With the index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 37 | 13 | 00:00:01 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 37 | 13 | 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 9968 | 119616 | 11 | 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 119616 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 9968 | 119616 | 11 | 00:00:01 |
| * 7 | SORT ORDER BY STOPKEY | | 9968 | 119616 | 11 | 00:00:01 |
| 8 | INDEX FAST FULL SCAN | TN__ID__PK | 9968 | 119616 | 9 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 25 | 1 | 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
If you cannot do it in single SQL statement using ROWID then you can rewrite your existing procedure to use exactly the same queries but use the FORALL statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number)
IS
TYPE obs_tab IS TABLE OF observations%ROWTYPE;
begin
SELECT *
BULK COLLECT INTO obs_tab
FROM (
SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM<=cuantos;
FORALL i IN 1 .. obs_tab.COUNT
DELETE FROM OBSERVATIONS
WHERE nplate = obs_tab(i).nplate
AND odatetime = obs_tab(i).odatetime;
END del_obs;
What you definitively need is an index on OBSERVATIONS to allow the DELETEwith an index access.
CREATE INDEX cuantos ON OBSERVATIONS(nplate, odatetime);
The execution of the procedure will lead to one FULL TABLE SCANot the OBSERVATIONS table and to one INDEX ACCESS for each deleted record.
For a limited number deleted recrods it will behave similar as the set DELETEproposed in other answer; for larger number of deleted records the elapsed time will linerary scale with the number of deletes.
For a non-trival number of deleted records you must assume that the index is not completely in the buffer pool and lots of disc access will be requried. So you'll end with approximately 100 deleted rows per second.
In other words to delete 100K rows it will take ca. 1/4 hour.
To delete 1M rows you need 2 3/4 of an hour.
You see while deleting in this scale the first part of the task - the FULL SCAN of your table is neglectable, it will take few minutes only. The only possibility to get acceptable response time in this case is to switch the logic to a single DELETEstatement as proposed in other answers.
This behavior is also called the rule: "Row by Row is Slow by Slow" (i.e. processing in a loop works fine, but only with a limited number of records).
You can do this using a single delete statement:
delete from observations o
where (o.nplate, o.odatetime) in (select nplace, odatetime
from (select o2.nplate, o2.odatetime
from observations o2
order by DBMS_RANDOM.VALUE
) o2
where rownum <= v_cuantos
);
This is often faster than executing multiple queries for each row being deleted.
Try this. test on MSSQL hopes so it will work also on Oracle. please remarks the status.
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
DELETE OBSERVATIONS FROM OBSERVATIONS
join (select * from OBSERVATIONS ORDER BY VALUE ) as i on
nplate=i.nplate AND
odatetime=i.odatetime AND
i.ROWNUM<=cuantos;
End DEL_OBS;
Since you say that nplate and odatetime are the primary key of observations, then I am guessing the problem is here:
SELECT * FROM (
SELECT *
FROM observations
ORDER BY DBMS_RANDOM.VALUE)
WHERE ROWNUM<=cuantos;
There is no way to prevent that from performing a full scan of observations, plus a lot of sorting if that's a big table.
You need to change the code that runs. By far, the easiest way to change the code is to change the source code and recompile it.
However, there are ways to change the code that executes without changing the source code. Here are two:
(1) Use DBMS_FGAC to add a policy that detects whether you are in this procedure and, if so, add a predicate to the observations table like this:
AND rowid IN
( SELECT obs_sample.rowid
FROM observations sample (0.05) obs_sample)
(2) Use DBMS_ADVANCED_REWRITE to rewrite your query changing:
FROM observations
.. to ..
FROM observations SAMPLE (0.05)
Using the text of your query in the re-write policy should prevent it from affecting other queries against the observations table.
Neither of these are easy (at all), but can be worth a try if you are really stuck.
Each time i want to process 5000 records like below.
First time i want to process records from 1 to 5000 rows.
second time i want to process records from 5001 to 10000 rows.
third time i want to process records from 10001 to 15001 rows like wise
I dont want to go for procedure or PL/SQL. I will change the rnum values in my code to fetch the 5000 records.
The given query is taking 3 minutes to fetch the records from 3 joined tables. How can i reduced the time to fetch the records.
select * from (
SELECT to_number(AA.MARK_ID) as MARK_ID, AA.SUPP_ID as supplier_id, CC.supp_nm as SUPPLIER_NAME, CC.supp_typ as supplier_type,
CC.supp_lock_typ as supplier_lock_type, ROW_NUMBER() OVER (ORDER BY AA.MARK_ID) as rnum
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL)
where rnum >=10001 and rnum<=15000;
I have tried below scenario but no luck.
I have tried the /*+ USE_NL(AA BB) */ hints.
I used exists in the where conditions. but its taking the same 3 minutes to fetch the records.
Below is the table details.
select count(*) from TABLE_B;
-----------------
2275
select count(*) from TABLE_A;
-----------------
2405276
select count(*) from TABLE_C;
-----------------
1269767
Result of my inner query total records is
SELECT count(*)
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL;
-----------------
2027055
All the used columns in where conditions are indexed properly.
Explain Table for the given query is...
Plan hash value: 3726328503
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 1 | VIEW | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 2 | WINDOW SORT PUSHED RANK | | 2082K| 166M| 200M| 85175 (1)| 00:17:03 |
|* 3 | HASH JOIN | | 2082K| 166M| | 44550 (1)| 00:08:55 |
| 4 | TABLE ACCESS FULL | TABLE_C | 1640 | 49200 | | 22 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 2082K| 107M| 27M| 44516 (1)| 00:08:55 |
|* 6 | VIEW | index$_join$_005 | 1274K| 13M| | 9790 (1)| 00:01:58 |
|* 7 | HASH JOIN | | | | | | |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | INDEX RANGE SCAN | TABLE_B_IN2 | 1274K| 13M| | 2371 (2)| 00:00:29 |
| 10 | INDEX FAST FULL SCAN| TABLE_B_IU1 | 1274K| 13M| | 4801 (1)| 00:00:58 |
|* 11 | TABLE ACCESS FULL | TABLE_A | 2356K| 96M| | 27174 (1)| 00:05:27 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=10001 AND "RNUM"<=15000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "A"."MARK_ID")<=15000)
3 - access("A"."SUPP_ID"="C"."LOC_ID" AND "A"."VALUE_KEY"="C"."VALUE_KEY")
5 - access("A"."MARK_ID"="A"."MARK_ID" AND "A"."VALUE_KEY"="A"."VALUE_KEY")
6 - filter("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
7 - access(ROWID=ROWID)
9 - access("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
11 - filter("A"."CHNL_ID"=160 AND "A"."VPR_ID" IS NOT NULL)
Could you please anyone help me on this to tune this query as i am trying from last 2 days?
Each query will take a long time because each query will have to join then sort all rows. The row_number analytic function can only return a result if the whole set has been read. This is highly inefficient. If the data set is large, you only want to sort and hash-join once.
You should fetch the whole set once, using batches of 5k rows. Alternatively, if you want to keep your existing code logic, you could store the result in a temporary table, for instance:
CREATE TABLE TMP AS <your above query>
CREATE INDEX ON TMP (rnum)
And then replace your query in your code by
SELECT * FROM TMP WHERE rnum BETWEEN :x AND :y
Obviously if your temp table is being reused periodically, just create it once and delete when done (or use a true temporary table).
How many unique MARK_ID values have you got in TABLE_A? I think you may get better performance if you limit the fetched ranges of records by MARK_ID instead of the artificial row number, because the latter is obviously not sargeable. Granted, you may not get exactly 5000 rows in each range but I have a feeling it's not as important as the query performance.
Firstly, giving obfuscated table names makes it nearly impossible to deduce anything about the data distributions and relationships between tables, so potential answerers are crippled from the start.
However, if every row in table_a matches one row in the other tables then you can avoid some of the usage of 200Mb of temporary disk space that is probably crippling performance by pushing the ranking down into an inline view or common table expression.
Monitor V$SQL_WORKAREA to check the exact amount of space being used for the window function, and if it is still excessive consider modifying the memory management to increase available sort area size.
Something like:
with cte_table_a as (
SELECT
to_number(MARK_ID) as MARK_ID,
SUPP_ID as supplier_id,
ROW_NUMBER() OVER (ORDER BY MARK_ID) as rnum
from
TABLE_A
where
char_id='160' and
VPR_ID IS NOT NULL)
select ...
from
cte_table_a aa,
TABLE_B BB,
TABLE_C CC
WHERE
aa.rnum >= 10001 and
aa.rnum <= 15000 and
AA.MARK_ID = BB.MARK_ID AND
AA.SUPP_ID = CC.location_id AND
BB.VALUE_KEY = AA.VALUE_KEY AND
BB.VALUE_KEY = CC.VALUE_KEY
I previously posted a question about joining two tables based on certain criteria over here
How to join two tables based on a timestamp (with variance of a few seconds)? (link doesn't have to be read)
I found that after creating indexes it works really fast. The snippet of my current code is
CREATE INDEX INDEXNAME1 ON TABLEA (CALL_DATE+5/86400);
CREATE INDEX INDEXNAME2 ON TABLEA (CALL_DATE+6/86400);
CREATE INDEX INDEXNAME3 ON TABLEB (NUMBER1,NUMBER2);
CREATE INDEX INDEXNAME4 ON TABLEA (NUMBER1,NUMBER2);
----
INSERT INTO AB_RECON (
SELECT A.*,B.* FROM TABLEB B FULL OUTER JOIN TABLEA A
ON B.NUMBER1=A.NUMBER1 AND B.NUMBER2=A.NUMBER2 AND
B.CALL_DATE-A.CALL_DATE IN (5/86400,6/86400);
----
DROP INDEX INDEXNAME1;
DROP INDEX INDEXNAME2;
DROP INDEX INDEXNAME3;
DROP INDEX INDEXNAME4;
Don't bother about the correctness of the code, it works. But the problem I'm facing is that the execution time is quite random. 90% of the time, the execution time is really quick (2-5 minutes) but sometimes (like right now its running for more than 20 minutes).
I know it might seem like "depends on the size of the tables" but on an average
TABLEA has 1.4 million records, and TABLEB has .9 million records. + or - a few ten thousands, not more.
I've run the following code (ran as SYS) to identify the queries currently running on the database along with elapsed time
select sess.sid, sess.serial#, sess.sql_id, sess.last_call_et as
EXECUTION_TIME,sq.sql_text from v$session sess,v$sql sq
where status = 'ACTIVE' and last_call_et > sysdate - (sysdate - (3/86400))
and username is not null and sess.sql_id=sq.sql_id;
And I get the following output
SID || SERIAL# || SQL_ID || EXECUTION_TIME || SQL_TEXT
246 || 51291 || dxa2sz103vt0g || 1256 || <my recon query pasted above>
I don't understand what's taking it so long cause from the looks of it, its the only active query. I'm not a DBA so I don't fully understand if there's something I'm missing.
Would appreciate it if some light could be shed into possible reasons/ solutions so that I can be point myself in the correct direction.
Additional information if required
Explain Plan
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 2386K| 530M| | 2395M (1)|999:59:59 |
| 1 | LOAD TABLE CONVENTIONAL | AB_RECON | | | | | |
| 2 | VIEW | | 2386K| 530M| | 2395M (1)|999:59:59 |
| 3 | UNION-ALL | | | | | | |
|* 4 | HASH JOIN RIGHT OUTER| | 1417K| 109M| 49M| 10143 (1)| 00:02:02 |
| 5 | TABLE ACCESS FULL | TABLEA | 968K| 38M| | 1753 (1)| 00:00:22 |
| 6 | TABLE ACCESS FULL | TABLEB | 1417K| 52M| | 2479 (1)| 00:00:30 |
|* 7 | FILTER | | | | | | |
| 8 | TABLE ACCESS FULL | TABLEA | 968K| 38M| | 1754 (1)| 00:00:22 |
|* 9 | TABLE ACCESS FULL | TABLEB | 1 | 29 | | 2479 (1)| 00:00:30 |
Oracle Edition
Oracle Database 11g Enterprise Edition 11.2.0.3.0 64-bit Production
Ok, I didn't really find an answer to my question but I did find a work around.
I've broken down what I wanted to do (basically a reconciliation between two sources of information) into three parts.
Matched in both source A and B
Missing in source B
Missing in source A
The queries I've used is given below. Overall it runs much faster. Will have to monitor its performance over several executions.
INSERT INTO AB_RECON (
SELECT M.*,I.* FROM TABLEA M, TABLEB I
WHERE M.ANUMBER=I.ANUMBER AND M.BNUMBER=I.BNUMBER
AND M.CALL_DATE-I.CALL_DATE B (5/86400,6/86400));
COMMIT;
INSERT INTO AB_RECON
(SELECT ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO,NULL,NULL,NULL,NULL,NULL FROM
(SELECT * FROM TABLEA M WHERE NOT EXISTS
(SELECT ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO FROM AB_RECON I
WHERE M.ANUMBER=I.ANUMBER AND M.BNUMBER=I.BNUMBER AND M.CALL_DATE=I.CALL_DATE
)
)
);
COMMIT;
INSERT INTO AB_RECON
(SELECT NULL,NULL,NULL,NULL,NULL,ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO FROM
(SELECT * FROM TABLEB M WHERE NOT EXISTS
(SELECT ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO FROM AB_RECON I
WHERE M.ANUMBER=I.ANUMBER AND M.BNUMBER=I.BNUMBER AND M.CALL_DATE=I.CALL_DATE
)
)
);
Honestly, I have no idea about the theory behind why this works faster. So my main question remains unresolved.
I have an index:
CREATE INDEX BLAH ON EMPLOYEE(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4));
and an SQL STATEMENT:
SELECT COUNT(*)
FROM (SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
but it keeps doing a full table scan instead of using the index unless I add a hint.
EMPSHIRTNO is not the primary key, EMPNO is (which isn't used here).
Complex query
EXPLAIN PLAN FOR SELECT COUNT(*) FROM (SELECT COUNT(*) FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1712471557
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 24 (9)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 497 | | 24 (9)| 00:00:01 |
|* 3 | FILTER | | | | | |
----------------------------------------------------------------------------------
| 4 | HASH GROUP BY | | 497 | 2485 | 24 (9)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 49990 | 22 (0)| 00:00:01||
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>100)
17 rows selected.
ANALYZE INDEX BLAH VALIDATE STRUCTURE;
SELECT BTREE_SPACE, USED_SPACE FROM INDEX_STATS;
BTREE_SPACE USED_SPACE
----------- ----------
176032 150274
Simple query:
EXPLAIN PLAN FOR SELECT * FROM EMPLOYEE;
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2913724801
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9998 | 439K| 23 (5)| 00:00:01 |
| 1 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 439K| 23 (5)| 00:00:01 |
------------------------------------------------------------------------------
8 rows selected.
Maybe it is because the NOT NULL constraint is enforced via a CHECK constraint rather than being defined originally in the table creation statement? It will use the index when I do:
SELECT * FROM EMPLOYEE WHERE SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) = '1234';
For those suggesting that it needs to read all of the rows anyway (which I don't think it does as it is counting), the index is not used on this either:
SELECT SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) FROM EMPLOYEE;
In fact, putting an index on EMPSHIRTNO and performing SELECT EMPSHIRTNO FROM EMPLOYEE; does not use the index either. I should point out that EMPSHIRTNO is not unique, there are duplicates in the table.
Because of the nature of your query it needs to scan every row of the table anyway. So oracle is probably deciding that a full table scan is the most efficient way to do this. Because its using a HASH GROUP BY there is no nasty sort at the end like in oracle 7 days.
First get the count per SUBSTR(...) of shirt no. Its thus first part of the query which has to scan the entire table
SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
Next you want to discard the SUBSTR(...) where the count is <= 100. Oracle needs to scan all rows to verify this. Technically you could argue that once it has 101 it doesn't need any more, but I don't think Oracle can work this out, especially as you are asking it what the total numer is in the SELECT COUNT(*) of the subquery.
HAVING COUNT(*) > 100);
So basically to give you the answer you want Oracle needs to scan every row in the table, so an index is no help on filtering. Because its using a hash group by, the index is no help on the grouping either. So to use the index would just slow your query down, which is why Oracle is not using it.
I think you may need to build a function-based index on SUBSTR(TO_CHAR(EMPSHIRTNO), 1,4); Functions in your SQL have a tendency to invalidate regular indexes on a column.
I believe #Codo is correct. Oracle cannot determine that the expression will always be non-null, and then must assume that some nulls may not
be stored in the index.
(It seems like Oracle should be able to figure out that the expression is not nullable. In general, the chance of any random SUBSTR expression always being
not null is probably very low, maybe Oracle just lumps all SUBSTR expressions together?)
You can make the index usable for your query with one of these work-arounds:
--bitmap index:
create bitmap index blah on employee(substr(to_char(empshirtno), 1, 4));
--multi-column index:
alter table employee add constraint blah primary key (id, empshirtno);
--indexed virtual column:
create table employee(empshirtno varchar2(10) not null
,empshirtno_for_index as (substr(empshirtno,1,4)) not null );
create index blah on employee(empshirtno_for_index);