I am trying to optimize the performance of a query. The problem is:
Say we have view X. And it has attr A, B, C.
C is a calculated field.
Say I want to try to optimize by querying "Select A,B from X where A = 'some condition'" then if I need C I can calculate that later on with the much smaller subset of data to enhance performance.
My question is will this help? Or does an Oracle view calculate C anyways when I make the initial query, regardless of whether I am querying for that attr or not? Therefore to optimize I would have to remove these calculated from the view?
The short answer is yes it helps to select only the limited column list from a view - both in means of the CPU (value calculation) and storage.
In cases when it is possible Oracle optimizer completely eliminates the view definition and merges it in the underlining tables.
So the view columns not referenced in the query are not accessed at all.
Here a simple example
create table t as
select rownum a, rownum b, rownum c from dual
connect by level = 10;
create or replace view v1 as
select a, b, c/0 c from t;
I'm simulation the complex calculation with a zero divide to see if the value is caclulated at all.
The best way to check is to run the query and to see the execution plan
select a, b from v1
where a = 1;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 26 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"=1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22], "B"[NUMBER,22]
In the Project Information is visible, that only columns A and B are refrenced - so no calculation on the C column is done.
Even the other way around work; if you first materialize the view and afterward you make the row filtering. I'm simulating it with the folowing query - not ethat teh MATERALIZE hint materializes all the view rows in a temporary table, that is used in the query.
with vv as (
select /*+ MATERIALIZE */ a,b from v1)
select a, b from vv
where a = 1;
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 5 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6605_5E8CE554 | | | | |
| 3 | TABLE ACCESS FULL | T | 1 | 26 | 3 (0)| 00:00:01 |
|* 4 | VIEW | | 1 | 26 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6605_5E8CE554 | 1 | 26 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("A"=1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22], "B"[NUMBER,22]
2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[96], SYSDEF[0]
3 - "A"[NUMBER,22], "B"[NUMBER,22]
4 - "A"[NUMBER,22], "B"[NUMBER,22]
5 - "C0"[NUMBER,22], "C1"[NUMBER,22]
Again you see in the Projection Information the column C is not referenced.
What doesn't work and you should avoid is to materialize the view with a select * .. - this of course fails.
with vv as (
select /*+ MATERIALIZE */ * from v1)
select a, b from vv
where a = 1;
SQL is a descriptive language not a procedural language. The database optimizer actually determines the execution.
For your scenario, there are two reasonable options for the optimizer:
Filter the rows and then calculate the value after filtering.
Calculate the value first and then filter.
Oracle has a good optimizer, so I would expect it to consider both possibilities. It chooses the one that would have the best performance for your query.
If you still need c only for some of the rows returned, then delaying the computation might be worthwhile if it is really really expensive. However, that would be an unusual optimization.
Related
I have a table that stores the response from certain API.
It has 1.7 million rows.
pk is a kind of UnixTime(not exactly, but smilliar).
I call the API very frequently to see if the data had changed.
To check if the data had changed, I have to run this command:
SELECT 1
FROM RATE
WHERE REGDATE = '$apiReponseDate' --yymmddhhmmss
If the answer is False, that means the reponse had changed, and then I insert.
I have an INDEX on REGDATE, and I know this makes the table to do the binary search, not a full-search.
but I do know that in order to know if the data had updated, I only need to check the recent rows.
To me, using WHERE for the whole table seems an inefficient way.
Is there any good way to see if the data I got from the API response is already in DB or not?
I'm using Oracle, but that is not a main point because I'm thinking about searching the query's efficiency.
You may use index_desc hint and filter by rownum to access the table and read the most recent row. Then compare this value with the current API response.
Example is below for (default) ascending index. If an index is created as id desc, then you need to reverse the order of reading (specify index_asc hint).
create table t (
id not null,
val
) as
select level,
dbms_random.string('x', 1000)
from dual
connect by level < 5000
create unique index t_ix
on t(id)
select
/*+
index_desc(t (t.id))
gather_plan_statistics
*/
id,
substr(val, 1, 10)
from t
where rownum = 1
ID
SUBSTR(VAL,1,10)
4999
D0H3YOHB5E
select *
from dbms_xplan.display_cursor(
format => 'ALL ALLSTATS'
)
PLAN_TABLE_OUTPUT
:-----------------
SQL_ID 2ym2rg02qfmk4, child number 0
-------------------------------------
select /*+ index_desc(t (t.id)) gather_plan_statistics */
id, substr(val, 1, 10) from t where rownum = 1
Plan hash value: 1335626365
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3 (100)| | 1 |00:00:00.01 | 3 | 1 |
|* 1 | COUNT STOPKEY | | 1 | | | | | 1 |00:00:00.01 | 3 | 1 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1005 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 3 | 1 |
| 3 | INDEX FULL SCAN DESCENDING | T_IX | 1 | 4999 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 2 | 1 |
------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / "T"#"SEL$1"
3 - SEL$1 / "T"#"SEL$1"
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
2 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
3 - "T".ROWID[ROWID,10], "ID"[NUMBER,22]
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
2 - SEL$1 / "T"#"SEL$1"
- index_desc(t (t.id))
fiddle
Instead of doing a SELECT and then INSERT, which is two queries, then you could combine the two into a MERGE statement:
MERGE INTO rate r
USING DUAL ignore
ON r.redgate = '$apiReponseDate'
WHEN NOT MATCHED THEN
INSERT (col1, col2, col3, redgate)
VALUES ('value1', 'value2', 'value3', '$apiNewReponseDate');
This will prevent you having to use two round trips from the middle-tier to the database and do it all in a single query.
I have two tables, both using valid to and valid from logic. Table 1 looks like this:
ID | VALID_FROM | VALID_TO
1 | 01.01.2000 | 04.01.2000
1 | 04.01.2000 | 16.01.2000
1 | 16.01.2000 | 17.01.2000
1 | 17.01.2000 | 19.01.2000
2 | 03.02.2001 | 04.04.2001
2 | 04.04.2001 | 14.03.2001
2 | 14.04.2001 | 18.03.2001
while table 2 looks like this:
ID | VAR | VALID_FROM | VALID_TO
1 | 3 | 01.01.2000 | 17.01.2000
1 | 2 | 17.01.2000 | 19.01.2000
2 | 4 | 03.02.2001 | 14.03.2001
Table 1 has 132,195,791 rows and table 2 has 16,964,846.
The valid from and valid to date of any observation in table 1 is within one or more valid from to valid to windows shown in table 2.
I created primary keys for both of them over ID and VALID_FROM
I want to do an inner join like:
select t1.*,
t2.var
from t1 t1
inner join t2 t2
on t1.id = t2.id
and t1.valid_from >= t2.valid_from
and t1.valid_to <= t2.valid_to;
This join is really slow. I ran it half a day without any success. What can I do to increase performance in this particular case? Please note that I also want to left join the resulting table in later stages. Any help is highly appreciated.
EDIT
Obviously, the information I gave was less then generally desired here on the platform.
I use Oracle Database 12c Enterprise Edition
The example I gave was illustrative for the bigger problem at hand. I
am concerned with joining information from different tables with
different valid_from / valid_to dates. For this I created a grid
first with the distinct values in the valid_from and valid_to
variables of all the relevant tables. This grid is what I refer here
to as table 1.
Results from the execution plan (I adjusted the column and table names to meet the terminology used in my illustrative example):
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 465M| 23G| | 435K (3)| 00:00:18 |
|* 1 | HASH JOIN | | 465M| 23G| 695M| 435K (3)| 00:00:18 |
| 2 | TABLE ACCESS FULL| TABLE2 | 16M| 501M| | 22961 (2)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TABLE1 | 132M| 3025M| | 145K (2)| 00:00:06 |
--------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
2 - SEL$58A6D7F6 / T2#SEL$1
3 - SEL$58A6D7F6 / T1#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_TO"<="T2"."VALID_TO" AND
"T1"."VALID_FROM">="T2"."VALID_FROM")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "T2"."ID"[VARCHAR2,20],
"T1"."ID"[VARCHAR2,20], "T1"."VALID_TO"[DATE,7],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7], "T1"."ID"[VARCHAR2,20],
"T1"."VALID_FROM"[DATE,7], "T1"."VALID_TO"[DATE,7], "T1"."VALID_FROM"[DATE,7]
2 - "T2"."ID"[VARCHAR2,20],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7]
3 - "T1"."ID"[VARCHAR2,20], "T1"."VALID_FROM"[DATE,7],
"T1"."VALID_TO"[DATE,7]
Note
-----
- this is an adaptive plan
A good practice is to ask first: what is expected the query will return?
Base on your WHERE predicate is seems you are interested on all versions from table2 that are included in the validity interval of table1. This may be intention, but more common you need all versions that intersect between the tables.
The second aspect is, do you need to see few first rows or all rows from the join.
If you only want to see few results, simple add AND t1.ID = nnnn to the WHERE clause to limit to some sample ID. If you have proper indexes (and tehre are no expreme lot of rows with this ID), you will get the result quick as NESTED LOOP join will kick in.
To perform the the full result, you must consider all rows from both tables. No index will help you to select all rows from a table - here is the FULL TABLE SCAN the best option.
To join the large row sets the best approach is HASH JOIN. NESTED LOOPS (which you probably use now) are quick to join few rows, but hangs on large row sets.
The smaller table (table2) is red in memory (hopefully) as a hash table. The larger table (table1) is probed against this hash table toperform the join.
This is the execution plan you should look for
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10T| 399T| | 190M(100)| 02:03:47 |
|* 1 | HASH JOIN | | 10T| 399T| 550M| 190M(100)| 02:03:47 |
| 2 | TABLE ACCESS FULL| SCD2 | 16M| 355M| | 39 (93)| 00:00:01 |
| 3 | TABLE ACCESS FULL| SCD1 | 132M| 2395M| | 211 (99)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_FROM">="T2"."VALID_FROM" AND
"T1"."VALID_TO"<="T2"."VALID_TO")
Provided you are on an enterprise database this should pass you from days to hours. Further you can deploy parallel option to get additional speed up.
Good luck!
I'm wondering why cost of this query
select * from address a
left join name n on n.adress_id=a.id
where a.street='01';
is higher than
select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';
where address table looks like this
ID NUMBER
STREET VARCHAR2(255 CHAR)
POSTAL_CODE VARCHAR2(255 CHAR)
and name table looks like this
ID NUMBER
ADDRESS_ID NUMBER
NAME VARCHAR2(255 CHAR)
SURNAME VARCHAR2(255 CHAR)
These are costs returned by explain plan
Explain plan for '01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3591 | 1595K| 87 (0)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 3591 | 1595K| 87 (0)| 00:00:02 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 3 | 207 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."STREET"='01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
Explain plan for N'01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 347 | 154K| 50 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 347 | 154K| 50 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 1 | 69 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?
[EDIT]
Table address has 30 rows.
Table name has 19669 rows.
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.
Let's try to see step-by-step in a test case.
SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;
Table created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 5 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter("COL"='a10')
13 rows selected.
SQL>
So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.
Let's see with the national characterset conversion.
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter(SYS_OP_C2C("COL")=U'a10')
13 rows selected.
SQL>
What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.
This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.
SQL> create index nchar_indx on t(to_nchar(col));
Index created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 50 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NCHAR_INDX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - access(SYS_OP_C2C("COL")=U'a10')
14 rows selected.
SQL>
However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.
My research says,
Usually, such scenarios occur when the data coming via an application
is nvarchar2 type, but the table column is varchar2. Thus, Oracle
applies an internal function in the filter operation. My suggestion
is, to know your data well, so that you use similar data types during
design phase.
When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.
You can check how long ago statistics were gathered by querying the data dictionary:
select table_name, last_analyzed
from user_tables
where table_name in ('ADDRESS','NAME');
You can gather statistics for the optimizer to use by calling DBMS_STATS:
begin
dbms_stats.gather_table_stats(user, 'ADDRESS');
dbms_stats.gather_table_stats(user, 'NAME');
end;
So perhaps after gathering statistics you will get different explain plans. Perhaps not.
The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.
In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.
In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."
When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.
select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));
In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.
As I can see the first query returns 3591 rows, the second one returns 347 rows. So Oracle needs less I/O operation that's why the cost is less.
Don't be confused with
N'01' needs additionally convert varchar to nvarchar
Oracle does one hard parse and then uses soft parse for the same queries. So the longer your oracle works the faster it becomes.
I want to index this query clause -- note that the text is static.
SELECT * FROM tbl where flags LIKE '%current_step: complete%'
To re-iterate, the current_step: complete never changes in the query. I want to build an index that will effectively pre-calculate this boolean value, thereby preventing full table scans...
I would prefer not to add a boolean column to store the pre-calculated value as this would necessitate code changes in the application....
If you don't want to change the query, and it isn't just an issue of nor changing the data maintenance (in which case a virtual column and/or index would do the job), you could use a materialised view that applies the filter, and let query rewrite take case of using that instead of the real table. Which may well be overkill but is an option.
The original plan for a mocked-up version:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL | 2 | 60 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAGS" IS NOT NULL AND "FLAGS" LIKE '%current_step:
complete%')
A materialised view that will only hold the records your query is interested in (this is a simple example but you'd need to decide how to refresh and add a log if needed):
create materialized view mvw
enable query rewrite as
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
Now your query hits the materialised view, thanks to query rewrite:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
select * from table(dbms_xplan.display);
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 3 (0)| 00:00:01 |
| 1 | MAT_VIEW REWRITE ACCESS FULL| MVW | 2 | 60 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
But any other query will still use the original table:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: working%';
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL | 1 | 27 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAGS" LIKE '%current_step: success%' AND "FLAGS" IS NOT
NULL)
Of course a virtual index would be simpler if you are allowed to modify the query...
A full text search index might be what you are looking for.
There are a few ways you can implement this:
Oracle has Oracle Text where you can define which type of full text index you want.
Lucene is a Java full text search framework.
Solr is a server product that provides full text search.
I would prefer not to add a boolean column to store the pre-calculated value as this would necessitate code changes in the application
There are two ways I can suggest :
1.
If you are on 11g and up, you could have a VIRTUAL COLUMN always generated as 1 when the value is complete else 0. All you need to do then :
select * from table where virtual_column = 1
To improve performance, you could have an index over it, which is equivalent to a function-based index.
2.
Update : Perhaps, I should be more clear with my second point : source
There are instances where Oracle will use an index to resolve a like with the pattern of '%text%'. If the query can be resolved without having to go back to the table (rowid lookup), the index may be chosen. Example:
select distinct first_nm from person where first_nm like '%EV%';
And in above case, Oracle will do an index fast full scan - a full scan of the smaller index.
Each time i want to process 5000 records like below.
First time i want to process records from 1 to 5000 rows.
second time i want to process records from 5001 to 10000 rows.
third time i want to process records from 10001 to 15001 rows like wise
I dont want to go for procedure or PL/SQL. I will change the rnum values in my code to fetch the 5000 records.
The given query is taking 3 minutes to fetch the records from 3 joined tables. How can i reduced the time to fetch the records.
select * from (
SELECT to_number(AA.MARK_ID) as MARK_ID, AA.SUPP_ID as supplier_id, CC.supp_nm as SUPPLIER_NAME, CC.supp_typ as supplier_type,
CC.supp_lock_typ as supplier_lock_type, ROW_NUMBER() OVER (ORDER BY AA.MARK_ID) as rnum
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL)
where rnum >=10001 and rnum<=15000;
I have tried below scenario but no luck.
I have tried the /*+ USE_NL(AA BB) */ hints.
I used exists in the where conditions. but its taking the same 3 minutes to fetch the records.
Below is the table details.
select count(*) from TABLE_B;
-----------------
2275
select count(*) from TABLE_A;
-----------------
2405276
select count(*) from TABLE_C;
-----------------
1269767
Result of my inner query total records is
SELECT count(*)
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL;
-----------------
2027055
All the used columns in where conditions are indexed properly.
Explain Table for the given query is...
Plan hash value: 3726328503
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 1 | VIEW | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 2 | WINDOW SORT PUSHED RANK | | 2082K| 166M| 200M| 85175 (1)| 00:17:03 |
|* 3 | HASH JOIN | | 2082K| 166M| | 44550 (1)| 00:08:55 |
| 4 | TABLE ACCESS FULL | TABLE_C | 1640 | 49200 | | 22 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 2082K| 107M| 27M| 44516 (1)| 00:08:55 |
|* 6 | VIEW | index$_join$_005 | 1274K| 13M| | 9790 (1)| 00:01:58 |
|* 7 | HASH JOIN | | | | | | |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | INDEX RANGE SCAN | TABLE_B_IN2 | 1274K| 13M| | 2371 (2)| 00:00:29 |
| 10 | INDEX FAST FULL SCAN| TABLE_B_IU1 | 1274K| 13M| | 4801 (1)| 00:00:58 |
|* 11 | TABLE ACCESS FULL | TABLE_A | 2356K| 96M| | 27174 (1)| 00:05:27 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=10001 AND "RNUM"<=15000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "A"."MARK_ID")<=15000)
3 - access("A"."SUPP_ID"="C"."LOC_ID" AND "A"."VALUE_KEY"="C"."VALUE_KEY")
5 - access("A"."MARK_ID"="A"."MARK_ID" AND "A"."VALUE_KEY"="A"."VALUE_KEY")
6 - filter("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
7 - access(ROWID=ROWID)
9 - access("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
11 - filter("A"."CHNL_ID"=160 AND "A"."VPR_ID" IS NOT NULL)
Could you please anyone help me on this to tune this query as i am trying from last 2 days?
Each query will take a long time because each query will have to join then sort all rows. The row_number analytic function can only return a result if the whole set has been read. This is highly inefficient. If the data set is large, you only want to sort and hash-join once.
You should fetch the whole set once, using batches of 5k rows. Alternatively, if you want to keep your existing code logic, you could store the result in a temporary table, for instance:
CREATE TABLE TMP AS <your above query>
CREATE INDEX ON TMP (rnum)
And then replace your query in your code by
SELECT * FROM TMP WHERE rnum BETWEEN :x AND :y
Obviously if your temp table is being reused periodically, just create it once and delete when done (or use a true temporary table).
How many unique MARK_ID values have you got in TABLE_A? I think you may get better performance if you limit the fetched ranges of records by MARK_ID instead of the artificial row number, because the latter is obviously not sargeable. Granted, you may not get exactly 5000 rows in each range but I have a feeling it's not as important as the query performance.
Firstly, giving obfuscated table names makes it nearly impossible to deduce anything about the data distributions and relationships between tables, so potential answerers are crippled from the start.
However, if every row in table_a matches one row in the other tables then you can avoid some of the usage of 200Mb of temporary disk space that is probably crippling performance by pushing the ranking down into an inline view or common table expression.
Monitor V$SQL_WORKAREA to check the exact amount of space being used for the window function, and if it is still excessive consider modifying the memory management to increase available sort area size.
Something like:
with cte_table_a as (
SELECT
to_number(MARK_ID) as MARK_ID,
SUPP_ID as supplier_id,
ROW_NUMBER() OVER (ORDER BY MARK_ID) as rnum
from
TABLE_A
where
char_id='160' and
VPR_ID IS NOT NULL)
select ...
from
cte_table_a aa,
TABLE_B BB,
TABLE_C CC
WHERE
aa.rnum >= 10001 and
aa.rnum <= 15000 and
AA.MARK_ID = BB.MARK_ID AND
AA.SUPP_ID = CC.location_id AND
BB.VALUE_KEY = AA.VALUE_KEY AND
BB.VALUE_KEY = CC.VALUE_KEY