Which sql statement is faster when counting nulls? - sql

I need to make a query to determine if 3 columns are not filled out. Should I make a column in the table just as a flag to note that 3 columns are empty? My instinct tells me that I shouldn't make an extra column. I'm just wondering if I would get any performance boost from doing so. This is for oracle server.
select count(*) from my_table t where t.not_available = 1;
or
select count(*) from my_table t where t.col1 is null and t.col2 is null and t.col3 is null;

I think you are doing a pre-mature optimization.
Adding an extra column into a table increases the size of each record. This would typically mean that a table would occupy more space on disk. Large table sizes imply longer full table scans.
Adding indexes might help. But, there is an associated cost with them. If an index would help, you don't need to add another column, because Oracle supports functional indexes. So, you can index on an expression.
In most cases, your query is going to do a full table scan or full index scan, unless some of the conditions are rare.
In other words, to have a change of really answering your question requires understanding:
The record layout
The distribution of values in the three columns
Any additional factors that might affect access, such as partitioned columns

Only when performance leaves you with no other choice should you resort to an extra redundant column. In this case, you should probably avoid it. Just introduce an index on (col1,col2,col3,1) if performance of this statement is too poor.
Here is an example of why putting the 4th constant value 1 in the index is probably a good idea.
First a table with 1000 rows, out of which only 1 row (456) has all three columns NULL:
SQL> create table my_table (id,col1,col2,col3,fill)
2 as
3 select level
4 , nullif(level,456)
5 , nullif(level,456)
6 , nullif(level,456)
7 , rpad('*',100,'*')
8 from dual
9 connect by level <= 1000
10 /
Table created.
A row with three NULLS is not indexed by the index below:
SQL> create index my_table_i1 on my_table(col1,col2,col3)
2 /
Index created.
and will use a full table scan in my test case (likely a full index scan on your primary key index in your case)
SQL> exec dbms_stats.gather_table_stats(user,'my_table')
PL/SQL procedure successfully completed.
SQL> set autotrace on
SQL> select count(*) from my_table t where t.col1 is null and t.col2 is null and t.col3 is null
2 /
COUNT(*)
----------
1
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 228900979
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
|* 2 | TABLE ACCESS FULL| MY_TABLE | 1 | 12 | 8 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."COL1" IS NULL AND "T"."COL2" IS NULL AND "T"."COL3"
IS NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
37 consistent gets
0 physical reads
0 redo size
236 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
But if I add a constant 1 to the index:
SQL> set autotrace off
SQL> drop index my_table_i1
2 /
Index dropped.
SQL> create index my_table_i2 on my_table(col1,col2,col3,1)
2 /
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'my_table')
PL/SQL procedure successfully completed.
Then it will use the index and your statement will fly
SQL> set autotrace on
SQL> select count(*) from my_table t where t.col1 is null and t.col2 is null and t.col3 is null
2 /
COUNT(*)
----------
1
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 623815834
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
|* 2 | INDEX RANGE SCAN| MY_TABLE_I2 | 1 | 12 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T"."COL1" IS NULL AND "T"."COL2" IS NULL AND "T"."COL3"
IS NULL)
filter("T"."COL2" IS NULL AND "T"."COL3" IS NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
236 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL execution speed depends on numerous factors you didn't list in your question.
Therefore I think you should check the execution plan on your specific server to get first-hand benchmarks of both situtations.
See e.g. here for info on how to display the execution plan on Oracle..

If the column(s) in the WHERE clause allow the use of an index then most likely that would be the faster. However if no columns are indexed then I would expect the first query to be superior.
But checking the plan is the always the best way to know.

I would create on index on not_available and then query that.
CREATE INDEX index_name
ON table_name (not_available)

Something like this might help you?
select count(NVL2(t.col1||t.col2||t.col3),NULL,1) FROM my_table t;

Related

Execution plan of table in cluster expects one row when it should expect multiple

I have created a cluster and a table in the cluster with the following definitions:
create cluster roald_dahl_titles (
title varchar2(100)
);
create index i_roald_dahl_titles
on cluster roald_dahl_titles
;
create table ROALD_DAHL_NOVELS (
title varchar2(100),
published_year number
)
cluster roald_dahl_titles (title)
;
Notably, this is index is not created with the unique constraint, and it's quite possible to insert duplicate values into the table ROALD_DAHL_NOVELS:
insert into roald_dahl_novels (title, published_year) values ('Esio Trot', 1990);
insert into roald_dahl_novels (title, published_year) values ('Esio Trot', 1990);
I then gather statistics on the both the table and the index, and look at an execution plan that uses the index:
begin
dbms_stats.gather_table_stats(user, 'ROALD_DAHL_NOVELS');
dbms_stats.gather_INDEX_stats(user, 'I_ROALD_DAHL_TITLES');
end;
/
explain plan for
select published_year
from roald_dahl_novels
where title = 'Esio Trot';
select *
from table(dbms_xplan.display(format => 'ALL'));
The contents of the execution plan I find a bit confusing, though:
Plan hash value: 2187850431
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 28 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS CLUSTER| ROALD_DAHL_NOVELS | 2 | 28 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | I_ROALD_DAHL_TITLES | 1 | | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
2 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TITLE"='Esio Trot')
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10], "TITLE"[VARCHAR2,100],
"PUBLISHED_YEAR"[NUMBER,22]
2 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10]
As part of operation 2, it performs an index unique scan, which tells me that 'Esio Trot' is expected to appear only once in the cluster. The execution plan also says that for that operation, it expects to return only one row.
The column projection information tells me that it expects to return a single column (which will be a ROWID for the table ROALD_DAHL_NOVELS), so this tells me that the total number of ROWIDs returned from that operation will be 1 (1 row at 1 ROWID per row). Since each of the two rows in the table ROALD_DAHL_NOVELS has a different ROWID, then this operation can only be used to return a single row from the table.
When the TABLE ACCESS CLUSTER operation is performed, the execution plan then (correctly) expects two rows to be returned, which is what I find confusing. If these rows are being accessed by ROWID, then I would expect the previous operation to return (at least) two ROWIDs. If they are not being accessed by ROWID, I would not expect the previous operation to return and ROWIDs.
Also, in the TABLE ACCESS CLUSTER, the ROWID of the table ROALD_DAHL_NOVELS is listed in the column projection information section. I am not attempting to select the ROWID, so I would not expect it to be returned from that operation. If anywhere, I would expect it to be in the predicate information section.
Additional investigation
I tried inserting the same row into the table repeatedly, until it contained 65536 identical copies of the same row. After gathering stats and querying USER_INDEXES for the index I_ROALD_DAHL_TITLES, we got the following:
UNIQUENESS DISTINCT_KEYS AVG_DATA_BLOCKS_PER_KEY
UNIQUE 1 109
As I understand it, this tells us:
The index is unique, so we expect each key to appear once in the index
The index has only one distinct key ('Esio Trot'), so must have exactly one entry
The index expects our one key to match to several rows in the table, across 109 blocks
This seems paradoxical - for one key to match to several rows in the table would mean that there must be several entries in the index for that key (each matching to a different ROWID), which would contradict the index being unique.
When checking USER_EXTENTS, the index only uses a single extent of 65536 bytes, which is not enough space to hold information for each of the ROWIDs in the table.
It's not a bug.
Run this query in your database:
select UNIQUENESS from dba_indexes where index_name = upper('i_roald_dahl_titles');
UNIQUENES
---------
UNIQUE
The reason for this is that B-tree cluster indexes only store the database block address of the cluster block that stores that data -- it does not store full rowid values, like a normal index would.
So, while your various rows for title = 'Esio Trot' might have rowid values like:
select rowid row_id, title from roald_dahl_novels n;
ROW_ID TITLE
------------------ ----------------------------------------------------------------------------------------------------
ABocNnACmAABWsWAAL Esio Trot
ABocNnACmAABWsWAAM Esio Trot
ABocNnACmAABWsWAAN Esio Trot
The B-tree cluster index only stores one entry: "Esio Trot", with the corresponding database block address. You can confirm this in your database with:
select num_rows from dba_indexes where index_Name = 'I_ROALD_DAHL_TITLES';
NUM_ROWS
----------
1
That is why you are getting an UNIQUE SCAN reported. Because that is what it is doing, as far as the index is concerned.
There is the same issue with the actual execution plan (tested in 19.5).
Maybe it is a limitation or bug of the displayed execution plan for cluster objects. I would ask this question on asktom.oracle.com to have some kind of official (and free) answer from Oracle.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f41cf1x2zdyyr, child number 0
-------------------------------------
select published_year from roald_dahl_novels where title = 'Esio
Trot'
Plan hash value: 2187850431
--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1 (100)| | 2 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS CLUSTER| ROALD_DAHL_NOVELS | 1 | 2 | 28 | 1 (0)| 00:00:01 | 2 |00:00:00.01 | 3 |
|* 2 | INDEX UNIQUE SCAN | I_ROALD_DAHL_TITLES | 1 | 1 | | 0 (0)| | 1 |00:00:00.01 | 1 |
--------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
2 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TITLE"='Esio Trot')
Column Projection Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------
1 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10], "TITLE"[VARCHAR2,100], "PUBLISHED_YEAR"[NUMBER,22]
2 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10]
32 rows selected.

where column is null taking longer time to execute

I am executing a select statement like the one below which is taking more than 6mins to execute.
select * from table where col1 is null;
whereas:
select * from table;
returns results in few seconds. The table contains 25million records. No indexes are used. there is a composite PK but not on the col used. Same query when executed on a different table with 50 million records, returns results in few seconds. only this table poses a problem.
Rebuilt the table to check if there was a miss, but still facing the same issue.
can some one help me here on why it is taking time?
datatype: VARCHAR2(40)
PLAN:
Plan hash value: 2838772322
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 794 | 60973 (16)| 00:00:03 |
|* 1 | TABLE ACCESS STORAGE FULL| table | 1 | 794 | 60973 (16)| 00:00:03 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("column" IS NULL)
filter("column" IS NULL)
select * from table;
Oracle SQL Developer tool has a default setting to display only 50 records unless it was manually edited. So the entire 25 million records will not be fetched as you don't need all the records for display.
select * from table where col1 is null;
But when you filter for null values, the entire set of 25 million rows has to be scanned to apply the filter and get your 81 records satisfying that predicate. Hence your second query takes longer.

How to optimize query? Explain Plan

I have one table with 3 fields and I neeed get all value of fields, I have next query:
SELECT COM.FIELD1, COM.FIELD2, COM.FIELD3
FROM OWNER.TABLE_NAME COM
WHERE COM.FIELD1 <> V_FIELD
ORDER BY COM.FIELD3 ASC;
And i want optimaze, I have next values of explain plan:
Plan
SELECT STATEMENT CHOOSECost: 4 Bytes: 90 Cardinality: 6
2 SORT ORDER BY Cost: 4 Bytes: 90 Cardinality: 6
1 TABLE ACCESS FULL OWNER.TABLE_NAME Cost: 2 Bytes: 90 Cardinality: 6
Any solution for not get TAF(Table Acces Full)?
Thanks!
Since your WHERE condition is on the column FIELD1, an index on that column many help.
You may already have an index on that column. Even then, you will still see a full table access, if the expected number of rows that don't have VAL1 in that column is sufficiently large.
The only case when you will NOT see full table access is if you have an index on that column, the vast majority (at least, say, 80% to 90%) of rows in the table do have the value VAL1 in the column FIELD1, and statistics are up to date AND, perhaps, you need to use a histogram (because in this case the distribution of values in FIELD1 would be very skewed).
I suppose that your table has a very large number of rows with a given key (let call it 'B') and a very small number of rows with other keys.
Note, that the index access will work only for conditions FIELD1 <> 'B', all other predicates will return 'B' and therefore are not suitable for index access.
Note also that if you have more that one large key, the index access will not work from the same reason - you will never get only a few record where index can profit.
As a starting point you can reformulte the predicate
FIELD1 <> V_FIELD
as
DECODE(FIELD1,V_FIELD,1,0) = 0
The DECODE return 1 if FIELD1 = V_FIELD and returns 0 if FIELD1 <> V_FIELD
This transformation allows you to define a function based index with the DECODE expression.
Example
create table tt as
select
decode(mod(rownum,10000),1,'A','B') FIELD1
from dual connect by level <= 50000;
select field1, count(*) from tt group by field1;
FIELD1 COUNT(*)
------ ----------
A 5
B 49995
FBIndex
create index tti on tt(decode(field1,'B',1,0));
Use your large key for the index definition.
Access
To select FIELD1 <> 'B' use reformulated predicate decode(field1,'B',1,0) = 0
Which leads nicely to an index access:
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
SELECT * from tt where decode(field1,'B',1,0) = 0;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 471 | 2355 | 24 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TT | 471 | 2355 | 24 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TTI | 188 | | 49 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(DECODE("FIELD1",'B',1,0)=0)
To select FIELD1 <> 'A' use reformulated predicate decode(field1,'A',1,0) = 0
Here you don't want index access as nearly the whole table is returned- and the CBO opens FULL TABLE SCAN.
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
SELECT * from tt where decode(field1,'A',1,0) = 0;
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 47066 | 94132 | 26 (4)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TT | 47066 | 94132 | 26 (4)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(DECODE("FIELD1",'A',1,0)=0)
Bind Variables
This will work the same way even if you use bind variables FIELD1 <> V_FIELD - provided you pass always the same value.
The bind variable peeking will evaluate the correct plan in the first parse and generate the proper plan.
If you will use more that one values as bind variable (and therefore expect to get different plans for different values) - you will learn the feature of adaptive cursor sharing
The query is already optimized, don't spend any more time on it unless it's running noticeably slow. If you have a tuning checklist that says "avoid all full table scans" it might be time to change that checklist.
The cost of the full table scan is only 2. The exact meaning of the cost is tricky, and not always particularly helpful. But in this case it's probably safe to say that 2 means the full table scan will run quickly.
If the query is not running in less than a few microseconds, or is returning significantly more than the estimated 6 rows, then there may be a problem with the optimizer statistics. If that's the case, try gathering statistics like this:
begin
dbms_stats.gather_table_stats('OWNER', 'TABLE_NAME');
end;
/
As #symcbean pointed out, a full table scan is not always a bad thing. If a table is incredibly small, like this one might be, all the data may fit inside a single block. (Oracle accesses data by block(s)-at-a-time, where the block is usually 8KB of data.) When the data structures are trivially small there won't be any significant difference between using a table or an index.
Also, full table scans can use multi-block reads, whereas most index access paths use single-block reads. For reading a large percentage of data it's faster to read the whole thing with multi-block reads than reading it one-block-at-a-time with an index. Since this query only has a <> condition, it looks likely that this query will read a large percentage of data and a full table scan is optimal.

Efficient retrieval of overlapping IP range records via a single point

I have a table with millions of IP range records (start_num, end_num respectively) which I need to query via a single IP address in order to return all ranges which overlap that point. The query is essentially:
SELECT start_num
, end_num
, other_data_col
FROM ip_ranges
WHERE :query_ip BETWEEN start_num and end_num;
The table has 8 range partitions on start_num and has a local composite index on (start_num, end_num). Call it UNQ_RANGE_IDX. Statistics have been gathered on the table and index.
The query does an index range scan on the UNQ_RANGE_IDX index as expected and in some cases performs very well. The cases where it performs well are toward the bottom of the IP address space (i.e. something like 4.4.10.20) and performance is poor when at the upper end. (i.e. 200.2.2.2) I'm sure that the problem resides in the fact that on the lower end, the optimizer can prune all the partitions above the one that contains the applicable ranges due to the range partitioning on start_num providing the information necessary to prune. When querying on the top end of the IP spectrum, it can't prune the lower partitions and therefore it incurs the I/O of reading the additional index partitions. This can be verified via the number of CR_BUFFER_GETS when tracing the execution.
In reality, the ranges satisfying the query won't be in any partition but the one the query_ip is located in or the one immediately below or above it as the range size won't be greater than an A class and each partition covers many A classes each. I can make Oracle use that piece of information by specifying it in the where clause, but is there a way to convey this type of information to Oracle via stats, histograms, or a custom/domain index? It seems that there would be a common solution/approach to this sort of problem when searching for date ranges that cover a specific date as well.
I'm looking for solutions that use Oracle and its functionality to tackle this problem, but other solution types are appreciated. I've thought of a couple methods outside the scope of Oracle that would work, but I'm hoping for a better means of indexing, statistics gathering, or partitioning that will do the trick.
Requested Info:
CREATE TABLE IP_RANGES (
START_NUM NUMBER NOT NULL,
END_NUM NUMBER NOT NULL,
OTHER NUMBER NOT NULL,
CONSTRAINT START_LTE_END CHECK (START_NUM <= END_NUM)
)
PARTITION BY RANGE(START_NUM)
(
PARTITION part1 VALUES LESS THAN(1090519040) TABLESPACE USERS,
PARTITION part2 VALUES LESS THAN(1207959552) TABLESPACE USERS
....<snip>....
PARTITION part8 VALUES LESS THAN(MAXVALUE) TABLESPACE USERS
);
CREATE UNIQUE INDEX IP_RANGES_IDX ON IP_RANGES(START_NUM, END_NUM, OTHER) LOCAL NOLOGGING;
ALTER TABLE IP_RANGES ADD CONSTRAINT PK_IP_RANGE
PRIMARY KEY(START_NUM, END_NUM, OTHER) USING INDEX IP_RANGES_IDX;
There is nothing special about the cutoff values selected for the range partitions. They are simply A class addresses where the number of ranges per partition would equate to about 1M records.
I've had a similar problem in the past; the advantage I had was that my ranges were distinct. I've got several IP_RANGES tables, each for a specific context, and the largest is ~10 million or so records, unpartitioned.
Each of the tables I have is index-organized, with the primary key being (END_NUM, START_NUM). I've also got a unique index on (START_NUM, END_NUM), but it's not used in this case.
Using a random IP address (1234567890), your query takes about 132k consistent gets.
The query below returns in between 4-10 consistent gets (depending on IP) on 10.2.0.4.
select *
from ip_ranges outr
where :ip_addr between outr.num_start and outr.num_end
and outr.num_end = (select /*+ no_unnest */
min(innr.num_end)
from ip_ranges innr
where innr.num_end >= :ip_addr);
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 70 | 6 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN | IP_RANGES_PK | 1 | 70 | 3 (0)| 00:00:01 |
| 2 | SORT AGGREGATE | | 1 | 7 | | |
| 3 | FIRST ROW | | 471K| 3223K| 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN (MIN/MAX)| IP_RANGES_PK | 471K| 3223K| 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OUTR"."NUM_END"= (SELECT /*+ NO_UNNEST */ MIN("INNR"."NUM_END") FROM
"IP_RANGES" "INNR" WHERE "INNR"."NUM_END">=TO_NUMBER(:IP_ADDR)) AND
"OUTR"."NUM_START"<=TO_NUMBER(:IP_ADDR))
filter("OUTR"."NUM_END">=TO_NUMBER(:IP_ADDR))
4 - access("INNR"."NUM_END">=TO_NUMBER(:IP_ADDR))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
968 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
The NO_UNNEST hint is key; it tells Oracle to run that subquery once, not once for each row, and it gives an equality test for the index to use in the outer query.
I suggest you turn your 8 million row table into a bigger table.
Google's IP (for me, at the moment) is coming up as
"66.102.011.104"
You store one record as "66.102.011" with the respective range(s) that it falls in. In fact you store at least one record for every "aaa.bbb.ccc". You'll probably end up with a table maybe five times as big, but one you can can pin-point the relevant record with just a few logical IOs each time rather than the hundreds/thousands for a partition scan.
I suspect any data you have is going to be a little out of date anyway (as various authorities around the world issue/re-issue ranges), so regenerating adjustments for that table on a daily/weekly basis shouldn't be a big problem.
The problem I see is Local Partitioned Index and as you said looks like Oracle doesn't do prune the partition list efficiently. Can you try with Global Index? Local partitioned index doesn't scale well for OLTP queries. In our environment we don't use any Local partitioned index.
Would you please indicate if there are any uniform or ordered characteristics of your IP ranges? For example, I would normally expect IP ranges to lie on power-of-2 boundaries. Is that the case here, so we can assume that all ranges have an implicit net mask that starts with m ones followed by n zeroes where m + n = 32?
If so, there should be a way to exploit this knowledge and "step" into the ranges. Would it be possible to add an index on a calculated value with the count of the masked bits (0-32) or maybe the block size (1 to 2^32)?
32 seeks from masks 0 to 32 using just the start_num would be faster than a scan using BETWEEN start_num AND end_num.
Also, have you considered bit arithmetic as a possible means of checking for matches (again only if the ranges represent evenly-positioned chunks in power-of-2 sizes).
Firstly, what is your performance requirement ?
Your partitions have a definite start value and end values which can be determined from ALL_PARTITIONS (or hard-coded) and used in a function (concept below but you'd need to amend it to go one partition forward/back).
You should then be able to code
SELECT * FROM ip_ranges
WHERE :query_ip BETWEEN start_num and end_num
AND start_num between get_part_start(:query_ip) and get_part_end(:query_ip);
Which should be able to lock it down to specific partition(s). However if, as you suggest, you can only lock it down to three out of eight partitions, that is still going to be a BIG scan. I'm posting another, more radical answer as well which may be more appropriate.
create or replace function get_part_start (i_val in number)
return number deterministic is
cursor c_1 is
select high_value from all_tab_partitions
where table_name = 'IP_RANGES'
order by table_owner, table_name;
type tab_char is table of varchar2(20) index by pls_integer;
type tab_num is table of number index by pls_integer;
t_char tab_char;
t_num tab_num;
v_ind number;
begin
open c_1;
fetch c_1 bulk collect into t_char;
close c_1;
--
for i in 1..t_char.last loop
IF t_char(i) != 'MAXVALUE' THEN
t_num(to_number(t_char(i))) := null;
END IF;
end loop;
--
IF i_val > t_num.last then
return t_num.last;
ELSIF i_val < t_num.first then
return 0;
END IF;
v_ind := 0;
WHILE i_val >= t_num.next(v_ind) loop
v_ind := t_num.next(v_ind);
exit when v_ind is null;
END LOOP;
return v_ind;
end;
/
Your existing partitioning doesn't work, because Oracle is accessing the table's local index partitions by start_num, and it's got to check each one where there could be a match.
A different solution, assuming no ranges span a class A, would be to list partition by trunc(start_num / power(256,3)) -- the first octet. It may be worth breaking it out into a column (populated via trigger) and adding that as a filtering column to your query.
Your ~10m rows would then be, assuming an even distribution, be spread out into about 40k rows, which might be a lot faster to read through.
I ran the use case discussed below, assuming that no range spans a class A network.
create table ip_ranges
(start_num number not null,
end_num number not null,
start_first_octet number not null,
...
constraint start_lte_end check (start_num <= end_num),
constraint check_first_octet check (start_first_octet = trunc(start_num / 16777216) )
)
partition by list ( start_first_octet )
(
partition p_0 values (0),
partition p_1 values (1),
partition p_2 values (2),
...
partition p_255 values (255)
);
-- run data population script, ordered by start_num, end_num
create index ip_ranges_idx01 on ip_ranges (start_num, end_num) local;
begin
dbms_stats.gather_table_stats (ownname => user, tabname => 'IP_RANGES', cascade => true);
end;
/
Using the base query above still performs poorly, as it's unable to do effective partition elimination:
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25464 | 1840K| 845 (1)| 00:00:05 | | |
| 1 | PARTITION LIST ALL | | 25464 | 1840K| 845 (1)| 00:00:05 | 1 | 256 |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES | 25464 | 1840K| 845 (1)| 00:00:05 | 1 | 256 |
|* 3 | INDEX RANGE SCAN | IP_RANGES_IDX01 | 825 | | 833 (1)| 00:00:05 | 1 | 256 |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
filter("END_NUM">=TO_NUMBER(:IP_ADDR))
Statistics
----------------------------------------------------------
15 recursive calls
0 db block gets
141278 consistent gets
94469 physical reads
0 redo size
1040 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
However, if we add the condition to allow Oracle to focus on a single partition, it makes a huge difference:
SQL> select * from ip_ranges
2 where :ip_addr between start_num and end_num
3 and start_first_octet = trunc(:ip_addr / power(256,3));
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 183 | 13542 | 126 (2)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE | | 183 | 13542 | 126 (2)| 00:00:01 | KEY | KEY |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES | 183 | 13542 | 126 (2)| 00:00:01 | KEY | KEY |
|* 3 | INDEX RANGE SCAN | IP_RANGES_IDX01 | 3 | | 322 (1)| 00:00:02 | KEY | KEY |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
filter("END_NUM">=TO_NUMBER(:IP_ADDR))
Statistics
----------------------------------------------------------
15 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
1040 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

How to optimize an update SQL that runs on a Oracle table with 700M rows

UPDATE [TABLE] SET [FIELD]=0 WHERE [FIELD] IS NULL
[TABLE] is an Oracle database table with more than 700 million rows. I cancelled the SQL execution after it had been running for 6 hours.
Is there any SQL hint that could improve performance? Or any other solution to speed that up?
EDIT: This query will be run once and then never again.
First of all is it a one-time query or is it a recurrent query ? If you only have to do it once you may want to look into running the query in parallel mode. You will have to scan all rows anyway, you could either divide the workload yourself with ranges of ROWID (do-it-yourself parallelism) or use Oracle built-in features.
Assuming you want to run it frequently and want to optimize this query, the number of rows with the field column as NULL will eventually be small compared to the total number of rows. In that case an index could speed things up. Oracle doesn't index rows that have all indexed columns as NULL so an index on field won't get used by your query (since you want to find all rows where field is NULL).
Either:
create an index on (FIELD, 0), the 0 will act as a non-NULL pseudocolumn and all rows will be indexed on the table.
create a function-based index on (CASE WHEN field IS NULL THEN 1 END), this will only index the rows that are NULLs (the index would therefore be very compact). In that case you would have to rewrite your query:
UPDATE [TABLE] SET [FIELD]=0 WHERE (CASE WHEN field IS NULL THEN 1 END)=1
Edit:
Since this is a one-time scenario, you may want to use the PARALLEL hint:
SQL> EXPLAIN PLAN FOR
2 UPDATE /*+ PARALLEL(test_table 4)*/ test_table
3 SET field=0
4 WHERE field IS NULL;
Explained
SQL> select * from table( dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4026746538
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 22793 | 289K| 12 (9)| 00:00:
| 1 | UPDATE | TEST_TABLE | | | |
| 2 | PX COORDINATOR | | | | |
| 3 | PX SEND QC (RANDOM)| :TQ10000 | 22793 | 289K| 12 (9)| 00:00:
| 4 | PX BLOCK ITERATOR | | 22793 | 289K| 12 (9)| 00:00:
|* 5 | TABLE ACCESS FULL| TEST_TABLE | 22793 | 289K| 12 (9)| 00:00:
--------------------------------------------------------------------------------
Are other users are updating the same rows in the table at the same time ?
If so, you could be hitting lots of concurrency issues (waiting for locks) and it may be worth breaking it into smaller transactions.
DECLARE
v_cnt number := 1;
BEGIN
WHILE v_cnt > 0 LOOP
UPDATE [TABLE] SET [FIELD]=0 WHERE [FIELD] IS NULL AND ROWNUM < 50000;
v_cnt := SQL%ROWCOUNT;
COMMIT;
END LOOP;
END;
/
The smaller the ROWNUM limit the less concurrency/locking issues you'll hit, but the more time you'll spend in table scanning.
Vincent already answered your question perfectly, but I'm curious about the "why" behind this action. Why are you updating all NULL's to 0?
Regards,
Rob.
Some suggestions:
Drop any indexes that contain FIELD before running your UPDATE statement, and then re-add them later.
Write a PL/SQL procedure to do this that commits after every 1000 or 10000 rows.
Hope this helps.
You could acheive the same result without updating by using an ALTER table to set the columns "DEFAULT" value to 0.