Oracle performance using functions in where clause - sql

In a stored procedure (which has a date parameter named 'paramDate' ) I have a query like this one
select id, name
from customer
where period_aded = to_char(paramDate,'mm/yyyy')
will Oracle convert paramDate to string for each row?
I was sure that Oracle wouldn't but I was told that Oracle will.
In fact I thought that if the parameter of the function was constraint (not got a fierld nor a calculated value inside the query) the result should be allways the same, and that's why Oracle should perform this conversion only once.
Then I realized that I've sometimes executed DML sentences in several functions, and perhaps this could cause the resulting value to change, even if it does not change for each row.
This should mean that I should convert such values before I add them to the query.
Anyway, perhaps well 'known functions' (built in) are evaluated once, or even my functions would also be.
Anyway, again...
Will oracle execute that to_char once or will Oracle do it for each row?
Thanks for your answers

I do not think this is generally the case, as it would prevent an index from being used.
At least for built-in functions, Oracle should be able to figure out that it could evaluate it only once. (For user-defined functions, see below).
Here is a case where an index is being used (and the function is not evaluated for every row):
SQL> select id from tbl_table where id > to_char(sysdate, 'YYYY');
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 35 | 140 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| SYS_C004274 | 35 | 140 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID">TO_NUMBER(TO_CHAR(SYSDATE#!,'YYYY')))
For user-defined functions check out this article. It mentions two ways to ensure
that your function gets called only once:
Since Oracle 10.2, you can define the function as DETERMINISTIC.
On older versions you can re-phrase it to use "scalar subquery caching":
SELECT COUNT(*)
FROM EMPLOYEES
WHERE SALARY = (SELECT getValue(1) FROM DUAL);

Looking at write-ups on the DETERMINISTIC keyword (here is one, here is another), it was introduced to allow the developer to tell Oracle that the function will return the same value for the same input params. So if you want your functions to be called only once, and you can guarantee they will always return the same value for the same input params you can use the keyword DETERMINISTIC.
With regards to built-in functions like to_char, I defer to those who are better versed in the innards of Oracle to give you direction.

The concern about to_char does not ring a bell with me. However, in your pl/sql,
you could have
create or replace procedure ........
some_variable varchar2(128);
begin
some_variable := to_char(paramDate,'mm/yyyy');
-- and your query could read
select id, name from customer where period_aded = some_variable;
.
.
.
end;
/
Kt

Related

Why I cant use the column name in the alias when i opered with dates

Currently I am migrating a database from SQL_SERVER to SPARK using HIVE_SQL.
I had an issue when im trying to pass a number to a date format.I found the answer is:
from_unixtime(unix_timestamp(cast(DATE as string) , 'dd-MM-yyyy'))
When I execute this query it bring me the data, notice that iI put an alias different to the name of column FECHA :
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(FECHA AS STRING ) ,'yyyyMMdd'), 'yyyy-MM-dd') AS FECHA_1
FROM reportes_hechos_avisos_diarios
LIMIT 1
| FECHA_1 |
| -------- |
| 2019-01-01 |
But when I put the same alias as the column name it bring me an incosistent information:
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(FECHA AS STRING ) ,'yyyyMMdd'), 'yyyy-MM-dd') AS FECHA
FROM reportes_hechos_avisos_diarios
LIMIT 1
| FECHA |
| -------- |
| 2.019 |
I know the trivial answer is , put an alias that doesnt be the same as the column name, but i have an implementation in Tableau that feeds from this query and Its complicated to change this columns because basically i must change all implementation so I need to preserve the column name.This query works for me in SQL SERVER, but i dont know why doesnt works in hive.
Issue
ExpectedResult
PSDT:Thanks for your attention, this is the first question I ask in stack and my native language is not English, sorry if I had grammatical errors.
limit 1 without order by can produce non-deterministic results from run to run because the order of rows is random due to parallel execution, some factors may affect it somehow but getting the same row is not guaranteed.
What is happening - I guess you receiving different row and the date is corrupted in that row, this is why some weird result is returned.
Also, you can another method of conversion:
select date(regexp_replace(cast(20200101 as string),'(\\d{4})(\\d{2})(\\d{2})','$1-$2-$3')) --put your column instead of constant.
Result:
2020-01-01

Index is not being used by optimizer

I have a query which is performing very badly due to full scan of a table.I have checked the statistics rebuild the indexes but its not working.
SQL Statement:
select distinct NA_DIR_EMAIL d, NA_DIR_EMAIL r
from gcr_items , gcr_deals
where gcr_deals.GCR_DEALS_ID=gcr_items.GCR_DEALS_ID
and
gcr_deals.bu_id=:P0_BU_ID
and
decode(:P55_DIRECT,'ALL','Y',trim(upper(NA_ORG_OWNER_EMAIL)))=
decode(:P55_DIRECT,'ALL','Y',trim(upper(:P55_DIRECT)))
order by 1
Execution Plan :
Plan hash value: 3180018891
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 00:11:42 |
| 1 | SORT ORDER BY | | 8 | 00:11:42 |
| 2 | HASH UNIQUE | | 8 | 00:11:42 |
|* 3 | HASH JOIN | | 7385 | 00:11:42 |
|* 4 | VIEW | index$_join$_002 | 10462 | 00:00:05 |
|* 5 | HASH JOIN | | | |
|* 6 | INDEX RANGE SCAN | GCR_DEALS_IDX12 | 10462 | 00:00:01 |
| 7 | INDEX FAST FULL SCAN| GCR_DEALS_IDX1 | 10462 | 00:00:06 |
|* 8 | TABLE ACCESS FULL | GCR_ITEMS | 7386 | 00:11:37 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("GCR_DEALS"."GCR_DEALS_ID"="GCR_ITEMS"."GCR_DEALS_ID")
4 - filter("GCR_DEALS"."BU_ID"=TO_NUMBER(:P0_BU_ID))
5 - access(ROWID=ROWID)
6 - access("GCR_DEALS"."BU_ID"=TO_NUMBER(:P0_BU_ID))
8 - filter(DECODE(:P55_DIRECT,'ALL','Y',TRIM(UPPER("NA_ORG_OWNER_EMAI
L")))=DECODE(:P55_DIRECT,'ALL','Y',TRIM(UPPER(:P55_DIRECT))))
In the beginning a part of the condition in the WHERE clause must be decomposed (or "decompiled" - or "reengeenered") into a simpler form without using decode function, which a form can be understandable by the query optimizer:
AND
decode(:P55_DIRECT,'ALL','Y',trim(upper(NA_ORG_OWNER_EMAIL)))=
decode(:P55_DIRECT,'ALL','Y',trim(upper(:P55_DIRECT)))
into:
AND (
:P55_DIRECT = 'ALL'
OR
trim(upper(:P55_DIRECT)) = trimm(upper(NA_ORG_OWNER_EMAIL))
)
To find rows in the table based on values stored in the index, Oracle uses an access method named Index scan, see this link for details:
https://docs.oracle.com/cd/B19306_01/server.102/b14211/optimops.htm#i52300
One of the most common access method is Index Range Scan see here:
https://docs.oracle.com/cd/B19306_01/server.102/b14211/optimops.htm#i45075
The documentation says (in the latter link) that:
The optimizer uses a range scan when it finds one or more leading
columns of an index specified in conditions, such as the following:
col1 = :b1
col1 < :b1
col1 > :b1
AND combination of the preceding conditions for leading columns in the
index
col1 like 'ASD%' wild-card searches should not be in a leading
position otherwise the condition col1 like '%ASD' does not result in a
range scan.
The above means that the optimizer is able to use the index to find rows only for query conditions that contain basic comparision operators: = < > <= >= LIKE which are used to comparing simple values with plain column names. What the documentation doesn't clearly say - and you need to deduce it reading between the lines - is a fact that when some function is used in the condition, in a form function( column_name ) or function( expression_involving_column_names ) , then the index range scan cannot be used. In this case the query optimizer must evaluate this expression individually for each row in the table, thus must read all rows (perform a full table scan).
A short conclusion and a rule of thumb:
Functions in the WHERE clause can prevent the optimizer from using
indexes
If you see some function somewhere in the WHERE clause, then it is a sign that you are
running the red light
STOP immediately and think three times how
this function impact the query optimizer and the performance of your
query, and try to rewrite the condition to a form that the optimizer
is able to understand.
Now take a look at our rewritten condition:
AND (
:P55_DIRECT = 'ALL'
OR
trim(upper(:P55_DIRECT)) = trimm(upper(NA_ORG_OWNER_EMAIL))
)
and STOP - there are still two functions trim and upper applied to a column named NA_ORG_OWNER_EMAIL. We need to think how they can impact the query optimizer.
I assume that you have created a plain index on a single column: CREATE INDEX somename ON GCR_ITEMS( NA_ORG_OWNER_EMAIL ).If yes, then the index contains only plain values of NA_ORG_OWNER_EMAIL.
But the query is trying to find trimm(upper(NA_ORG_OWNER_EMAIL)) values, which are not stored in the index, so this index cannot be used in this case.
This condition requires a function based index:
https://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_indexes.htm#ADFNS00505
CREATE INDEX somename ON GCR_ITEMS( trim( upper( NA_ORG_OWNER_EMAIL )))
Unfortunately even the function based index will still not help, because the condition in the query is too general - if a value of :P55_DIRECT = ALL the query must retrieve all rows from the table (perform a full table scan), otherwise must use the index to search value within it.
This is because the query is planned (think of it as "compiled") by the query optimizer only once, during it's first execution. Then the plan is stored in the cache and used to execute the query for all further executions. A value of the parameter is not know in advance, so the plan must consider each possible cases, thus will always perform a full table scan.
In 12c there is a new feature "Adaptive query optimalization":
https://docs.oracle.com/database/121/TGSQL/tgsql_optcncpt.htm#TGSQL94982
where the query optimizer analyses each parameters of the query on each runs, and is able to detect that the plan is not optimal for some runtime parameters, and choose a better "subplans" depending on actual parameter's value ... but you must use 12c, and additionally pay for Enterprise Edition, because only this edition includes that feature. And it's still not certain if the adaptive plan will work in this case or not.
What you can do without paying for 12c EE is to DIVIDE this general query into two separate variants, one for a case where :P55_DIRECT = ALL, and the other for remaining cases, and run an appropriate variant in the client (your application) depending on the value of this parameter.
A version for :P55_DIRECT = ALL, that will perform a full table scan
where gcr_deals.GCR_DEALS_ID=gcr_items.GCR_DEALS_ID
and
gcr_deals.bu_id=:P0_BU_ID
order by 1
and a version for other cases, that will use the function based index:
where gcr_deals.GCR_DEALS_ID=gcr_items.GCR_DEALS_ID
and
gcr_deals.bu_id=:P0_BU_ID
and
trim(upper(:P55_DIRECT)) = trimm(upper(NA_ORG_OWNER_EMAIL))
order by 1

Is it allowed to use LIKE for NUMBER datatype?

There is a procedure which tries to fetch details of project/s from PROJECTS table.
The snippet goes here:
PROCEDURE GET_PROJECTS (
P_PROJECT_ID_LIKE IN VARCHAR2 DEFAULT '%',
P_SEPARATOR IN VARCHAR2 DEFAULT '-=-' )
AS
CURSOR PROJECTS_CURSOR IS
.....
WHERE
PROJECT_ID LIKE P_PROJECT_ID_LIKE
The concern is:
PROJECT_ID has a datatype - NUMBER.
P_PROJECT_ID_LIKE has a datatype - VARCHAR2.
I am wondering how LIKE can be used on PROJECT_ID ?
It is working perfectly fine for
GET_PROJECTS('%','-=-');
GET_PROJECTS('28','-=-')
Any insight would be a great help!
An implicit type conversion will take place. Notice 1 - filter(TO_CHAR("N") LIKE 'asdf%') in predicate information section.
15:13:51 (133)LKU#sandbox> create table t (n number);
Table created.
Elapsed: 00:00:00.10
15:14:22 (133)LKU#sandbox> select * from t where n like 'asdf%'
15:14:37 2
15:14:37 (133)LKU#sandbox> #xplan
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 13 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / T#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_CHAR("N") LIKE 'asdf%')
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]
In either case, it doesn't make much sense to filter identifiers using like operator. If you want to get all values in case certain condition is met, then you should probably do it sort of this way:
where project_id = P_PROJECT_ID or P_PROJECT_ID = -1
using basically any numeric value that is not a valid project id.
As #be here now suggested an implicit conversion will take place. This will work 99% of the times but there are some cavets when reaching big numbers.
Take this scenario for example,
SQL> select to_char(power(2,140)) from dual;
TO_CHAR(POWER(2,140))
----------------------------------------
1.3937965749081639463459823920405226E+42
The number was converted to char with an exponential notation. So some string might not match.
If you don't reach these numbers you should be fine.
Although this is an oracle question take some advice from the Zen Of Python
Explicit is better than implicit.
I didn't check this in Oracle but In SQL Server it is possible to use like against an INT attribute,
for example
Create table mytable(
id int not null ,
name varchar(50))
then you can select with like
select * from mytable where id like '123%'
And the reason why this is possible is that char is an Integral type.
so in my conclusion ,this might be possible in oracle too please check.

What is maximum rows count in oracles nested table

CREATE TYPE nums_list AS TABLE OF NUMBER;
What is maximum possible rows count in oracle's nested table ?
UPDATE
CREATE TYPE nums_list AS TABLE OF NUMBER;
CREATE OR REPLACE FUNCTION generate_series(from_n NUMBER, to_n NUMBER)
RETURN nums_list AS
ret_table nums_list := nums_list();
BEGIN
FOR i IN from_n..to_n LOOP
ret_table.EXTEND;
ret_table(i) := i;
END LOOP;
RETURN ret_table;
END;
SELECT count(*) FROM TABLE ( generate_series(1,4555555) );
This gives error: ORA-22813 operand value exceeds system limits, Object or Collection value was too large
The range of subscripts for a nested table is 1..2**31 so you can have 2**31 elements in the collection. That limit hasn't changed since at least 8.1.6 though, of course, it might change in the future.
Just as an additional observation, it isn't the nested table itself that is too large or using too much memory. With an exception handler you can see that the error is not being thrown by your function. You can populate the same thing in an anonymous block:
DECLARE
ret_table nums_list := nums_list();
BEGIN
FOR i IN 1..4555555 LOOP
ret_table.EXTEND;
ret_table(i) := i;
END LOOP;
dbms_output.put_line(ret_table.count);
END;
/
anonymous block completed
4555555
And you can call your function from a block too:
DECLARE
ret_table nums_list;
BEGIN
ret_table := generate_series(1,4555555);
dbms_output.put_line(ret_table.count);
END;
/
anonymous block completed
4555555
It's only when you use it as table collection expression that you get an error:
SQL Error: ORA-22813: operand value exceeds system limits
22813. 00000 - "operand value exceeds system limits"
*Cause: Object or Collection value was too large. The size of the value
might have exceeded 30k in a SORT context, or the size might be
too big for available memory.
*Action: Choose another value and retry the operation.
The cause text refers to the SORT context, and a sort is being done by your query:
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 | 29 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | COLLECTION ITERATOR PICKLER FETCH| GENERATE_SERIES | 8168 | 16336 | 29 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
As #a_horse_with_no_name suggested, you can avoid the problem by making your function pipelined:
CREATE OR REPLACE FUNCTION generate_series(from_n NUMBER, to_n NUMBER)
RETURN nums_list PIPELINED AS
BEGIN
FOR i IN from_n..to_n LOOP
PIPE ROW (i);
END LOOP;
RETURN;
END;
/
SELECT count(*) FROM TABLE ( generate_series(1,4555555) );
COUNT(*)
----------
4555555
That still does a SORT AGGREGATE but it doesn't seem to mind any more. Not really sure why it does that in either case; perhaps someone else will be able to explain what it's doing. (I'm doing this in an 11gR2 instance by the way; I don't have a 12c instance to verify the behaviour is the same, but your symptoms suggest it will be). Or maybe it isn't the SORT context that's the issue, perhaps it is available memory. In my environment your version seems to consistently work up to 4177918 elements - which doesn't seem to be a significant number, so is likely to be environment related?
But it depends how you intend to use the collection; from a PL/SQL context your original version might be more suitable.

Oracle <> , != , ^= operators

I want to know the difference of those operators, mainly their performance difference.
I have had a look at Difference between <> and != in SQL, it has no performance related information.
Then I found this on dba-oracle.com,
it suggests that in 10.2 onwards the performance can be quite different.
I wonder why? does != always perform better then <>?
NOTE: Our tests, and performance on the live system shows, changing from <> to != has a big impact on the time the queries return in. I am here to ask WHY this is happening, not whether they are same or not. I know semantically they are, but in reality they are different.
I have tested the performance of the different syntax for the not equal operator in Oracle. I have tried to eliminate all outside influence to the test.
I am using an 11.2.0.3 database. No other sessions are connected and the database was restarted before commencing the tests.
A schema was created with a single table and a sequence for the primary key
CREATE TABLE loadtest.load_test (
id NUMBER NOT NULL,
a VARCHAR2(1) NOT NULL,
n NUMBER(2) NOT NULL,
t TIMESTAMP NOT NULL
);
CREATE SEQUENCE loadtest.load_test_seq
START WITH 0
MINVALUE 0;
The table was indexed to improve the performance of the query.
ALTER TABLE loadtest.load_test
ADD CONSTRAINT pk_load_test
PRIMARY KEY (id)
USING INDEX;
CREATE INDEX loadtest.load_test_i1
ON loadtest.load_test (a, n);
Ten million rows were added to the table using the sequence, SYSDATE for the timestamp and random data via DBMS_RANDOM (A-Z) and (0-99) for the other two fields.
SELECT COUNT(*) FROM load_test;
COUNT(*)
----------
10000000
1 row selected.
The schema was analysed to provide good statistics.
EXEC DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'LOADTEST', estimate_percent => NULL, cascade => TRUE);
The three simple queries are:-
SELECT a, COUNT(*) FROM load_test WHERE n <> 5 GROUP BY a ORDER BY a;
SELECT a, COUNT(*) FROM load_test WHERE n != 5 GROUP BY a ORDER BY a;
SELECT a, COUNT(*) FROM load_test WHERE n ^= 5 GROUP BY a ORDER BY a;
These are exactly the same with the exception of the syntax for the not equals operator (not just <> and != but also ^= )
First each query is run without collecting the result in order to eliminate the effect of caching.
Next timing and autotrace were switched on to gather both the actual run time of the query and the execution plan.
SET TIMING ON
SET AUTOTRACE TRACE
Now the queries are run in turn. First up is <>
> SELECT a, COUNT(*) FROM load_test WHERE n <> 5 GROUP BY a ORDER BY a;
26 rows selected.
Elapsed: 00:00:02.12
Execution Plan
----------------------------------------------------------
Plan hash value: 2978325580
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 130 | 6626 (9)| 00:01:20 |
| 1 | SORT GROUP BY | | 26 | 130 | 6626 (9)| 00:01:20 |
|* 2 | INDEX FAST FULL SCAN| LOAD_TEST_I1 | 9898K| 47M| 6132 (2)| 00:01:14 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("N"<>5)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
22376 consistent gets
22353 physical reads
0 redo size
751 bytes sent via SQL*Net to client
459 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
26 rows processed
Next !=
> SELECT a, COUNT(*) FROM load_test WHERE n != 5 GROUP BY a ORDER BY a;
26 rows selected.
Elapsed: 00:00:02.13
Execution Plan
----------------------------------------------------------
Plan hash value: 2978325580
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 130 | 6626 (9)| 00:01:20 |
| 1 | SORT GROUP BY | | 26 | 130 | 6626 (9)| 00:01:20 |
|* 2 | INDEX FAST FULL SCAN| LOAD_TEST_I1 | 9898K| 47M| 6132 (2)| 00:01:14 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("N"<>5)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
22376 consistent gets
22353 physical reads
0 redo size
751 bytes sent via SQL*Net to client
459 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
26 rows processed
Lastly ^=
> SELECT a, COUNT(*) FROM load_test WHERE n ^= 5 GROUP BY a ORDER BY a;
26 rows selected.
Elapsed: 00:00:02.10
Execution Plan
----------------------------------------------------------
Plan hash value: 2978325580
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 130 | 6626 (9)| 00:01:20 |
| 1 | SORT GROUP BY | | 26 | 130 | 6626 (9)| 00:01:20 |
|* 2 | INDEX FAST FULL SCAN| LOAD_TEST_I1 | 9898K| 47M| 6132 (2)| 00:01:14 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("N"<>5)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
22376 consistent gets
22353 physical reads
0 redo size
751 bytes sent via SQL*Net to client
459 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
26 rows processed
The execution plan for the three queries is identical and the timings 2.12, 2.13 and 2.10 seconds.
It should be noted that whichever syntax is used in the query the execution plan always displays <>
The tests were repeated ten times for each operator syntax. These are the timings:-
<>
2.09
2.13
2.12
2.10
2.07
2.09
2.10
2.13
2.13
2.10
!=
2.09
2.10
2.12
2.10
2.15
2.10
2.12
2.10
2.10
2.12
^=
2.09
2.16
2.10
2.09
2.07
2.16
2.12
2.12
2.09
2.07
Whilst there is some variance of a few hundredths of the second it is not significant. The results for each of the three syntax choices are the same.
The syntax choices are parsed, optimised and are returned with the same effort in the same time. There is therefore no perceivable benefit from using one over another in this test.
"Ah BC", you say, "in my tests I believe there is a real difference and you can not prove it otherwise".
Yes, I say, that is perfectly true. You have not shown your tests, query, data or results. So I have nothing to say about your results. I have shown that, with all other things being equal, it doesn't matter which syntax you use.
"So why do I see that one is better in my tests?"
Good question. There a several possibilities:-
Your testing is flawed (you did not eliminate outside factors -
other workload, caching etc You have given no information about
which we can make an informed decision)
Your query is a special case (show me the query and we can discuss it).
Your data is a special case (Perhaps - but how - we don't see that either).
There is some other outside influence.
I have shown via a documented and repeatable process that there is no benefit to using one syntax over another. I believe that <> != and ^= are synonymous.
If you believe otherwise fine, so
a) show a documented example that I can try myself
and
b) use the syntax which you think is best. If I am correct and there is no difference it won't matter. If you are correct then cool, you have an improvement for very little work.
"But Burleson said it was better and I trust him more than you, Faroult, Lewis, Kyte and all those other bums."
Did he say it was better? I don't think so. He didn't provide any definitive example, test or result but only linked to someone saying that != was better and then quoted some of their post.
Show don't tell.
You reference the article on the Burleson site. Did you follow the link to the Oracle-L archive? And did you read the other emails replying to the email Burleson cites?
I don't think you did, otherwise you wouldn't have asked this question. Because there is no fundamental difference between != and <>. The original observation was almost certainly a fluke brought about by ambient conditions in the database. Read the responses from Jonathan Lewis and Stephane Faroult to understand more.
" Respect is not something a programmer need to have, its the basic
attitude any human being should have"
Up to a point. When we meet a stranger in the street then of course we should be courteous and treat them with respect.
But if that stranger wants me to design my database application in a specific way to "improve performance" then they should have a convincing explanation and some bulletproof test cases to back it up. An isolated anecdote from some random individual is not enough.
The writer of the article, although a book author and the purveyor of some useful information, does not have a good reputation for accuracy. In this case the article was merely a mention of one persons observations on a well known Oracle mailing list. If you read through the responses you will see the assumptions of the post challenged, but no presumption of accuracy. Here are some excerpts:
Try running your query through explain plan (or autotrace) and see
what that says...
According to this, "!=" is considered to be the same as "<>"...
Jonathan Lewis
Jonathan Lewis is a well respected expert in the Oracle community.
Just out of curiosity... Does the query optimizer generate a different
execution plan for the two queries? Regards, Chris
.
Might it be bind variable peeking in action? The certain effect of
writing != instead of <> is to force a re-parse. If at the first
execution the values for :id were different and if you have an
histogram on claws_doc_id it could be a reason. And if you tell me
that claws_doc_id is the primary key, then I'll ask you what is the
purpose of counting, in particular when the query in the EXISTS clause
is uncorrelated with the outer query and will return the same result
whatever :id is. Looks like a polling query. The code surrounding it
must be interesting.
Stéphane Faroult
.
I'm pretty sure the lexical parse converts either != to <> or <> to
!=, but I'm not sure whether that affects whether the sql text will
match a stored outline.
.
Do the explain plans look the same? Same costs?
The following response is from the original poster.
Jonathan, Thank you for your answer. We did do an explain plan on
both versions of the statement and they were identical, which is what
is so puzzling about this. According to the documentation, the two
forms of not equal are the same (along with ^= and one other that I
can't type), so it makes no sense to me why there is any difference in
performance.
Scott Canaan
.
Not an all inclusive little test but it appears at least in 10.1.0.2
it gets pared into a "<>" for either (notice the filter line for each
plan)
.
Do you have any Stored Outline ? Stored Outlines do exact (literal)
matches so if you have one Stored Outline for, say, the SQL with a
"!=" and none for the SQL with a "<>" (or a vice versa), the Stored
Outline might be using hints ? (although, come to think of it, your
EXPLAIN PLAN should have shown the hints if executing a Stored Outline
?)
.
Have you tried going beyond just explain & autotrace and running a
full 10046 level 12 trace to see where the slower version is spending
its time? This might shed some light on the subject, plus - be sure
to verify that the explain plans are exactly the same in the 10046
trace file (not the ones generated with the EXPLAIN= option), and in
v$sqlplan. There are some "features" of autotrace and explain that
can cause it to not give you an accurate explain plan.
Regards, Brandon
.
Is the phenomenon totally reproducible ?
Did you check the filter_predicates and access_predicates of the plan,
or just the structure. I don't expect any difference, but a change in
predicate order can result in a significant change in CPU usage if you
are unlucky.
If there is no difference there, then enable rowsource statistics
(alter session set "_rowsource_execution_statistics"=true) and run the
queries, then grab the execution plan from V$sql_plan and join to
v$sql_plan_statistics to see if any of the figures about last_starts,
last_XXX_buffer_gets, last_disk_reads, last_elapsed_time give you a
clue about where the time went.
If you are on 10gR2 there is a /*+ gather_plan_statistics */ hint you
can use instead of the "alter session".
Regards Jonathan Lewis
At this point the thread dies and we see no further posts from the original poster, which leads me to believe that either the OP discovered an assumption they had made that was not true or did no further investigation.
I will also point out that if you do an explain plan or autotrace, you will see that the comparison is always displayed as <>.
Here is some test code. Increase the number of loop iterations if you like. You may see one side or the other get a higher number depending on the other activity on the server activity, but in no way will you see one operator come out consistently better than the other.
DROP TABLE t1;
DROP TABLE t2;
CREATE TABLE t1 AS (SELECT level c1 FROM dual CONNECT BY level <=144000);
CREATE TABLE t2 AS (SELECT level c1 FROM dual CONNECT BY level <=144000);
SET SERVEROUTPUT ON FORMAT WRAPPED
DECLARE
vStart Date;
vTotalA Number(10) := 0;
vTotalB Number(10) := 0;
vResult Number(10);
BEGIN
For vLoop In 1..10 Loop
vStart := sysdate;
For vLoop2 In 1..2000 Loop
SELECT count(*) INTO vResult FROM t1 WHERE t1.c1 = 777 AND EXISTS
(SELECT 1 FROM t2 WHERE t2.c1 <> 0);
End Loop;
vTotalA := vTotalA + ((sysdate - vStart)*24*60*60);
vStart := sysdate;
For vLoop2 In 1..2000 Loop
SELECT count(*) INTO vResult FROM t1 WHERE t1.c1 = 777 AND EXISTS
(SELECT 1 FROM t2 WHERE t2.c1 != 0);
End Loop;
vTotalB := vTotalB + ((sysdate - vStart)*24*60*60);
DBMS_Output.Put_Line('Total <>: ' || RPAD(vTotalA,8) || '!=: ' || vTotalB);
vTotalA := 0;
vTotalB := 0;
End Loop;
END;
A Programmer will use !=
A DBA will use <>
If there is a different execution plan it may be that there are differences in the query cache or statistics for each notation. But I don't really think it is so.
Edit:
What I mean above. In complex databases there can be some strange side effects. I don't know oracle good enough, but I think there is an Query Compilation Cache like in SQL Server 2008 R2.
If a query is compiled as new query, the database optimiser calculates a new execution plan depending on the current statistics. If the statistics has changed it will result in a other, may be a worse plan.