IN vs OR of Oracle, which faster? - sql

I'm developing an application which processes many data in Oracle database.
In some case, I have to get many object based on a given list of conditions, and I use SELECT ...FROM.. WHERE... IN..., but the IN expression just accepts a list whose size is maximum 1,000 items.
So I use OR expression instead, but as I observe -- perhaps this query (using OR) is slower than IN (with the same list of condition). Is it right? And if so, how to improve the speed of query?

IN is preferable to OR -- OR is a notoriously bad performer, and can cause other issues that would require using parenthesis in complex queries.
Better option than either IN or OR, is to join to a table containing the values you want (or don't want). This table for comparison can be derived, temporary, or already existing in your schema.

In this scenario I would do this:
Create a one column global temporary table
Populate this table with your list from the external source (and quickly - another whole discussion)
Do your query by joining the temporary table to the other table (consider dynamic sampling as the temporary table will not have good statistics)
This means you can leave the sort to the database and write a simple query.

Oracle internally converts IN lists to lists of ORs anyway so there should really be no performance differences. The only difference is that Oracle has to transform INs but has longer strings to parse if you supply ORs yourself.
Here is how you test that.
CREATE TABLE my_test (id NUMBER);
SELECT 1
FROM my_test
WHERE id IN (1,2,3,4,5,6,7,8,9,10,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50,
51,52,53,54,55,56,57,58,59,60,
61,62,63,64,65,66,67,68,69,70,
71,72,73,74,75,76,77,78,79,80,
81,82,83,84,85,86,87,88,89,90,
91,92,93,94,95,96,97,98,99,100
);
SELECT sql_text, hash_value
FROM v$sql
WHERE sql_text LIKE '%my_test%';
SELECT operation, options, filter_predicates
FROM v$sql_plan
WHERE hash_value = '1181594990'; -- hash_value from previous query
SELECT STATEMENT
TABLE ACCESS FULL ("ID"=1 OR "ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5
OR "ID"=6 OR "ID"=7 OR "ID"=8 OR "ID"=9 OR "ID"=10 OR "ID"=21 OR
"ID"=22 OR "ID"=23 OR "ID"=24 OR "ID"=25 OR "ID"=26 OR "ID"=27 OR
"ID"=28 OR "ID"=29 OR "ID"=30 OR "ID"=31 OR "ID"=32 OR "ID"=33 OR
"ID"=34 OR "ID"=35 OR "ID"=36 OR "ID"=37 OR "ID"=38 OR "ID"=39 OR
"ID"=40 OR "ID"=41 OR "ID"=42 OR "ID"=43 OR "ID"=44 OR "ID"=45 OR
"ID"=46 OR "ID"=47 OR "ID"=48 OR "ID"=49 OR "ID"=50 OR "ID"=51 OR
"ID"=52 OR "ID"=53 OR "ID"=54 OR "ID"=55 OR "ID"=56 OR "ID"=57 OR
"ID"=58 OR "ID"=59 OR "ID"=60 OR "ID"=61 OR "ID"=62 OR "ID"=63 OR
"ID"=64 OR "ID"=65 OR "ID"=66 OR "ID"=67 OR "ID"=68 OR "ID"=69 OR
"ID"=70 OR "ID"=71 OR "ID"=72 OR "ID"=73 OR "ID"=74 OR "ID"=75 OR
"ID"=76 OR "ID"=77 OR "ID"=78 OR "ID"=79 OR "ID"=80 OR "ID"=81 OR
"ID"=82 OR "ID"=83 OR "ID"=84 OR "ID"=85 OR "ID"=86 OR "ID"=87 OR
"ID"=88 OR "ID"=89 OR "ID"=90 OR "ID"=91 OR "ID"=92 OR "ID"=93 OR
"ID"=94 OR "ID"=95 OR "ID"=96 OR "ID"=97 OR "ID"=98 OR "ID"=99 OR
"ID"=100)

I would question the whole approach. The client of the SP has to send 100000 IDs. Where does the client get those IDs from? Sending such a large number of ID as the parameter of the proc is going to cost significantly anyway.

If you create the table with a primary key:
CREATE TABLE my_test (id NUMBER,
CONSTRAINT PK PRIMARY KEY (id));
and go through the same SELECTs to run the query with the multiple IN values, followed by retrieving the execution plan via hash value, what you get is:
SELECT STATEMENT
INLIST ITERATOR
INDEX RANGE SCAN
This seems to imply that when you have an IN list and are using this with a PK column, Oracle keeps the list internally as an "INLIST" because it is more efficient to process this, rather than converting it to ORs as in the case of an un-indexed table.
I was using Oracle 10gR2 above.

Related

Poor performance of SQL query with Table Variable or User Defined Type

I have a SELECT query on a view, that contains 500.000+ rows. Let's keep it simple:
SELECT * FROM dbo.Document WHERE MemberID = 578310
The query runs fast, ~0s
Let's rewrite it to work with the set of values, which reflects my needs more:
SELECT * FROM dbo.Document WHERE MemberID IN (578310)
This is same fast, ~0s
But now, the set is of IDs needs to be variable; let's define it as:
DECLARE #AuthorizedMembers TABLE
(
MemberID BIGINT NOT NULL PRIMARY KEY, --primary key
UNIQUE NONCLUSTERED (MemberID) -- and index, as if it could help...
);
INSERT INTO #AuthorizedMembers SELECT 578310
The set contains the same, one value but is a table variable now. The performance of such query drops to 2s, and in more complicated ones go as high as 25s and more, while with a fixed id it stays around ~0s.
SELECT *
FROM dbo.Document
WHERE MemberID IN (SELECT MemberID FROM #AuthorizedMembers)
is the same bad as:
SELECT *
FROM dbo.Document
WHERE EXISTS (SELECT MemberID
FROM #AuthorizedMembers
WHERE [#AuthorizedMembers].MemberID = Document.MemberID)
or as bad as this:
SELECT *
FROM dbo.Document
INNER JOIN #AuthorizedMembers AS AM ON AM.MemberID = Document.MemberID
The performance is same for all the above and always much worse than the one with a fixed value.
The dynamic SQL comes with help easily, so creating an nvarchar like (id1,id2,id3) and building a fixed query with it keeps my query times ~0s. But I would like to avoid using Dynamic SQL as much as possible and if I do, I would like to keep it always the same string, regardless the values (using parameters - which above method does not allow).
Any ideas how to get the performance of the table variable similar to a fixed array of values or avoid building a different dynamic SQL code for each run?
P.S. I have tried the above with a user defined type with same results
Edit:
The results with a temporary table, defined as:
CREATE TABLE #AuthorizedMembers
(
MemberID BIGINT NOT NULL PRIMARY KEY
);
INSERT INTO #AuthorizedMembers SELECT 578310
have improved the execution time up to 3 times. (13s -> 4s). Which is still significantly higher than dynamic SQL <1s.
Your options:
Use a temporary table instead of a TABLE variable
If you insist on using a TABLE variable, add OPTION(RECOMPILE) at the end of your query
Explanation:
When the compiler compiles your statement, the TABLE variable has no rows in it and therefore doesn't have the proper cardinalities. This results in an inefficient execution plan. OPTION(RECOMPILE) forces the statement to be recompiled when it is run. At that point the TABLE variable has rows in it and the compiler has better cardinalities to produce an execution plan.
The general rule of thumb is to use temporary tables when operating on large datasets and table variables for small datasets with frequent updates. Personally I only very rarely use TABLE variables because they generally perform poorly.
I can recommend this answer on the question "What's the difference between temporary tables and table variables in SQL Server?" if you want an in-depth analysis on the differences.

PostgreSQL return select results AND add them to temporary table?

I want to select a set of rows and return them to the client, but I would also like to insert just the primary keys (integer id) from the result set into a temporary table for use in later joins in the same transaction.
This is for sync, where subsequent queries tend to involve a join on the results from earlier queries.
What's the most efficient way to do this?
I'm reticent to execute the query twice, although it may well be fast if it was added to the query cache. An alternative is store the entire result set into the temporary table and then select from the temporary afterward. That also seems wasteful (I only need the integer id in the temp table.) I'd be happy if there was a SELECT INTO TEMP that also returned the results.
Currently the technique used is construct an array of the integer ids in the client side and use that in subsequent queries with IN. I'm hoping for something more efficient.
I'm guessing it could be done with stored procedures? But is there a way without that?
I think you can do this with a Postgres feature that allows data modification steps in CTEs. The more typical reason to use this feature is, say, to delete records for a table and then insert them into a log table. However, it can be adapted to this purpose. Here is one possible method (I don't have Postgres on hand to test this):
with q as (
<your query here>
),
t as (
insert into temptable(pk)
select pk
from q
)
select *
from q;
Usually, you use the returning clause with the data modification queries in order to capture the data being modified.

Creating temporary tables in SQL

I am trying to create a temporary table that selects only the data for a certain register_type. I wrote this query but it does not work:
$ CREATE TABLE temp1
(Select
egauge.dataid,
egauge.register_type,
egauge.timestamp_localtime,
egauge.read_value_avg
from rawdata.egauge
where register_type like '%gen%'
order by dataid, timestamp_localtime ) $
I am using PostgreSQL.
Could you please tell me what is wrong with the query?
You probably want CREATE TABLE AS - also works for TEMPORARY (TEMP) tables:
CREATE TEMP TABLE temp1 AS
SELECT dataid
, register_type
, timestamp_localtime
, read_value_avg
FROM rawdata.egauge
WHERE register_type LIKE '%gen%'
ORDER BY dataid, timestamp_localtime;
This creates a temporary table and copies data into it. A static snapshot of the data, mind you. It's just like a regular table, but resides in RAM if temp_buffers is set high enough. It is only visible within the current session and dies at the end of it. When created with ON COMMIT DROP it dies at the end of the transaction.
Temp tables come first in the default schema search path, hiding other visible tables of the same name unless schema-qualified:
How does the search_path influence identifier resolution and the "current schema"
If you want dynamic, you would be looking for CREATE VIEW - a completely different story.
The SQL standard also defines, and Postgres also supports: SELECT INTO. But its use is discouraged:
It is best to use CREATE TABLE AS for this purpose in new code.
There is really no need for a second syntax variant, and SELECT INTO is used for assignment in plpgsql, where the SQL syntax is consequently not possible.
Related:
Combine two tables into a new one so that select rows from the other one are ignored
ERROR: input parameters after one with a default value must also have defaults in Postgres
CREATE TABLE LIKE (...) only copies the structure from another table and no data:
The LIKE clause specifies a table from which the new table
automatically copies all column names, their data types, and their
not-null constraints.
If you need a "temporary" table just for the purpose of a single query (and then discard it) a "derived table" in a CTE or a subquery comes with considerably less overhead:
Change the execution plan of query in postgresql manually?
Combine two SELECT queries in PostgreSQL
Reuse computed select value
Multiple CTE in single query
Update with results of another sql
http://www.postgresql.org/docs/9.2/static/sql-createtable.html
CREATE TEMP TABLE temp1 LIKE ...

minus vs delete where exist in oracle

I have a CREATE TABLE query which can be done using two methods (create as select statement for thousands/million records):
First method:
create table as select some data minus (select data from other table)
OR
first i should create the table as
create table as select .....
and then
delete from ..where exist.
I guess the second method is better.For which query the cost is less?Why is minus query not as fast as the second method?
EDIT:
I forgot to mention that the create statement has join from two tables as well.
The minus is slow probably because it needs to sort the tables on disk in order to compare them.
Try to rewrite the first query with NOT EXISTS instead of MINUS, it should be faster and will generate less REDO and UNDO (as a_horse_with_no_name mentioned). Of course, make sure that all the fields involved in the WHERE clauses are indexed!
The second one will write lots of records to disk and then remove them. This will in 9 of 10 cases take way longer then filtering what you write in to begin with.
So if the first one actually isn't faster we need more information about the tables and statements involved.

SQL argument limit in Oracle

It appears that there is a limit of 1000 arguments in an Oracle SQL. I ran into this when generating queries such as....
select * from orders where user_id IN(large list of ids over 1000)
My workaround is to create a temporary table, insert the user ids into that first instead of issuing a query via JDBC that has a giant list of parameters in the IN.
Does anybody know of an easier workaround? Since we are using Hibernate I wonder if it automatically is able to do a similar workaround transparently.
An alternative approach would be to pass an array to the database and use a TABLE() function in the IN clause. This will probably perform better than a temporary table. It will certainly be more efficient than running multiple queries. But you will need to monitor PGA memory usage if you have a large number of sessions doing this stuff. Also, I'm not sure how easy it will be to wire this into Hibernate.
Note: TABLE() functions operate in the SQL engine, so they need us to declare a SQL type.
create or replace type tags_nt as table of varchar2(10);
/
The following sample populates an array with a couple of thousand random tags. It then uses the array in the IN clause of a query.
declare
search_tags tags_nt;
n pls_integer;
begin
select name
bulk collect into search_tags
from ( select name
from temp_tags
order by dbms_random.value )
where rownum <= 2000;
select count(*)
into n
from big_table
where name in ( select * from table (search_tags) );
dbms_output.put_line('tags match '||n||' rows!');
end;
/
As long as the temporary table is a global temporary table (ie only visible to the session), this is the recommended way of doing things (and I'd go that route for anything more than a dozen arguments, let alone a thousand).
I'd wonder where/how you are building that list of 1000 arguments. If this is a semi-permanent grouping (eg all employees based in a particular location) then that grouping should be in the database and the join done there. Databases are designed and built to do joins really quickly. Much quicker than pulling a bunch of id's back to the mid tier and then sending them back to the database.
select * from orders
where user_id in
(select user_id from users where location = :loc)
You can add additional predicates to split the list into chunks of 1000:
select * from orders where user_id IN (<first batch of 1000>)
OR user_id IN (<second batch of 1000>)
OR user_id IN ...
the comments regarding "if these IDs are in your database, use joins/correlation instead" hold true. However, if your list of IDs comes from elsewhere, like a SOLR result, you can get around the temp table requirement by issuing multiple queries, each with no more than 1000 ids present, and then merging the results of the query in memory. If you place the initial list of ids in a unique collection like a hashset, you can pop off 1000 ids at a time.