Optimizing an Oracle SQL query which uses IN clause extensively - sql

I maintain an application where I am trying to optimize an Oracle SQL query wherein multiple IN clauses are used. This query is now a blocker as it hogs nearly 3 minutes of execution time and affects application performance severely.The query is called from Java code(JDBC) and looks like this :
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1 and not(col1 in (idsetone1,idsetone2,... idsetoneN)) or
(col1 in(idsettwo1,idsettwo2,...idsettwoN))....
(col1 in(idsetN1,idsetN2,...idsetNN))
The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible. ID sets have grown over time with use of the application and currently they number more than 10,000 records.
How can I start with optimizing this query ?

I really doupt about "The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible." Of course you can join the tables, provided you got select privileges on it.
Anyway, let's assume it is not possible due to whatever reason. One solution could be to insert all entries first into a Nested Table and the use this one:
CREATE OR REPLACE TYPE NUMBER_TABLE_TYPE AS TABLE OF NUMBER;
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1
and not (col1 NOT MEMBER OF (NUMBER_TABLE_TYPE(idsetone1,idsetone2,... idsetoneN))
OR
(col1 MEMBER OF NUMBER_TABLE_TYPE(idsettwo1,idsettwo2,...idsettwoN))
Regarding the max. number of elements Oracle Documentation says: Because a nested table does not have a declared size, you can put as many elements in the constructor as necessary.
I don't know how serious you can take this statement.

You should put all the items into one temporary table and to an explicit join:
Select your cols
from Table1
left join table_with_items
on table_with_items.id = Table1.col1
where table_with_items.id is null;
Also that distinct suggest a problem in your business logic or in the architecture of application. Why do you have duplicate ids? You should get rid of that distinct.

Related

How to put more than 1million ID's using union All [duplicate]

I have comma delimited id's that I want to use in NOT IN clause..
I'm using oracle 11g.
select * from table where ID NOT IN (1,2,3,4,...,1001,1002,...)
results in
ORA-01795: maximum number of expressions in a list is 1000
I don't want to use temp table. am trying considering doing this
select * from table1 where ID NOT IN (1,2,3,4,…,1000) AND
ID NOT IN (1001,1002,…,2000)
Is there any other better workaround to this issue?
You said you don't want to, but: use a temporary table. That's the correct solution here.
Query parsing is expensive in Oracle, and that's what you'll get when you put thousands of identifiers into a giant blob of SQL. Also, there are ill-defined limits on query length that you're going to hit. Doing an anti-JOIN against a table, on the other hand... Oracle is good at that. Bulk loading data into a table, Oracle is good at that too. Use a temp table.
Limiting IN to a thousand entries is a sanity check. The fact that you're hitting it means you're trying to do something insane.
Jump out of the question, can you combine the SQL to get more than 1000 IDs with this SQL. That's the better way to simplify your SQLs.
It's insane.
But you can probably try to select from select:
SELECT * FROM
(SELECT * FROM table WHERE ID NOT IN (1,2,3,4,...,1000))
WHERE ID NOT IN (1001,1002,…,2000)
Make as many levels as you need.
Use MINUS, the opposite to `UNION
SELECT * FROM TABLE
MINUS
SELECT T.* FROM TABLE T,TABLE2 T2 WHERE T.ID = T2.ID
This represents registers on table T which id not in table2 t2

SELECT COUNT(*) ;

I have a database, database1, with two tables (Table 1, Table2) in it.
There are 3 rows in Table1 and 2 rows in Table2. Now if I execute the following SQL query SELECT COUNT(*); on database1, then the output is "1".
Does anyone has the idea, what this "1" signifies?
The definition of the two tables is as below.
CREATE TABLE Table1
(
ID INT PRIMARY KEY,
NAME NVARCHAR(20)
)
CREATE TABLE Table2
(
ID INT PRIMARY KEY,
NAME NVARCHAR(20)
)
Normally all selects are of the form SELECT [columns, scalar computations on columns, grouped computations on columns, or scalar computations] FROM [table or joins of tables, etc]
Because this allows plain scalar computations we can do something like SELECT 1 + 1 FROM SomeTable and it will return a recordset with the value 2 for every row in the table SomeTable.
Now, if we didn't care about any table, but just wanted to do our scalar computed we might want to do something like SELECT 1 + 1. This isn't allowed by the standard, but it is useful and most databases allow it (Oracle doesn't unless it's changed recently, at least it used to not).
Hence such bare SELECTs are treated as if they had a from clause which specified a table with one row and no column (impossible of course, but it does the trick). Hence SELECT 1 + 1 becomes SELECT 1 + 1 FROM ImaginaryTableWithOneRow which returns a single row with a single column with the value 2.
Mostly we don't think about this, we just get used to the fact that bare SELECTs give results and don't even think about the fact that there must be some one-row thing selected to return one row.
In doing SELECT COUNT(*) you did the equivalent of SELECT COUNT(*) FROM ImaginaryTableWithOneRow which of course returns 1.
Along similar lines the following also returns a result.
SELECT 'test'
WHERE EXISTS (SELECT *)
The explanation for that behavior (from this Connect item) also applies to your question.
In ANSI SQL, a SELECT statement without FROM clause is not permitted -
you need to specify a table source. So the statement "SELECT 'test'
WHERE EXISTS(SELECT *)" should give syntax error. This is the correct
behavior.
With respect to the SQL Server implementation, the FROM
clause is optional and it has always worked this way. So you can do
"SELECT 1" or "SELECT #v" and so on without requiring a table. In
other database systems, there is a dummy table called "DUAL" with one
row that is used to do such SELECT statements like "SELECT 1 FROM
dual;" or "SELECT #v FROM dual;". Now, coming to the EXISTS clause -
the project list doesn't matter in terms of the syntax or result of
the query and SELECT * is valid in a sub-query. Couple this with the
fact that we allow SELECT without FROM, you get the behavior that you
see. We could fix it but there is not much value in doing it and it
might break existing application code.
It's because you have executed select count(*) without specifying a table.
The count function returns the number of rows in the specified dataset. If you don't specify a table to select from, a single select will only ever return a single row - therefore count(*) will return 1. (In some versions of SQL, such as Oracle, you have to specify a table or similar database object; Oracle includes a dummy table (called DUAL) which can be selected from when no specific table is required.)
you wouldn't normally execute a select count(*) without specifying a table to query against. Your database server is probably giving you a count of "1" based on default system table it is querying.
Try using
select count(*) from Table1
Without a table name it makes no sense.
without table name it always return 1 whether it any database....
Since this is tagged SQL server, the MSDN states.
COUNT always returns an int data type value.
Also,
COUNT(*) returns the number of items in a group. This includes NULL
values and duplicates.
Thus, since you didn't provide a table to do a COUNT from, the default (assumption) is that it returns a 1.
COUNT function returns the number of rows as result. If you don't specify any table, it returns 1 by default. ie., COUNT(*), COUNT(1), COUNT(2), ... will return 1 always.
Select *
without a from clause is "Select ALL from the Universe" since you have filtered out nothing.
In your case, you are asking "How many universe?"
This is exactly how I would teach it. I would write on the board on the first day,
Select * and ask what it means. Answer: Give me the world.
And from there I would teach how to filter the universe down to something meaningful.
I must admit, I never thought of Select Count(*), which would make it more interesting but still brings back a true answer. We have only one world.
Without consulting Steven Hawking, SQL will have to contend with only 1.
The results of the query is correct.

I would like to treat on the momentary table or inner table of oracle by making two or more values into a line.

Since I am a Japanese, I am poor at English.
Please understand the situation.
There is the following as indispensable requirements.
This requirement is unchangeable.
I know only ID of two or more values.
This number is over 500000.
It acquires early at low cost by 1 time of SQL.
The index is created by id and it is optimized.
The following SQL queries think of me by making these things into the method of taking as a search condition.
select *
from emp
where id in(1,5,7,8.....)
or id in(5000,5002....)
It divides 1000 affairs at a time by "in" after above where.
However, processing takes most time in case of this method.
As a result of investigating many things, it turned out that it is processing time earlier to specify conditions by "exists" rather than having specified conditions by "in".
In order to use "exists", you have to ask by a subquery.
However, it calls by a subquery well by what kind of method, or I cannot imagine.
Someone should teach a good method.
Thank you for your consideration.
If my understanding is correct, you are trying to do this:
select * from emp where emp in (list of several thousand values)
Because oracle only support lists of 1000 values in that construct your code uses a union.
Suggested solution:
Create a global temporary table with an id field the same size as emp.id
Insert the id:s you want to find in this table
Join against this table in your select
create global temporary table temp_id (id number) on commit delete rows;
Your select code can be replaced by:
<code to insert the emp.id:s you want to search for>
select * from emp inner join temp_id tmp on emp.id = temp_id.id;

SQL WHERE ID IN (id1, id2, ..., idn)

I need to write a query to retrieve a big list of ids.
We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.
The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?
1) Writing a query using IN
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
My question here is. What happens if n is very big? Also, what about performance?
2) Writing a query using OR
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn
I think that this approach does not have n limit, but what about performance if n is very big?
3) Writing a programmatic solution:
foreach (var id in myIdList)
{
var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
myObjectList.Add(item);
}
We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.
What would be a correct solution for this problem?
Option 1 is the only good solution.
Why?
Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...
Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"
An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.
You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.
Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.
What Ed Guiness suggested is really a performance booster , I had a query like this
select * from table where id in (id1,id2.........long list)
what i did :
DECLARE #temp table(
ID int
)
insert into #temp
select * from dbo.fnSplitter('#idlist#')
Then inner joined the temp with main table :
select * from table inner join temp on temp.id = table.id
And performance improved drastically.
First option is definitely the best option.
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:
Divide you list of Ids into chunks of fixed number, say 100
Chunk size should be decided based upon the memory size of your server
Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
Process one chunk at a time resulting in 100 database calls for select
Why should you divide into chunks?
You will never get memory overflow exception which is very common in scenarios like yours.
You will have optimized number of database calls resulting in better performance.
It has always worked like charm for me. Hope it would work for my fellow developers as well :)
Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!
Doing this instead returned results immediately:
select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id
Use a join.
In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.
The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.
You may want to read this articles:
Passing parameters in MySQL: IN list vs. temporary table
I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.
Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.
Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.
For 1st option
Add IDs into temp table and add inner join with main table.
CREATE TABLE #temp (column int)
INSERT INTO #temp (column)
SELECT t.column1 FROM (VALUES (1),(2),(3),...(10000)) AS t(column1)
Try this
SELECT Position_ID , Position_Name
FROM
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name

Optimize query that compares two tables with similar schema in different databases

I have two different tables with similar schema in different database. What is the best way to compare records between these two tables. I need to find out-
records that exists in first table whose corresponding record does not exist in second table filtering records from the first table with some where clauses.
So far I have come with this SQL construct:
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<> AND
NOT EXISTS
(SELECT * FROM
table2
WHERE
t1_col1=t2_col1 AND
t1_col2=t2_col2)
Is there a better way to do this?
This above query seems fine but I suspect it is doing row by row comparison without evaluating the conditions in the first part of the query because the first part of the query will reduce the resultset very much. Is this happening?
Just use except keyword!!!
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<condition>
except
SELECT t2_col1, t2_ col2 FROM table2
It returns any distinct values from the query to the left of the EXCEPT operand that are not also returned from the right query.
For more information on MSDN
If the data in both table are expected to have the same primary key, you can use IN keyword to filter those are not found in the other table. This could be the simplest way.
If you are open to third party tools like Redgate Data Compare you can try it, it's a very nice tool. Visual Studio 2010 Ultimate edition also have this feature.