I would like to treat on the momentary table or inner table of oracle by making two or more values into a line. - sql

Since I am a Japanese, I am poor at English.
Please understand the situation.
There is the following as indispensable requirements.
This requirement is unchangeable.
I know only ID of two or more values.
This number is over 500000.
It acquires early at low cost by 1 time of SQL.
The index is created by id and it is optimized.
The following SQL queries think of me by making these things into the method of taking as a search condition.
select *
from emp
where id in(1,5,7,8.....)
or id in(5000,5002....)
It divides 1000 affairs at a time by "in" after above where.
However, processing takes most time in case of this method.
As a result of investigating many things, it turned out that it is processing time earlier to specify conditions by "exists" rather than having specified conditions by "in".
In order to use "exists", you have to ask by a subquery.
However, it calls by a subquery well by what kind of method, or I cannot imagine.
Someone should teach a good method.
Thank you for your consideration.

If my understanding is correct, you are trying to do this:
select * from emp where emp in (list of several thousand values)
Because oracle only support lists of 1000 values in that construct your code uses a union.
Suggested solution:
Create a global temporary table with an id field the same size as emp.id
Insert the id:s you want to find in this table
Join against this table in your select
create global temporary table temp_id (id number) on commit delete rows;
Your select code can be replaced by:
<code to insert the emp.id:s you want to search for>
select * from emp inner join temp_id tmp on emp.id = temp_id.id;

Related

Optimizing an Oracle SQL query which uses IN clause extensively

I maintain an application where I am trying to optimize an Oracle SQL query wherein multiple IN clauses are used. This query is now a blocker as it hogs nearly 3 minutes of execution time and affects application performance severely.The query is called from Java code(JDBC) and looks like this :
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1 and not(col1 in (idsetone1,idsetone2,... idsetoneN)) or
(col1 in(idsettwo1,idsettwo2,...idsettwoN))....
(col1 in(idsetN1,idsetN2,...idsetNN))
The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible. ID sets have grown over time with use of the application and currently they number more than 10,000 records.
How can I start with optimizing this query ?
I really doupt about "The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible." Of course you can join the tables, provided you got select privileges on it.
Anyway, let's assume it is not possible due to whatever reason. One solution could be to insert all entries first into a Nested Table and the use this one:
CREATE OR REPLACE TYPE NUMBER_TABLE_TYPE AS TABLE OF NUMBER;
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1
and not (col1 NOT MEMBER OF (NUMBER_TABLE_TYPE(idsetone1,idsetone2,... idsetoneN))
OR
(col1 MEMBER OF NUMBER_TABLE_TYPE(idsettwo1,idsettwo2,...idsettwoN))
Regarding the max. number of elements Oracle Documentation says: Because a nested table does not have a declared size, you can put as many elements in the constructor as necessary.
I don't know how serious you can take this statement.
You should put all the items into one temporary table and to an explicit join:
Select your cols
from Table1
left join table_with_items
on table_with_items.id = Table1.col1
where table_with_items.id is null;
Also that distinct suggest a problem in your business logic or in the architecture of application. Why do you have duplicate ids? You should get rid of that distinct.

Updating Table Records in a Batch and Auditing it

Consider this Table:
Table: ORDER
Columns: id, order_num, order_date, order_status
This table has 1 million records. I want to update the order_status to value of '5', for a bunch (about 10,000) of order_num's that i will be reading from a input text file.
My SQL could be:
(A) update ORDER set order_status=5 where order_num in ('34343', '34454', '454545',...)
OR
(B) update ORDER set order_status=5 where order_num='34343'
I can loop over this update several times until I have covered my 10,000 order updates.
(Also note that i have few Child Tables of ORDER like ORDER_ITEMS, where similar status must be updated and information audited)
My problem is here is:
How can i Audit this update in a separate ORDER_AUDIT Table:
Order_Num: 34343 - Updated Successfully
Order_Num: 34454 - Order Not Found
Order_Num: 454545 - Updated Successfully
Order_Num: 45457 - Order Not Found
If i go for batch update as in (A), I cannot Audit at Order Level.
If i go for Single Order at at time update as in (B), I will have to loop 10,000 times - that may be quite slow - but I can Audit at Order level in this case.
Is there any other way?
First of all, build an external table over your "input text file". That way you can run a simple single UPDATE statement:
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
Neat and efficient. (Sorting the sub-query is optional: it may improve the performance of the update but the key point is, we can treat external tables like regular tables and use the full panoply of the SELECT syntax on them.) Find out more.
Secondly use the RETURNING clause to capture the hits.
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
returning order_num bulk collect into l_nums;
l_nums in this context is a PL/SQL collection of type number. The RETURNING clause will give you all the ORDER_NUM values for updated rows only. Find out more.
If you declare the type for l_nums as a SQL nested table object you can use it in further SQL statements for your auditing:
insert into order_audit
select 'Order_Num: '||to_char(t.column_value)||' - Updated Succesfully'
from table ( l_nums ) t
/
insert into order_audit
select 'Order_Num: '||to_char(col1)||' - Order Not Found'
from ext_table
minus
select * from table ( l_nums )
/
Notes on performance:
You don't say how many of the rows you have in the input text file will match. Perhaps you don't know (actually on re-reading it's not clear whether 10,000 is the number of rows in the file or the number of matching rows). Pl/SQL collections use private session memory, so very large collections can blow the PGA. However, you should be able to cope with ten thousand NUMBER instances without blinching.
My solution does require you to read the external table twice. This shouldn't be a problem. And it will certainly be way faster than dynamically assembling one hundred IN clauses of a thousand numbers and looping over each.
Note that update is often the slowest bulk operation known to man. There are ways of speeding them up, but those methods can get quite involved. However, if this is something you'll want to do often and performance becomes a sticking point you should read this OraFAQ article.
Use MERGE. Firstly load data into a temporary table called ORDER_UPD_TMP with only one column id. You can do it using SQLDeveloper import feature. Then use MERGE in order to udpate your base table:
MERGE INTO ORDER b
USING (
SELECT order_id
FROM ORDER_UPD_TMP
) e
ON (b.id = e.id)
WHEN MATCHED THEN
UPDATE SET b.status = 5
You can also update with a different status when records don't match. Check the documentation for more details:
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
I think the best way will be:
to import your file to the database first
then do few SQL UPDATE/INSERT queries in one transaction to update status for all orders and create audit records.

SQL Query Speed

I'm building a report that collates a huge amount of data, the data for the report has taken shape as a view which runs in about 2 to 9 seconds (which is acceptable). I also have a function that returns a set of ids which needs to filter the view:
select *
from vw_report
where employee_id in (select id from dbo.fnc_security(#personRanAsID))
The security function on its own runs in less than a second. However when I combine the two as I have above the query takes over 15 minutes.
Both the view and the security function do quite a lot of work so originally I thought it might be down to locking, I've tried no lock on the security function but it made no difference.
Any tips or tricks as to where I may be going wrong?
It may be worth noting that when I copy the result of the function into the in part of the statement:
select *
from vw_report
where employee_id in (123, 456, 789)
The speed increases back to 2 to 9 seconds.
Firstly, any extra background will help here...
- Do you have the code for the view and the function?
- Can you specify the schema and indexes used for the tables being referenced?
Without these, advise become difficult, but I'll have a stab...
1). You could change the IN clause to a Join.
2). You could specify WITH (NOEXPAND) on the view.
SELECT
*
FROM
vw_report WITH (NOEXPAND)
INNER JOIN
(select id from dbo.fnc_security(#personRanAsID)) AS security
ON security.id = vw_report.employee_id
Note: I'd try without NOEXPAND first.
The other option is that the combination of the indexes and the formulation of the view make it very hard for the optimiser to create a good execution plan. With the extra info I asked for above, this may be improvable.
It takes so much time because sub-select query executing for each row from vw_report while the second query doesn't. You should use something like:
select *
from vw_report r, (select id from dbo.fnc_security(#personRanAsID)) v
where r.employee_id = v.id
I ended up dumping the result from the security function into a temporary table and using the temporary table in my main query. Proved to be the fastest method.
e.g.:
create table #tempTable (id bigint)
select id
into #tempTable
from dbo.fnc_security(#personRanAsID)
select *
from vw_report
where id in (select id from #tempTable)

SQL WHERE ID IN (id1, id2, ..., idn)

I need to write a query to retrieve a big list of ids.
We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.
The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?
1) Writing a query using IN
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
My question here is. What happens if n is very big? Also, what about performance?
2) Writing a query using OR
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn
I think that this approach does not have n limit, but what about performance if n is very big?
3) Writing a programmatic solution:
foreach (var id in myIdList)
{
var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
myObjectList.Add(item);
}
We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.
What would be a correct solution for this problem?
Option 1 is the only good solution.
Why?
Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...
Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"
An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.
You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.
Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.
What Ed Guiness suggested is really a performance booster , I had a query like this
select * from table where id in (id1,id2.........long list)
what i did :
DECLARE #temp table(
ID int
)
insert into #temp
select * from dbo.fnSplitter('#idlist#')
Then inner joined the temp with main table :
select * from table inner join temp on temp.id = table.id
And performance improved drastically.
First option is definitely the best option.
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:
Divide you list of Ids into chunks of fixed number, say 100
Chunk size should be decided based upon the memory size of your server
Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
Process one chunk at a time resulting in 100 database calls for select
Why should you divide into chunks?
You will never get memory overflow exception which is very common in scenarios like yours.
You will have optimized number of database calls resulting in better performance.
It has always worked like charm for me. Hope it would work for my fellow developers as well :)
Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!
Doing this instead returned results immediately:
select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id
Use a join.
In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.
The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.
You may want to read this articles:
Passing parameters in MySQL: IN list vs. temporary table
I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.
Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.
Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.
For 1st option
Add IDs into temp table and add inner join with main table.
CREATE TABLE #temp (column int)
INSERT INTO #temp (column)
SELECT t.column1 FROM (VALUES (1),(2),(3),...(10000)) AS t(column1)
Try this
SELECT Position_ID , Position_Name
FROM
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name

SQL argument limit in Oracle

It appears that there is a limit of 1000 arguments in an Oracle SQL. I ran into this when generating queries such as....
select * from orders where user_id IN(large list of ids over 1000)
My workaround is to create a temporary table, insert the user ids into that first instead of issuing a query via JDBC that has a giant list of parameters in the IN.
Does anybody know of an easier workaround? Since we are using Hibernate I wonder if it automatically is able to do a similar workaround transparently.
An alternative approach would be to pass an array to the database and use a TABLE() function in the IN clause. This will probably perform better than a temporary table. It will certainly be more efficient than running multiple queries. But you will need to monitor PGA memory usage if you have a large number of sessions doing this stuff. Also, I'm not sure how easy it will be to wire this into Hibernate.
Note: TABLE() functions operate in the SQL engine, so they need us to declare a SQL type.
create or replace type tags_nt as table of varchar2(10);
/
The following sample populates an array with a couple of thousand random tags. It then uses the array in the IN clause of a query.
declare
search_tags tags_nt;
n pls_integer;
begin
select name
bulk collect into search_tags
from ( select name
from temp_tags
order by dbms_random.value )
where rownum <= 2000;
select count(*)
into n
from big_table
where name in ( select * from table (search_tags) );
dbms_output.put_line('tags match '||n||' rows!');
end;
/
As long as the temporary table is a global temporary table (ie only visible to the session), this is the recommended way of doing things (and I'd go that route for anything more than a dozen arguments, let alone a thousand).
I'd wonder where/how you are building that list of 1000 arguments. If this is a semi-permanent grouping (eg all employees based in a particular location) then that grouping should be in the database and the join done there. Databases are designed and built to do joins really quickly. Much quicker than pulling a bunch of id's back to the mid tier and then sending them back to the database.
select * from orders
where user_id in
(select user_id from users where location = :loc)
You can add additional predicates to split the list into chunks of 1000:
select * from orders where user_id IN (<first batch of 1000>)
OR user_id IN (<second batch of 1000>)
OR user_id IN ...
the comments regarding "if these IDs are in your database, use joins/correlation instead" hold true. However, if your list of IDs comes from elsewhere, like a SOLR result, you can get around the temp table requirement by issuing multiple queries, each with no more than 1000 ids present, and then merging the results of the query in memory. If you place the initial list of ids in a unique collection like a hashset, you can pop off 1000 ids at a time.