How can I speed up a joined update in SQL? My statement seems to run indefinitely - sql

I have two tables: a source table and a target table. The target table will have a subset of the columns of the source table. I need to update a single column in the target table by joining with the source table based on another column. The update statement is as follows:
UPDATE target_table tt
SET special_id = ( SELECT source_special_id
FROM source_table st
WHERE tt.another_id = st.another_id )
For some reason, this statement seems to run indefinitely. The inner select happens almost immediately when executed by itself. The table has roughly 50,000 records and its hosted on a powerful machine (resources are not an issue).
Am I doing this correctly? Any reasons the above wouldn't work in a timely manner? Any better way to do this?

Your initial query executes the inner subquery once for every row in the outer table. See if Oracle likes this better:
UPDATE target_table
SET special_id = st.source_special_id
FROM
target_table tt
INNER JOIN
source_table st
WHERE tt.another_id = st.another_id
(edited after posted query was corrected)
Add:
If the join syntax doesn't work on Oracle, how about:
UPDATE target_table
SET special_id = st.source_special_id
FROM
target_table tt, source_table st
WHERE tt.another_id = st.another_id
The point is to join the two tables rather than using the outer query syntax you are currently using.

Is there an index on source_table(another_id)? If not source_table will be fully scanned once for each row in target_table. This will take some time if target_table is big.
Is it possible for there to be no match in source_table for some target_table rows? If so, your update will set special_id to null for those rows. If you want to avoid that do this:
UPDATE target_table tt
SET special_id = ( SELECT source_special_id
FROM source_table st
WHERE tt.another_id = st.another_id )
WHERE EXISTS( SELECT NULL
FROM source_table st
WHERE tt.another_id = st.another_id );
If target_table.another_id was declared as a foreign key referencing source_table.another_id (unlikely in this scenario), this would work:
UPDATE
( SELECT tt.primary_key, tt.special_id, st.source_special_id
FROM tatget_table tt
JOIN source_table st ON st.another_id = tt.another_id
)
SET special_id = source_special_id;

Are you actually sure that it's running?
Have you looked for blocking locks? indefinitely is a long time and that's usually only achieved via something stalling execution.

Update: Ok, now that the query has been fixed -- I've done this exact thing many times, on unindexed tables well over 50K rows, and it worked fine in Oracle 10g and 9i. So something else is going on here; yes, you are calling for nested loops, but no, it shouldn't run forever, even so. What are the primary keys on these tables? Do you by any chance have multiple rows from the second table matching the same row for the first table? You could be trying to rewrite the whole table over and over, throwing the locking system into a fit.
Original Response
That statement doesn't really make sense -- you are telling it to update all the rows where ids match, to the same id (meaning, no change happens!).
I imagine the real statement looks a bit different?
Please also provide table schema information (primary keys for the 2 tables, any available indexes, etc).

Not sure what Oracle has available, but MS Sql Server has a tuning advisor that you can feed your queries into and it will give recommendation for adding indexes, etc... I would assume Oracle has something similar.
That would be the quickest way to pinpoint the issue.

I don't know Oracle, but MSSQLServer optimizer would have no problem converting the subquery into a join for you.
It sounds like you might be doing a data import against a short-lived or newly created table. It is easy to overlook indexing these kinds of tables. I'd make sure there's an index on sourcetable.anotherid - or a covering index on sourcetable.anotherid, sourcetable.specialid (order matters, anotherid should be first).
In cases such as these (query running unexpectedly for longer than 1 second). It is best to figure out how your environment provides query plans. Examine that plan and the problem will become clear.
You see, there is no such thing as "optimized sql code". Sql code is never executed - query plans are generated from the code and then those plans are executed.

Check that the statistics are up to date on the tables - see this question

I had the same problem, and got a "SQL command not properly ended" when trying Codewerks' answer in Oracle 11g. A bit of Googling turned up the Oracle MERGE statement, which I adapted as follows:
MERGE INTO target_table tt
USING source_table st
ON (tt.another_id = st.another_id)
WHEN MATCHED THEN
UPDATE SET tt.special_id = st.special_id;
If you're not sure all the values of another_id will be in source_table then you can use the WHEN NOT MATCHED THEN clause to handle that case.

If you have constraints that let Oracle know there is a 1-1 relationship when you join on "another_id", I think this might work well:
UPDATE ( SELECT tt.rowid tt_rowid, tt.another_id, tt.special_id, st.source_special_id
FROM target_table tt, source_table st
WHERE tt.another_id = st.other_id
ORDER BY tt.rowid )
SET special_id = source_special_id
Ordering by ROWID is important when you are updating and updatable view with a join.

Related

Update statement taking more than 24 hours for updating 32k rows

Below is the update statement that I am running for 32k times and it is taking more than 15 hours and running.
I have to update the value in table 2 for 32k different M_DISPLAY VALUES.
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC'
WHERE EXISTS (SELECT 1 FROM TABLE_2 T1 WHERE TRIM(T1.M_DISPLAY) = 'ANCHORTST' AND T1.M_LABEL=T2.M_LABEL );
Am not sure Why is it taking such a long time as I have tuned the query,
I have copied 32000 update statements in a Update.sql file and running the SQL in command line.
Though it is updating the table, it is a neverending process
Please advice if I have gone wrong anywhere
Regards
Using FORALL
If you cannot rewrite the query to run a single bulk-update instead of 32k individual updates, you might still get lucky by using PL/SQL's FORALL. An example:
DECLARE
TYPE rec_t IS RECORD (
m_value table_2.m_value%TYPE,
m_display table_2.m_display%TYPE
);
TYPE tab_t IS TABLE OF rec_t;
data tab_t := tab_t();
BEGIN
-- Fill in data object. Replace this by whatever your logic for matching
-- m_value to m_display is
data.extend(1);
data(1).m_value := 'COL_ANC';
data(1).m_display := 'ANCHORTST';
-- Then, run the 32k updates using FORALL
FORALL i IN 1 .. data.COUNT
UPDATE table_2 t2
SET t2.m_value = data(i).m_value
WHERE EXISTS (
SELECT 1
FROM table_2 t1
WHERE trim(t1.m_display) = data(i).m_display
AND t1.m_label = t2.m_label
);
END;
/
Concurrency
If you're not the only process on the system, 32k updates in a single transaction can hurt. It's definitely worth committing a few thousand rows in sub-transactions to reduce concurrency effects with other processes that might read the same table while you're updating.
Bulk update
Really, the goal of any improvement should be bulk updating the entire data set in one go (or perhaps split in a few bulks, see concurrency).
If you had a staging table containing the update instructions:
CREATE TABLE update_instructions (
m_value VARCHAR2(..),
m_display VARCHAR2(..)
);
Then you could pull off something along the lines of:
MERGE INTO table_2 t2
USING (
SELECT u.*, t1.m_label
FROM update_instructions u
JOIN table_2 t1 ON trim(t1.m_display) = u.m_display
) t1
ON t2.m_label = t1.m_label
WHEN MATCHED THEN UPDATE SET t2.m_value = t1.m_value;
This should be even faster than FORALL (but might have more concurrency implications).
Indexing and data sanitisation
Of course, one thing that might definitely hurt you when running 32k individual update statements is the TRIM() function, which prevents using an index on M_DISPLAY efficiently. If you could sanitise your data so it doesn't need trimming first, that would definitely help. Otherwise, you could add a function based index just for the update (and then drop it again):
CREATE INDEX i ON table_2 (trim (m_display));
The query and subquery query the same table: TABLE_2. Assuming that M_LABEL is unique, the subquery returns 1s for all rows in TABLE_2 where M_DISPLAY is ANCHORTST. Then the update query updates the same (!) TABLE_2 for all 1s returned from subquery - so for all rows where M_DISPLAY is ANCHORTST.
Therefore, the query could be simplified, exploiting the fact that both update and select work on the same table - TABLE_2:
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC' WHERE TRIM(T2.M_DISPLAY) = 'ANCHORTST'
If M_LABEL is not unique, then the above is not going to work - thanks to commentators for pointing that out!
For significantly faster execution:
Ensure that you have created an index on M_DISPLAY and M_LABEL columns that are in your WHERE clause.
Ensure that M_DISPLAY has a function based index. If it does not, then do not pass it to the TRIM function, because the function will prevent the database from using the index that you have created for the M_DISPLAY column. TRIM the data before storing in the table.
Thats it.
By the way, as has been mentioned, you shouldn't need 32k queries for meeting your objective. One will probably suffice. Look into query based update. As an example, see the accepted answer here:
Oracle SQL: Update a table with data from another table

I would like to treat on the momentary table or inner table of oracle by making two or more values into a line.

Since I am a Japanese, I am poor at English.
Please understand the situation.
There is the following as indispensable requirements.
This requirement is unchangeable.
I know only ID of two or more values.
This number is over 500000.
It acquires early at low cost by 1 time of SQL.
The index is created by id and it is optimized.
The following SQL queries think of me by making these things into the method of taking as a search condition.
select *
from emp
where id in(1,5,7,8.....)
or id in(5000,5002....)
It divides 1000 affairs at a time by "in" after above where.
However, processing takes most time in case of this method.
As a result of investigating many things, it turned out that it is processing time earlier to specify conditions by "exists" rather than having specified conditions by "in".
In order to use "exists", you have to ask by a subquery.
However, it calls by a subquery well by what kind of method, or I cannot imagine.
Someone should teach a good method.
Thank you for your consideration.
If my understanding is correct, you are trying to do this:
select * from emp where emp in (list of several thousand values)
Because oracle only support lists of 1000 values in that construct your code uses a union.
Suggested solution:
Create a global temporary table with an id field the same size as emp.id
Insert the id:s you want to find in this table
Join against this table in your select
create global temporary table temp_id (id number) on commit delete rows;
Your select code can be replaced by:
<code to insert the emp.id:s you want to search for>
select * from emp inner join temp_id tmp on emp.id = temp_id.id;

SQL WHERE ID IN (id1, id2, ..., idn)

I need to write a query to retrieve a big list of ids.
We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.
The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?
1) Writing a query using IN
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
My question here is. What happens if n is very big? Also, what about performance?
2) Writing a query using OR
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn
I think that this approach does not have n limit, but what about performance if n is very big?
3) Writing a programmatic solution:
foreach (var id in myIdList)
{
var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
myObjectList.Add(item);
}
We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.
What would be a correct solution for this problem?
Option 1 is the only good solution.
Why?
Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...
Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"
An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.
You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.
Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.
What Ed Guiness suggested is really a performance booster , I had a query like this
select * from table where id in (id1,id2.........long list)
what i did :
DECLARE #temp table(
ID int
)
insert into #temp
select * from dbo.fnSplitter('#idlist#')
Then inner joined the temp with main table :
select * from table inner join temp on temp.id = table.id
And performance improved drastically.
First option is definitely the best option.
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:
Divide you list of Ids into chunks of fixed number, say 100
Chunk size should be decided based upon the memory size of your server
Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
Process one chunk at a time resulting in 100 database calls for select
Why should you divide into chunks?
You will never get memory overflow exception which is very common in scenarios like yours.
You will have optimized number of database calls resulting in better performance.
It has always worked like charm for me. Hope it would work for my fellow developers as well :)
Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!
Doing this instead returned results immediately:
select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id
Use a join.
In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.
The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.
You may want to read this articles:
Passing parameters in MySQL: IN list vs. temporary table
I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.
Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.
Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.
For 1st option
Add IDs into temp table and add inner join with main table.
CREATE TABLE #temp (column int)
INSERT INTO #temp (column)
SELECT t.column1 FROM (VALUES (1),(2),(3),...(10000)) AS t(column1)
Try this
SELECT Position_ID , Position_Name
FROM
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name

How can I use "FOR UPDATE" with a JOIN on Oracle?

The answer to another SO question was to use this SQL query:
SELECT o.Id, o.attrib1, o.attrib2
FROM table1 o
JOIN (SELECT DISTINCT Id
FROM table1, table2, table3
WHERE ...) T1 ON o.id = T1.Id
Now I wonder how I can use this statement together with the keyword FOR UPDATE. If I simply append it to the query, Oracle will tell me:
ORA-02014: cannot select FOR UPDATE from view
Do I have to modify the query or is there a trick to do this with Oracle?
With MySql the statement works fine.
try:
select .....
from <choose your table>
where id in (<your join query here>) for UPDATE;
EDIT: that might seem a bit counter-intuitive bearing in mind the question you linked to (which asked how to dispense with an IN), but may still provide benefit if your join returns a restricted set. However, there is no workaround: the oracle exception is pretty self-explanatory; oracle doesn't know which rows to lock becasue of the DISTINCT. You could either leave out the DISTINCT or define everything in a view and then update that, if you wanted to, without the explicit lock: http://www.dba-oracle.com/t_ora_02014_cannot_select_for_update.htm
It may depend what you want to update. You can do
... FOR UPDATE OF o.attrib1
to tell it you're only interested in updating data from the main table, which seems likely to be the case; this means it will only try to lock that table and not worry about the implicit view in the join. (And you can still update multiple columns within that table, naming one still locks the whole row - though it will be clearer if you specify all the columns you want to update in the FOR UPDATE OF clause).
Don't know if that will work with MySQL though, which brings us back to Mark Byers point.

Performance of updating one big table using values from one small table

First, I know that the sql statement to update table_a using values from table_b is in the form of:
Oracle:
UPDATE table_a
SET (col1, col2) = (SELECT cola, colb
FROM table_b
WHERE table_a.key = table_b.key)
WHERE EXISTS (SELECT *
FROM table_b
WHERE table_a.key = table_b.key)
MySQL:
UPDATE table_a
INNER JOIN table_b ON table_a.key = table_b.key
SET table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
What I understand is the database engine will go through records in table_a and update them with values from matching records in table_b.
So, if I have 10 millions records in table_a and only 10 records in table_b:
Does that mean that the engine will do 10 millions iterations through table_a just to update 10 records? Are Oracle/MySQL/etc smart enough to do only 10 iterations through table_b?
Is there a way to force the engine to actually iterate through records in table_b instead of table_a to do the update? Is there an alternative syntax for the sql statement?
Assume that table_a.key and table_b.key are indexed.
Either engine should be smart enough to optimize the query based on the fact that there are only ten rows in table b. How the engine determines what to do is based factors like indexes and statistics.
If the "key" column is the primary key and/or is indexed, the engine will have to do very little work to run this query. It will basically already sort of "know" where the matching rows are, and look them up very quickly. It won't have to "iterate" at all.
If there is no index on the key column, the engine will have to to a "table scan" (roughly the equivalent of "iterate") to find the right values and match them up. This means it will have to scan through 10 million rows.
Do a little reading on what's called an Execution Plan. This is basically an explanation of what work the engine had to do in order to run your query (some databases show it in text only, some have the option of seeing it graphically). Learning how to interpret an Execution Plan will give you great insight into adding indexes to your tables and optimizing your queries.
Look these up if they don't work (it's been a while), but it's something like:
In MySQL, put the work "EXPLAIN" in front of your SELECT statement
In Oracle, run "SET AUTOTRACE ON" before you run your SELECT statement
I think the first (Oracle) query would be better written with a JOIN instead of a WHERE EXISTS. The engine may be smart enough to optimize it properly either way. Once you get the hang of interpreting an execution plan, you can run it both ways and see for yourself. :)
Okay I know answering own question is usually frowned upon but I already accepted another answer and won't unaccept it so meh here it is ..
I've discovered a much better alternative that I'd like to share it with anyone who encounters the same scenario: MERGE statement.
Apparently, newer Oracle versions introduced this MERGE statement which simply blows! Not only that the performance is so much better in most cases, the syntax is so simple and so make sense that I feel stupid for using the UPDATE statement! Here comes ..
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb;
And what more is that I can also extend the statement to include INSERT action when table_a does not have matching records for some records in table_b:
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
WHEN NOT MATCHED THEN INSERT
(key, col1, col2)
VALUES (table_b.key, table_b.cola, table_b.colb);
This new statement type made my day the day I discovered it :)