Performance tuning in Oracle - update same column

Performance tuning in Oracle - update same column - sql

I am writing a query in Oracle to update the column based on same column
UPDATE TABLE SET A = 'CG-'||A
I HAVE THE DATA LIKE
COLUMN A
121
234
333
I NEED THE DATA LIKE
COLUMN A
CG-121
CG-234
CG-333
basically I am doing this for 30 Million records and its taking lot of time. Is there any way I can optimize this query?. If I create a Index on Column A does that improve the performance?

You have the correct query:
UPDATE TABLE
SET A = 'CG-' || A;
Here are different options.
First, you can do this in batches, say 100,000 rows at a time. This limits the size of the log.
Second, you can do a "replace" rather than update:
create table tempt as
select * from table;
truncate table "table";
insert into table ( . . . )
select 'CG-' || A, . . .
from tempt;
Third, you can use a generated column and dispense with the update (but only in the most recent versions of Oracle).

Related

UPDATE two columns with new value under large size table

We have table like :
mytable (pid, string_value, int_value)
This table has more than 20M rows in total. Now we have a feature try to mark all the rows from this tables as invalid. So we need update the table columns: string_Value = NULL and int_value = 0 which indicate this is invalid row ( we still want to keep the pid as it is important to us)
So what is the best way?
I use the following SQL:
UPDATE Mytable
SET string_value = NULL,
int_value = 0;
but this query takes more than 4 minutes in my test env. Is there any better way we can improve it?

Updating all the rows can be quite expensive. Often, it is faster to empty the table and reload it.
In generic SQL this looks like:
create table mytable_temp as
select pid
from mytable;
truncate table mytable; -- back it up first!
insert into mytable (pid, string_value, int_value)
select pid, null, 0
from mytable_temp;
The creation of the temporary table may use different syntax, depending on our database.

Updates can take time to complete. Another way of achieving this is to follow the following steps:
Add new columns with the values you need set as the default value
Drop the original columns
Rename the new columns with the names of the original columns.
You can then drop the default values on the new columns.
This needs to be tested as different DBMSs allow different levels of table alters (i.e. not all DMBSs allow a drop default or a drop column).

which delete statement is better for deleting millions of rows

I have table which contains millions of rows.
I want to delete all the data which is over a week old based on the value of column last_updated.
so here are my two queries,
Approach 1:
Delete from A where to_date(last_updated,''yyyy-mm-dd'')< sysdate-7;
Approach 2:
l_lastupdated varchar2(255) := to_char(sysdate-nvl(p_days,7),'YYYY-MM-DD');
insert into B(ID) select ID from A where LASTUPDATED < l_lastupdated;
delete from A where id in (select id from B);
which one is better considering performance, safety and locking?

Assuming the delete removes a significant fraction of the data & millions of rows, approach three:
create table tmp
Delete from A where to_date(last_updated,''yyyy-mm-dd'')< sysdate-7;
drop table a;
rename tmp to a;
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2345591157689
Obviously you'll need to copy over all the indexes, grants, etc. But online redefinition can help with this https://oracle-base.com/articles/11g/online-table-redefinition-enhancements-11gr1
When you get to 12.2, there's another simpler option: a filtered move.
This is an alter table move operation, with an extra clause stating which rows you want to keep:
create table t (
c1 int
);
insert into t values ( 1 );
insert into t values ( 2 );
commit;
alter table t
move including rows where c1 > 1;
select * from t;
C1
2
While you're waiting to upgrade to 12.2+ and if you don't want to use the create-as-select method for some reason then approach 1 is superior:
Both methods delete the same rows from A* => it's the same amount of work to do the delete
Option 1 has one statement; Option 2 has two statements; 2 > 1 => option 2 is more work
*Statement level consistency means you might get different results running the processes. Say another session tries to update an old row that your process will remove.
With just the delete, the update will be blocked until the delete finishes. At which point the row's gone, so the update does nothing.
Whereas if you do the insert first, the other session can update & commit the row before the insert completes. So the update "succeeds". But the delete will then remove it! Which can lead to some unhappy customers...

Your stored dateformat seems suitable for proper sorting, so you could go the other way round and convert sysdate to string:
--this is false today
select * from dual where '2019-06-05' < to_char(sysdate-7, 'YYYY-MM-DD');
--this is true today
select * from dual where '2019-05-05' < to_char(sysdate-7, 'YYYY-MM-DD');
So it would be:
Delete from A where last_updated < to_char(sysdate-7, ''yyyy-mm-dd'');
It has the benefit that your default index (if there is any) will be used.
It has the disadvantage on relying on the String/Varchar ordering which might be changed i.e. bei NLS changes (if i remember right), so in any case you should do a little testing before...
In the long term, you should of cource alter the colum to a proper date-datatype, but I guess that doesn't help you right now ;)

If you are trying to delete most of the rows in the table, I would advise you go with a different approach, namely:
create <new table name> as
select *
from <old table name>
where <predicates for the data you want to keep>;
then
drop table <old table name>;
and finally you can rename the new table back to the old table.
You could always partition the new table (i.e. create the new table with a separate statement containing the partitioning clauses, and then have an insert as select into the new table from the old table).
That way, when you need to delete rows, it's a simple matter of dropping the relevant partition(s).

simple UPDATE query on large table has bad performance

I need to do the following update query through a stored procedure:
UPDATE table1
SET name = #name (this is the stored procedure inputparameter)
WHERE name IS NULL
Table1 has no indexes or keys, 5 columns which are 4 integers and 1 varchar (updatable column 'name' is the varchar column)
The NULL records are about 15.000.000 rows that need updating. This takes about 50 minutes, which I think is too long.
I'm running an Azure SQL DB Standard S6 (400DTU's).
Can anyone give me an advise to improve performance?

As you don't have any keys, or indexes, I can suggest following approach.
1- Create a new table using INTO (which will copy the data) like following query.
SELECT
CASE
WHEN NAME IS NULL THEN #name
ELSE NAME
END AS NAME,
<other columns >
INTO dbo.newtable
FROM table1
2- Drop the old table
drop table table1
3- Rename the new table to table1
exec sp_rename 'dbo.newtable', 'table1'
Another approach can be using batch update, sometime you get better performance compared to bulk update (You need to test by adjusting the batch size).
WHILE EXISTS (SELECT 1 FROM table1 WHERE name is null)
BEGIN
UPDATE TOP (10000) table1
SET name = #name
WHERE n ame is null
END

can you do with following method ?
UPDATE table1
SET name = ISNULL(name,#name)
for null values it will update with #name and rest will be updated with same value.

No. You are updating 15,000,000 rows which is going to take a long time. Each update has overhead for finding the row and logging the value.
With so many rows to update, it is unlikely that the overhead is finding the rows. If you add an index on name, the update is going to actually have to update the index as well as updating the original values.
If your concern is locking the database, you can set up a loop where you do something like this over and over:
UPDATE TOP (100000) table1
SET name = #name (this is the stored procedure inputparameter)
WHERE name IS NULL;
100,000 rows should be about 30 seconds or so.
In this case, an index on name does help. Otherwise, each iteration of the loop would in essence be reading the entire table.

Update query in Access to update MANY columns from Null to value

I have a database table with about 100 columns (bulky, I know). I have about half of these columns which I will need to update iteratively to set Is Null or "" values to "TBD".
I compiled all 50 some columns which need to be updated into an update query with Access SQL code that looked something like this...
UPDATE tablename
SET tablename.column1="TBD", tablename.column2="TBD", tablename.column3="TBD"....
WHERE tablename.column1 Is Null OR tablename.column1="" OR tablename.column2 Is Null OR tablename.column2="" OR tablename.column3 Is Null OR tablename.column3=""....
Two issues: This query with 50 columns receives a "query is too complex" error.
This query is also just functionally wrong...because I'm losing data within these columns due to the WHERE statement. Records that had values populated which I did not want to update are being updated because of the OR clause.
My question is how can I go about updating all of these columns and setting their null or empty values to a particular value (in this case, "TBD")?
I know that I can just use a select query to select the columns I need to update, run it, and just CTRL+H to find & replace "" to "TBD". However, I'm worried about the potential for this to introduce errors into my dataset. I also know I could also go through column by column and update these values via an update query. However, this would be quite time consuming with 50+ columns & the iterative updates which I need to run on the entire dataset.
I'm leaning towards this latter route. I am still wondering if there are any other scripted options which I can build into a query to overcome such an issue, and that leads me here to you.
Thank you!

You could just run 50 queries:
UPDATE table SET column1="TBD" WHERE column1 IS NULL OR column1 = "";
An optimization could be:
Create a temporary table which determines which rows actually would need an update: Concatenate all column values such that a single NULL or empty would result in an record in your temp table. This way you only have to scan the base table once.
Use the keys from that table to focus on those rows only.
Etc.
That is safe and only updates your empty values (where as your previous query would have updated all columns unless you would have checked every value first with an IFNULL).
This query style also does not run into the too complex issue

You could issue one query as:
UPDATE tablename
SET column1 = iif(column1 is null or column1 = "", "TBD", column1),
column2 = iif(column2 is null or column2 = "", "TBD", column2),
. . .;
If you don't mind potentially updating all rows, you can leave out the where clause.

CASE vs Multiple UPDATE queries for large data sets - Performance

For performance what option would be better for large data sets that are to be updated?
Using a CASE statement or Individual update queries?
CASE Example:
UPDATE tbl_name SET field_name =
CASE
WHEN condition_1 THEN 'Blah'
WHEN condition_2 THEN 'Foo'
WHEN condition_x THEN 123
ELSE 'bar'
END AS value
Individual Query Example:
UPDATE tbl_name SET field_name = 'Blah' WHERE field_name = condition_1
UPDATE tbl_name SET field_name = 'Foo' WHERE field_name = condition_2
UPDATE tbl_name SET field_name = 123 WHERE field_name = condition_x
UPDATE tbl_name SET field_name = 'bar' WHERE field_name = condition_y
NOTE: About 300,000 records are going to be updated and the CASE statement would have about 10,000 WHEN conditions. If using the individual queries it's about 10,000 as well

The CASE version.
This is because there is a good chance you are altering the same row more than once with the individual statements. If row 10 has both condition_1 and condition_y then it will need to get read and altered twice. If you have a clustered index this means two clustered index updates on top of whatever the other field(s) that were modified were.
If you can do it as a single statement, each row will be read only once and it should run much quicker.
I changed a similar process about a year ago that used dozens of UPDATE statements in sequence to use a since UPDATE with CASE and processing time dropped about 80%.

It seems logic to me that on the first option SQL Server will go through the table only once and for each row, it will evaluate the condition.
On the second, it will have to go through all table 4 times
So, for a table with 1000 rows, on the first option on the best case scenario we are talking about 1000 evaluations and worst case, 3000.
On the second we'll always have 4000 evaluations
So option 1 would be the faster.

As pointed out by Mitch, try making a temp table filling it with all the data you need, make a different temp table for each column (field) you want to change. You should also add an index to the temp table(s) for added performance improvement.
This way your update statement becomes (more or less):
UPDATE tbl_name SET field_name = COALESCE((SELECT value FROM temp_tbl WHERE tbl_name.conditional_field = temp_tbl.condition_value), field_name),
field_name2 = COALESCE((SELECT value FROM temp_tbl2 WHERE tbl_name.conditional_field2 = temp_tbl2.condition_value), field_name2)
and so on..
This should give you good performance while scaling up for large volumes of updates at once.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas