CASE vs Multiple UPDATE queries for large data sets - Performance - sql

For performance what option would be better for large data sets that are to be updated?
Using a CASE statement or Individual update queries?
CASE Example:
UPDATE tbl_name SET field_name =
CASE
WHEN condition_1 THEN 'Blah'
WHEN condition_2 THEN 'Foo'
WHEN condition_x THEN 123
ELSE 'bar'
END AS value
Individual Query Example:
UPDATE tbl_name SET field_name = 'Blah' WHERE field_name = condition_1
UPDATE tbl_name SET field_name = 'Foo' WHERE field_name = condition_2
UPDATE tbl_name SET field_name = 123 WHERE field_name = condition_x
UPDATE tbl_name SET field_name = 'bar' WHERE field_name = condition_y
NOTE: About 300,000 records are going to be updated and the CASE statement would have about 10,000 WHEN conditions. If using the individual queries it's about 10,000 as well

The CASE version.
This is because there is a good chance you are altering the same row more than once with the individual statements. If row 10 has both condition_1 and condition_y then it will need to get read and altered twice. If you have a clustered index this means two clustered index updates on top of whatever the other field(s) that were modified were.
If you can do it as a single statement, each row will be read only once and it should run much quicker.
I changed a similar process about a year ago that used dozens of UPDATE statements in sequence to use a since UPDATE with CASE and processing time dropped about 80%.

It seems logic to me that on the first option SQL Server will go through the table only once and for each row, it will evaluate the condition.
On the second, it will have to go through all table 4 times
So, for a table with 1000 rows, on the first option on the best case scenario we are talking about 1000 evaluations and worst case, 3000.
On the second we'll always have 4000 evaluations
So option 1 would be the faster.

As pointed out by Mitch, try making a temp table filling it with all the data you need, make a different temp table for each column (field) you want to change. You should also add an index to the temp table(s) for added performance improvement.
This way your update statement becomes (more or less):
UPDATE tbl_name SET field_name = COALESCE((SELECT value FROM temp_tbl WHERE tbl_name.conditional_field = temp_tbl.condition_value), field_name),
field_name2 = COALESCE((SELECT value FROM temp_tbl2 WHERE tbl_name.conditional_field2 = temp_tbl2.condition_value), field_name2)
and so on..
This should give you good performance while scaling up for large volumes of updates at once.

Related

simple UPDATE query on large table has bad performance

I need to do the following update query through a stored procedure:
UPDATE table1
SET name = #name (this is the stored procedure inputparameter)
WHERE name IS NULL
Table1 has no indexes or keys, 5 columns which are 4 integers and 1 varchar (updatable column 'name' is the varchar column)
The NULL records are about 15.000.000 rows that need updating. This takes about 50 minutes, which I think is too long.
I'm running an Azure SQL DB Standard S6 (400DTU's).
Can anyone give me an advise to improve performance?
As you don't have any keys, or indexes, I can suggest following approach.
1- Create a new table using INTO (which will copy the data) like following query.
SELECT
CASE
WHEN NAME IS NULL THEN #name
ELSE NAME
END AS NAME,
<other columns >
INTO dbo.newtable
FROM table1
2- Drop the old table
drop table table1
3- Rename the new table to table1
exec sp_rename 'dbo.newtable', 'table1'
Another approach can be using batch update, sometime you get better performance compared to bulk update (You need to test by adjusting the batch size).
WHILE EXISTS (SELECT 1 FROM table1 WHERE name is null)
BEGIN
UPDATE TOP (10000) table1
SET name = #name
WHERE n ame is null
END
can you do with following method ?
UPDATE table1
SET name = ISNULL(name,#name)
for null values it will update with #name and rest will be updated with same value.
No. You are updating 15,000,000 rows which is going to take a long time. Each update has overhead for finding the row and logging the value.
With so many rows to update, it is unlikely that the overhead is finding the rows. If you add an index on name, the update is going to actually have to update the index as well as updating the original values.
If your concern is locking the database, you can set up a loop where you do something like this over and over:
UPDATE TOP (100000) table1
SET name = #name (this is the stored procedure inputparameter)
WHERE name IS NULL;
100,000 rows should be about 30 seconds or so.
In this case, an index on name does help. Otherwise, each iteration of the loop would in essence be reading the entire table.

Is there a better way to split work for simple updates?

I'm doing a MS SQL Server update of a row in a table that's very simple. I'm replacing about 4 things with another 4 things.
update Table set Column1 = 'something new' where Column1 = 'something old';
update Table set Column1 = 'something new 2' where Column1 = 'something old 2';
update Table set Column1 = 'something new 3' where Column1 = 'something old 3';
update Table set Column1 = 'something new 4' where Column1 = 'something old 4';
That's really all there is to it. But my question is, this is a table with a huge number of records running in production, but the exact number is unknown before running the updates.
There is a timestamp column. And it's probably more important to update the most recent ones first.
But my question is probably a more prectical one.
Is it best to partition this up by timestamp and let it run manually, or is there a better method of letting this run? I can also divide up the work by each update statement.
Or is there some way to put such a thing into the script itself?
I've tried looking at the plans for the queries, but it doesn't tell me the best way to split it up.
Use Update Top
You can update data as chunks using a while loop and Update Top option:
WHILE 1 = 1
BEGIN
UPDATE top (1000) tableToUpdate
SET Column1 = 'something new'
WHERE
Column1 = 'something old';
if ##ROWCOUNT < 1000 BREAK
END
When ##ROWCOUNT is less than 1000 which is the chunk size it implies that all rows are updated.
Note That, based on the official documentation:
The rows referenced in the TOP expression used with INSERT, UPDATE, or DELETE are not arranged in any order.
Update using TOP and Order BY
If you are looking to update sorted data based on a timestamp, in the official documentation they mentioned that:
If you must use TOP to apply updates in a meaningful chronology, you must use TOP together with ORDER BY in a subselect statement.
As Example:
WHILE 1 = 1
BEGIN
UPDATE tableToUpdate
SET Column1 = 'something new'
FROM (SELECT TOP 1000 IDColumn FROM tableToUpdate WHERE tableToUpdate.Column1 = 'something old' ORDER BY TimeStamp DESC) tto
WHERE
tableToUpdate.ID = tto.ID;
if ##ROWCOUNT < 1000 BREAK
END
Other helpful links
UPDATE (Transact-SQL) - official documentation
How can I create a loop on an UPDATE statement that works until there is no row left to update?
Fastest way to update 120 Million records
How to update large table with millions of rows in SQL Server?
Updating rows in a large table in sql server

Update query in Access to update MANY columns from Null to value

I have a database table with about 100 columns (bulky, I know). I have about half of these columns which I will need to update iteratively to set Is Null or "" values to "TBD".
I compiled all 50 some columns which need to be updated into an update query with Access SQL code that looked something like this...
UPDATE tablename
SET tablename.column1="TBD", tablename.column2="TBD", tablename.column3="TBD"....
WHERE tablename.column1 Is Null OR tablename.column1="" OR tablename.column2 Is Null OR tablename.column2="" OR tablename.column3 Is Null OR tablename.column3=""....
Two issues: This query with 50 columns receives a "query is too complex" error.
This query is also just functionally wrong...because I'm losing data within these columns due to the WHERE statement. Records that had values populated which I did not want to update are being updated because of the OR clause.
My question is how can I go about updating all of these columns and setting their null or empty values to a particular value (in this case, "TBD")?
I know that I can just use a select query to select the columns I need to update, run it, and just CTRL+H to find & replace "" to "TBD". However, I'm worried about the potential for this to introduce errors into my dataset. I also know I could also go through column by column and update these values via an update query. However, this would be quite time consuming with 50+ columns & the iterative updates which I need to run on the entire dataset.
I'm leaning towards this latter route. I am still wondering if there are any other scripted options which I can build into a query to overcome such an issue, and that leads me here to you.
Thank you!
You could just run 50 queries:
UPDATE table SET column1="TBD" WHERE column1 IS NULL OR column1 = "";
An optimization could be:
Create a temporary table which determines which rows actually would need an update: Concatenate all column values such that a single NULL or empty would result in an record in your temp table. This way you only have to scan the base table once.
Use the keys from that table to focus on those rows only.
Etc.
That is safe and only updates your empty values (where as your previous query would have updated all columns unless you would have checked every value first with an IFNULL).
This query style also does not run into the too complex issue
You could issue one query as:
UPDATE tablename
SET column1 = iif(column1 is null or column1 = "", "TBD", column1),
column2 = iif(column2 is null or column2 = "", "TBD", column2),
. . .;
If you don't mind potentially updating all rows, you can leave out the where clause.

Update if different/changed

Is it possible to perform an update statement in sql, but only update if the updates are different?
for example
if in the database, col1 = "hello"
update table1 set col1 = 'hello'
should not perform any kind of update
however, if
update table1 set col1 = "bye"
this should perform an update.
During query compilation and execution, SQL Server does not take the time to figure out whether an UPDATE statement will actually change any values or not. It just performs the writes as expected, even if unnecessary.
In the scenario like
update table1 set col1 = 'hello'
you might think SQL won’t do anything, but it will – it will perform all of the writes necessary as if you’d actually changed the value. This occurs for both the physical table (or clustered index) as well as any non-clustered indexes defined on that column. This causes writes to the physical tables/indexes, recalculating of indexes and transaction log writes. When working with large data sets, there is huge performance benefits to only updating rows that will receive a change.
If we want to avoid the overhead of these writes when not necessary we have to devise a way to check for the need to be updated. One way to check for the need to update would be to add something like “where col <> 'hello'.
update table1 set col1 = 'hello' where col1 <> 'hello'
But this would not perform well in some cases, for example if you were updating multiple columns in a table with many rows and only a small subset of those rows would actually have their values changed. This is because of the need to then filter on all of those columns, and non-equality predicates are generally not able to use index seeks, and the overhead of table & index writes and transaction log entries as mentioned above.
But there is a much better alternative using a combination of an EXISTS clause with an EXCEPT clause. The idea is to compare the values in the destination row to the values in the matching source row to determine if an update is actually needed. Look at the modified query below and examine the additional query filter starting with EXISTS. Note how inside the EXISTS clause the SELECT statements have no FROM clause. That part is particularly important because this only adds on an additional constant scan and a filter operation in the query plan (the cost of both is trivial). So what you end up with is a very lightweight method for determining if an UPDATE is even needed in the first place, avoiding unnecessary write overhead.
update table1 set col1 = 'hello'
/* AVOID NET ZERO CHANGES */
where exists
(
/* DESTINATION */
select table1.col1
except
/* SOURCE */
select col1 = 'hello'
)
This looks overly complicated vs checking for updates in a simple WHERE clause for the simple scenerio in the original question when you are updating one value for all rows in a table with a literal value. However, this technique works very well if you are updating multiple columns in a table, and the source of your update is another query and you want to minimize writes and transaction logs entries. It also performs better than testing every field with <>.
A more complete example might be
update table1
set col1 = 'hello',
col2 = 'hello',
col3 = 'hello'
/* Only update rows from CustomerId 100, 101, 102 & 103 */
where table1.CustomerId IN (100, 101, 102, 103)
/* AVOID NET ZERO CHANGES */
and exists
(
/* DESTINATION */
select table1.col1
table1.col2
table1.col3
except
/* SOURCE */
select z.col1,
z.col2,
z.col3
from #anytemptableorsubquery z
where z.CustomerId = table1.CustomerId
)
The idea is to not perform any update if a new value is the same as in DB right now
WHERE col1 != #newValue
(obviously there is also should be some Id field to identify a row)
WHERE Id = #Id AND col1 != #newValue
PS: Originally you want to do update only if value is 'bye' so just add AND col1 = 'bye', but I feel that this is redundant, I just suppose
PS 2: (From a comment) Also note, this won't update the value if col1 is NULL, so if NULL is a possibility, make it WHERE Id = #Id AND (col1 != #newValue OR col1 IS NULL).
If you want to change the field to 'hello' only if it is 'bye', use this:
UPDATE table1
SET col1 = 'hello'
WHERE col1 = 'bye'
If you want to update only if it is different that 'hello', use:
UPDATE table1
SET col1 = 'hello'
WHERE col1 <> 'hello'
Is there a reason for this strange approach? As Daniel commented, there is no special gain - except perhaps if you have thousands of rows with col1='hello'. Is that the case?
This is possible with a before-update trigger.
In this trigger you can compare the old with the new values and cancel the update if they don't differ. But this will then lead to an error on the caller's site.
I don't know, why you want to do this, but here are several possibilities:
Performance: There is no performance gain here, because the update would not only need to find the correct row but additionally compare the data.
Trigger: If you want the trigger only to be fired if there was a real change, you need to implement your trigger like so, that it compares all old values to the new values before doing anything.
CREATE OR REPLACE PROCEDURE stackoverflow([your_value] IN TYPE) AS
BEGIN
UPDATE [your_table] t
SET t.[your_collumn] = [your_value]
WHERE t.[your_collumn] != [your_value];
COMMIT;
EXCEPTION
[YOUR_EXCEPTION];
END stackoverflow;
You need an unique key id in your table, (let's suppose it's value is 1) to do something like:
UPDATE table1 SET col1="hello" WHERE id=1 AND col1!="hello"
Old question but none of the answers correctly address null values.
Using <> or != will get you into trouble when comparing values for differences if there are is potential null in the new or old value to safely update only when changed use the is distinct from operator in Postgres. Read more about it here
I think this should do the trick for ya...
create trigger [trigger_name] on [table_name]
for insert
AS declare #new_val datatype,#id int;
select #new_val = i.column_name from inserted i;
select #id = i.Id from inserted i;
update table_name set column_name = #new_val
where table_name.Id = #id and column_name != #new_val;

select the rows affected by an update

If I have a table with this fields:
int:id_account
int:session
string:password
Now for a login statement I run this sql UPDATE command:
UPDATE tbl_name
SET session = session + 1
WHERE id_account = 17 AND password = 'apple'
Then I check if a row was affected, and if one indeed was affected I know that the password was correct.
Next what I want to do is retrieve all the info of this affected row so I'll have the rest of the fields info.
I can use a simple SELECT statement but I'm sure I'm missing something here, there must be a neater way you gurus know, and going to tell me about (:
Besides it bothered me since the first login sql statement I ever written.
Is there any performance-wise way to combine a SELECT into an UPDATE if the UPDATE did update a row?
Or am I better leaving it simple with two statements? Atomicity isn't needed, so I might better stay away from table locks for example, no?
You should use the same WHERE statement for SELECT. It will return the modified rows, because your UPDATE did not change any columns used for lookup:
UPDATE tbl_name
SET session = session + 1
WHERE id_account = 17 AND password = 'apple';
SELECT *
FROM tbl_name
WHERE id_account = 17 AND password = 'apple';
An advice: never store passwords as plain text! Use a hash function, like this:
MD5('apple')
There is ROW_COUNT() (do read about details in the docs).
Following up by SQL is ok and simple (which is always good), but it might unnecessary stress the system.
This won't work for statements such as...
Update Table
Set Value = 'Something Else'
Where Value is Null
Select Value From Table
Where Value is Null
You would have changed the value with the update and would be unable to recover the affected records unless you stored them beforehand.
Select * Into #TempTable
From Table
Where Value is Null
Update Table
Set Value = 'Something Else'
Where Value is Null
Select Value, UniqueValue
From #TempTable TT
Join Table T
TT.UniqueValue = T.UniqueValue
If you're lucky, you may be able to join the temp table's records to a unique field within Table to verify the update. This is just one small example of why it is important to enumerate records.
You can get the effected rows by just using ##RowCount..
select top (Select ##RowCount) * from YourTable order by 1 desc