MERGE - UPDATE column values separately, based on logic in WHEN MATCHED block

MERGE - UPDATE column values separately, based on logic in WHEN MATCHED block - sql

Earlier today, I asked this question and got the answer I was looking for. Now I have a follow-up question:
What I want:
I want the MERGE to compare each column value, per row, in the target table against the corresponding value in the source table, and make any updates based on the logic separated by OR in the WHEN MATCHED AND block.
I am afraid that the code I've written (pictured below) will make the updates listed in the THEN UPDATE SET block if any of the logic separated by OR in the WHEN MATCHED AND block is true.
If my hunch is correct, do you have any suggestions for how to re-write the code to have it behave like I want it to behave?

Not having your data, and not wanting to re-type your query from an image, I created a sample that I think demonstrates what you want:
create table t (ID int not null,Col1 int null,Col2 int null)
create table s (ID int not null,Col1 int null,Col2 int null)
insert into t(ID,Col1,Col2) values (1,1,null),(2,null,2)
insert into s(ID,Col1,Col2) values (1,3,4),(2,5,6),(3,7,8)
;merge into t
using s
on t.ID = s.ID
when not matched then insert (ID,Col1,Col2) values (s.ID,s.Col1,s.Col2)
when matched then update
set Col1 = COALESCE(t.Col1,s.Col1),
Col2 = COALESCE(t.Col2,s.Col2)
;
select * from t
Result:
ID Col1 Col2
----------- ----------- -----------
1 1 4
2 5 2
3 7 8
Where the key is to use COALESCE to avoid updating a column value if it already has one (which I think is what you're trying to achieve)

I'm not sure I understand the question - do you mean... well, I'm not sure what you mean. Minus the extra trailing OR, you have two conditions. If either (or both) of these evaluate to TRUE, the target table will be updated by the THEN UPDATE
However, you are MATCHing on unique_key, and the first condition (s.unique_key IS NOT NULL AND t.unique_key IS NULL) will never be true, because if it were true then the records would not be matched. So the first part of the OR can be ignored.
Also, since the records are MATCHED on unique_key, it is completely redundant to update the target with the source value of unique_key - they are already the same.
Thus, as it is currently written, your MERGE is:
MERGE dbo.input311 AS T
USING dbo.input311staging AS S
ON S.unique_key = S.unique_key
WHEN NOT MATCHED BY TARGET THEN
INSERT
-- insert statement I'm too lazy to type
WHEN MATCHED AND s.created_date IS NOT NULL AND t.created_date IS NULL THEN
UPDATE SET t.created_date = s.created_date

Related

How to write a SQL update statement that will fail if it updates more than one row?

A common mistake when writing update statements is to forget the where clause, or to write it incorrectly, so that more rows than expected get updated. Is there a way to specify in the update statement itself that it should only update one row (and to fail if it would update more)?
Correcting an error in the number of rows updated requires thinking ahead - using a transaction, formatting it as a select first to check the number of rows - and then actually catching the error. It would be useful to be able to write in one place the expectation for the number of rows.

Combining a few facts, I found a working solution for Postgres.
A select will fail when comparing using = to a subquery that returns more than one row. (where x = (select ...))
Values can be returned from an update statement, using the returning clause. An update cannot be used as a subquery, but it can be used as a CTE, which can be used in a subquery.
Example:
create table foo (id int not null primary key, x int not null);
insert into foo (id, x) values (1,5), (2,5);
with updated as (update foo set x = 4 where x = 5 returning id)
select id from foo where id = (select id from updated);
The query containing the update fails with ERROR: more than one row returned by a subquery used as an expression, and the updates are not applied. If the update's where clause is adjusted to only match one row, the update succeeds.

DB2 SQL Statement for Inserting new data and updating existing data

I've found a lot of near misses on this question. So many similar, but not quite right, scenarios. No doubt my ignorance will shine here.
Using DB2 and a shred of knowledge, my scenario is as follows:
On a table, insert a row of data if a given value is not present in a given column, or update the corresponding row if the value is present.
I have a table
id, bigint, not nullable
ref,varchar, nullable
I am not sure if a MERGE is the correct path here as most examples and thorough discussions all seem to revolve around merging one table into another. I'm simply gathering user input and either adding it or updating it. It seems like it should be really simple.
I'm using jdbc and prepared statements to get this done.
Is MERGE the correct way to do this?
When testing my query in DB2 Control Center, I run up against
"No row was found for FETCH, UPDATE or DELETE; or the result of a
query is an empty table"
or a variety of other errors depending on how I structure my MERGE. Here's what I have presently.
merge into table1 as t1
using (select id from table1 group by id) as t2
on t1.id = t2.id
when matched then update set t1.ref = 'abc'
when not matched then insert (t1.id, t1.ref) values (123, 'abc');
If I were to instead compose an update followed by an insert; for new data the insert runs and the update fails, and for existing data they both succeed resulting in bad data in the table, e.g. two identical rows.
The desired result is if on initial use with the values:
id = 1
ref = a
a new row is added. On subsequent use if the values change to:
id = 1
ref = b
the row with id = 1 is updated. Subsequent uses would follow the same rules.
Please let me know how I can phrase this question better.
Update id is not an automatic incrementing key. It's an external key that will be unique but not every thing we are referencing will need a related row in the table I'm attempting to update. This table is rather unstructured on its own but is part of a larger data model.

I'm a bit puzzled by your query. Reading the text makes me suspect that you want something like this:
merge into table1 as t1
using ( values (123, 'abc') ) as t2 (id, ref)
on t1.id = t2.id
when matched then update
set t1.ref = t2.ref
when not matched then
insert (id, ref) values (t2.id, t2.ref);
Is that correct?

cannot insert value NULL into column error shows wrong column name

I've added a new column(NewValue) to my table which holds an int and allows nulls. Now I want to update the column but my insert statement only attempts to update the first column in the table not the one I specified.
I basically start with a temp table that I put my initial data into and it has two columns like this:
create table #tempTable
(
OldValue int,
NewValue int
)
I then do an insert into that table and based on the information NewValue can be null.
Example data in #tempTable:
OldValue NewValue
-------- --------
34556 8765432
34557 7654321
34558 null
Once that's complete I planned to insert NewValue into the primary table like so:
insert into myPrimaryTable(NewValue)
select tt.NewValue from #tempTable tt
left join myPrimaryTable mpt on mpt.Id = tt.OldValue
where tt.NewValue is not null
I only want the NewValue to insert into rows in myPrimaryTable where the Id matches the OldValue. However when I try to execute this code I get the following error:
Cannot insert the value NULL into column 'myCode', table 'myPrimaryTable'; column does not allow nulls. INSERT fails.
But I'm not trying to insert into 'myCode', I specified 'NewValue' as the column but it doesn't seem to see it. I've checked NewValue and it is set to allow int and is set to allow null and it does exist on the right table in the right database. The column 'myCode' is actually the second column in the table. Could someone please point me in the right direction with this error?
Thanks in advance.

INSERT always creates new rows, it never modifies existing rows. If you skip specifying a value for a column in an INSERT and that column has no DEFAULT bound to it and is not identity, that column will be NULL in the new row--thus your error. I believe you might be looking for an UPDATE instead of an INSERT.
Here's a potential query that might work for you:
UPDATE mpt
SET
mpt.NewValue = tt.NewValue
FROM
myPrimaryTable mpt
INNER JOIN #tempTable tt
ON mpt.Id = tt.OldValue -- really?
WHERE
tt.NewValue IS NOT NULL;
Note that I changed it to an INNER JOIN. A LEFT JOIN is clearly incorrect since you are filtering #tempTable for only rows with values, and don't want to update mpt where there is no match to tt--so LEFT JOIN expresses the wrong logical join type.
I put "really?" as a comment on the ON clause since I was wondering if OldValue is really an Id. It probably is--you know your table best. It just raised a mild red flag in my mind to see an Id column being compared to a column that does not have Id in its name (so if it is correct, I would suggest OldId as a better column choice than OldValue).
Also, I recommend that you never name a column just Id again--column names should be the same in every table in the database. Also, when it comes join time you will be more likely to make mistakes when your columns from different tables can coincide. It is much better to follow the format of SomethingId in the Something table, instead of just Id. Correspondingly, the suggested old column name would be OldSomethingId.

Update if different/changed

Is it possible to perform an update statement in sql, but only update if the updates are different?
for example
if in the database, col1 = "hello"
update table1 set col1 = 'hello'
should not perform any kind of update
however, if
update table1 set col1 = "bye"
this should perform an update.

During query compilation and execution, SQL Server does not take the time to figure out whether an UPDATE statement will actually change any values or not. It just performs the writes as expected, even if unnecessary.
In the scenario like
update table1 set col1 = 'hello'
you might think SQL won’t do anything, but it will – it will perform all of the writes necessary as if you’d actually changed the value. This occurs for both the physical table (or clustered index) as well as any non-clustered indexes defined on that column. This causes writes to the physical tables/indexes, recalculating of indexes and transaction log writes. When working with large data sets, there is huge performance benefits to only updating rows that will receive a change.
If we want to avoid the overhead of these writes when not necessary we have to devise a way to check for the need to be updated. One way to check for the need to update would be to add something like “where col <> 'hello'.
update table1 set col1 = 'hello' where col1 <> 'hello'
But this would not perform well in some cases, for example if you were updating multiple columns in a table with many rows and only a small subset of those rows would actually have their values changed. This is because of the need to then filter on all of those columns, and non-equality predicates are generally not able to use index seeks, and the overhead of table & index writes and transaction log entries as mentioned above.
But there is a much better alternative using a combination of an EXISTS clause with an EXCEPT clause. The idea is to compare the values in the destination row to the values in the matching source row to determine if an update is actually needed. Look at the modified query below and examine the additional query filter starting with EXISTS. Note how inside the EXISTS clause the SELECT statements have no FROM clause. That part is particularly important because this only adds on an additional constant scan and a filter operation in the query plan (the cost of both is trivial). So what you end up with is a very lightweight method for determining if an UPDATE is even needed in the first place, avoiding unnecessary write overhead.
update table1 set col1 = 'hello'
/* AVOID NET ZERO CHANGES */
where exists
(
/* DESTINATION */
select table1.col1
except
/* SOURCE */
select col1 = 'hello'
)
This looks overly complicated vs checking for updates in a simple WHERE clause for the simple scenerio in the original question when you are updating one value for all rows in a table with a literal value. However, this technique works very well if you are updating multiple columns in a table, and the source of your update is another query and you want to minimize writes and transaction logs entries. It also performs better than testing every field with <>.
A more complete example might be
update table1
set col1 = 'hello',
col2 = 'hello',
col3 = 'hello'
/* Only update rows from CustomerId 100, 101, 102 & 103 */
where table1.CustomerId IN (100, 101, 102, 103)
/* AVOID NET ZERO CHANGES */
and exists
(
/* DESTINATION */
select table1.col1
table1.col2
table1.col3
except
/* SOURCE */
select z.col1,
z.col2,
z.col3
from #anytemptableorsubquery z
where z.CustomerId = table1.CustomerId
)

The idea is to not perform any update if a new value is the same as in DB right now
WHERE col1 != #newValue
(obviously there is also should be some Id field to identify a row)
WHERE Id = #Id AND col1 != #newValue
PS: Originally you want to do update only if value is 'bye' so just add AND col1 = 'bye', but I feel that this is redundant, I just suppose
PS 2: (From a comment) Also note, this won't update the value if col1 is NULL, so if NULL is a possibility, make it WHERE Id = #Id AND (col1 != #newValue OR col1 IS NULL).

If you want to change the field to 'hello' only if it is 'bye', use this:
UPDATE table1
SET col1 = 'hello'
WHERE col1 = 'bye'
If you want to update only if it is different that 'hello', use:
UPDATE table1
SET col1 = 'hello'
WHERE col1 <> 'hello'
Is there a reason for this strange approach? As Daniel commented, there is no special gain - except perhaps if you have thousands of rows with col1='hello'. Is that the case?

This is possible with a before-update trigger.
In this trigger you can compare the old with the new values and cancel the update if they don't differ. But this will then lead to an error on the caller's site.
I don't know, why you want to do this, but here are several possibilities:
Performance: There is no performance gain here, because the update would not only need to find the correct row but additionally compare the data.
Trigger: If you want the trigger only to be fired if there was a real change, you need to implement your trigger like so, that it compares all old values to the new values before doing anything.

CREATE OR REPLACE PROCEDURE stackoverflow([your_value] IN TYPE) AS
BEGIN
UPDATE [your_table] t
SET t.[your_collumn] = [your_value]
WHERE t.[your_collumn] != [your_value];
COMMIT;
EXCEPTION
[YOUR_EXCEPTION];
END stackoverflow;

You need an unique key id in your table, (let's suppose it's value is 1) to do something like:
UPDATE table1 SET col1="hello" WHERE id=1 AND col1!="hello"

Old question but none of the answers correctly address null values.
Using <> or != will get you into trouble when comparing values for differences if there are is potential null in the new or old value to safely update only when changed use the is distinct from operator in Postgres. Read more about it here

I think this should do the trick for ya...
create trigger [trigger_name] on [table_name]
for insert
AS declare #new_val datatype,#id int;
select #new_val = i.column_name from inserted i;
select #id = i.Id from inserted i;
update table_name set column_name = #new_val
where table_name.Id = #id and column_name != #new_val;

Update function in TSQL trigger

I have a question about TSQL function Update. For example, I have a table with a field Name. If I check if the field Name is changed or not in a After Update trigger likes this:
if Update(Name)
Begin
-- process
End
Will the Update still return TRUE even if Name is not changed? The following update statement will update it with the same value:
SELECT #v_Name = Name From MyTable Where Id = 1;
Update MyTable Set Name = #v_Name where Id = 1;
If the Update() returns TRUE even the value of Name is not changed, do I have to compare the value in the inserted and deleted virtual tables to find out if the value is really changed?
By the way, the inserted and deleted are virtual tables and they may contain more than one rows of data if more than one rows of data are changed by one TSQL INSERT or UPDATE statement. In case of more than one records, are the count numbers of rows in inserted and deleted virtual tables the same and what is the real meaning of Update(Name) as TRUE? Does it mean that at least one is changed? Or does Update(Name) mean that the field of Name has been set by Update statement regardless if the value is changed?
The SQL server I use is Microsoft SQL 2005.

Triggers are tricky and you need to think in bulk when you're creating one. A trigger fires once for each UPDATE statement. If that UPDATE statement updates multiple rows, the trigger will still only fire once. The UPDATE() function returns true for a column when that column is included in the UPDATE statement. That function helps to improve the efficiency of triggers by allowing you to sidestep SQL logic when that column isn't even included in the update statement. It doesn't tell you if the value changed for a column in a given row.
Here's a sample table...
CREATE TABLE tblSample
(
SampleID INT PRIMARY KEY,
SampleName VARCHAR(10),
SampleNameLastChangedDateTime DATETIME,
Parent_SampleID INT
)
If the following SQL was used against this table:
UPDATE tblSample SET SampleName = 'hello'
..and an AFTER INSERT, UPDATE trigger was in effect, this particular SQL statement would always evaluate the UPDATE function as follows...
IF UPDATE(SampleName) --aways evaluates to TRUE
IF UPDATE(SampleID) --aways evaluates to FALSE
IF UPDATE(Parent_SampleID) --aways evaluates to FALSE
Note that UPDATE(SampleName) would always be true for this SQL statement, regardless of what the SampleName values were before. It returns true because the UPDATE statement includes the column SampleName in the SET section of that clause and not based on what the values were before or afterward. The UPDATE() function will not determine if the values changed. If you want to do actions based on whether the values are changed you're going to need to use SQL and compare the inserted and deleted rows.
Here's an approach to keeping a last updated column in sync:
--/*
IF OBJECT_ID('dbo.tgr_tblSample_InsertUpdate', 'TR') IS NOT NULL
DROP TRIGGER dbo.tgr_tblSample_InsertUpdate
GO
--*/
CREATE TRIGGER dbo.tgr_tblSample_InsertUpdate ON dbo.tblSample
AFTER INSERT, UPDATE
AS
BEGIN --Trigger
IF UPDATE(SampleName)
BEGIN
UPDATE tblSample SET
SampleNameLastChangedDateTime = CURRENT_TIMESTAMP
WHERE
SampleID IN (SELECT Inserted.SampleID
FROM Inserted LEFT JOIN Deleted ON Inserted.SampleID = Deleted.SampleID
WHERE COALESCE(Inserted.SampleName, '') <> COALESCE(Deleted.SampleName, ''))
END
END --Trigger
The logic to determine if the row was updated is in the WHERE clause above. That's the real check you need to do. My logic is using COALESCE to handle NULL values and INSERTS.
...
WHERE
SampleID IN (SELECT Inserted.SampleID
FROM Inserted LEFT JOIN Deleted ON Inserted.SampleID = Deleted.SampleID
WHERE COALESCE(Inserted.SampleName, '') <> COALESCE(Deleted.SampleName, ''))
Note that the IF UPDATE() check is used to help improve the efficiency of the trigger for when the SampleName column is NOT being updated. If a SQL statement updated the Parent_SampleID column for instance then that IF UPDATE(SampleName) check would help sidestep around the more complex logic in that IF statement when it doesn't need to run. Consider using UPDATE() when it's appropriate but not for the wrong reason.
Also realize that depending on your architecture, the UPDATE function may have no use to you. If your code architecture uses a middle-tier that always updates all columns in a row of a table with the values in the business object when the object is saved, the UPDATE() function in a trigger becomes useless. In that case, your code is likely always updating all the columns with every UPDATE statement issued from the middle-tier. That being the case, the UPDATE(columnname) function would always evaluate to true when your business objects are saved because all the column names are always included in the update statements. In that case, it would not be helpful to use UPDATE() in the trigger and would just be extra overhead in that trigger for a majority of the time.
Here's some SQL to play with the trigger above:
INSERT INTO tblSample
(
SampleID,
SampleName
)
SELECT 1, 'One'
UNION SELECT 2, 'Two'
UNION SELECT 3, 'Three'
GO
SELECT SampleID, SampleName, SampleNameLastChangedDateTime FROM tblSample
/*
SampleID SampleName SampleNameLastChangedDateTime
----------- ---------- -----------------------------
1 One 2010-10-27 14:52:42.567
2 Two 2010-10-27 14:52:42.567
3 Three 2010-10-27 14:52:42.567
*/
GO
INSERT INTO tblSample
(
SampleID,
SampleName
)
SELECT 4, 'Foo'
UNION SELECT 5, 'Five'
GO
SELECT SampleID, SampleName, SampleNameLastChangedDateTime FROM tblSample
/*
SampleID SampleName SampleNameLastChangedDateTime
----------- ---------- -----------------------------
1 One 2010-10-27 14:52:42.567
2 Two 2010-10-27 14:52:42.567
3 Three 2010-10-27 14:52:42.567
4 Foo 2010-10-27 14:52:42.587
5 Five 2010-10-27 14:52:42.587
*/
GO
UPDATE tblSample SET SampleName = 'Foo'
SELECT SampleID, SampleName, SampleNameLastChangedDateTime FROM tblSample
/*
SampleID SampleName SampleNameLastChangedDateTime
----------- ---------- -----------------------------
1 Foo 2010-10-27 14:52:42.657
2 Foo 2010-10-27 14:52:42.657
3 Foo 2010-10-27 14:52:42.657
4 Foo 2010-10-27 14:52:42.587
5 Foo 2010-10-27 14:52:42.657
*/
GO
UPDATE tblSample SET SampleName = 'Not Prime' WHERE SampleID IN (1,4)
SELECT SampleID, SampleName, SampleNameLastChangedDateTime FROM tblSample
/*
SampleID SampleName SampleNameLastChangedDateTime
----------- ---------- -----------------------------
1 Not Prime 2010-10-27 14:52:42.680
2 Foo 2010-10-27 14:52:42.657
3 Foo 2010-10-27 14:52:42.657
4 Not Prime 2010-10-27 14:52:42.680
5 Foo 2010-10-27 14:52:42.657
*/
--Clean up...
DROP TRIGGER dbo.tgr_tblSample_InsertUpdate
DROP TABLE tblSample
User GBN had suggested the following:
IF EXISTS (
SELECT
*
FROM
INSERTED I
JOIN
DELETED D ON I.key = D.key
WHERE
D.valuecol <> I.valuecol --watch for NULLs!
)
blah
GBN's suggestion of using an IF (EXISTS( ...clause and putting the logic in that IF statement if rows exist that were changed could work. That approach will fire for ALL rows included in the trigger even if only some of the rows were actually changed (which may be appropriate for your solution, but also may not be appropriate if you only want to do something to rows where the values changed.) If you need to do something to rows where an actual change has occurred, you need different logic in your SQL that he provided.
In my examples above, when the UPDATE tblSample SET SampleName = 'Foo' statement is issued and the fourth row is already 'foo', using GBN's approach to update a "last changed datetime" column would also update the fourth row, which would not be appropriate in this case.

UPDATE() can be true, even if it's the same value. I would not rely on it personally and would compare values.
Second, DELETED and INSERTED have the same number of rows.
The Update() function is not per row, but across all rows. Another reason not to use it.
More here in MSDN, however it's a bit sparse, really.
After comment:
IF EXISTS (
SELECT
*
FROM
INSERTED I
JOIN
DELETED D ON I.key = D.key
WHERE
D.valuecol <> I.valuecol --watch for NULLs!
)
blah

I agree the best way to determine if a column value has actually changed (as opposed to being updated with the same value) is to do a comparison of the column values in the deleted and inserted pseudo tables. However, this can be a real pain if you want to check more than a few columns.
Here's a trick I came across in some code I was maintaining (don't know the original author):
Use a UNION and a GROUP BY with a HAVING clause to determine which columns have changed.
eg, in the trigger, to get the ID's of the rows that have changed:
SELECT SampleID
FROM
(
SELECT SampleID, SampleName
FROM deleted
-- NOTE: UNION, not UNION ALL. UNION by itself removes duplicate
-- rows. UNION ALL includes duplicate rows.
UNION
SELECT SampleID, SampleName
FROM inserted
) x
GROUP BY SampleID
HAVING COUNT(*) > 1
This is too much work when you're only checking if a single column has changed. But if you're checking 10 or 20 columns the UNION method is a lot less work than
WHERE COALESCE(Inserted.Column1, '') <> COALESCE(Deleted.Column1, '')
OR COALESCE(Inserted.Column2, '') <> COALESCE(Deleted.Column2, '')
OR COALESCE(Inserted.Column3, '') <> COALESCE(Deleted.Column3, '')
OR ...

I think that the following code is better than the examples above because it focuses on just the columns you want to check in a concise and efficient manner.
It determines if a value has changed in only the columns specified. I have not investigated its performance compared with the other solutions but it is working well in my database.
It uses the EXCEPT set operator to return any rows from the left query that are not also found on the right query. This code can be used in INSERT and UPDATE triggers.
The "PrimaryKeyID" column is the primary key of the table (can be multiple columns) and is required to enable matching between the two sets.
-- Only do trigger logic if specific field values change.
IF EXISTS(SELECT PrimaryKeyID
,Column1
,Column7
,Column10
FROM inserted
EXCEPT
SELECT PrimaryKeyID
,Column1
,Column7
,Column10
FROM deleted ) -- Tests for modifications to fields that we are interested in
BEGIN
-- Put code here that does the work in the trigger
END
If you want to use the changed rows in subsequent trigger logic, I usually put the results of the EXCEPT query into a table variable that can be referenced later on.
I hope this is of interest :-)

The update trigger will fire on all update statements. the impacted rows are available within the trigger in the "inserted" and "deleted" tables. You can compare the old and new values by comparing the PK columns in the two tables (if you have a PK). The actual table remains unchanged till the trigger finishes execution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas