How to write a SQL update statement that will fail if it updates more than one row? - sql

A common mistake when writing update statements is to forget the where clause, or to write it incorrectly, so that more rows than expected get updated. Is there a way to specify in the update statement itself that it should only update one row (and to fail if it would update more)?
Correcting an error in the number of rows updated requires thinking ahead - using a transaction, formatting it as a select first to check the number of rows - and then actually catching the error. It would be useful to be able to write in one place the expectation for the number of rows.

Combining a few facts, I found a working solution for Postgres.
A select will fail when comparing using = to a subquery that returns more than one row. (where x = (select ...))
Values can be returned from an update statement, using the returning clause. An update cannot be used as a subquery, but it can be used as a CTE, which can be used in a subquery.
Example:
create table foo (id int not null primary key, x int not null);
insert into foo (id, x) values (1,5), (2,5);
with updated as (update foo set x = 4 where x = 5 returning id)
select id from foo where id = (select id from updated);
The query containing the update fails with ERROR: more than one row returned by a subquery used as an expression, and the updates are not applied. If the update's where clause is adjusted to only match one row, the update succeeds.

Related

Update a row multiple times when performing a join

CREATE TABLE Replacements
(
OldVal nvarchar(max),
NewVal nvarchar(max)
);
CREATE TABLE Foo
(
Val nvarchar(max)
);
insert into Replacements values ('old1','new1');
insert into Replacements values ('old2','new2');
insert into Replacements values ('old3','new3');
insert into Replacements values ('old4','new4');
insert into Foo values ('old1');
insert into Foo values ('old3');
insert into Foo values ('old2;old4');
I have some data that may be delimited by semicolon. I need to join the data against a lookup table and replace the old data with the new data. This works fine when the data is not delimited, but if it is delimited, it only performs the first update.
update f
set f.val = Replace(f.val, r.OldVal, r.NewVal)
from Foo f
inner join replacements r on (CHARINDEX(r.OldVal, f.Val) > 0);
select * from Foo;
Val
new1
new3
new2;old4
How can I perform multiple updates on the same row? Is there a better way for finding/replacing strings within delimited strings? Compatibility will need to be back to SQL Sever 2014.
http://sqlfiddle.com/#!18/c320d13/9
In SQL Server, UPDATE statements can only affect 1 of each of the rows in the target table in any single statement.
This means that the expected solution is to execute this update multiple times, one hack for this is to simply script the update a fixed number of times:
update f
set f.val = Replace(f.val,r.OldVal,r.NewVal)
from
Foo f inner join replacements r
on
(CHARINDEX(r.OldVal,f.Val) > 0);
update f
set f.val = Replace(f.val,r.OldVal,r.NewVal)
from
Foo f inner join replacements r
on
(CHARINDEX(r.OldVal,f.Val) > 0);
http://sqlfiddle.com/#!18/c320d13/10
An alternative solution would be to recursively apply the update:
WHILE EXISTS (SELECT 1
from Foo f
inner join replacements r on (f.Val = r.OldVal or CHARINDEX(r.OldVal,f.Val) > 0))
BEGIN
update f
set f.val = Replace(f.val,r.OldVal,r.NewVal)
from
Foo f inner join replacements r
on
(CHARINDEX(r.OldVal,f.Val) > 0);
END
**NOTE: **
This answer is based on the original post that specified the delimiter was unknown. With a known delimiter the UPDATE statement needs to be more specific, but ultimately the update will need to be applied multiple times.
Other possible solutions include using a CURSOR or splitting the concatenated field, replacing the tokens then re-joining the tokens back into a delimited string.
The docs are a little bit vague on why we need to do this but, if your UPDATE has multiple matches on the target table, only 1 of the matches will be applied, but there is no guarantee which one, it is indeterminate:
UPDATE: Best Practices
Use caution when specifying the FROM clause to provide the criteria for the update operation. The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic. For example, in the UPDATE statement in the following script, both rows in Table1 meet the qualifications of the FROM clause in the UPDATE statement; but it is undefined which row from Table1 is used to update the row in Table2.
This has been discussed before on SO: SQL update same row multiple times and is explicitly disallowed in the MERGE statement, it would result in this error:
The MERGE statement attempted to UPDATE or DELETE the same row more than once.
This happens when a target row matches more than one source row.
A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times.
Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

Update multiple rows in the target table from one row in the source table using a natural key

I need to update two columns in a FACT table using data from a dimension table. the challenge is that I don't have a primary key that match both tables, so I have to use a natural key, two columns to create a unique value. besides the source have a single record and the target has multiples records. if I do a merge I get
ora-30926 unable to get a stable set of rows
and if I do an update I get another error. please I need help.
I try this update statement:
UPDATE dw.target_table obc
SET
( obc.sail_key,
obc.durations ) = (
SELECT
sd.sail_key,
sd.durations
FROM
dw.source_table sd
WHERE
obc.code_1 = sd.code_2
AND obc.date_1 = sd.date_2
)
WHERE
obc.item NOT IN (
30,
40
)
AND obc.sail_key = 0
and OBC.load_date between to_date('01-12-2018','DD-MM-YYYY')
AND to_date ('31-12-2018','DD-MM-YYYY');
and I try this merge statement:
MERGE INTO dw.target_table obc
USING ( SELECT distinct
code_2,date_2,durations,sail_key
FROM dw.source_table
) tb_dim
ON ( obc.code_1 = tb_dim.code_2
AND obc.date_1 = tb_dim.date_2 )
WHEN MATCHED THEN UPDATE SET obc.durations = tb_dim.durations,
obc.sail_key = tb_dim.sail_key
WHERE
obc.sail_key = 0
AND obc. NOT IN (
30,
40
)
AND obc.loaddate BETWEEN TO_DATE('01-01-2012','DD-MM-YYYY')
AND TO_DATE ('31-01-2012','DD-MM-YYYY');
ora-30926 unable to get a stable set of rows
This means (code_2,date_2) is not a unique key of tb_dim. Consequently your USING subquery does not produce a set which matches just one row to any row in obc. Consequently the MERGE fails, because Oracle cannot determine which row from the USING subquery should be applied to the target. The DISTINCT does not help because it is applied to the whole projection and it seems you have multiple different values of durations,sail_key for each permutation of code_2,date_2.
You don't say which error you get when you run your UPDATE but presumably it's ORA-01779 or ORA-01427. Something indicating the subquery isn't returning a set of joining keys.
So how do you fix the situation? We cannot give you the correct solution because this is a failure of your data model or your specification. Solving requires an understanding of your business that we do not have. But generally you need to find an extra rule which reduces the USING subquery to a set. That is:
add a third key column which allows the ON clause to map one row in tb_dim to one row in obc; or
use row_number() analytic function in the subquery to fake such a column, preferably ordering by a meaningful column such as date; or
add a criterion WHERE clause of the subquery to remove duplicate values of code_2,date_2.
Alternatively, if you don't care which particular values of durations,sail_key get applied you can use an aggregate:
USING (SELECT code_2
,date_2
,max(durations) as durations
,max(sail_key) as sail_key
FROM dw.source_table
group by code_2,date_2 ) tb_dim
Use whatever function makes sense to you.

Get Id from a conditional INSERT

For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i

How to insert generated id into a results table

I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!
Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.

AFTER UPDATE trigger using an aggregate from the updated table

I'm having trouble with an update trigger. I want the trigger to set Quarterbacks.Yards equal to the sum of Wide receiving yards if they're on the same team.
create trigger T_Receiving_Passing
on NFL.Widereceivers
after update
as
begin
declare
#receivingyards int,
select #receivingyards = sum (Widereceivers.Yards) from NFL.Widereceivers
update NFL.Quarterbacks
set Quarterbacks.Yards = #receivingyards
where Quarterbacks.Team = Widereceivers.Team
end
At the end of the statement, Widereceivers.Team is underlined in red, and it is causing errors. I get this same error whenever I try to reference a column in another table without naming the table in a from clause. How can I fix this problem?
Ok, you should be able to do this without the SELECT statement or its variable and instead use a more complex UPDATE statement joining on the special inserted table which holds the new values from the update.
CREATE TRIGGER T_Receiving_Passing
ON NFL.Widereceivers
AFTER UPDATE
AS
UPDATE NFL.Quarterbacks
-- Get the SUM() in a subselect
SET Quarterbacks.Yards = (SELECT SUM(Yards) FROM Widereceivers WHERE Team = inserted.Team)
FROM
NFL.Quarterbacks
-- Join against the special inserted table
JOIN inserted ON Quarterbacks.Team = inserted.Team
GO
Here is a proof of concept
In your original attempt, you hoped to use a SELECT query first to populate a scalar variable. In an UPDATE statement however, you can use a subselect that returns exactly one column of exactly one row inside the SET clause to retrieve a new value.
Since your requirement was to use an aggregate SUM() it isn't as straightforward as assigning a value directly from the inserted like SET Yards = inserted.Yards. Instead, the subselect produces the aggregate sum limited to just the Team used in the inserted row.
As far as the inserted/deleted tables go, review the official documentation. I have not worked with SQL Server regularly for a few years but if I recall correctly, the inserted table must occur in the FROM clause which implies it will usually need to be JOINed in. In your UPDATE statement, inserted is needed in both the subselect and the outer query, so it was joined in the outer one.