Update multiple rows in the target table from one row in the source table using a natural key - sql

I need to update two columns in a FACT table using data from a dimension table. the challenge is that I don't have a primary key that match both tables, so I have to use a natural key, two columns to create a unique value. besides the source have a single record and the target has multiples records. if I do a merge I get
ora-30926 unable to get a stable set of rows
and if I do an update I get another error. please I need help.
I try this update statement:
UPDATE dw.target_table obc
SET
( obc.sail_key,
obc.durations ) = (
SELECT
sd.sail_key,
sd.durations
FROM
dw.source_table sd
WHERE
obc.code_1 = sd.code_2
AND obc.date_1 = sd.date_2
)
WHERE
obc.item NOT IN (
30,
40
)
AND obc.sail_key = 0
and OBC.load_date between to_date('01-12-2018','DD-MM-YYYY')
AND to_date ('31-12-2018','DD-MM-YYYY');
and I try this merge statement:
MERGE INTO dw.target_table obc
USING ( SELECT distinct
code_2,date_2,durations,sail_key
FROM dw.source_table
) tb_dim
ON ( obc.code_1 = tb_dim.code_2
AND obc.date_1 = tb_dim.date_2 )
WHEN MATCHED THEN UPDATE SET obc.durations = tb_dim.durations,
obc.sail_key = tb_dim.sail_key
WHERE
obc.sail_key = 0
AND obc. NOT IN (
30,
40
)
AND obc.loaddate BETWEEN TO_DATE('01-01-2012','DD-MM-YYYY')
AND TO_DATE ('31-01-2012','DD-MM-YYYY');

ora-30926 unable to get a stable set of rows
This means (code_2,date_2) is not a unique key of tb_dim. Consequently your USING subquery does not produce a set which matches just one row to any row in obc. Consequently the MERGE fails, because Oracle cannot determine which row from the USING subquery should be applied to the target. The DISTINCT does not help because it is applied to the whole projection and it seems you have multiple different values of durations,sail_key for each permutation of code_2,date_2.
You don't say which error you get when you run your UPDATE but presumably it's ORA-01779 or ORA-01427. Something indicating the subquery isn't returning a set of joining keys.
So how do you fix the situation? We cannot give you the correct solution because this is a failure of your data model or your specification. Solving requires an understanding of your business that we do not have. But generally you need to find an extra rule which reduces the USING subquery to a set. That is:
add a third key column which allows the ON clause to map one row in tb_dim to one row in obc; or
use row_number() analytic function in the subquery to fake such a column, preferably ordering by a meaningful column such as date; or
add a criterion WHERE clause of the subquery to remove duplicate values of code_2,date_2.
Alternatively, if you don't care which particular values of durations,sail_key get applied you can use an aggregate:
USING (SELECT code_2
,date_2
,max(durations) as durations
,max(sail_key) as sail_key
FROM dw.source_table
group by code_2,date_2 ) tb_dim
Use whatever function makes sense to you.

Related

How to write a SQL update statement that will fail if it updates more than one row?

A common mistake when writing update statements is to forget the where clause, or to write it incorrectly, so that more rows than expected get updated. Is there a way to specify in the update statement itself that it should only update one row (and to fail if it would update more)?
Correcting an error in the number of rows updated requires thinking ahead - using a transaction, formatting it as a select first to check the number of rows - and then actually catching the error. It would be useful to be able to write in one place the expectation for the number of rows.
Combining a few facts, I found a working solution for Postgres.
A select will fail when comparing using = to a subquery that returns more than one row. (where x = (select ...))
Values can be returned from an update statement, using the returning clause. An update cannot be used as a subquery, but it can be used as a CTE, which can be used in a subquery.
Example:
create table foo (id int not null primary key, x int not null);
insert into foo (id, x) values (1,5), (2,5);
with updated as (update foo set x = 4 where x = 5 returning id)
select id from foo where id = (select id from updated);
The query containing the update fails with ERROR: more than one row returned by a subquery used as an expression, and the updates are not applied. If the update's where clause is adjusted to only match one row, the update succeeds.

Bigquery Error: UPDATE/MERGE must match at most one source row for each target row

Just wondering if someone could help with the following error:
UPDATE/MERGE must match at most one source row for each target row
My query is as below:
UPDATE `sandbox.sellout` s
SET s.SKU_Label = TRIM(SKU_TEMP.SKU)
FROM (SELECT SKU, Old_SKU FROM `sandbox.ref_sku_temp`) SKU_TEMP
WHERE TRIM(SKU_TEMP.Old_SKU) = TRIM(s.SKU)
If a row in the table to be updated joins with more than one row from the FROM clause, then the query generates the following runtime error: UPDATE/MERGE must match at most one source row for each target row.
Data Manipulation Language Syntax.
It occurs because the target table of the BigQuery contains duplicated row(w.r.t you are joining). If a row in the table to be updated joins with more than one row from the FROM clause, then BigQuery returns this error:
Solution
Remove the duplicated rows from the target table and perform the UPDATE/MERGE operation
Define Primary key in BigQuery target table to avoid data redundancy
Issue is with the duplicate rows in the source table. Please consider removing the the dups and run the query. Or post the sample data here.
Add DISTINCT to your select statement like this:
UPDATE `sandbox.sellout` s
SET s.SKU_Label = TRIM(SKU_TEMP.SKU)
FROM (SELECT DISTINCT SKU, Old_SKU FROM `sandbox.ref_sku_temp`) SKU_TEMP
WHERE TRIM(SKU_TEMP.Old_SKU) = TRIM(s.SKU)

How to insert generated id into a results table

I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!
Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.

AFTER UPDATE trigger using an aggregate from the updated table

I'm having trouble with an update trigger. I want the trigger to set Quarterbacks.Yards equal to the sum of Wide receiving yards if they're on the same team.
create trigger T_Receiving_Passing
on NFL.Widereceivers
after update
as
begin
declare
#receivingyards int,
select #receivingyards = sum (Widereceivers.Yards) from NFL.Widereceivers
update NFL.Quarterbacks
set Quarterbacks.Yards = #receivingyards
where Quarterbacks.Team = Widereceivers.Team
end
At the end of the statement, Widereceivers.Team is underlined in red, and it is causing errors. I get this same error whenever I try to reference a column in another table without naming the table in a from clause. How can I fix this problem?
Ok, you should be able to do this without the SELECT statement or its variable and instead use a more complex UPDATE statement joining on the special inserted table which holds the new values from the update.
CREATE TRIGGER T_Receiving_Passing
ON NFL.Widereceivers
AFTER UPDATE
AS
UPDATE NFL.Quarterbacks
-- Get the SUM() in a subselect
SET Quarterbacks.Yards = (SELECT SUM(Yards) FROM Widereceivers WHERE Team = inserted.Team)
FROM
NFL.Quarterbacks
-- Join against the special inserted table
JOIN inserted ON Quarterbacks.Team = inserted.Team
GO
Here is a proof of concept
In your original attempt, you hoped to use a SELECT query first to populate a scalar variable. In an UPDATE statement however, you can use a subselect that returns exactly one column of exactly one row inside the SET clause to retrieve a new value.
Since your requirement was to use an aggregate SUM() it isn't as straightforward as assigning a value directly from the inserted like SET Yards = inserted.Yards. Instead, the subselect produces the aggregate sum limited to just the Team used in the inserted row.
As far as the inserted/deleted tables go, review the official documentation. I have not worked with SQL Server regularly for a few years but if I recall correctly, the inserted table must occur in the FROM clause which implies it will usually need to be JOINed in. In your UPDATE statement, inserted is needed in both the subselect and the outer query, so it was joined in the outer one.

MySQL -- mark all but 1 matching row

This is similar to this question, but it seems like some of the answers there aren't quite compatible with MySQL (or I'm not doing it right), and I'm having a heck of a time figuring out the changes I need. Apparently my SQL is rustier than I thought it was. I'm also looking to change a column value rather than delete, but I think at least that part is simple...
I have a table like:
rowid SERIAL
fingerprint TEXT
duplicate BOOLEAN
contents TEXT
created_date DATETIME
I want to set duplicate=true for all but the first (by created_date) of each group by fingerprint. It's easy to mark all of the rows with duplicate fingerprints as dupes. The part I'm getting stuck on is keeping the first.
One of the apps that populates the table does bulk loads of data, with multiple workers loading data from different sources, and the workers' data isn't necessarily partitioned by date, so it's a pain to try to mark these all as they come in (the first one inserted isn't necessarily the first one by date). Also, I already have a bunch of data in there I'll need to clean up either way. So I'd rather just have a relatively efficient query I can run after a bulk load to clean up than try to build it into that app.
Thanks!
MySQL needs to be explicitly told if the data you are grouping by is larger than 1024 bytes (see this link for details). So if your data in the fingerprint column is larger than 1024 bytes you should use set the max_sort_length variable (see this link for details about values allowed, and this link about how to set it) to a larger number so that the group by wont silently use only part of your data for grouping.
Once you're certain that MySQL will group your data properly, the following query will set the duplicate flag so that the first fingerprint record has duplicate set to FALSE/0 and any subsequent fingerprint records have duplicate set to TRUE/1:
UPDATE mytable m1
INNER JOIN (SELECT fingerprint
, MIN(rowid) AS minrow
FROM mytable m2
GROUP BY fingerprint) m3
ON m1.fingerprint = m3.fingerprint
SET m1.duplicate = m3.minrow != m1.rowid;
Please keep in mind that this solution does not take NULLs into account and if it is possible for the fingerprint field to be NULL then you would need additional logic to handle that case.
How about a two-step approach, assuming you can go offline during a data load:
Mark every item as duplicate.
Select the earliest row from each group, and clear the duplicate flag.
Not elegant, but gets the job done.
Here's a funny way to do it:
SET #rowid := 0;
UPDATE mytable
SET duplicate = (rowid = #rowid),
rowid = (#rowid:=rowid)
ORDER BY rowid, created_date;
First set a user variable to zero, assuming this is less than any rowid in your table.
Then use the MySQL UPDATE...ORDER BY feature to ensure that the rows are updated in order by rowid, then by created_date.
For each row, if the current rowid is not equal to the user variable #rowid, set duplicate to 0 (false). This will be true only on the first row encountered with a given value for rowid.
Then add a dummy set of rowid to its own value, setting #rowid to that value as a side effect.
As you UPDATE the next row, if it's a duplicate of the previous row, rowid will be equal to the user variable #rowid, and therefore duplicate will be set to 1 (true).
Edit: Now I have tested this, and I corrected a mistake in the line that sets duplicate.
Here's another way to do it, using MySQL's multi-table UPDATE syntax:
UPDATE mytable m1
JOIN mytable m2 ON (m1.rowid = m2.rowid AND m1.created_date < m2.created_date)
SET m2.duplicate = 1;
I don't know the MySQL syntax, but in PLSQL you just do:
UPDATE t1
SET duplicate = 1
FROM MyTable t1
WHERE rowid != (
SELECT TOP 1 rowid FROM MyTable t2
WHERE t2.fingerprint = t1.fingerprint ORDER BY created_date DESC
)
That may have some syntax errors, as I'm just typing off the cuff/not able to test it, but that's the gist of it.
MySQL version (not tested):
UPDATE t1
SET duplicate = 1
FROM MyTable t1
WHERE rowid != (
SELECT rowid FROM MyTable t2
WHERE t2.fingerprint = t1.fingerprint
ORDER BY created_date DESC
LIMIT 1
)
Untested...
UPDATE TheAnonymousTable
SET duplicate = TRUE
WHERE rowid NOT IN
(SELECT rowid
FROM (SELECT MIN(created_date) AS created_date, fingerprint
FROM TheAnonymousTable
GROUP BY fingerprint
) AS M,
TheAnonymousTable AS T
WHERE M.created_date = T.created_date
AND M.fingerprint = T.fingerprint
);
The logic is that the innermost query returns the earliest created_date for each distinct fingerprint as table alias M. The middle query determines the rowid value for each of those rows; it is a nuisance to have to do this (but necessary), and the code assumes that you won't get two records for the same fingerprint and timestamp. This gives you the rowid for the earlist record for each separate fingerprint. Then the outer query (the UPDATE) sets the 'duplicate' flag on all those rows where the rowid is not one of the earliest rows.
Some DBMS may be unhappy about doing (nested) sub-queries on the table being updated.