I need to do a merge statement in BigQuery using a classic flat table, having as target a table with nested and repeated fields, and I'm having trouble understanding how this is supposed to work. Google's examples use direct values, so the syntax here is not really clear to me.
Using this example:
CREATE OR REPLACE TABLE
mydataset.DIM_PERSONA (
IdPersona STRING,
Status STRING,
Properties ARRAY<STRUCT<
Id STRING,
Value STRING,
_loadingDate TIMESTAMP,
_lastModifiedDate TIMESTAMP
>>,
_loadingDate TIMESTAMP NOT NULL,
_lastModifiedDate TIMESTAMP
);
INSERT INTO mydataset.DIM_PERSONA
values
('A', 'KO', [('FamilyMembers', '2', CURRENT_TIMESTAMP(), TIMESTAMP(NULL))], CURRENT_TIMESTAMP(), TIMESTAMP(NULL)),
('B', 'KO', [('FamilyMembers', '4', CURRENT_TIMESTAMP(), TIMESTAMP(NULL)),('Pets', '1', CURRENT_TIMESTAMP(), NULL)], CURRENT_TIMESTAMP(), TIMESTAMP(NULL))
;
CREATE OR REPLACE TABLE
mydataset.PERSONA (
IdPersona STRING,
Status STRING,
IdProperty STRING,
Value STRING
);
INSERT INTO mydataset.PERSONA
VALUES('A', 'OK','Pets','3'),('B', 'OK','FamilyMembers','5'),('C', 'OK','Pets','2')
The goal is to:
Update IdPersona='A', adding a new element in Properties and
changing Status
Update IdPersona='B', updating the existent element
in Properties
Insert IdPersona='C'
This INSERT works:
MERGE INTO mydataset.DIM_PERSONA TRG
USING (
SELECT
IdPersona,
Status,
ARRAY(
SELECT AS STRUCT
IdProperty,
Value,
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
) Properties,
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
FROM mydataset.PERSONA
) SRC ON TRG.IdPersona=SRC.IdPersona
WHEN NOT MATCHED THEN
INSERT VALUES (IdPersona, Status, Properties, CURRENT_TIMESTAMP(), TIMESTAMP(NULL))
But I would like to build the nested/repeated fields in the INSERT clause, because for the UPDATE I would also need (I think) to do a "SELECT AS STRUCT * REPLACE" by comparing the values of TRG with SRC.
This doesn't work:
MERGE INTO mydataset.DIM_PERSONA TRG
USING (
SELECT
*
FROM mydataset.PERSONA
) SRC ON TRG.IdPersona=SRC.IdPersona
WHEN NOT MATCHED THEN
INSERT VALUES (
IdPersona,
Status,
ARRAY(
SELECT AS STRUCT
IdProperty,
Value,
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
),
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
)
I get "Correlated Subquery is unsupported in INSERT clause."
Even if I used the first option, I don't get how to reference TRG.properties in the UPDATE..
WHEN MATCHED THEN
UPDATE
SET Properties = ARRAY(
SELECT AS STRUCT p_SRC.*
REPLACE (IF(p_SRC.IdProperty=p_TRG.id AND p_SRC.Value<>p_TRG.Value,p_SRC.Value,p_TRG.Value) AS Value)
FROM SRC.Properties p_SRC, TRG.Properties p_TRG
)
Obv this is wrong though.
One way to solve this, as I see it, is to pre-join everything in the USING clause, therefore doing all the replacement there, but it feels very wrong for a merge statement.
Can anyone help me figure this out, please? :\
So, I wanted to share a possible solution, although I still hope there's another way.
As mentioned, I pre-compute what I need with a CTE and a FULL OUTER JOIN, therefore recreating the array of structs I need later on (tables will be relatively small so I can afford it).
MERGE INTO mydataset.DIM_PERSONA TRG
USING (
WITH NEW_PROPERTIES AS (
SELECT
COALESCE(idp,IdPersona) IdPersona,
ARRAY_AGG((
SELECT AS STRUCT
COALESCE(idpro,Id) IdProperty,
COALESCE(vl,Value) Value,
COALESCE(_loadingDate,CURRENT_TIMESTAMP) _loadingDate,
IF(idp=IdPersona,CURRENT_TIMESTAMP,TIMESTAMP(NULL)) _lastModifiedDate
)) Properties
FROM (
SELECT DIP.IdPersona, DIP.Status, DIP_PR.*, PER.IdPersona idp, PER.Status st, PER.IdProperty idpro, PER.Value vl
FROM `clean-yew-281811.mydataset.DIM_PERSONA` DIP
CROSS JOIN UNNEST(DIP.Properties) DIP_PR
FULL OUTER JOIN mydataset.PERSONA PER
ON DIP.IdPersona=PER.IdPersona
AND DIP_PR.Id=PER.IdProperty
)
GROUP BY IdPersona
)
SELECT
IdPersona,
'subquery to do here' Status,
NP.Properties
FROM (SELECT DISTINCT IdPersona FROM mydataset.PERSONA) PE
LEFT JOIN NEW_PROPERTIES NP USING (IdPersona)
) SRC ON TRG.IdPersona=SRC.IdPersona
WHEN NOT MATCHED THEN
INSERT VALUES (IdPersona, Status, Properties, CURRENT_TIMESTAMP(), TIMESTAMP(NULL))
WHEN MATCHED THEN
UPDATE
SET
TRG.Status = SRC.Status,
TRG.Properties = SRC.Properties,
TRG._lastModifiedDate = CURRENT_TIMESTAMP()
This works but I'm pretty much avoiding the syntax to update an array of structs, as what I'm doing is a rebuild and replace operation. Hopefully someone can suggest a better way.
Also, while you did not provide your desired output, I was able to create a query based on the objectives you described and your code and with the sample data you provided.
Following the below goals:
Update IdPersona='A', adding a new element in Properties and changing Status
Update IdPersona='B', updating the existent element in Properties
Insert IdPersona='C'
Instead of doing a replace and rebuild operation, I used:
MERGE;in order to perform the updates and insert the new rows, such as IdPersona = "C"
INSERT: within merge it is not possible to use INSERT with WHEN MATCHED. Thus, in order to add a new Property when IdPerson="A", this method was used after the MERGE operations.
CREATE TABLE: after using INSERT, the new Properties when IdPersona="A" are not aggregated, since we did not use WHEN MATCHED. So, the final table DM_PERSONA is replaced in order to aggregate properly the results.
LEFT JOIN: in order to add the fields _loadingDate and *_lastModifiedDate *, which are not aggregated into the ARRAY<STRUCT<>>.
Below is the query with the proper comments:
#first step update current values and insert new IdPersonas
MERGE sample.DIM_PERSONA_test2 T
USING sample.PERSONA_test2 S
ON T.IdPersona = S.IdPersona
#update A but not insert
WHEN MATCHED AND T.IdPersona ="A" THEN
UPDATE SET STATUS = "OK"
#update B
WHEN MATCHED AND T.IdPersona ="B" THEN
UPDATE SET Properties = [( S.IdPersona, S.IdProperty,TIMESTAMP(NULL), TIMESTAMP(NULL) )]
#insert what is not in the target table
WHEN NOT MATCHED THEN
INSERT(IdPersona, Status , Properties, _loadingDate, _lastModifiedDate ) VALUES (S.IdPersona, S.Status, [( IdProperty,Value, TIMESTAMP(NULL), TIMESTAMP(NULL))], CURRENT_TIMESTAMP(), TIMESTAMP(NULL));
#insert new values when IdPersona="A"
#you will see the result won't be aggregated properly
INSERT INTO sample.DIM_PERSONA_test2(IdPersona, Status , Properties, _loadingDate, _lastModifiedDate)
SELECT IdPersona, Status,[( IdProperty,Value, TIMESTAMP(NULL), TIMESTAMP(NULL))], CURRENT_TIMESTAMP(), TIMESTAMP(NULL) from sample.PERSONA_test2
where IdPersona = "A";
#replace the above table to recriate the ARRAY<STRUCT<>>
CREATE OR REPLACE TABLE sample.DIM_PERSONA_FINAL_test2 AS(
SELECT t1.*, t2._loadingDate,t2._lastModifiedDate
FROM( SELECT a.IdPersona,
a.Status,
ARRAY_AGG(STRUCT( Properties.Id as Id, Properties.Value as Value, Properties._loadingDate ,
Properties._lastModifiedDate AS _lastModifiedDate)) AS Properties
FROM sample.DIM_PERSONA_test2 a, UNNEST(Properties) as Properties
GROUP BY 1,2
ORDER BY a.IdPersona)t1 LEFT JOIN sample.DIM_PERSONA_test2 t2 USING(IdPersona)
)
And the output,
Notice that when updating the ARRAY<STRUCT<>>, the values are wrapped within [()]. Lastly, pay attention that there are two IdPersona="A" because _loadingDate is required, so it can not be NULL and due to the CURRENT_TIMESTAMP(), there are two different values for this field. Thus, two different records.
I'm trying to perform a merge into a target table in our Snowflake instance where the source data contains change data with a field denoting the at source DML operation i.e I=Insert,U=Update,D=Delete.
The problem is dealing with the fact the log (deltas) source might contain multiple updates for the same record. The merge I've constructed bombs out complaining about duplicate keys.
I'm struggling to think of a solution without going the likes of GROUP BY and MAX on the updates. I've done a similar setup with Oracle and the AND clause on the MATCH was enough.
MERGE INTO "DB"."SCHEMA"."TABLE" t
USING (
SELECT * FROM "DB"."SCHEMA"."TABLE_LOG"
ORDER BY RECORD_TIMESTAMP ASC
) s ON t.RECORD_KEY = s.RECORD_KEY
WHEN MATCHED AND s.RECORD_OPERATION = 'D' THEN DELETE
WHEN MATCHED AND s.RECORD_OPERATION = 'U' THEN UPDATE
SET t.ID=COALESCE(s.ID,t.ID),
t.CREATED_AT=COALESCE(s.CREATED_AT,t.CREATED_AT),
t.PRODUCT=COALESCE(s.PRODUCT,t.PRODUCT),
t.SHOP_ID=COALESCE(s.SHOP_ID,t.SHOP_ID),
t.UPDATED_AT=COALESCE(s.UPDATED_AT,t.UPDATED_AT)
WHEN NOT MATCHED AND s.RECORD_OPERATION = 'I' THEN
INSERT (RECORD_KEY, ID, CREATED_AT, PRODUCT,
SHOP_ID, UPDATED_AT)
VALUES (s.RECORD_KEY, s.ID, s.CREATED_AT, s.PRODUCT,
s.SHOP_ID, s.UPDATED_AT);
Is there a way to rewrite the above merge so that it works as is?
The Snowflake docs show the ability for the AND case predicate during the match clause, it sounds like you tried this and it's not working because of the duplicates, right?
https://docs.snowflake.net/manuals/sql-reference/sql/merge.html#matchedclause-for-updates-or-deletes
There is even an example there which is using the AND command:
merge into t1 using t2 on t1.t1key = t2.t2key
when matched and t2.marked = 1 then delete
when matched and t2.isnewstatus = 1 then update set val = t2.newval, status = t2.newstatus
when matched then update set val = t2.newval
when not matched then insert (val, status) values (t2.newval, t2.newstatus);
I think you are going to have to get the "last record" per key and use that as your update, or process these serially which will be pretty slow...
Another thing to look at would be to try to see if you can apply the last_value( ) function to each column, where you order by your timestamp and partition over your key. If you do that in your inline view, that might work.
I hope this helps, I have a feeling it won't help much...Rich
UPDATE:
I found the following: https://docs.snowflake.net/manuals/sql-reference/parameters.html#error-on-nondeterministic-merge
If you run the following command before your merge, I think you'll be OK (testing required of course):
ALTER SESSION SET ERROR_ON_NONDETERMINISTIC_MERGE=false;
Is there any way to do some kind of "WITH...UPDATE" action on SQL?
For example:
WITH changes AS
(...)
UPDATE table
SET id = changes.target
FROM table INNER JOIN changes ON table.id = changes.base
WHERE table.id = changes.base;
Some context information: What I'm trying to do is to generate a base/target list from a table and then use it to change values in another table (changing values equal to base into target)
Thanks!
You can use merge, with the equivalent of your with clause as the using clause, but because you're updating the field you're joining on you need to do a bit more work; this:
merge into t42
using (
select 1 as base, 10 as target
from dual
) changes
on (t42.id = changes.base)
when matched then
update set t42.id = changes.target;
.. gives error:
ORA-38104: Columns referenced in the ON Clause cannot be updated: "T42"."ID"
Of course, it depends a bit what you're doing in the CTE, but as long as you can join to your table withint that to get the rowid you can use that for the on clause instead:
merge into t42
using (
select t42.id as base, t42.id * 10 as target, t42.rowid as r_id
from t42
where id in (1, 2)
) changes
on (t42.rowid = changes.r_id)
when matched then
update set t42.id = changes.target;
If I create my t42 table with an id column and have rows with values 1, 2 and 3, this will update the first two to 10 and 20, and leave the third one alone.
SQL Fiddle demo.
It doesn't have to be rowid, it can be a real column if it uniquely identifies the row; normally that would be an id, which would normally never change (as a primary key), you just can't use it and update it at the same time.
I've been trying for a few hours (probably more than I needed to) to figure out the best way to write an update sql query that will dissallow duplicates on the column I am updating.
Meaning, if TableA.ColA already has a name 'TEST1', then when I'm changing another record, then I simply can't pick a value for ColA to be 'TEST1'.
It's pretty easy to simply just separate the query into a select, and use a server layer code that would allow conditional logic:
SELECT ID, NAME FROM TABLEA WHERE NAME = 'TEST1'
IF TableA.recordcount > 0 then
UPDATE SET NAME = 'TEST1' WHERE ID = 1234
END IF
But I'm more interested to see if these two queries can be combined into a single query.
I am using Oracle to figure things out, but I'd love to see a SQL Server query as well. I figured a MERGE statement can work, but for obvious reasons you can't have the clause:
..etc.. WHEN NOT MATCHED UPDATE SET ..etc.. WHERE ID = 1234
AND you can't update a column if it's mentioned in the join (oracle limitation but not limited to SQL Server)
ALSO, I know you can put a constraint on a column that prevents duplicate values, but I'd be interested to see if there is such a query that can do this without using constraint.
Here is an example start-up attempt on my end just to see what I can come up with (explanations on it failed is not necessary):
ERROR: ORA-01732: data manipulation operation not legal on this view
UPDATE (
SELECT d.NAME, ch.NAME FROM (
SELECT 'test1' AS NAME, '2722' AS ID
FROM DUAL
) d
LEFT JOIN TABLEA a
ON UPPER(a.name) = UPPER(d.name)
)
SET a.name = 'test2'
WHERE a.name is null and a.id = d.id
I have tried merge, but just gave up thinking it's not possible. I've also considered not exists (but I'd have to be careful since I might accidentally update every other record that doesn't match a criteria)
It should be straightforward:
update personnel
set personnel_number = 'xyz'
where person_id = 1001
and not exists (select * from personnel where personnel_number = 'xyz');
If I understand correctly, you want to conditionally update a field, assuming the value is not found. The following query does this. It should work in both SQL Server and Oracle:
update table1
set name = 'Test1'
where (select count(*) from table1 where name = 'Test1') > 0 and
id = 1234
I use a cursor to iterate through quite a big table. For each row I check if value from one column exists in other.
If the value exists, I would like to increase value column in that other table.
If not, I would like to insert there new row with value set to 1.
I check "if exists" by:
IF (SELECT COUNT(*) FROM otherTabe WHERE... > 1)
BEGIN
...
END
ELSE
BEGIN
...
END
I don't know how to get that row which was found and update value. I don't want to make another select.
How can I do this efficiently?
I assume that the method of checking described above isn't good for this case.
Depending on the size of your data and the actual condition, you have two basic approaches:
1) use MERGE
MERGE TOP (...) INTO table1
USING table2 ON table1.column = table2.column
WHEN MATCHED
THEN UPDATE SET table1.counter += 1
WHEN NOT MATCHED SOURCE
THEN INSERT (...) VALUES (...);
the TOP is needed because when you're doing a huge update like this (you mention the table is 'big', big is relative, but lets assume truly big, +100MM rows) you have to batch the updates, otherwise you'll overwhelm the transaction log with one single gigantic transaction.
2) use a cursor, as you are trying. Your original question can be easily solved, simply always update and then check the count of rows updated:
UPDATE table
SET column += 1
WHERE ...;
IF ##ROW_COUNT = 0
BEGIN
-- no match, insert new value
INSERT INTO (...) VALUES (...);
END
Note that this approach is dangerous though because of race conditions: there is nothing to prevent another thread from inserting the value concurrently, so you may end up with either duplicates or a constraint violation error (preferably the latter...).
This is just psuedo code because I have no idea of your table structure but I think you will understand... basically Update the columns you want then Insert the columns you need. A Cursor operation sounds unnecessary.
Update OtherTable
Set ColumnToIncrease = ColumnToIncrease + 1
FROM CurrentTable Where ColumnToCheckValue is not null
Insert Into OtherTable (ColumnToIncrease, Field1, Field2,...)
SELECT
1,
?
?
FROM CurrentTable Where ColumnToCheckValue is not null
Without a sample, I think this is the best I can do. Bottom line: you don't need a cursor. UPDATE where a match exists (INNER JOIN) and INSERT where one does not.
UPDATE otherTable
SET IncrementingColumn = IncrementingColumn + 1
FROM thisTable INNER JOIN otherTable ON thisTable.ID = otherTable.ID
INSERT INTO otherTable
(
ID
, IncrementingColumn
)
SELECT ID, 1
FROM thisTable
WHERE NOT EXISTS (SELECT *
FROM otherTable
WHERE thisTable.ID = otherTable.ID)
I think you'd be better off using a view for this -- then it's always up to date, no risk of mistakenly double/triple/etc counting:
CREATE VIEW vw_value_count AS
SELECT st.value,
COUNT(*) AS numValue
FROM SOME_TABLE st
GROUP BY st.value
But if you still want to use the INSERT/UPDATE approach:
IF EXISTS(SELECT NULL
FROM SOMETABLE WHERE ... > 1)
BEGIN
UPDATE TABLE
SET count = count + 1
WHERE value = #value
END
ELSE
BEGIN
INSERT INTO TABLE
(value, count)
VALUES
(#value, 1)
END
What about Update statement with inner join to perform +1, and Insert selected rows that do not exist in the first table.
Provide the tables schema and the columns you want to check and update so I can help.
Regards.