SQL set operation for update latest record - sql

I am facing a problem and cant find any solution to this. I have a source table (T) where I get data from field. The data may contain duplicate records with time stamp. My objective is to take the field data and store it into a final table (F) having the same structure.
Before inserting I check whether key field exists or not in the F if yes I update the the record in F with the latest one from T. Other wise I Insert the record in F from T. This works fine as long as there is no duplicate record in T. In case T has two records of the same key with different time stamp. It always inserts both the record (In case the key is primary key the insert operation fails). I am using following code for the operation -
IF EXISTS(SELECT * FROM [Final_Table] F, TMP_Source T WHERE T.IKEy =F.IKEY)
begin
print 'Update'
UPDATE [Final_Table]
SET [FULLNAME] = T.FULLNAME
,[FATHERNAME] = T.FATHERNAME
,[MOTHERNAME] = T.MOTHERNAME
,[SPOUSENAME] = T.SPOUSENAME
from TMP_Source T
WHERE Final_Table.IKEy = T.IKEy
and [Final_Table].[RCRD_CRN_DATE] < T.RCRD_CRN_DATE
--Print 'Update'
end
else
begin
INSERT INTO [Final_Table]
([IKEy],[FTIN],[FULLNAME],[FATHERNAME],[MOTHERNAME],[SPOUSENAME]
)
Select IKEy,FTIN,FULLNAME,FATHERNAME,MOTHERNAME,SPOUSENAME
from TMP_Source
end
The problem comes when I my T table has entries like -
IKey RCRD_CRN_DATE ...
123 10-11-2013-12.20.30
123 10-11-2013-12.20.35
345 10-11-2013-01.10.10
All three are inserted in the F table.
Please help.

Remove all but the latest row as a first step (well, in a CTE) using ROW_NUMBER() before attempting to perform the insert:
;WITH UniqueRows AS (
SELECT IKey,RCRD_CRN_DATE,FULL_NAME,FATHER_NAME,MOTHER_NAME,SPOUSENAME,FTIN,
ROW_NUMBER() OVER (PARTITION BY IKey ORDER BY RCRD_CRN_DATE desc) as rn
FROM TMP_Source
)
MERGE INTO Final_Table t
USING (SELECT * FROM UniqueRows WHERE rn = 1) s
ON t.IKey = s.IKey
WHEN MATCHED THEN UPDATE
SET [FULLNAME] = s.FULLNAME
,[FATHERNAME] = s.FATHERNAME
,[MOTHERNAME] = s.MOTHERNAME
,[SPOUSENAME] = s.SPOUSENAME
WHEN NOT MATCHED THEN INSERT
([IKEy],[FTIN],[FULLNAME],[FATHERNAME],[MOTHERNAME],[SPOUSENAME]) VALUES
(s.IKEy,s.FTIN,s.FULLNAME,s.FATHERNAME,s.MOTHERNAME,s.SPOUSENAME);
(I may not have all the columns entirely correct, they seem to keep switching around in your question)
(As you may have noticed, I've also switched to using MERGE since it allows us to express everything as a single declarative statement rather than writing procedural code)

Related

Conditional column in SQL Server based in another column

I'm looking for a solution to have a version column on my table based on another column.
I have a column "document No" in my table. Every time I insert a new row with the same document no, I would like to increase the column version.
I know I can it by the back-end. But, it means I have first to read the table and then insert. My idea is to optimize the performance and leave it with SQL Server.
Is It possible?
pk DocNo Version
---------------------
1 ABC 0
2 CBD 0
3 ABC 1
4 FGH 0
5 ABC 2
Assuming that you can parameterize your query (as in a stored procedure), AND your primary key is set to IDENTITY, you can use something along the lines of:
INSERT INTO TableA (DocNo, Version)
(SELECT TOP 1 'XYZ',ISNULL(MAX(Version)+1,0)
FROM TableA WHERE DocNo = 'XYZ')
I used 'XYZ' where you would place your parameter like:
INSERT INTO TableA (DocNo, Version)
(SELECT TOP 1 #DocNo,ISNULL(MAX(Version)+1,0)
FROM TableA WHERE DocNo = #DocNo)
Stored Procedure Solution
CREATE PROCEDURE tableUpsert(#DocNo varchar(100))
AS
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
IF EXISTS(SELECT * FROM dbo.YourTable WITH (UPDLOCK) WHERE DocNo = #DocNo)
UPDATE dbo.YourTable
SET Version = Version + 1
WHERE DocNo = #DocNo;
ELSE
INSERT dbo.YourTable(DocNo, Version)
VALUES(#DocNo, 1);
COMMIT
Code is pretty self-explanatory. If the record exists, you update by incrementing your VersionNumber column and if it doesn't, then insert a new record with default VersionNumber of 1. Note the use of UPDLOCK to ensure that only your specific process is currently updating the record.
You can use insert trigger. In the trigger, update the Version by getting last version of same DocNo and increment by 1.
update t
set Version = isnull(v.Version, 0) + 1
from inserted i
inner join mytable t on i.pk = t.pk
outer apply
(
select Version = max(Version)
from mytable x
where x.DocNo = i.DocNo
) v
Your version number is implicit in your data. Use the PK to determine it via
SELECT DocNo, ROW_NUMBER() over (PARTITION BY DocNo ORDER BY pk) as version order by DocNo when you retrieve the data (or put that in a view)
Relying on IDENTITY may give you gaps
Relying on MAX(x)+1 may not always work depending on your concurrency model.
Locking the table/column will introduce concurrency issues (which may be unimportant or trivial in your case).

Merging deltas with duplicate keys

I'm trying to perform a merge into a target table in our Snowflake instance where the source data contains change data with a field denoting the at source DML operation i.e I=Insert,U=Update,D=Delete.
The problem is dealing with the fact the log (deltas) source might contain multiple updates for the same record. The merge I've constructed bombs out complaining about duplicate keys.
I'm struggling to think of a solution without going the likes of GROUP BY and MAX on the updates. I've done a similar setup with Oracle and the AND clause on the MATCH was enough.
MERGE INTO "DB"."SCHEMA"."TABLE" t
USING (
SELECT * FROM "DB"."SCHEMA"."TABLE_LOG"
ORDER BY RECORD_TIMESTAMP ASC
) s ON t.RECORD_KEY = s.RECORD_KEY
WHEN MATCHED AND s.RECORD_OPERATION = 'D' THEN DELETE
WHEN MATCHED AND s.RECORD_OPERATION = 'U' THEN UPDATE
SET t.ID=COALESCE(s.ID,t.ID),
t.CREATED_AT=COALESCE(s.CREATED_AT,t.CREATED_AT),
t.PRODUCT=COALESCE(s.PRODUCT,t.PRODUCT),
t.SHOP_ID=COALESCE(s.SHOP_ID,t.SHOP_ID),
t.UPDATED_AT=COALESCE(s.UPDATED_AT,t.UPDATED_AT)
WHEN NOT MATCHED AND s.RECORD_OPERATION = 'I' THEN
INSERT (RECORD_KEY, ID, CREATED_AT, PRODUCT,
SHOP_ID, UPDATED_AT)
VALUES (s.RECORD_KEY, s.ID, s.CREATED_AT, s.PRODUCT,
s.SHOP_ID, s.UPDATED_AT);
Is there a way to rewrite the above merge so that it works as is?
The Snowflake docs show the ability for the AND case predicate during the match clause, it sounds like you tried this and it's not working because of the duplicates, right?
https://docs.snowflake.net/manuals/sql-reference/sql/merge.html#matchedclause-for-updates-or-deletes
There is even an example there which is using the AND command:
merge into t1 using t2 on t1.t1key = t2.t2key
when matched and t2.marked = 1 then delete
when matched and t2.isnewstatus = 1 then update set val = t2.newval, status = t2.newstatus
when matched then update set val = t2.newval
when not matched then insert (val, status) values (t2.newval, t2.newstatus);
I think you are going to have to get the "last record" per key and use that as your update, or process these serially which will be pretty slow...
Another thing to look at would be to try to see if you can apply the last_value( ) function to each column, where you order by your timestamp and partition over your key. If you do that in your inline view, that might work.
I hope this helps, I have a feeling it won't help much...Rich
UPDATE:
I found the following: https://docs.snowflake.net/manuals/sql-reference/parameters.html#error-on-nondeterministic-merge
If you run the following command before your merge, I think you'll be OK (testing required of course):
ALTER SESSION SET ERROR_ON_NONDETERMINISTIC_MERGE=false;

Update columns in DB2 using randomly chosen static values provided at runtime

I would like to update rows with values chosen randomly from a set of possible values.
Ideally I would be able to provide this values at runtime, using JdbcTemplate from Java application.
Example:
In a table, column "name" can contain any name. The goal is to run through the table and change all names to equal to either "Bob" or "Alice".
I know that this can be done by creating a sql function. I tested it and it was fine but I wonder if it is possible to just use simple query?
This will not work, seems that the value is computed once, and applied to all rows:
UPDATE test.table
SET first_name =
(SELECT a.name
FROM
(SELECT a.name, RAND() idx
FROM (VALUES('Alice'), ('Bob')) AS a(name) ORDER BY idx FETCH FIRST 1 ROW ONLY) as a)
;
I tried using MERGE INTO, but it won't even run (possible_names is not found in SET query). I am yet to figure out why:
MERGE INTO test.table
USING
(SELECT
names.fname
FROM
(VALUES('Alice'), ('Bob'), ('Rob')) AS names(fname)) AS possible_names
ON ( test.table.first_name IS NOT NULL )
WHEN MATCHED THEN
UPDATE SET
-- select random name
first_name = (SELECT fname FROM possible_names ORDER BY idx FETCH FIRST 1 ROW ONLY)
;
EDIT: If possible, I would like to only focus on fields being updated and not depend on knowing primary keys and such.
Db2 seems to be optimizing away the subselect that returns your supposedly random name, materializing it only once, hence all rows in the target table receive the same value.
To force subselect execution for each row you need to somehow correlate it to the table being updated, for example:
UPDATE test.table
SET first_name =
(SELECT a.name
FROM (VALUES('Alice'), ('Bob')) AS a(name)
ORDER BY RAND(ASCII(SUBSTR(first_name, 1, 1)))
FETCH FIRST 1 ROW ONLY)
or may be even
UPDATE test.table
SET first_name =
(SELECT a.name
FROM (VALUES('Alice'), ('Bob')) AS a(name)
ORDER BY first_name, RAND()
FETCH FIRST 1 ROW ONLY)
Now that the result of subselect seems to depend on the value of the corresponding row in the target table, there's no choice but to execute it for each row.
If your table has a primary key, this would work. I've assumed the PK is column id.
UPDATE test.table t
SET first_name =
( SELECT name from
( SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY R) AS RN FROM
( SELECT *, RAND() R
FROM test.table, TABLE(VALUES('Alice'), ('Bob')) AS d(name)
)
)
AS u
WHERE t.id = u.id and rn = 1
)
;
There might be a nicer/more efficient solution, but I'll leave that to others.
FYI I used the following DDL and data to test the above.
create table test.table(id int not null primary key, first_name varchar(32));
insert into test.table values (1,'Flo'),(2,'Fred'),(3,'Sue'),(4,'John'),(5,'Jim');

Need to update data from another database using db link

I have a table A with null dates (CREATED_ON_DT) in BI database. I need to update those nulls with the right dates from AFLDEV DB using a DB link mtl_system_items_b#afldev. Common key is inventory_item_id in AFLDEV and integration_id in BI DB. I have framed the following query but it does not work:
UPDATE w_product_d
SET w_product_d.CREATED_ON_DT = (SELECT min(creation_date)
FROM mtl_system_items_b#afldev B
where to_char(B.inventory_item_id)=w_product_d.integration_id
and B.organization_id = '102'
AND w_product_d.CREATED_ON_DT IS NULL
and w_product_d.integration_id in (SELECT T.integration_id
FROM (SELECT * FROM w_product_d ORDER BY w_product_d.integration_id )T
WHERE T.CREATED_ON_DT IS NULL)
);
If I run this query it updates all the dates to nulls but I need the opposite to happen i.e. replace null with the right dates.
Please help me out with this! I am doing this on SQL Developer for Oracle DB.
I think you've gotten all tied up between the rows you're updating and the rows you're using to update the column values with.
If you think about it, you're wanting to update rows in your w_product_d table where the created_on_dt is null, which means that your update statement will have a basic structure of:
update w_product_d wpd
set ...
where wpd.created_on_dt is null;
Once you have that, it's easy then to slot in the column you're updating and what you're updating it with:
update w_product_d wpd
set wpd.created_on_dt = (select min(creation_date)
from mtl_system_items_b#afldev b
where to_char(b.inventory_item_id) = wpd.integration_id)
where wpd.created_on_dt is null;

SQL pivoted table is read-only and cells can't be edited?

If I create a VIEW using this pivot table query, it isn't editable. The cells are read-only and give me the SQL2005 error: "No row was updated. The data in row 2 was not committed. Update or insert of view or function 'VIEWNAME' failed because it contains a derived or constant field."
Any ideas on how this could be solved OR is a pivot like this just never editable?
SELECT n_id,
MAX(CASE field WHEN 'fId' THEN c_metadata_value ELSE ' ' END) AS fId,
MAX(CASE field WHEN 'sID' THEN c_metadata_value ELSE ' ' END) AS sID,
MAX(CASE field WHEN 'NUMBER' THEN c_metadata_value ELSE ' ' END) AS NUMBER
FROM metadata
GROUP BY n_id
Assuming you have a unique constraint on n_id, field which means that at most one row can match you can (in theory at least) use an INSTEAD OF trigger.
This would be easier with MERGE (but that is not available until SQL Server 2008) as you need to cover UPDATES of existing data, INSERTS (Where a NULL value is set to a NON NULL one) and DELETES where a NON NULL value is set to NULL.
One thing you would need to consider here is how to cope with UPDATES that set all of the columns in a row to NULL I did this during testing the code below and was quite confused for a minute or two until I realised that this had deleted all the rows in the base table for an n_id (which meant the operation was not reversible via another UPDATE statement). This issue could be avoided by having the VIEW definition OUTER JOIN onto what ever table n_id is the PK of.
An example of the type of thing is below. You would also need to consider potential race conditions in the INSERT/DELETE code indicated and whether you need some additional locking hints in there.
CREATE TRIGGER trig
ON pivoted
INSTEAD OF UPDATE
AS
BEGIN
SET nocount ON;
DECLARE #unpivoted TABLE (
n_id INT,
field VARCHAR(10),
c_metadata_value VARCHAR(10))
INSERT INTO #unpivoted
SELECT *
FROM inserted UNPIVOT (data FOR col IN (fid, sid, NUMBER) ) AS unpvt
WHERE data IS NOT NULL
UPDATE m
SET m.c_metadata_value = u.c_metadata_value
FROM metadata m
JOIN #unpivoted u
ON u.n_id = m.n_id
AND u.c_metadata_value = m.field;
/*You need to consider race conditions below*/
DELETE FROM metadata
WHERE NOT EXISTS(SELECT *
FROM #unpivoted u
WHERE metadata.n_id = u.n_id
AND u.field = metadata.field)
INSERT INTO metadata
SELECT u.n_id,
u.field,
u.c_metadata_value
FROM #unpivoted u
WHERE NOT EXISTS (SELECT *
FROM metadata m
WHERE m.n_id = u.n_id
AND u.field = m.field)
END
You'll have to create trigger on view, because direct update is not possible:
CREATE TRIGGER TrMyViewUpdate on MyView
INSTEAD OF UPDATE
AS
BEGIN
SET NOCOUNT ON;
UPDATE MyTable
SET ...
FROM INSERTED...
END