SQL Merge Statement not working with Batch Data - sql

I have created a stored procedure using MERGE statement as follows.
MERGE Tasks S
USING (SELECT id, name, field1, field2, field3 FROM Tasks_Temp) SS
ON S.id=SS.id
WHEN MATCHED
THEN UPDATE SET S.name=SS.name, S.field1=SS.field1, S.field2=SS.field2, S.field3=SS.field3
WHEN NOT MATCHED
THEN INSERT (id, name, field1, field2, field3) VALUES (SS.id, SS.name, SS.field1, SS.field2, SS.field3)
Tasks_temp:
id
name
field1
field2
field3
1
Task1
NULL
NULL
STARTED
2
Task2
SUBMITTED
NULL
STARTED
1
Task1
SUBMITTED
NULL
STARTED
3
Task3
NULL
NULL
STARTED
1
Task1
APPROVED
NULL
STARTED
2
Task2
APPROVED
NULL
STARTED
Tasks:(required)
id
name
field1
field2
field3
1
Task1
APPROVED
NULL
STARTED
2
Task2
APPROVED
NULL
STARTED
3
Task3
NULL
NULL
STARTED
Now here the Tasks_Temp table is empty at first and then it's filled with a sequential form of data as displayed where for each ID there's multiple changes like the field1, field2 or field3. The merge statement is inserting all the values into the other table, even duplicate IDs i.e., exactly the records as present in Tasks_Temp.
The next time when I run it gives me errors saying
Trying to update multiple records not allowed
Is merge not able to handle batch data like this? Even if I ignore getting the latest data for each fields MERGE still inserts the same number of rows as present in Tasks_Temp table. If not, what are some other optimal approaches?
I feel that even if merge worked, it wouldn't necessarily mean it's taking the most updated field values for an id.
However, if I have a timestamp provided, how can I insert the most recent records from the Tasks_Temp table to the Tasks table?

CREATE TABLE Tasks_Temp(
id INTEGER NOT NULL
,name VARCHAR(100) NOT NULL
,field1 VARCHAR(100)
,field2 VARCHAR(100)
,field3 VARCHAR(100) NOT NULL
);
INSERT INTO Tasks_Temp
(id,name,field1,field2,field3) VALUES
(1,'Task1',NULL,NULL,'STARTED'),
(2,'Task2','SUBMITTED',NULL,'STARTED'),
(1,'Task1','SUBMITTED',NULL,'STARTED'),
(3,'Task3',NULL,NULL,'STARTED'),
(1,'Task1','APPROVED',NULL,'STARTED'),
(2,'Task2','APPROVED',NULL,'STARTED');
As #Larnu said you should use a criterion for choosing same id, based on my perspective it should be APPROVED>SUBMITTED>NULL for field1
hence I use ROW_NUMBER alphabetically and filter best values of Tasks_temp as follows:
SELECT id,
 NAME,
       field1,
       field2,
       field3
FROM   (SELECT id,
               NAME,
               field1,
               Row_number()
                 OVER (
                   partition BY NAME
                   ORDER BY IIF(field1 IS NULL, 'xxx', field1)) Stats,
               field2,
               field3
        FROM   tasks_temp) a
WHERE  stats = 1
That gives following table
id
name
field1
field2
field3
1
Task1
APPROVED
null
STARTED
2
Task2
APPROVED
null
STARTED
3
Task3
null
null
STARTED
Hence you should use the above written query as a subquery for use with the MERGE in order to facilitate the operation:
MERGE Tasks S
USING (SELECT id,
       NAME,
       field1,
       field2,
       field3
FROM   (SELECT id,
               NAME,
               field1,
               Row_number()
                 OVER (
                   partition BY NAME
                   ORDER BY Iif(field1 IS NULL, 'xxx', field1)) Stats,
               field2,
               field3
        FROM   tasks_temp) a
WHERE  stats = 1 ) SS
ON S.id=SS.id
WHEN MATCHED
THEN UPDATE SET S.name=SS.name, S.field1=SS.field1, S.field2=SS.field2, S.field3=SS.field3
WHEN NOT MATCHED
THEN INSERT (id, name, field1, field2, field3) VALUES (SS.id, SS.name, SS.field1, SS.field2, SS.field3)
dbfiddle

Related

Skip Update of Table when no records present-SQL

enter image description hereI want to migrate data from old table (table_old) to new table (table_new).
table_old has following columns.
recordseq
startdate
enddate
field1
field2
field3
recordnum
table_new has following columns.
recordseq
startdate
enddate
field1
field2
field3
field4
field5
field6
recordnum
recordnum_old
Each recordseq can have multiple records which can be found out by recordnum for example:
If for a particular recordseq has two records, then 1 and 2 will be present in recordnum column data.
Requirement is while migrating the data, every two records in old table should appear in 1 record.
Here data migration means consider field1, field2, field3 and recordnum.
For example, if a particular seq has 4 records in old table then first record will be inserted in 1st row of new table in field1, field2, field3 and 2nd row data will be inserted into field4, field5 and field6.
Similarly for 3rd row data in old table will be inserted in 2nd row of new table in field1, field2 and field3 and 4th row data will be inserted into field4, field5 and field6 of 2nd row in new table.
Some recordseqs may have only one record. Different seqs may have different recordnums.
In new table recordnum _old will hold the recordnum column data of old table and recordnum will contain new record number after migrating.
So while migrating first we have inserted recordseq, startdate, enddate and recordnum _old data
And recordnum be constant number like 999 which will be later updated to actual recordnum.
After this we wrote below code to migrate remain data those are field1,field2, field3 as explained above.
Do
Begin
DECLARE MAX_REC INT;
DECLARE REC_IDX INT;
Select max(recordnum) into MAX_REC FROM table _old;
MAX_REC := 0;
REC_IDX:=0;
while(REC_IDX<= MAX_REC)
Do
Begin
IF(mod(REC_IDX,2)=0)
Then
Update table_new a set (a. recordNum,a.field1,a.field2,a.field3)
= (select cast(REC_IDX/2 as int),b. field1,b.field2,b.field3 from table_old b Where
a.recordseq=b.recordseq and a.startdate=b.startdate and a.enddate = b.enddate and
b.recordnum=REC_IDX) from table_new a where a.recordnum_old= cast(REC_IDX/2 as int);
ELSEIF
THEN
Update table_new a set (a. recordNum,a.field4,a.field5,a.field6)
= (select cast(REC_IDX/2 as int),b. field1,b.field2,b.field3 from table_old b Where
a.recordseq=b.recordseq and a.startdate=b.startdate and a.enddate = b.enddate and
b.recordnum=REC_IDX) from table_new a where a.recordnum_old= cast(REC_IDX/2 as int);
END IF
once execution is finished new records can be filtered out by checking records which dont have pagenum 999
Main Issue with this code is if a particual seq has only one record and Max record number in old table is 5
On first iteration first row of the old record is inserting to first row of new table with rownumber as 1 then null value is overriding for recordnum in 2nd iteration as there are no more records present for this record seq.
So I don’t want to update the new table for a seq which has no more records in old table while iterating through while loop.
Can you please help me to achive this.

Will order by preserve?

create table source_table (id number);
insert into source_table values(3);
insert into source_table values(1);
insert into source_table values(2);
create table target_table (id number, seq_val number);
create sequence example_sequence;
insert into target_table
select id, example_sequence.nextval
from
> (select id from source_table ***order by id***);
Is it officially assured that for the id's with the lower values in source_table corresponding sequence's value will also be lower when inserting into the source_table? In other words, is it guaranteed that the sorting provided by order by clause will be preserved when inserting?
EDIT
The question is not: 'Are rows ordered in a table as such?' but rather 'Can we rely on the order by clause used in the subquery when inserting?'.
To even more closely illustrate this, the contents of the target table in the above example, after running the query like select * from target_table order by id would be:
ID | SEQ_VAL
1 1
2 2
3 3
Moreover, if i specified descending ordering when inserting like this:
insert into target_table
select id, example_sequence.nextval
from
> (select id from source_table ***order by id DESC***);
The output of the same query from above would be:
ID | SEQ_VAL
1 3
2 2
3 1
Of that I'm sure, I have tested it multiple times. My question is 'Can I always rely on this ordering?'
Tables in a relational database are not ordered, and any apparent ordering in the result set of a cursor which lacks an ORDER BY is an artifact of data storage, is not guaranteed, and later actions on the table may cause this apparent ordering to change. If you want the results of a cursor to be ordered in a particular manner you MUST use an ORDER BY.

Return rows of a table that actually changed in an UPDATE

Using Postgres, I can perform an update statement and return the rows affected by the commend.
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
RETURNING accounts.*
This will give me a list of all records that matched the WHERE clause, however will not tell me which rows were actually updated by the operation.
In this simplified use-case it of course would be trivial to simply add another guard AND status != 'Closed, however my real world use-case involves updating potentially dozens of fields from a merge table with 10,000+ rows, and I want to be able to detect which rows were actually changed, and which are identical to their previous version. (The expectation is very few rows will actually have changed).
The best I've got so far is
UPDATE accounts
SET x=..., y=...
FROM accounts as old WHERE old.uid = accounts.uid
FROM merge_accounts WHERE merge_accounts.uid = accounts.uid
RETURNING accounts, old
Which will return a tuple of old and new rows that can then be diff'ed inside my Java codebase itself - however this requires significant additional network traffic and is potentially error prone.
The ideal scenario is to be able to have postgres return just the rows that actually had any values changed - is this possible?
Here on github is a more real world example of what I'm doing, incorporating some of the suggestions so far.
Using Postgres 9.1, but can use 9.4 if required. The requirements are effectively
Be able to perform an upsert of new data
Where we may only know the specific key/value pair to update on any given row
Get back a result containing just the rows that were actually changed by the upsert
Bonus - get a copy of the old records as well.
Since this question was opened I've gotten most of this working now, although I'm unsure if my approach is a good idea or not - it's a bit hacked together.
Only update rows that actually change
That saves expensive updates and expensive checks after the UPDATE.
To update every column with the new value provided (if anything changes):
UPDATE accounts a
SET (status, field1, field2) -- short syntax for ..
= (m.status, m.field1, m.field2) -- .. updating multiple columns
FROM merge_accounts m
WHERE m.uid = a.uid
AND (a.status IS DISTINCT FROM m.status OR
a.field1 IS DISTINCT FROM m.field1 OR
a.field2 IS DISTINCT FROM m.field2)
RETURNING a.*;
Due to PostgreSQL's MVCC model any change to a row writes a new row version. Updating a single column is almost as expensive as updating every column in the row at once. Rewriting the rest of the row comes at practically no cost, as soon as you have to update anything.
Details:
How do I (or can I) SELECT DISTINCT on multiple columns?
UPDATE a whole row in PL/pgSQL
Shorthand for whole rows
If the row types of accounts and merge_accounts are identical and you want to adopt everything from merge_accounts into accounts, there is a shortcut comparing the whole row type:
UPDATE accounts a
SET (status, field1, field2)
= (m.status, m.field1, m.field2)
FROM merge_accounts m
WHERE a.uid = m.uid
AND m IS DISTINCT FROM a
RETURNING a.*;
This even works for NULL values. Details in the manual.
But it's not going to work for your home-grown solution where (quoting your comment):
merge_accounts is identical, save that all non-pk columns are array types
It requires compatible row types, i.e. each column shares the same data type or there is at least an implicit cast between the two types.
For your special case
UPDATE accounts a
SET (status, field1, field2)
= (COALESCE(m.status[1], a.status) -- default to original ..
, COALESCE(m.field1[1], a.field1) -- .. if m.column[1] IS NULL
, COALESCE(m.field2[1], a.field2))
FROM merge_accounts m
WHERE m.uid = a.uid
AND (m.status[1] IS NOT NULL AND a.status IS DISTINCT FROM m.status[1]
OR m.field1[1] IS NOT NULL AND a.field1 IS DISTINCT FROM m.field1[1]
OR m.field2[1] IS NOT NULL AND a.field2 IS DISTINCT FROM m.field2[1])
RETURNING a.*
m.status IS NOT NULL works if columns that shouldn't be updated are NULL in merge_accounts.
m.status <> '{}' if you operate with empty arrays.
m.status[1] IS NOT NULL covers both options.
Related:
Return pre-UPDATE column values using SQL only
if you aren't relying on side-effectts of the update, only update the records that need to change
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
AND NOT (status IS NOT DISTINCT FROM merge_accounts.status
AND field1 IS NOT DISTINCT FROM merge_accounts.field1
AND field2 IS NOT DISTINCT FROM merge_accounts.field2
)
RETURNING accounts.*
I would recommend using the information_schema.columns table to introspect the columns dynamically, and then use those within a plpgsql function to dynamically generate the UPDATE statement.
i.e. this DDL:
create table foo
(
id serial,
val integer,
name text
);
insert into foo (val, name) VALUES (10, 'foo'), (20, 'bar'), (30, 'baz');
And this query:
select column_name
from information_schema.columns
where table_name = 'foo'
order by ordinal_position;
will yield the columns for the table in the order that they were defined in the table DDL.
Essentially you would use the above SELECT within the function to dynamically build up your UPDATE statement by iterating over the results of the above SELECT in a FOR LOOP to dynamically build up both the SET and WHERE clauses.
Some variation of this ?
SELECT * FROM old;
id | val
----+-----
1 | 1
2 | 2
4 | 5
5 | 1
6 | 2
SELECT * FROM new;
id | val
----+-----
1 | 2
2 | 2
3 | 2
5 | 1
6 | 1
SELECT * FROM old JOIN new ON old.id = new.id;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
2 | 2 | 2 | 2
5 | 1 | 5 | 1
6 | 2 | 6 | 1
(4 rows)
WITH sel AS (
SELECT o.id , o.val FROM old o JOIN new n ON o.id=n.id ),
upd AS (
UPDATE old SET val = new.val FROM new WHERE new.id=old.id RETURNING old.* )
SELECT * from sel, upd WHERE sel.id = upd.id AND sel.val <> upd.val;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
6 | 2 | 6 | 1
(2 rows)
Refer SO answer and read the entire discussion.
If you are updating a single table and want to know if the row is actually changed you can use this query:
with rows_affected as (
update mytable set (field1, field2, field3)=('value1', 'value2', 3) where id=1 returning *
)
select count(*)>0 as is_modified from rows_affected
join mytable on mytable.id=rows_affected.id
where rows_affected is distinct from mytable;
And you can wrap your existing queries into this one without the need to modify the actual update statements.

Changing the values in a column with a value from the same column

Is it possible in access to select a record from a given column, and update that same column with the record you selected?
For example:
----column----
test
---------------
becomes
-----Column------
test
test
test
test
-------------------
Notice the blanks in the first table, and how in the second table those blanks were filled with the value that was in the first row. Is there a way to do this without having to specifically say "update to test"? I'm at a loss as to how to do this without telling Access that it needs to specifically update the blanks to "test".
This will do what you want:
UPDATE Table1, (SELECT TOP 1 Field1 As F FROM Table1 WHERE Field1 Is Not Null)
SET Field1 = F
WHERE Field1 Is Null
It handles several special cases safely:
If more than one row has a value in Fleld1, the value from the first such row is used.
The first row of the table need not have a value in Field1.
If none of the rows have a value in Field1, nothing bad will happen.
Before:
Field1 Field2
apple
pet cat
dog
color red
blue
After:
Field1 Field2
pet apple
pet cat
pet dog
color red
pet blue
Assuming you have:
a table MY_TABLE
a column COL_1
multiple records of which only one holds a value (not null) for COL_1
The update statement to fill COL_1 of all records with the non-null value :
update MY_TABLE
set COL_1 = (select COL_1 FROM MY_TABLE where COL_1 is not null)
where COL_1 IS NULL;

How to insert, update, delete when import data from table to table?

I have a query that I need to run more than once a day. This query is importing data from a database to another.
The target table structure is:
Id Date Department Location PersonId Starttime EndTime State
1 2012-01-01 2 5 200 12:00:00.000 15:00:00.000 2
An application can also insert data to the target table. The records that are inserted by the application may not be updated also when this record exists in the source(temp table) table with another state.
To make this possible I have an solution created. I will create an new column in the target table with a second state so that I can check.
Id Date Department Location PersonId Starttime EndTime State StateSource
1 2012-01-01 2 5 200 12:00:00.000 15:00:00.000 2 2
Some Requirements:
If a record is added by the application than StateSource will be NULL. Means that this record may not be deleted, updated or inserted again from the source table.
If a Record is updated by the application than the value State and StateSource will be different. In this case I do not update this record.
I will update if the state from the sourcetable and targettable are not same and the values from target table State = StateSource.
I will INSERT a record when this is not exists in the target table. When records already exists do not insert (no matter if this is added by the application or my query on the first run).
I will delete the records from the target when they are no more exists in my sourcetable and State=StateSource.
I already have the following queries. I have decided to make 3 statements.
--Delete Statement first
Delete from t
from TargetTable t LEFT JOIN SourceTable s ON t.Id=s.Id
and t.Date=s.Date
and t.departments=s.Department
and t.PersonId=s.PersonId
and t.State=t.StateSource
--Just delete if a date is no more exists from the source table and this records is NOT
--changed by the application (t.State=t.StateSource)
--Update statement second
Update t
set t.State = s.State
From Targettable t INNER JOIN SourceTable s ON t.Id=s.Id
and t.Date=s.Date
and t.departments=s.Department
and t.PersonId=s.PersonId
The problem here is:
--when I have State 2 already in the targettable and in my sourcetable i have
--another state then the state in the targettable changes. This would not be the case.
--Insert Statement thirth
insert into TargetTable (Id, Date, Department, Location, PersonId, Starttime, EndTime,State, StateSource)
select Id, Date, Department, Location, PersonId, Starttime, EndTime,State, StateSource
from SourceTable s
WHERE Date not in (select Date
from TargetTable t
where t.id=s.id
and t.PersonId=s.PersonId
and t.date=s.date
and t.department=s.department)
--I have no idea about how the insert should be because the application also can
--insert records. When a record exists then no insert. What to do with the State?
Remember that the states that are changed by the application are leading.
Can anyone help me with the desired result?
you may use a merge statement.. something like this...
with target_T as (select * from UR_TARGET_TABLE
where statesource is not null) -- to dont change the data inserted from application...
merge target_T as TARGET
using UR_SOURCE_TABLE as SOURCE
on SOURCE.id = TARGET.id -- id is unique? anyway, put your primary key here...
when matched and TARGET.state = TARGET.statesource then --if inserted/updated from application, will not change data
update set TARGET.state = SOURCE.state
,TARGET.statesource = SOURCE.state --important update it together to be different from an application update
--, other collumns that you have to set...
--should use another when matched then update if need to change something on inserted/updated from application data
when not matched by TARGET then
insert (Id, Date, Department, Location, PersonId, Starttime, EndTime,State, StateSource)
values(SOURCE.Id, SOURCE.Date, SOURCE.Department, SOURCE.Location, SOURCE.PersonId, SOURCE.Starttime, SOURCE.EndTime,SOURCE.State, SOURCE.StateSource);
if you set an sample with declaring your tables and inserting some data...
I should help more, with a code that really works.. not just a sample...