two way comparison in merge statement

two way comparison in merge statement - sql

Given a source table:
create table source_after (
binary_path varchar2(40),
hostname varchar2(40),
change_column varchar2(40),
flag varchar2(20) default 'open'
);
insert all
into source_after (binary_path,hostname,change_column) values ('java','b','DMZ')
into source_after (binary_path,hostname,change_column) values ('apache','c','drn')
into source_after (binary_path,hostname,change_column) values ('NEW','NEW','NEW')
select * from dual;
--------
binary_path hostname flag change_column
java b open DMZ
apache c open drn
NEW NEW open NEW
And a destination table:
create table destination (
binary_path varchar2(40),
hostname varchar2(40),
change_column varchar2(40),
flag varchar2(20)
);
insert all
into destination (binary_path,hostname,change_column) values ('python','a','drn')
into destination (binary_path,hostname,change_column) values ('java','b','drn')
into destination (binary_path,hostname,change_column) values ('apache','c','drn')
into destination (binary_path,hostname,change_column) values ('spark','d','drn')
select * from dual;
------
binary_path hostname change_column flag
python a drn null
java b drn null
apache c drn null
spark d drn null
The primary key of both tables is the combination (binary_path,hostname) . I want to merge into destination the changes of source_after.
These should be:
If the primary key in destination is present in source_after, I want to update change_column in destination with the value of source_after.
If the primary key in destination is not present in source_after, I want to mark the flag column as closed.
If the primary key in source_after is not present in destination, I want to insert the row present in source_after which is not present in destination.
I have tried this:
merge into destination d
using (select * from source_after) s on (d.hostname = s.hostname and d.binary_path = s.binary_path)
when matched then update
set
d.change_column = s.change_column,
d.flag = s.flag
when not matched then insert
(d.binary_path,d.hostname,d.change_column,d.flag)
values
(s.binary_path,s.hostname,s.change_column,s.flag)
;
binary_path hostname change_column flag
python a drn null
java b DMZ open
apache c drn open
spark d drn null
NEW NEW NEW open
It solves problem 1 and 3 , but not problem 2 which is marking the column flag as closed.

You can use a FULL OUTER JOIN in the USING clause and correlate on the ROWID pseudo-column for the destination between the USING clause and the target of the MERGE:
MERGE INTO destination d
USING (
SELECT d.ROWID AS rid,
s.*
FROM destination d
FULL OUTER JOIN source_after s
ON (d.hostname = s.hostname AND d.binary_path = s.binary_path)
) s
ON (s.rid = d.ROWID)
WHEN MATCHED THEN
UPDATE
SET d.change_column = COALESCE(s.change_column, d.change_column),
d.flag = COALESCE(s.flag, 'closed')
WHEN NOT MATCHED THEN
INSERT (d.binary_path,d.hostname,d.change_column,d.flag)
VALUES (s.binary_path,s.hostname,s.change_column,s.flag);
Which, for the sample data, changes the destination table to:
BINARY_PATH
HOSTNAME
CHANGE_COLUMN
FLAG
python
a
drn
closed
java
b
DMZ
open
apache
c
drn
open
spark
d
drn
closed
NEW
NEW
NEW
open
fiddle

If I understood you correctly, that won't work - not in a single statement.
If something MATCHES, you can UPDATE it
If there's NO MATCH, you can INSERT it
You can't combine NO MATCH with UPDATE, which means that you'll have to write two statements

merge into destination d
using (
select s.binary_path, s.hostname, s.change_column, s.flag from source_after s
union all
select d.binary_path, d.hostname, d.change_column, 'closed' from destination d
where not exists(select 1 from source_after s where s.binary_path = d.binary_path and s.hostname = d.hostname)
) s
on (d.hostname = s.hostname and d.binary_path = s.binary_path)
when matched then update
set
d.change_column = s.change_column,
d.flag = s.flag
when not matched then insert
(d.binary_path,d.hostname,d.change_column,d.flag)
values
(s.binary_path,s.hostname,s.change_column,s.flag)
;

Related

cdc with python and merge on bigquery

I have written a pipeline using Apache beam and Google dataflow that sends changes from a MongoDB to bq. I have a bigquery log table like ...
table
operation type
timestamp
[all columns]
[insert / update / delete / replace]
timestamp
and a "normal" table without the operation and timestamp column. My goal is to merge the src table (log) and the target table. The problem is as following, when the second last entry to a field is not null and the last one is, how can I check this in the merge statement? For example in other databases you can do something like
create function get_sec_last_value(id) as (
(
select as struct
*
from (
select
*,
row_number() over(order by timestamp desc) as number
from table
where id = id
) where number = 2
)
);
merge target trg
using source as src
on trg.id = src.id
...
update set id = case
when (get_sec_last_value(src.id).id is not null and src.id is not null) or (get_sec_last_value(src.id).id is null and src.id is not null) then src.id
when (get_sec_last_value(src.id).id is not null and src.id is null) or (get_sec_last_value(src.id).id is null and src.id is null) then null
end
...
Has anybody faced the same problem or has an idea how to solve it?
Thanks in advance

Adding new column with multiple values per existing record

I'm interested in adding a column to an existing table with a set of explicit values that should duplicate existing records (similar to common join constructs).
For example, say we're starting with a table with a single column:
CREATE TABLE #DEMO (
COLUMN_A NVARCHAR(100) NOT NULL
PRIMARY KEY (COLUMN_A)
);
COLUMN_A
ACCOUNT_001
ACCOUNT_002
ACCOUNT_003
...and I want to add Column_B with row values of 'A', 'B', and 'C'. The end goal would be a table that looks like:
COLUMN_A
COLUMN_B
ACCOUNT_001
A
ACCOUNT_001
B
ACCOUNT_001
C
ACCOUNT_002
A
ACCOUNT_002
B
ACCOUNT_002
C
ACCOUNT_003
A
ACCOUNT_003
B
ACCOUNT_003
C
Is this possible? Bonus Points if there is a name or phrase for this you know of.

So I think you need couple of steps to first insert new rows and then update existing:
alter table #demo add COLUMN_B char(1);
with x as (
select * from (values('A'),('B'))x(B)
)
insert into #demo(COLUMN_A, COLUMN_B)
select COLUMN_A, B
from #DEMO cross join x
update #DEMO set COLUMN_B = 'C'
where COLUMN_B is null
Demo Fiddle

Insert Non-Duplicate records to copy table with updated date

I am currently working on a project where I have 2 SQL Server databases and need to copy new records into archive database and append with updated date. Example:
Existing DB: dbo.A.Category(Id, Name)
Copy new records (no duplicates) to:
Archive DB: dbo.B.Category(Id, Name, ArchiveDate)
How do I copy only changed records from the existing database to the archive database? This is in SQL Server.

You can use the EXCEPT operator for this. For example:
INSERT INTO archiveCategory (id,name,creationdate)
SELECT id,name,current_timestamp
FROM (
SELECT id,name
FROM myDB.dbo.category
EXCEPT
SELECT id,name
FROM archiveDB.dbo.category
WHERE creationdate = (SELECT max(creationdate) from archiveDB.dbo.category a2 where a.id = a2.id )
) delta

You can achieve this with a MERGE statement.
I have made the following assumptions about what you're trying to achieve:
the [Id] column in dbo.A.Category contains unique values
the [Id] column in dbo.B.Category is not an identity column and values correspond to matching [Id] values in dbo.B.Category
you only care if updated [name] values in dbo.A.Category have been changed, not if they've been updated with the same value (e.g. not if 'Bob' is changed to 'Bob')
you do not want deleted rows from dbo.A.Category to be likewise deleted from dbo.B.Category
MERGE dbo.B.Category AS tgt
USING dbo.A.Category AS src
ON tgt.[Id] = src.[Id]
WHEN MATCHED
AND tgt.[Name] <> src.[Name]
THEN UPDATE
SET [Name] = src.[Name]
, [ArchiveDate] = SYSDATETIME()
WHEN NOT MATCHED BY TARGET
THEN INSERT ( [Id], [Name], [ArchiveDate] )
VALUES ( src.[id], src.[name], SYSDATETIME() ) ;
GO

Create a new column of value based on another column

I am new to SQL, and am asked to create two new columns of value based on another column in Oracle Sql.
Here is how data looks like:Under each ID, there is also an IDseq representing a sub-segment in this ID, each with a Start and End place.
SQL needs to help me find the smallest IDseq under each ID, then find the corresponding start place. Similarly, find the largest IDseq under each ID, then find the corresponding end place. Each unique ID would have only one origin and one destination, which will be shown in the two new columns. I'd like to create two new columns (see below) - Origin and Dest to show the origin and destination place for each ID.
Really appreciate your help.

You can use a CASE statement, such as:
select
a.idseq, a.id, a.start, a.end,
case
when a.id = 'ABC' then 'X'
when a.id = 'BCD' then 'Q'
end as origin,
case
when a.id = 'ABC' then 'G'
when a.id = 'BCD' then 'Z'
end as dest
from
yourtablename a

I wrote this before seeing the Oracle tag. MySQL has derived temp table issues, maybe you can avoid the extras in Oracle?
CREATE TEMPORARY TABLE tmp_sequence (
IDSeq INT NOT NULL AUTO_INCREMENT, range_id VARCHAR(3), range_start CHAR(1), range_end CHAR(1), origin CHAR(1), destination CHAR(1), PRIMARY KEY (IDSeq)
);
INSERT INTO tmp_sequence (range_id, range_start, range_end)
VALUES ('ABC', 'X', 'Y'), ('ABC', 'Y', 'H'), ('ABC', 'H','L'), ('ABC','L', 'G'),
('BCD','Q','D'), ('BCD','D','H'),('BCD','H','Z');
CREATE TEMPORARY TABLE tmp_min AS
SELECT MIN(IDSeq) min_id, range_id
FROM tmp_sequence
GROUP BY range_id;
CREATE TEMPORARY TABLE tmp_start AS
SELECT s.min_id, s.range_id, t.range_start
FROM tmp_sequence t
JOIN tmp_min s ON t.IDSeq = s.min_id
AND t.range_id = s.range_id;
UPDATE tmp_sequence t
JOIN tmp_start s ON t.range_id = s.range_id
SET origin = s.range_start;
CREATE TEMPORARY TABLE tmp_max AS
SELECT MAX(IDSeq) max_id, range_id
FROM tmp_sequence
GROUP BY range_id;
CREATE TEMPORARY TABLE tmp_end AS
SELECT s.max_id, s.range_id, t.range_end
FROM tmp_sequence t
JOIN tmp_max s ON t.IDSeq = s.max_id
AND t.range_id = s.range_id;
UPDATE tmp_sequence t
JOIN tmp_end s ON t.range_id = s.range_id
SET destination = s.range_end;
DROP TEMPORARY TABLE tmp_sequence;
DROP TEMPORARY TABLE tmp_min;
DROP TEMPORARY TABLE tmp_start;
DROP TEMPORARY TABLE tmp_end;

In postgresql, how can I fill in missing values within a column?

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John
ID # | Name
-------|-----
1 | Bob
1 | (null)
1 | (null)
2 | John
2 | (null)
2 | (null)
`
EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.

CREATE TABLE bobjohn
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO bobjohn(id, zname) VALUES
(1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
;
UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
;
SELECT * FROM bobjohn;
NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)
If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):
--
-- CTE's dont work in update queries for Postgres version below 9
--
WITH uniq AS (
SELECT DISTINCT id
-- if there are more than one names for a given Id: pick the lowest
, min(zname) as zname
FROM bobjohn
WHERE zname IS NOT NULL
GROUP BY id
)
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
;
SELECT * FROM bobjohn;

UPDATE tbl
SET name = x.name
FROM (
SELECT DISTINCT ON (id) id, name
FROM tbl
WHERE name IS NOT NULL
ORDER BY id, name
) x
WHERE x.id = tbl.id
AND tbl.name IS NULL;
DISTINCT ON does the job alone. Not need for additional aggregation.
In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.
Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.

If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:
UPDATE some_table AS t1
SET name = (
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
)
WHERE name IS NULL;
If you only want to query the table and have this information filled in on the fly, you can use a similar query:
SELECT
t1.id,
(
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
) AS name
FROM some_table AS t1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

two way comparison in merge statement - sql

If I understood you correctly, that won't work - not in a single statement. If something MATCHES, you can UPDATE it If there's NO MATCH, you can INSERT it You can't combine NO MATCH with UPDATE, which means that you'll have to write two statements

Related

cdc with python and merge on bigquery

Adding new column with multiple values per existing record

Insert Non-Duplicate records to copy table with updated date

Create a new column of value based on another column

In postgresql, how can I fill in missing values within a column?

Categories

Resources