How to compare 2 datasets in ADF dataflow and save the difference?

How to compare 2 datasets in ADF dataflow and save the difference? - azure-data-factory-2

I need to compare 2 tables and filter rows that are not the same and mark the column value in which they are different.
TABLE 1
User
Number
Type
Value
User1
1
A
value2
User1
2
A
value3
User1
1
B
value4
User1
2
B
value5
User1
1
C
value6
User1
2
C
Value7
User1
1
D
Value8
User1
2
D
Value9
TABLE 2
User
Number
Type
Value
User1
1
A
value2
User1
2
B
value3
User1
1
B
value4
User1
2
B
value5
User1
1
A
value6
User1
2
C
Value7
User1
1
D
Value8
User1
2
A
Value9
The final table should look like this:
FINAL TABLE
User
Number
Type
Value
Type Change
User1
2
B
value3
from A to B
User1
1
A
value6
from C to A
User1
2
A
Value9
from D to A

You can join TABLE1 and TABLE2 first, then use filter to get the rows which's type isn't same. Next create a DerivedColumn to generate Type Change column. Finally, use Select transformation to remain the columns you need.
Details:
Create Join transformation to TABLE1 and TABLE2.
Filter the rows which's type isn't same. Expression:source2#Type != source1#Type
Add Type Change column. Expression:concat('from ',source1#Type,' to ', source2#Type)
Use Select transformation to remain the columns you need.
Data Preview:

Related

SQL LEFT_JOIN table on itself while replacing column

I have the following data:
value1 value2 value3
qwe bird 1
qwe 2
qwe 3
asd dog 4
asd 5
And I would like the following data:
value1 value2 value3
qwe bird 1
qwe bird 2
qwe bird 3
asd dog 4
asd dog 5
To me it seems like the problem that can be fixed by left joining two columns of the table on itself, while replacing a column. Something like:
# Selecting unique value1 and value2 combinations
SELECT value1, value2
FROM mytable
WHERE value2 != ''
GROUP BY value1, value2
# Left joining two tables
SELECT value1, value2
FROM selection_table
LEFT JOIN mytable
ON selection_table.value1 = mytable.value1;
Can I somehow make this entire operation in one call, such that I avoid having to make intermediate tables? When doing my left join, can I overwrite the value2 column?
Or do you have a more intelligent way to do this? I am sure there must be one :)

A simple solution applies a Windowed Aggregate
SELECT value1
,max(value2) -- group maximum
over (partition by value1)
,value3
from mytable
This is usually more efficient than a solution based on a self-join.

In standard SQL, you can use lag(ignore nulls):
select t.*, lag(value2 ignore nulls) over (partition by value1 order by value3)
from t;
However, many databases do not implement this functionality. One method is two levels of aggregation, one to get the value3 with a value and second to spread the value:
select t.*,
max(value2) over (partition by value1, grp)
from (select t.*,
max(case when value2 is not null then value3 end) over (partition by value1 order by value3) as grp
from t
) t

SQL Select Return Rows If Column Contain Value or Null Value Base on Key Columns

It was a bit difficult to describe my requirements based on the title, however I'll post with a table sample and result expectation.
I have a table (lets call it TBL_K) that looks like this:
KEY1 KEY2 VALUE1 VALUE2
abc 123 NULL NULL
abc 123 9999 1111
abc 123 9999 1111
ghd 123 NULL NULL
ghd 123 NULL NULL
tiy 134 4444 NULL
tiy 134 4444 NULL
hhh 981 NULL NULL
I want my Select statement to return the result in:
KEY1 KEY2 VALUE1 VALUE2
abc 123 9999 1111
ghd 123 NULL NULL
tiy 134 4444 NULL
hhh 981 NULL NULL
I have came up with own solution with creating two sub-tables with a left outer join but I want to see if there are other ways of creating this result.

It seems nearly to use max() :
select key1, key2, max(val1), max(val2)
from TBL_K tk
group by key1, key2;

SELECT
A.KEY1,
A.KEY2,
B.VALUE1,
B.VALUE2
FROM
(
SELECT
Z.KEY1,
Z.KEY2,
TRIM(Z.VALUE1) VALUE1,
TRIM(Z.VALUE2) VALUE2
FROM
TBL_K Z
WHERE
TRIM(Z.VALUE1) IS NULL
GROUP BY
Z.KEY1,
Z.KEY2,
Z.VALUE1,Z.VALUE2) A LEFT OUTER JOIN
(
SELECT
Y.KEY1,
Y.KEY2,
TRIM(Y.VALUE1) VALUE1,
TRIM(Y.VALUE2) VALUE2
FROM
TBL_K Y
WHERE
TRIM(Y.VALUE1) IS NOT NULL
GROUP BY
Y.KEY1,
Y.KEY2,
Y.VALUE1,Y.VALUE2) B
ON
(A.KEY1 = B.KEY1
AND A.KEY2 = B.KEY2)

Querying related records to xml field

I have to query a huge database. For every row, containing above hundred of fields, there are related records in three reated tables. Joining them will produce recordset multiplied tens of times (depending on related records count). I thought about combining fields from related tables to a single xml field.
For example:
Table1 (main table):
Id Field1 Field2 ...more fields go here
1 Value1 Value2
2 Value3 Value4
Table2 (one of the related tables)
Id ParentId Field3 Field4
1 1 Value5 Value6
2 1 Value7 Value8
3 2 Value9 Value10
I would like to get a following result:
Id Field1 Field2 XmlField1
1 Value1 Value2 XmlValue1
2 Value3 Value4 XmlValue2
Where XmlValue1 is as follows
<RelatedRecords>
<RelatedRecord>
<Field3>
Value5
</Field3>
<Field4>
Value6
</Field4>
</RelatedRecord>
<RelatedRecord>
<Field3>
Value7
</Field3>
<Field4>
Value8
</Field4>
</RelatedRecord>
</RelatedRecords>
And XmlValue2 is as follows
<RelatedRecords>
<RelatedRecord>
<Field3>
Value
</Field3>
<Field4>
Value10
</Field4>
</RelatedRecord>
</RelatedRecords>
How to get the desired output?

Solved by myself. Providing the desired solution for others:
SELECT Id,
Field1,
Field2,
(
SELECT Field3, Field4
FROM Table2
WHERE Table1.Id = Table2.ParentId
FOR XML PATH('RelatedRecord'), ROOT('RelatedRecords')
) XmlField1
FROM Table1

PostgreSQL 9.5 Insert data on a key with multiple occurences

I have 2 tables which are representing the same type of datas, one is in my DB and the other is coming from my client one's. Both his and my table are having some ID as PRIMARY KEY, but they are absolutely not related.
There is a field (field1) which is common between the two table, but this field is not always UNIQUE. In most cases, there are the same amount of tuples with this field in each table, but it is not necessary the case. Here is an example to illustrate the situation :
My table :
id_mytable field1 field2 id_clients
1 aa1 null null
2 aa1 null null
3 aa1 null null
4 aa2 null null
5 aa2 null null
6 aa3 null null
7 aa4 null null
And the client's table:
id_clients field1 field2
9 aa1 value1
10 aa1 value2
11 aa2 value3
12 aa2 value4
13 aa2 value5
14 aa3 value6
15 aa4 value7
And the result I would like to get, sorted by the field1 :
id_mytable field1 field2 id_clients
1 aa1 value1 9
2 aa1 value2 10
3 aa1 null null
4 aa2 value3 11
5 aa2 value4 12
null aa2 value5 13
6 aa3 value6 14
7 aa4 value7 15
You can notice the values not fulfilled in the result table where it was a difference betwenn my and the client's table, and that a new row was inserted. The idea is to be able to fulfill my table with the field2 and the id_clients. So far I cannot figure out a way to achieve that, my guess is that my relative beginner level makes me miss a DB's concept...
And here is an online sample, with the code proposed by ttallierchio : http://rextester.com/LOLH81061
Thank you very much for your attention.

you want to use a full outer join. the reason for this is you are not sure if data exists in either table so you want to combine and take data if exists in either one. i would just use row number to ensure uniqueness.
SELECT row_number() over(order by mt.field1),
coalesce(mt.field1,c.field1) as field1,
coalesce(mt.field2,c.field2) as field2,
c.id as id_clients
FROM mytable mt
FULL OUTER JOIN clients c
on mt.id = c.id and c.field1 = mt.field1;

You can use full outer join. The trick is calculating the new id:
select coalesce(mt.id,
m.maxid + row_number() over (partition by mt.id order by ct.id)
) as newid
mt.field1, ct.field2, ct.id
from mytable mt full outer join
clienttable ct
on mt.id = ct.id and mt.field1 = ct.field1 cross join
(select max(id) as maxid from mytable) as m;

SQL: Update one field from different table

Let's say I have those two tables:
table1
ID value1 value2
1 NULL NULL
2 NULL NULL
3 NULL NULL
table2
ID value3 value4
5 100 400
6 200 500
7 300 600
I need a SQL-statement to get value3 and value4 of table2 ID 7 into value1 and value2 of table1 ID 1.
How do I go about that?
Thanks

If all you need is to update two fields in a single row, you can do it with subqueries, like this:
update table1
set
value1 = (select value3 from table2 where id=7)
, value2 = (select value4 from table2 where id=7)
where id=1
For updating more fields in related rows of two tables use an UPDATE with JOIN syntax appropriate from your RDBMS.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to compare 2 datasets in ADF dataflow and save the difference? - azure-data-factory-2

Related

SQL LEFT_JOIN table on itself while replacing column

SQL Select Return Rows If Column Contain Value or Null Value Base on Key Columns

Querying related records to xml field

PostgreSQL 9.5 Insert data on a key with multiple occurences

SQL: Update one field from different table

Categories

Resources