SQL LEFT_JOIN table on itself while replacing column

SQL LEFT_JOIN table on itself while replacing column - sql

I have the following data:
value1 value2 value3
qwe bird 1
qwe 2
qwe 3
asd dog 4
asd 5
And I would like the following data:
value1 value2 value3
qwe bird 1
qwe bird 2
qwe bird 3
asd dog 4
asd dog 5
To me it seems like the problem that can be fixed by left joining two columns of the table on itself, while replacing a column. Something like:
# Selecting unique value1 and value2 combinations
SELECT value1, value2
FROM mytable
WHERE value2 != ''
GROUP BY value1, value2
# Left joining two tables
SELECT value1, value2
FROM selection_table
LEFT JOIN mytable
ON selection_table.value1 = mytable.value1;
Can I somehow make this entire operation in one call, such that I avoid having to make intermediate tables? When doing my left join, can I overwrite the value2 column?
Or do you have a more intelligent way to do this? I am sure there must be one :)

A simple solution applies a Windowed Aggregate
SELECT value1
,max(value2) -- group maximum
over (partition by value1)
,value3
from mytable
This is usually more efficient than a solution based on a self-join.

In standard SQL, you can use lag(ignore nulls):
select t.*, lag(value2 ignore nulls) over (partition by value1 order by value3)
from t;
However, many databases do not implement this functionality. One method is two levels of aggregation, one to get the value3 with a value and second to spread the value:
select t.*,
max(value2) over (partition by value1, grp)
from (select t.*,
max(case when value2 is not null then value3 end) over (partition by value1 order by value3) as grp
from t
) t

Related

TSQL: Find duplicate values based on two database values

I like to find duplicate entrys based on the same "Article" AND "Warehouse" column. I cant find a solution for an MSSQL-Query to find out the different "Value1" and "Value2" based on the following table:
Article Value1 Value2 Warehouse
123 123 01.01.2021 1
123 456 02.12.2022 1
123 789 05.05.2024 1
123 123 01.01.2021 2
123 123 01.01.2021 3
456 123 01.01.2021 1
456 123 01.01.2021 1
456 123 01.01.2021 1
The result should be:
Article Value1 Value2 Warehouse
123 123 01.01.2021 1
123 456 02.12.2022 1
123 789 05.05.2024 1
EDIT: The Warehouse and Article is always different. In the result I want to the see the same article and warehouse which has different entry's on value1 and value2.
As you can see the article "123" AND Warehouse "1" has different entry´s on the value1 and value2. So I´d like to get them in the result of the SQL-Query.
But the article "456" has the SAME entry's on value1 and value2 for Warehouse 1, so I don´t wan´t them in the result.
Thank you very much for your help!

Use COUNT DISTINCT.
select *
from mytable t1
where exists
(
select null
from mytable t2
where t2.article = t1.article and t2.warehouse = t1.warehouse
having count(distinct value1) > 1 or count(distinct value2) > 1
)
order by article, warehouse, value1, value2;
(This would be more readable with an IN clause in my opinion, but SQL Server doesn't allow IN clauses on tuples like WHERE (article, warehouse) IN (...).)

Use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.article = t.article and
t2.warehouse = t.warehouse and
(t2.value1 <> t.value1 or t2.value2 <> t.value2)
);
It is unclear from your question whether both values have to be different or either value. The above implements either value being different.
For performance, I would recommend an index on (article, warehouse, value1, value2).

PostgreSQL 9.5 Insert data on a key with multiple occurences

I have 2 tables which are representing the same type of datas, one is in my DB and the other is coming from my client one's. Both his and my table are having some ID as PRIMARY KEY, but they are absolutely not related.
There is a field (field1) which is common between the two table, but this field is not always UNIQUE. In most cases, there are the same amount of tuples with this field in each table, but it is not necessary the case. Here is an example to illustrate the situation :
My table :
id_mytable field1 field2 id_clients
1 aa1 null null
2 aa1 null null
3 aa1 null null
4 aa2 null null
5 aa2 null null
6 aa3 null null
7 aa4 null null
And the client's table:
id_clients field1 field2
9 aa1 value1
10 aa1 value2
11 aa2 value3
12 aa2 value4
13 aa2 value5
14 aa3 value6
15 aa4 value7
And the result I would like to get, sorted by the field1 :
id_mytable field1 field2 id_clients
1 aa1 value1 9
2 aa1 value2 10
3 aa1 null null
4 aa2 value3 11
5 aa2 value4 12
null aa2 value5 13
6 aa3 value6 14
7 aa4 value7 15
You can notice the values not fulfilled in the result table where it was a difference betwenn my and the client's table, and that a new row was inserted. The idea is to be able to fulfill my table with the field2 and the id_clients. So far I cannot figure out a way to achieve that, my guess is that my relative beginner level makes me miss a DB's concept...
And here is an online sample, with the code proposed by ttallierchio : http://rextester.com/LOLH81061
Thank you very much for your attention.

you want to use a full outer join. the reason for this is you are not sure if data exists in either table so you want to combine and take data if exists in either one. i would just use row number to ensure uniqueness.
SELECT row_number() over(order by mt.field1),
coalesce(mt.field1,c.field1) as field1,
coalesce(mt.field2,c.field2) as field2,
c.id as id_clients
FROM mytable mt
FULL OUTER JOIN clients c
on mt.id = c.id and c.field1 = mt.field1;

You can use full outer join. The trick is calculating the new id:
select coalesce(mt.id,
m.maxid + row_number() over (partition by mt.id order by ct.id)
) as newid
mt.field1, ct.field2, ct.id
from mytable mt full outer join
clienttable ct
on mt.id = ct.id and mt.field1 = ct.field1 cross join
(select max(id) as maxid from mytable) as m;

Pivot duplicate column names and get all values for columns

EXPLAINATION
Imagine that I have 2 tables. FormFields where are stored column names as values, which should be pivoted and second table FilledValues with user's filled values with FormFieldId provided.
PROBLEM
As you see (below in SAMPLE section) in FormFields table I have duplicate names, but different ID's. I need to make that after joining tables, all values from FilledValues table will be assiged to column names, not to Id's.
What I need better you will see in OUTPUT section below.
SAMPLE DATA
FormFields
ID Name GroupId
1 col1 1
2 col2 1
3 col3 1
4 col1 2
5 col2 2
6 col3 2
FilledValues
ID Name FormFieldId GroupID
1 a 2 1
2 b 3 1
3 c 1 1
4 d 4 2
5 e 6 2
6 f 5 2
OUTPUT FOR NOW
col1 col2 col3
c a b -- As you see It returning only values for FormFieldId 1 2 3
-- d, e, f are lost that because It have duplicate col names, but different id's
DESIRED OUTPUT
col1 col2 col3
c a b
e f d
QUERY
SELECT * FROM
(
SELECT FF.Name AS NamePiv,
FV.Name AS Val1
FROM FormFields FF
JOIN FilledValues FV ON FF.Id = FV.FormFieldId
) x
PIVOT
(
MIN(Val1)
FOR NamePiv IN ([col1],[col2],[col3])
) piv
SQL FIDDLE
How can I produce the OUTPUT with the multiple rows?

Since you are using PIVOT the data is being aggregated so you only return one value for each column being grouped. You don't have any columns in your subquery that are unique and being used in the grouping aspect of PIVOT to return multiple rows. In order to do this you need some value. If you have a column with a unique value for each "group" then you would use that or you can use a windowing function like row_number().
row_number() will create a sequenced number for each FF.Name meaning if you have 2 col1 you will generate a 1 for a row and a 2 for another row. Once this is included in your subquery, you now have a unique value that is used when aggregating your data and you will return multiple rows:
SELECT [col1],[col2],[col3]
FROM
(
SELECT
FF.Name AS NamePiv,
FV.Name AS Val1,
rn = row_number() over(partition by ff.Name order by fv.Id)
FROM FormFields FF
JOIN FilledValues FV ON FF.Id = FV.FormFieldId
) x
PIVOT
(
MIN(Val1)
FOR NamePiv IN ([col1],[col2],[col3])
) piv;
See SQL Fiddle with Demo. The output is:
| col1 | col2 | col3 |
|------|------|------|
| c | a | b |
| e | f | d |

Just adding GroupId in Pivot source query will fix your problem
SELECT * FROM (
SELECT FF.Name AS NamePiv,
FV.Name AS Val1,
ff.groupid
FROM FormFields FF
JOIN FilledValues FV ON FF.Id = FV.FormFieldId
) x
PIVOT
(
MIN(Val1)
FOR NamePiv IN ([col1],[col2],[col3])
) piv
SQLFIDDLE DEMO

I would be inclined to do this with conditional aggregation:
select max(case when formfieldid % 3 = 1 then name end) as col1,
max(case when formfieldid % 3 = 2 then name end) as col2,
max(case when formfieldid % 3 = 0 then name end) as col3
from FilledValues
group by GroupID;
It is unclear what the rule is for assigning a value to a column. This uses the remainder, which works for your input data.

SQL: Update one field from different table

Let's say I have those two tables:
table1
ID value1 value2
1 NULL NULL
2 NULL NULL
3 NULL NULL
table2
ID value3 value4
5 100 400
6 200 500
7 300 600
I need a SQL-statement to get value3 and value4 of table2 ID 7 into value1 and value2 of table1 ID 1.
How do I go about that?
Thanks

If all you need is to update two fields in a single row, you can do it with subqueries, like this:
update table1
set
value1 = (select value3 from table2 where id=7)
, value2 = (select value4 from table2 where id=7)
where id=1
For updating more fields in related rows of two tables use an UPDATE with JOIN syntax appropriate from your RDBMS.

SQL query to find False Negatives w.r.t Data Matching or Entity Resolution

Suppose I have two different ID's assigned for same record. For example
RecordID | ID1 | ID2
--------------------
1 | X | A
2 | X | B
3 | Y | C
4 | Y | C
5 | Y | D
6 | Z | E
7 | Z | E
Now, I want to get the records where ID1 is assigned to the same value where as ID2 is assigned to a different value.
For example, I want to get:
1, X, A
2, X, B
Here ID1 assigned it X where as ID2 assigned it A and B, two different values.
Is it possible to write a query in SQL or SQL server that will return such records?

You need to use a subquery where, for each row, you poke through the table and see if any other rows match a certain criteria related to it.
pseudo sql:
select
t1.id,
t1.variable,
t1.value
from
table t1
where
exists ( select 1
from t2
where t2.id != t1.id
and t2.variable == t1.variable
and t2.value != t1.value)

Assuming I'm understanding your requirements, I think all you need is an INNER JOIN:
SELECT DISTINCT T.*
FROM YourTable T
JOIN YourTable T2 ON T.ID1 = T2.ID1 AND T.ID2 <> T2.ID2
And here is the SQL Fiddle.
Please note, in this example it returns all rows from X and Y. X because of A and B; Y because of C and D. Is this correct?
Good luck.

I think you are looking for this:
SELECT RecordID, ID1, ID2
FROM yourtable
WHERE ID1 IN (SELECT ID1
FROM yourtable
GROUP BY ID1
HAVING COUNT(DISTINCT ID2)>1);
See fiddle here.

If this is SQL Server 2005+:
WITH minmax AS (
SELECT
*,
minID2 = MIN(ID2) OVER (PARTITION BY ID1),
maxID2 = MAX(ID2) OVER (PARTITION BY ID1)
FROM atable
)
SELECT
RecordID,
ID1,
ID2
FROM minmax
WHERE minID2 <> maxID2
;
In the minmax CTE, two more columns are added which hold minimum and maximum ID2 for every group of rows with the same ID1. The main query returns only those rows where the corresponding minimum ID2 doesn't match the maximum ID2.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL LEFT_JOIN table on itself while replacing column - sql

A simple solution applies a Windowed Aggregate SELECT value1 ,max(value2) -- group maximum over (partition by value1) ,value3 from mytable This is usually more efficient than a solution based on a self-join.

Related

TSQL: Find duplicate values based on two database values

PostgreSQL 9.5 Insert data on a key with multiple occurences

Pivot duplicate column names and get all values for columns

SQL: Update one field from different table

SQL query to find False Negatives w.r.t Data Matching or Entity Resolution

Categories

Resources