Microsoft SQL Server Conditional Joining based on 2 columns - sql

I am looking to join 3 tables, all with the same data except one column is a different name (different date for each of the the 3 tables). The three tables look like the following. The goal is if a condition exists in table 1 AND/OR table 2 determine if a condition does or does not exist in table 3 for each individual id/condition. I'm currently left joining table 2 to table 1 but I'm aware that is not accounting for if a condition in table 2 exists that is not in table it is not being accounted for, anyways, any help would into this would be useful.
Table 1
id place Condition_2018
123 ABC flu
456 ABC heart attack
Table 2
id place Condition_2019
123 ABC flu
789 def copd
Table 3
id place Condition_2020
456 ABC heart attack
789 def copd
123 ABC flu
OUTPUT:
Table 2
id place Condition_2018 Condition_2019 Condition_2020
123 ABC flu flu flu
456 ABC heart attack null heart attack
789 def NULL copd copd
Thank you!

How about this (SQL Server syntax)...
SELECT
x.id
, x.place
, x.Condition_2018
, x.Condition_2019
, t3.Condition_2020
FROM (
SELECT
COALESCE(t1.id, t2.id) AS id
, COALESCE(t1.place, t2.place) AS place
, t1.Condition_2018
, t2.Condition_2019
FROM Table1 AS t1
FULL OUTER JOIN Table2 AS t2 ON t1.id = t2.id AND t1.place = t2.place
) AS x LEFT JOIN Table3 AS t3 ON x.id = t3.id AND x.place = t3.place

If your database supports full join, you can just do:
select
id,
place,
t1.condition_2018,
t2.condition_2019,
t3.condition_2020
from table1 t1
full join table2 t2 using(id, place)
full join table3 t3 using(id, place)
Otherwise, it is a bit more complicated: union all and aggregation is one method:
select
id,
place,
max(condition_2018) condition_2018,
max(condition_2019) condition_2019,
max(condition_2020) condition_2020
from (
select id, place, condition_2018, null condition_2019, null condition 2020 from table1
union all
select id, place, null, condition_2019, null from table2
select id, place, null, null, condition_2020 from table3
) t
group by id, place

You seem to want everything in Table3 and matches in the other two tables. That is just left joins:
select t3.id, t3.place,
t1.condition_2018, t2.condition_2019,
t3.condition_2020
from table3 t3 left join
table2 t2
on t3.id = t2.id and t3.place = t2.place left join
table1 t1
on t3.id = t1.id and t3.place = t1.place;

You need a full outer join of table1 and table2 and a left join to table3:
select
coalesce(t1.id, t2.id) id,
coalesce(t1.place, t2.place) place,
t1.Condition_2018,
t2.Condition_2019,
t3.Condition_2020
from table1 t1 full outer join table2 t2
on t2.id = t1.id
left join table3 t3
on t3.id = coalesce(t1.id, t2.id)
See the demo.
Results:
> id | place | Condition_2018 | Condition_2019 | Condition_2020
> --: | :---- | :------------- | :------------- | :-------------
> 123 | ABC | flu | flu | flu
> 456 | ABC | heart attack | null | heart attack
> 789 | def | null | copd | copd

Related

How to group total amount spend based on both ID and name

i have a table where
patientId | Units | Amount | PatientName
1234 | 1 | 20 |lisa
1111 | 5 | 10 |john
1234 | 10 | 200 |lisa
345 | 2 | 30 | xyz
i want to get ID in one column, then patient name then total amount spent by him on different items,
please note i have got patient name in the column above by doing a join on 2 tables using ID as the key
i am doing this to get this table
select t1.*,t2.name from table1 as t1 inner join table2 as t2
on t1.id = t2.id
then for adding i am trying to use the group by clause but that gives an error
please note i cannot use temp table in this, only need to do this using subquery, how to do it?
Are you looking for group by?
select t1.patientid, t2.patientname, sum(t1.amount)
from table1 t1 join
table2 t2
on t1.id = t2.id
group by t1.patientid, t2.patientname;
select t1.*,
t2.name
from table1 t1
inner join table2 t2
on t1.id = t2.id
group by t1.id, t2.name
What are table1 and table2 like? What's the error message?

What is the correct way from performance perspective to match(replace) every value in every row in temp table using SQL Server 2016 or 2017?

I am wondering what should I use in SQL Server 2016 or 2017 (CTE, LOOP, JOINS, CURSOR, REPLACE, etc) to match (replace) every value in every row in temp table? What is the best solution from performance perspective?
Source Table
|id |id2|
| 1 | 2 |
| 2 | 1 |
| 1 | 1 |
| 2 | 2 |
Mapping Table
|id |newid|
| 1 | 3 |
| 2 | 4 |
Expected result
|id |id2|
| 3 | 4 |
| 4 | 3 |
| 3 | 3 |
| 4 | 4 |
You may join the second table to the first table twice:
WITH cte AS (
SELECT
t1.id AS id_old,
t1.id2 AS id2_old,
t2a.newid AS id_new,
t2b.newid AS id2_new
FROM table1 t1
LEFT JOIN table2 t2a
ON t1.id = t2a.id
LEFT JOIN table2 t2b
ON t1.id2 = t2b.id
)
UPDATE cte
SET
id_old = id_new,
id2_old = id2_new;
Demo
Not sure if you want just a select here, or maybe an update, or an insert into another table. In any case, the core logic I gave above should work for all these cases.
You'd need to apply joins on update query. Something like this:
Update tblA set column1 = 'something', column2 = 'something'
from actualName tblA
inner join MappingTable tblB
on tblA.ID = tblB.ID
this query will compare eachrow with ids and if matched then it will update/replace the value of the column as you desire. :)
Do the self join only
SELECT t1.id2 as id, t2.id2
FROM table1 t
INNER JOIN table2 t1 on t1.id = t.id
INNER JOIN table2 t2 on t2.id = t.id2
This may have best performance from solutions posted here if you have indexes set appropriately:
select (select [newid] from MappingTable where id = [ST].[id]) [id],
(select [newid] from MappingTable where id = [ST].[id2]) [id2]
from SourecTable [ST]

Identify duplicates based on multiple columns

I want to identify duplicates in a db based on multiple columns from various tables. In the example below, 1&5 and 2&4 are duplicates - as all four columns have same values. How do I identify such records using a sql? I have used group by having count>1 when I had to identify duplicates based on a single column, but I am unsure how to identify them based on multiple columns. However, I see that when I do group by having count>1 based on all 4 columns, #3 and 6 are showing up, they are technically not duplicates per my requirement.
T1
ID | Col1 | Col2
---| --- | ---
1 | A | US
2 | B | FR
3 | C | AU
4 | B | FR
5 | A | US
6 | D | UK
T2
ID | Col1
---| ---
1 | Apple
1 | Kiwi
2 | Pear
3 | Banana
3 | Banana
4 | Pear
5 | Apple
T3
ID | Col1
---| ---
1 | Spinach
1 | Beets
2 | Celery
3 | Radish
4 | Celery
5 | Spinach
6 | Celery
6 | Celery
My expected result would be:
1 A US Apple Spinach
5 A US Apple Spinach
2 B FR Pear Celery
4 B FR Pear Celery
For your sample data, you can achieve this using inner join-ing all three tables and using just group by tA.Col1 having count(tA.Col1)>1 in where clause sub-query as below to obtain your desired result.
SELECT t1.ID,
t1.Col1,
t1.Col2,
t2.Col1,
t3.Col1
FROM table1 t1
JOIN table2 t2 ON t1.ID = t2.ID
JOIN table3 t3 ON t1.ID = t3.ID
WHERE t1.Col1 IN
( SELECT tA.Col1
FROM table1 tA
GROUP BY tA.Col1
HAVING count(tA.Col1)>1)
ORDER BY t1.ID;
Result
ID Col1 Col2 Col1 Col1
-----------------------------------
1 A US Apple Spinach
2 B FR Pear Celery
4 B FR Pear Celery
5 A US Apple Spinach
You can check the demo here
Hope this will help.
The problem is your result set needs to include the ID column which is unique. So a straightforward GROUP BY ... HAVING won't cut it. This would work.
with cte as
( select t1.id
, t1.col1 as t1_col1
, t1.col2 as t1_col2
, t2.col1 as t2_col1
, t3.col1 as t3_col1
from t1
join t2 on t1.id = t2.id
join t3 on t1.id = t3.id
)
select cte.*
from cte
where (t1_col1, t1_col2, t2_col1, t3_col1) in
( select t1_col1, t1_col2, t2_col1, t3_col1
from cte
group by t1_col1, t1_col2, t2_col1, t3_col1 having count(*) > 1)
/
The use of the sub-query factoring syntax is optional, but I find it useful to signal that the subquery is used more than one in the query.
"I have encountered another scenario in the data, some of the IDs have same values in T2 and T3 and they are showing up as dups."
The duplicated IDs in the child tables cause Cartesian products in the joined subquery, which causes false positives in the main result set. Ideally you should be able to handle this by introducing additional filters on those tables to remove the unwanted rows. However, if the data quality is so poor that there are no valid rules you will have to fall back on distinct:
with cte as (
select t1.id
, t1.col1 as t1_col1
, t1.col2 as t1_col2
, t2.col1 as t2_col1
, t3.col1 as t3_col1
from t1
join ( select distinct id, col1 from t2) t2 on t1.id = t2.id
join ( select distinct id, col1 from t3) t3 on t1.id = t3.id
) ...
You can add all columns in the group by clause for which you want to find the duplicate and then write the count condition in having claus
select t1.id,t1.col1,t2.col2,t2.col3,t3.col4 from t1 join t2 on t1.id=t2.id join t3 on t3.id=t1.id where (t1.col1,t2.col2,t2.col3,t3.col4) in (
select t1.col1,t2.col2,t2.col3,t3.col4
from t1 join t2 on t1.id=t2.id join t3 on t3.id=t1.id
group by t1.col1,t2.col2,t2.col3,t3.col4
having count(*) >1 )

how can i do the following query with Oracle SQL?

------------------
| **table 1** |
------------------
| 1 | 400 |
| 2 | 220 |
| 3 | 123 |
------------------
| **table 2** |
------------------
| 1 | 100 |
formula : table1 - table2 where table1.id=table2.id
------------------
| **Result** |
------------------
| 1 | 300 |
| 2 | 220 |
| 3 | 123 |
You want an outer join to get all rows from table_1 and the matching ones from table2
select t1.id, t1.val - coalesce(t2.val, 0) as result
from table_1 t1
left join table_2 t2 on t1.id = t2.id;
The coalesce(t2.val, 0) is necessary because the outer join will return null for those rows where no id exists in table_2 but t1.val - null would yield null
select t1.id,
nvl2(t2.val,t1.val-t2.val,t1.val) val
from t1,t2
where t1.id=t2.id(+)
order by t1.id;
Try this
select t1.col1, t1.col2-t2.col1 as balance from
table1 t1 left join table2 t2 on t1.col1=t2.col1
I don't the syntax in Oracle sql, but I can give the solution in mysql.
Consider the table with 2 columns:
id , value
SELECT table1.id, table1.value - table2.value
FROM table1, table2
WHERE table1.id=table2.id
OR
SELECT table1.id, table1.value
FROM table1, table2
WHERE NOT (table1.id =table2.id)
In some cases using scalar subquery caching could give better performance. It is on developer to compare execution plans and decide which query is the most appropriate.
with t1 (id, num) as
(
select 1, 400 from dual union all
select 2, 220 from dual union all
select 3, 123 from dual
),
t2(id, num) as
(
select 1, 100 from dual
)
select id,
num - nvl((select num from t2 where t2.id = t1.id), 0) result
from t1;
This is just to show you a different technique for solving problems in which you try to get data from several tables, but some may not have matching rows.
Using outer join in this case is in my opinion more logical.

Joining two sql tables with a one to many relationship, but want the max of the second table

I am trying to join two tables one is a unique feature the seconds is readings taken on several dates that relate to the unique features. I want all of the records in the first table plus the most recent reading. I was able to get the results I was looking for before adding the shape field. By using the code
SELECT
Table1.Name, Table1.ID, Table1.Shape,
Max(Table2.DATE) as Date
FROM
Table1
LEFT OUTER JOIN
Table2 ON Table1.ID = table2.ID
GROUP BY
Table1.Name, Table1.ID, Table1.Shape
The shape field is a geometry type and I get the error
'The type "Geometry" is not comparable. It can not be use in the Group By Clause'
So I need to go about it a different way, but not sure how.
Below is a sample of the two tables and the desired results.
Table1
Name| ID |Shape
AA1 | 1 | X
BA2 | 2 | Y
CA1 | 3 | Z
CA2 | 4 | Q
Table2
ID | Date
1 | 5/27/2013
1 | 6/27/2014
2 | 5/27/2013
2 | 6/27/2014
3 | 5/27/2013
3 | 6/27/2014
My Desired Result is
Name| ID |Shape |Date
AA1 | 1 | X | 6/27/2014
BA2 | 2 | Y | 6/27/2014
CA1 | 3 | Z | 6/27/2014
CA2 | 4 | Q | Null
You can do the aggregation on Table2 in a CTE, finding the MAX(DATE) for each ID, and then join that result to Table1:
WITH AggregatedTable2(ID, MaxDate) AS
(
SELECT
ID, MAX(DATE)
FROM
Table2
GROUP BY
ID
)
SELECT
t1.ID, t1.Name, t1.Shape, t2.MaxDate
FROM
Table1 t1
LEFT JOIN
AggregatedTable2 t2 ON t1.ID = t2.ID
Try casting geometry as a varchar.
Select Table1.Name, Table1.ID, cast(Table1.Shape as varchar(1)) AS Shape, Max(Table2.DATE) as Date
FROM Table1 LEFT OUTER JOIN
Table2 ON Table1.ID = table2.ID
Group By Table1.Name, Table1.ID, cast(Table1.Shape as varchar(1))
Try this:
SELECT t1.Name
, t1.ID
, t1.Shape
, MAX(t2.Date) As Date
FROM Table1 AS t1
LEFT JOIN Table2 AS t2
ON t2.ID = t1.ID
GROUP
BY t1.Name
, t1.ID
, t1.Shape