SQL Server: Full Outer Join, On vs. Where - sql-server-2005

I've searched for this and can't seem to find an answer. I apologize in advance as I'm sure this answer is out there, but I can't seem to find it.
I'm working with a SQL Server 2005 DB, and I'm aware that the query below doesn't represent a normalized DB, since the numPlacements field is in both the detail and rollup table. I didn't create the DB.
The below SQL gives the expected result when the where clause is used. The expected result is all rows where a matching value is missing from either table, or the two values don't match.
However, if I comment the where clause and uncomment the final AND in the ON clause, it returns over 200k rows instead of the expected 120 results.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END,
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'bnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM PlacementDetailLevel
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM PlacementRollupLevel
GROUP BY ID) B
ON A.ID = B.ID
--AND B.numPlacements <> A.numPlacements
WHERE A.numPlacements <> B.numPlacements or A.numPlacements is null or B.numPlacements is null
Any ideas as to why?
More detail below based on ypercube's suggestion:
I created TableA and TableB. They look like this:
TableA
ID numPlacements
1 10
2 20
3 30
4 40
TableB
ID numPlacements
2 20
3 31
4 40
5 50
Note that the differences are TableA has no #5, TableB has no #1, and #3 has a different numPlacements in both.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END AS 'ID',
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'BnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableA
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableB
GROUP BY ID) B
ON A.ID = B.ID
The above produces exactly what I'd expect:
ID AnumPlacements BnumPlacements Variance
1 10 NULL NULL
2 20 20 0
3 30 31 1
4 40 40 0
5 NULL 50 NULL
Let's try adding the WHERE clause.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END AS 'ID',
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'BnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableA
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableB
GROUP BY ID) B
ON A.ID = B.ID
WHERE A.numPlacements <> B.numPlacements or A.numPlacements is null or B.numPlacements is null
With the where, we get the three expected rows:
ID AnumPlacements BnumPlacements Variance
1 10 NULL NULL
3 30 31 1
5 NULL 50 NULL
Let's try adding the AND.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END AS 'ID',
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'BnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableA
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableB
GROUP BY ID) B
ON A.ID = B.ID
AND B.numPlacements <> A.numPlacements
Now, if we try it with the above AND in the join, I would expect to get row #3.
Instead, I get this:
ID AnumPlacements BnumPlacements Variance
1 10 NULL NULL
2 NULL 20 NULL
2 20 NULL NULL
3 30 31 1
4 NULL 40 NULL
4 40 NULL NULL
5 NULL 50 NULL

Related

Create column that combines two columns by ifelse statementet in SQL

In SQL I am trying to combine create a column id_main2 (after a right join) that is equal the value of column id_main (coming from a) if not NULL and the value of id (coming from b) if id_main is NULL.
Below is the join code followed by the desired output. How can I create this id_main2 column?
SELECT * FROM a
RIGHT JOIN b on a.id = b.id;
id_main id boy id girl id_main2
10 1 Alex 1 Alice 10
11 2 Bruce 2 Brunet 11
NULL NULL NULL 5 Emma 5
NULL NULL NULL 6 Fabia 6
I think you just want coalesce():
select a.*, b.*,
coalesce(a.id_main, b.id)
from b left join
a
on a.id = b.id;
I strongly prefer left join to right join, so I rearranged the tables in the from clause.
You can either use coalesce()
select
coalesce(a.id_main, b.id) as id_main2
from a
right join b on a.id = b.id;
or case when
select
case when a.id is not null then a.id
else b.id end as id_main2
from a
right join b on a.id = b.id;

sql find duplicates in a one column or other column

I'm having trouble writing a SQL query for the following requirement:
I have a table consisting of the columns: id, date(dd/mm/yyyy), phone and email. id is unique for each row in the table.
I need to find duplicates records by finding duplicates in phone OR email columns based on date column.
i.e, Identify if the email or phone in a record already exists on a previous date. If so, mark it as duplicate.
Prob do something like this:
select a.id, a.date, a.phone, a.email,
case when b.phone is not null or c.email is not null then 'Duplicate' else 'Unique' end as flag
from table a
left join table b on (a.phone = b.phone and a.date > b.date)
left join table c on (a.email = c.email and a.date > c.date)
If you have dupes in the dataset across phone, email and date, this may result in multiple rows returning so you may need to a sub select in the join.
For example
left join (select distinct phone, date from table) b on (a.phone = b.phone and a.date > b.date)
Original above
I've thought about it some more and you'll get duplicate rows on the join if there are previous instances of the phone or email.
This should work better:
select a.id, a.date, a.phone, a.email,
case when a.phone is null and a.email is null then null
when sum(case when b.phone is not null or c.email is not null then 1 else 0 end) > 0 then 'Duplicate' else 'Unique' end as flag
from table a
left join table b on (a.phone = b.phone and a.date > b.date)
left join table c on (a.email = c.email and a.date > c.date)
group by a.id, a.date, a.phone, a.email

SQL select query two rows combining

I have two tables called A, B. I write SQL query to retrieve data from A, B. It is as follows:
SELECT A.No ,
A.ItemCode,
case when B.Year = 1 then SUM(B.Sale_Amount) end as "1Year",
case when B.Year = 2 then SUM(B.Sale_Amount) end as "2Year",
case when B.Year = 3 then SUM(B.Sale_Amount) end as "3Year"
FROM A
LEFT OUTER JOIN B ON B.ID = A.CODE
GROUP BY
A."No",
A."ItemCode",
B."Year"
This gives following output:
But I need these three rows select in two rows as follows:
Here is the correct answer
SELECT A.No ,
A.ItemCode,
SUM(case when B.Year = 1 then B.Sale_Amount end) as [1Year],
SUM(case when B.Year = 2 then B.Sale_Amount end) as [2Year],
SUM(case when B.Year = 3 then B.Sale_Amount end) as [3Year]
FROM A
LEFT OUTER JOIN B ON B.ID = A.CODE
GROUP BY A.No,A.ItemCode

JOIN ON clause with CASE statement depending on if field is NULL?

I'm trying to do a simple LEFT JOIN with tables with 2 IDs - basically an ID and Sub-ID. Each row has an ID, but not necessarily a Sub-ID. When a Sub-ID exists, I want to join based on that, if not join on the ID. I'd imagine something like
SELECT ...
FROM tablename a
LEFT JOIN tablename b
ON CASE WHEN SUB_ID IS NOT NULL THEN
a.SUB_ID = b.SUB_ID
ELSE
a.ID = b.ID END
AND
a.otherfield = b.otherfield
But I couldn't get anything like this to work, so instead I had to do 2 queries with a UNION (one that joined on SUB_ID WHERE SUB_ID IS NOT NULL and another that joined on ID WHERE SUB_ID IS NULL.) It worked but I can't imagine there isn't a way to do it. If it helps, my ID and SUB_ID values look like this:
ID SUB_ID
10000 NULL
10001 NULL
10001 10001-3
10001 10001-5
10014 NULL
Any suggestions on how to achieve this without doing a UNION? Thanks in advance!!
We can use COALESCE for this purpose:
SELECT ...
FROM tablename a
LEFT JOIN tablename b
ON COALESCE(a.SUB_ID,a.ID) = COALESCE(b.SUB_ID,b.ID)
COALESCE returns value of first not null parameter from left.
Here is the code at SQL Fiddle
This should work for you.
SELECT ...
FROM tablename a
LEFT JOIN tablename b
ON ((b.SUB_ID IS NOT NULL AND a.SUB_ID = b.SUB_ID) OR
(a.ID = b.ID))
AND a.otherfield = b.otherfield
Interesting.
SELECT ...
FROM tablename a
LEFT JOIN tablename b
ON (
a.SUB_ID = b.SUB_ID
OR (a.SUB_ID IS NULL AND b.SUB_ID IS NULL AND a.ID = b.ID)
)
AND a.otherfield = b.otherfield
That might well work. It's NOT going to be fast, though.
Depending on the state of your data and what you want to achieve, you might want to change the join clause to
ON (
a.SUB_ID = b.SUB_ID
OR (a.SUB_ID IS NULL AND a.ID = b.ID)
OR (b.SUB_ID IS NULL AND a.ID = b.ID)
)
AND a.otherfield = b.otherfield
... instead.
You could JOIN the table twice:
SELECT ...
FROM tablename a
LEFT JOIN tablename b
ON a.SUB_ID = b.SUB_ID
AND a.otherfield = b.otherfield
LEFT JOIN tablename c
ON a.ID = b.ID
AND a.SUB_ID IS NULL
AND a.otherfield = c.otherfield
Then use ISNULL to get the columns, e.g.
ColumnName = ISNULL(b.ColumnName, c.ColumnName)
It depends on your indexes, but I suspect this may get optimised better than having a conditional join clause.
Try something like this:
SELECT *
FROM TABLENAME A
LEFT JOIN TABLENAME B
ON (A.SUB_ID IS NOT NULL AND A.SUB_ID = B.SUB_ID)
OR (A.SUB_ID IS NULL AND A.ID = B.ID)

SQL "All" Functionality?

This is probably a really easy question, it's just very hard to google a word like "All".
SELECT a.Id, Max(b.Number)
FROM Table1 a
JOIN Table2 b
ON a.FK = b.Id
GROUP BY a.Id
But I want to add a where clause that specifies that all b.Id's linked to an a.FK must have values. So basically I don't want to select the a.Id grouping of b.Id's where any of those b.Id's are null. Hope I made that clear, let me know if I need to elaborate. Thanks.
Edit - For some clarification (Changed the query above as well):
Table1
Id, FK
1 1
1 2
2 3
3 4
3 5
3 6
Table 2
Id Number
1 1
2 NULL
3 10
4 20
5 30
6 40
I would want my query to show:
a.Id Max Number
2 10
3 40
(Notice that a.Id = 1 doesn't show up because one of the b.Number fields is null)
select t1.Id, max(Number) as [Max Number]
from Table1 t1
left join Table2 t2 ON t1.FK=t2.Id and t2.Number is not null
group by t1.Id
having count(distinct t1.FK) = count(distinct t2.Id)
Okay, you are asking a totally different question from the one I thought you were. I'm replacing my answer.
The way I would handle this is to join a to b twice -- once to get all matching rows in b, and a second join to search for rows in b where Number is null. If no such row exists, then we know they're all non-null.
SELECT a.Id, Max(b1.Number)
FROM Table1 a
JOIN Table2 b1 ON a.FK = b1.Id
LEFT OUTER JOIN Table2 b2 ON a.FK = b2.Id AND b2.Number IS NULL
WHERE b2.Id IS NULL
GROUP BY a.Id
b2.Id will be null only if no row is found where b2.Number is null.
SELECT a.Id, Max(b.Id)
FROM Table1 a
JOIN Table2 b
ON a.FK = b.Id
WHERE b.Id is NOT NULL
GROUP BY a.Id