LEFT OUTER JOINs not acting as expected - sql

QUERY #1
SELECT
dbo.CLIENT.CLIENT_ID, dbo.CLIENT.GOC, dbo.SALES_UW_REGION.SALES_UNDERWRITING
FROM dbo.CLIENT LEFT OUTER JOIN
dbo.SALES_UW_REGION ON dbo.CLIENT.GOC = dbo.SALES_UW_REGION.GOC
WHERE (dbo.CLIENT.CLIENT_ID = 23721)
CLIENT_ID, GOC, SALES_UNDERWRITING
23721 332 Underwriting
23721 332 Sales
I can understand why this would return only one row, the reason being that despite the LEFT outer join which ensures both CLIENT records are returned even if they are unmatched, the FILTER is applied AFTER the join, so the resultset only has one row.
Query #2
SELECT
dbo.CLIENT.CLIENT_ID, dbo.CLIENT.GOC, dbo.SALES_UW_REGION.SALES_UNDERWRITING
FROM dbo.CLIENT LEFT OUTER JOIN
dbo.SALES_UW_REGION ON dbo.CLIENT.GOC = dbo.SALES_UW_REGION.GOC
WHERE (dbo.CLIENT.CLIENT_ID = 23721)
and SALES_UW_REGION.SALES_UNDERWRITING = 'Sales '
CLIENT_ID GOC SALES_UNDERWRITING
23721 332 Sales
However, if I move the 'SALES' filter to the JOIN clause, I was susprised to see that still one row is returned.
Query #3
SELECT
dbo.CLIENT.CLIENT_ID, dbo.CLIENT.GOC, dbo.SALES_UW_REGION.SALES_UNDERWRITING
FROM dbo.CLIENT LEFT OUTER JOIN
dbo.SALES_UW_REGION ON dbo.CLIENT.GOC = dbo.SALES_UW_REGION.GOC
and SALES_UW_REGION.SALES_UNDERWRITING = 'Sales '
WHERE (dbo.CLIENT.CLIENT_ID = 23721)
CLIENT_ID GOC SALES_UNDERWRITING
23721 332 Sales
I expected that since it was part of the JOIN clause and the join was a LEFT OUTER, that I would have gotten 2 rows. In general, if the JOIN condition which involves an equality match on two columns, one from the LEFT table and one from teh RIGHT table is not met, in a LEFT OUTER JOIN the LEFT table would be returned with NULL values for values selected from the RIGHT. Why should it be any different if we are matching a value from the RIGHT table to a literal? Shouldn't the row from teh LEFT table still be returned?
Man, I thought this was basic stuff that I had down....
Here's what you need to recreate what I did:
CREATE TABLE [dbo].[CLIENT](
[CLIENT_ID] [bigint] NOT NULL,
[GOC] [char](3) NULL
)
go
CREATE TABLE [dbo].[SALES_UW_REGION](
[GOC] [char](3) NOT NULL,
[SALES_UNDERWRITING] [varchar](12) NULL
)
go
INSERT INTO [dbo].[CLIENT]([CLIENT_ID], [GOC])
SELECT 23721, N'332'
go
INSERT INTO [dbo].[SALES_UW_REGION]([GOC], [SALES_UNDERWRITING])
SELECT N'332', N'Underwriting' UNION ALL
SELECT N'332', N'Sales'
go

You have one row in your CLIENT table for CLIENT_ID = 23721.
You're left joining the other table which has multiple rows for CLIENT_ID = 23721 (when joined with criteria in your first query), but when the joining criteria is altered it excludes one of those rows from the RIGHT table.
You're getting all records from your LEFT table regardless of whether they join to records in the RIGHT table, just as expected.

I think your confusion involves a misunderstanding of the first result set:
CLIENT_ID, GOC, SALES_UNDERWRITING
23721 332 Underwriting
23721 332 Sales
Though there are two rows in this result set, the results represent data from only one row in the Client table. The join condition allowed the single row from the Client table to match two rows in the Sales_UW_Region table, and so the data for that row from the Client table is duplicated in the result set. There is only one Client record here in the first place, evidenced by the fact that there is only one Client_ID, but the data for the record is shown twice: once for each matching record in Sales_UW_Region.
Later, when you include the and SALES_UW_REGION.SALES_UNDERWRITING = 'Sales ' condition as part of the join's ON clause, the original single record in the Client table only matches one record from the Sales_UW_Region table. The data for the row no longer needs to be duplicated, and so only one row is returned.

Your understanding is correct. If the filter is within the JOIN's ON clause, it will not remove rows from the first table.
I think you need to look elsewhere for your problem... Do you really mean to have a trailing space in your 'Sales ' constant?
Here's a JSFiddle to test this: http://sqlfiddle.com/#!2/bfe32/3/0

Related

Query with Left outer join and group by returning duplicates

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:
select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'
When I run this next query on the same table I obtain one of the rows, which is what I need:
SELECT MAX(ID_SAP_BAYER__c) FROM SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c
Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).
What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.
The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).
This is an example query that returns duplicates:
SELECT
cab.CUIT AS CUIT,
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,
sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,
cab.NroCliente as ClienteId
FROM PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK)
LEFT OUTER JOIN
SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON
cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID_SAP_BAYER__c in
( SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
)
WHERE cab.NroPedido ='5320'
Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.
What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.
Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.
Your IN clause is not actually doing anything, since...
SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)
You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.
You could probably do something like:
...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID in
(
SELECT MAX(ID)
FROM SalesForce_INT_Account__c sales2
WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
)
WHERE cab.NroPedido ='5320'
By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.
There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.
An alternative might be to use OUTER APPLY with TOP 1. Something like:
SELECT
...
FROM PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
SELECT TOP 1 *
FROM SalesForce_INT_Account__c s1
WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'
Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

Explanation of code for right excluding join?

I just found a great page with Venn diagrams of different joins and the code for executing them:
http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
I used the "Right Excluding Join" in my query, the Venn diagram looks like this:
and here is the code:
SELECT subjects.subject
FROM sold_subjects
RIGHT JOIN subjects
ON sold_subjects.subject = subjects.subject
WHERE sold_subjects.subject IS NULL
I am asking for an explanation of what this code actually does, particularly what happens in the last row. I understand that we are joining the two relations where they have the same subject, but what happens when we set subjects for one of the relations to NULL in the last row?
First, what do JOIN and RIGHT JOIN do?
The JOIN gets information from two tables and joins them according to rules you specify in the ON or WHERE clauses.
The JOIN modifiers, such as LEFT, INNER, OUTER and RIGHT control the behavior you JOIN will have in case of unmatched records -- when no record in A matches a record in B according to the specified rules, and vice-versa.
To understand this part, take table A as being the left table and table B as being the right one. When you have multiple joins, the right table in each join is the one whose name is immediately right of the JOIN command.
e.g. FROM a1 LEFT JOIN ... LEFT JOIN b
The b table is the right one and whatever comes before is the left one.
This is a summary of the modifiers' behavior:
LEFT: preserves unmatched records in the left table, discards those in the right table;
RIGHT: preserves unmatched records in the right table, discards those in the left table;
INNER: preserves only the records that are matched, discards unmatched from both tables;
OUTER or FULL: preserves all records, regardless of matches.
What is visually happening?
Imagine you have two simple tables with the same names of the ones you put in there.
sold_subjects subjects
subject subject
1 1
2 4
3 5
4 6
When you RIGHT JOIN two tables, you create a third one that looks like this:
joined_table
sold_subjects.subject subjects.subject
1 1
4 4
NULL 5
NULL 6
Please note that the subjects 2 and 3 are already gone in this subset.
When you add a WHERE clause with sold_subjects.subject IS NULL, you are only keeping the last two lines where there was no match in subjects.
The right join makes sure that you will keep all the records of the right table. If there is no match with the left table, then all the variables in the result originating from the left table will be null (because there is no match).
The where clause checks whether the value of lefttable.subject is null or not. If it's not null, then obviously the join succeeded. If it is null, then the join did not work, leaving this value blank. So this where clause will, per definition, return all the records of the right table that have no match in the left table, which is exactly what the venn diagram says!
This is a very common practice in SQL, there are may use cases. For example: left table is sales, right table is customers, and you want to know all the customers without sales.
RIGHT JOIN is shorthand for RIGHT OUTER JOIN.
Consider the excellent explanation in the fine manual:
LEFT OUTER JOIN returns all rows in the qualified Cartesian product
(i.e., all combined rows that pass its join condition), plus one copy
of each row in the left-hand table for which there was no right-hand
row that passed the join condition. This left-hand row is extended to
the full width of the joined table by inserting null values for the
right-hand columns. Note that only the JOIN clause's own condition is
considered while deciding which rows have matches. Outer conditions
are applied afterwards.
Conversely, RIGHT OUTER JOIN returns all the joined rows, plus one row
for each unmatched right-hand row (extended with nulls on the left).
This is just a notational convenience, since you could convert it to a
LEFT OUTER JOIN by switching the left and right tables.
Bold emphasis mine. Your query is just one way to exclude rows that are not present in another table, with a shiny buzz word attached ("Right Excluding JOIN"). There are others:
Select rows which are not present in other table
Now, for the tricky part - or where you deviate from the original:
But what happens when we set subjects for one of the relations to NULL in the last row?
Your query has:
WHERE sold_subjects.subject IS NULL
Where the original says:
WHERE A.Key IS NULL
Key is supposed to imply NOT NULL. The query simply does not work if either of the underlying table columns sold_subjects.subject or subjects.subject can be NULL. There would be no way to disambiguate how the row qualified:
subjects.subject IS NULL and no row with NULL in sold_subjects.subject
subjects.subject IS NULL and some row with NULL in sold_subjects.subject
subjects.subject IS NOT NULL but no matching row in sold_subjects
If one of the linking columns can be NULL, and you want to treat NULL values like they were actual values (which they are not), i.e. match NULL to NULL, you could substitute with an anti-join using the NULL-safe operator IS NOT DISTINCT FROM:
SELECT s.subject
FROM subjects s
LEFT JOIN sold_subjects ss ON ss.subject IS NOT DISTINCT FROM s.subject
WHERE ss.subject IS NULL;
Also with shorter syntax, using the more commonly used LEFT JOIN, but otherwise identical. IS NOT DISTINCT FROM is often slower than a simple =, only use it where you need it. Typically, you join tables on key columns that are defined NOT NULL - implicitly (a PK column is NOT NULL automatically) or explicitly.

When joining 2 tables one table comes up null

I am joining 2 tables on the first table I get all the relevant data on the second table I only get nulls. There are no nulls in either table Can any one tell me why this is happening?
select * from apmast
left join apitem
on apmast.fvendno + apmast.fccompany = apitem.fcinvkey
There is a problem with your ON that's resulting in you not getting matching records. A LEFT JOIN means that you should get all data from the left table and only the matching records from the right table, or else NULL where there are no matching records. The key to the join, however, is the ON statement. Make sure that apmast.fvendno + apmast.fccompany is actually equal to apitem.fcinvkey.
here is a explanation on the types of joins just incase you get stuck in the future.
INNER JOIN this will get only the rows that match in both the FROM clause and the JOINING table.
LEFT OUTER JOIN this gets all the rows from the table specified in the FROM clause and only the rows that match in the JOINING table.
RIGHT OUTER JOIN this gets all the rows from the table specified in the JOIN clause and only the rows that match in the FROM clause.
FULL OUTER JOIN this will get all the rows from both tables.
SELF JOIN this is used when you need to join the table back to its self to return data.

Inner join between two tables with same count values

I have been working on this issue since 2 days now.
I have two tables created by using SQL Select statements
SELECT (
) Target
INNER JOIN
SELECT (
) Source
ON Join condition 1
AND Join condition 2
AND Join condition 3
AND Join condition 4
AND Join condition 5
The target table has count value of 10,000 records.
The source table has count value of 10,000 records.
but when I do an inner join between the two tables on the 5 join conditions
I get 9573 records.
I am basically trying to find a one to one match between source and target table. I feel every field from target matches every field in source.
Questions:
Why does my inner join give less records even if there are same value of records in both tables?
If it is expected, how can I make sure I get the exact 10,000 records after the join condition?
1) An INNER JOIN only outputs the rows from the JOINING of two tables where their joining columns match. So in your case, Join Condition1 may not exist in rows in both tables and therefore some rows are filtering out.
2) As the other poster mentioned a left join is one way. You need to look which table source or target you want to use as your master i.e. start from and return all those rows. You then left join the remaining table based on your conditions to add all the columns where you join conditions match.
It's probably better if you give us the tables you are working on and the query\results you are trying to achieve.
There's some really good articles about the different joins out there. But it looks like you'd be interested in left joins. So if it exists in Target, but not in Source, it will not drop the record.
So, it would be:
SELECT(...) Target
LEFT OUTER JOIN
SELECT(...) Source
ON cond1 and cond2 and cond3 and cond4 and cond5
Give that a shot and let me know how it goes!
Sometime you need to rely on logical analysis rather than feelings. Use this query to find the fields that do not match and then work out your next steps
SELECT
Target.Col1,Source.Col1,
Target.Col2,Source.Col2,
Target.Col3,Source.Col3
FROM
(
) Target
FULL OUTER JOIN
(
) Source
ON Target.Col1=Source.Col1
AND Target.Col2=Source.Col2
AND Target.Col3=Source.Col3
WHERE (
Target.Col1 IS NULL
OR Source.Col1 IS NULL
OR Target.Col2 IS NULL
OR Source.Col2 IS NULL
OR Target.Col3 IS NULL
OR Source.Col3 IS NULL
)

Contradiction Between Multiple Left Joins

I am trying to understand the following query which is automatically produced by some software library:
SELECT DISTINCT `t`.* FROM `teacher` AS `t`
LEFT JOIN `rel` AS `rel_profile`
ON `rel_profile`.`field_id` = 2319 AND `rel_profile`.`item_id` = `t`.`id`
LEFT JOIN `teacher_info` AS `profile`
ON `profile`.`id` = `rel_profile`.`related_item_id`
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320 AND `rel_profile_city`.`item_id` = `profile`.`id` WHERE `rel_profile_city`.`item_id` = 1
There are three left joins. I understand the first and second one. What I don't understand is the third left join:
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320 AND `rel_profile_city`.`item_id` = `profile`.`id` WHERE `rel_profile_city`.`item_id` = 1
The table rel has already been used in the first left join:
LEFT JOIN `rel` AS `rel_profile`
ON `rel_profile`.`field_id` = 2319
Now, the same table is left joined again but this time the value of the joined field is different:
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320
How come that these two joins do not contradict?
The query is using aliases:
`rel` AS `rel_profile`
Says to pretend that the table rel is actually a table called rel_profile. That alias is then used throughout the rest of the query. I'm not sure of MySQL, but on some other database systems, it's an error to refer to the table as rel from then onwards(*) (unless there's another join that re-introduces the table and doesn't provide an alias).
And joining to the same table multiple times is allowed - provided that the names (or aliases) are unique. This is useful when you're trying to construct a result that relies on the content of multiple rows from the same table, where the result should occupy a single row.
(*) "Then onwards" being in the order in which the clauses are processed, not the text order. E.g. you should use the alias in the SELECT clause because, even though it occurs earlier textually, it's (conceptually) processed after the FROM clause.
This query will show teacher rows that have associated rows in rel with field_id = 2319 OR field_id = 2320
The are not "contradicting" each other. Imagine you have a table of users, wich have the demographic and personal data of your users. And another table with the "relation" between users. So, in this "relations" table, you have columns UserId1 and UserId2. If you want a query that returns the data of those two users, you'll need to do two JOINS with the table Users, once per each User column. This doesn't mean that they are contradicting each other.