Why is LEFT JOIN deleting rows? - sql

I have been using sql for a long time, but I am now working in Databricks and I am getting a very strange result. I have a table called block_durations with a set of ids (called block_ts), and I have another table called mergetable, which I want to left join to that table. Mergetable is indexed by acct_id and block_ts, so it has many different records for each block_ts. I want to keep the rows in block_durations that don't match, and if there are multiple matches in mergetable I want there to be multiple corresponding entries in the resulting join, as you would expect from a left join.
But this is not happening. In order to demonstrate this, I am showing the result of joining mergetable, after filtering for a single acct_id so that there is at most one match per block_ts.
select count(*) from mergetable where acct_id = '0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
16579
select count(*) from block_durations
82817
select count(*) from
(
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN mergetable mt
ON mt.block_ts = bd.block_ts
where acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
) countTable
16579
As you can see, even though there are >80000 records in block_durations, most of them are getting lost in the left join. Why is this happening? I thought the whole point of a left join is that the non-matching rows of the left table are kept. This is exactly the behavior I would expect from an inner join -- and indeed when I switch to an inner join nothing changes.
Could someone please help me figure out what's going on?
-Paul

All rows from left side of the join are preserved, but later on you run WHERE ... condition on that which removed rows not matching the condition.
Merge your WHERE condition into JOIN condition:
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN mergetable mt
ON mt.block_ts = bd.block_ts AND acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
You can also filter mergetable before you run JOIN on the results:
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN (SELECT * FROM mergetable WHERE acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98') mt
ON mt.block_ts = bd.block_ts

Related

Minus operation gives wrong answer

I am trying to get the result of the minus operation of join tables which means that I am finding unmatched records.
I tried:
SELECT count(*) FROM mp_v1 mp
left join cm_v1 sop
on mp.study_name=sop.study_name and
sop.site_id=sop.site_id
--where mp.study_name='1101'
MINUS
SELECT count(*) FROM iv_mpv1 mp
inner join cm sop
on mp.study_name=sop.study_name and
sop.site_id=sop.site_id
--where mp.study_name='1101'
output: the count of this gives me 171183251
but when I run the first query individually I get 171183251 for left outer join and 171070345 for inner join so the output needs to be 112906. I am not sure where my query is wrong. Could anyone please give your opinion.
If you want unmatched records you wouldn't use MINUS on the counts. The query would look more like:
SELECT COUNT(*)
FROM ((SELECT *
FROM mp_v1 mp LEFT JOIN
cm_v1 sop
USING (study_name, site_id)
) MINUS
(SELECT *
FROM iv_mpv1 mp LEFT JOIN
iv_cmv1 sop
USING (study_name, site_id)
)
) x;
Also note that MINUS removes duplicates, so if you have duplicates within each set of tables, then they only count as one row.
The SELECT * assumes that the tables have the same columns and compatible types -- which makes sense given the gist of the question. You may need to list the particular columns you care about.

Multiple joins on the same table, Results Not Returned if Join Field is NULL

SELECT organizations_organization.code as organization,
core_user.email as Created_By,
assinees.email as Assigned_To,
from tickets_ticket
JOIN organizations_organization on tickets_ticket.organization_id = organizations_organization.id
JOIN core_user on tickets_ticket.created_by_id = core_user.id
Left JOIN core_user as assinees on assinees.id = tickets_ticket.currently_assigned_to_id
In the above query, if tickets_ticket.currently_assigned_to_id is null then that that row from tickets_ticket is not returned
> Records In tickets_ticket = 109
> Returned Records = 4 (out of 109 4 row has value for currently_assigned_to_id rest 105 are null )
> Expected Records = 109 (with nulll set for Assigned_To)
Note I am trying to achieve multiple joins on the same table
LEFT JOIN can not kill output records,
your problem is here:
JOIN core_user on tickets_ticket.created_by_id = core_user.id
this join kills non-matching records
try
LEFT JOIN core_user on tickets_ticket.created_by_id = core_user.id
First, this is not the actual code you are running. There is a comma before the from clause that would cause a syntax error. If you have left out a where clause, then that would explain why you are seeing no rows.
When using left joins, conditions on the first table go in the where clause. Conditions on subsequent tables go in the on clause.
That said, a where clause may not be the problem. I would suggest using left joins from the first table onward -- along with table aliases:
select oo.code as organization, cu.email as Created_By, a.email as Assigned_To,
from tickets_ticket tt left join
organizations_organization oo
on tt.organization_id = oo.id left join
core_user cu
on tt.created_by_id = cu.id left join
core_user a
on a.id = tt.currently_assigned_to_id ;
I suspect that you have data in your data model that is unexpected -- perhaps bad organizations, perhaps bad created_by_id. Keep all the tickets to see what is missing.
That said, you should probably be including something like tt.id in the result set to identify the specific ticket.

Number of rows reduced after join

I have a query that gives me 759 rows:
Select
buildingID
,buildingAddress
,building_zip
From
BuildingTable
However, when I join the table to get a column from another table, the number of rows is reduced to 707
Select
buildingID
,buildingaddress
,building_zip
,b.surveyCost
From
BuildingTable as A
Inner Join SurveyTable as B
On a.buildingAddress = b.address
What is the best way to test which rows I lost and why? and how do I prevent this from happening? I was thinking that maybe some of the buildings don't have survey costs and therefore it was only showing me the ones that have costs, but I see some null values there so that was not the issue, I think.
If you need extra information let me know. Thanks
To find the rows you have lost, just replace the inner join with a left join and look for missing rows:
Select bt.*
From BuildingTable bt left join
SurveyTable st
on bt.buildingAddress = st.address
where st.address is null;
Note that rows can also be duplicated in both tables, so you might have more missing rows than you expect.
You can find the row which rows are lost usig a left join between original query and resulting query where the join is nulll
Select
buildingID
,buildingAddress
,building_zip
From
BuildingTable t1
left join (
Select
buildingID
,buildingaddress
,building_zip
,b.surveyCost
From
BuildingTable as A
Inner Join SurveyTable as B
On a.buildingAddress = b.address
) t2 on t1.buildingID = t2.buildingID
where t2.buildingID is null
but why .. depend on you knowledge of the data

JOIN Tables with all specified fields matching

I have two tables. I'm selecting all the values in the first table and trying to get the associated rows in the second table which match BOTH of the specified fields.
So in this example, I want only the rows in the CarsTable and the associated columns in the TrucksTable in which BOTH the Tires and Windows values match (if just one value matches, I don't want it). I'm not even certain a join is the correct operation. Any ideas?
SELECT * FROM CarsTable, TrucksTable
LEFT JOIN TrucksTable t1
ON
t1.Tires = Cars.Tires
LEFT JOIN Trucks t2
ON
t2.Windows = Cars.Windows
You need to do some research on joins...your from clause is all over the place. Carstable,truckstable as you have here is a cross join..and I don't understand what the left joins you are doing are. I think you only have two tables:
SELECT c.*, t.* FROM CarsTable c
inner join TrucksTable t
on t.tires = c.tires and t.windows = c.windows
inner join will function as a filter...if the record in carstable finds no matches in the truckstable by the keys listed, it won't appear. If you want all cartable rows even those that find no match, use left join (in this case, t.* will be null)

When I add a LEFT OUTER JOIN, the query returns only a few rows

The original query returns 160k rows. When I add the LEFT OUTER JOIN:
LEFT OUTER JOIN Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
the query returns only 150 rows. I'm not sure what I'm doing wrong.
All I need to do is add a column to the query, which will bring back a code from a different table. The code could be a number or a NULL. I still have to display NULL, hence the reason for the LEFT join. They should join on the "id" columns.
SELECT <lots of stuff> + the new column that I need (called "code").
FROM
dbo.Table_A A WITH (NOLOCK)
INNER JOIN
dbo.Table_B B WITH (NOLOCK) ON A.Id = B.Id AND A.version = B.version
--this is where I added the LEFT OUTER JOIN. with it, the query returns 150 rows, without it, 160k rows.
LEFT OUTER JOIN
Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
LEFT OUTER JOIN
Table_E E WITH (NOLOCK) ON A.agent = E.agent
LEFT OUTER JOIN
Table_D D WITH (NOLOCK) ON E.location = D.location
AND E.type = 'Organization'
AND D.af_type = 'agent_location'
INNER JOIN
(SELECT X , MAX(Version) AS MaxVersion
FROM LocalTable WITH (NOLOCK)
GROUP BY agemt) P ON E.agent = P.location AND E.Version = P.MaxVersion
Does anyone have any idea what could be causing the issue?
When you perform a LEFT OUTER JOIN between tables A and E, you are maintaining your original set of data from A. That is to say, there is no data, or lack of data, in table E that can reduce the number of rows in your query.
However, when you then perform an INNER JOIN between E and P at the bottom, you are indeed opening yourself up to the possibility of reducing the number of rows returned. This will treat your subsequent LEFT OUTER JOINs like INNER JOINs.
Now, without your exact schema and a set of data to test against, this may or may not be the exact issue you are experiencing. Still, as a general rule, always put your INNER JOINs before your OUTER JOINs. It can make writing queries like this much, much easier. Your most restrictive joins come first, and then you won't have to worry about breaking any of your outer joins later on.
As a quick fix, try changing your last join to P to a LEFT OUTER JOIN, just to see if the Z join works.
You have to be very careful once you start with LEFT JOINs.
Let's suppose this model: You have tables Products, Orders and Customers. Not all products necessarily have been ordered, but every order must have customer entered.
Task: Show all products, and if the product was ordered, list the ordering customers; i.e., product without orders will be shown as one row, product with 10 orders will have 10 rows in the resultset. This calls for a query designed around FROM Products LEFT JOIN Orders.
Now someone could think "OK, Customer is always entered into orders, so I can make inner join from orders to customers". Wrong. Since the table Customers is joined through left-joined table Orders, it has to be left-joined itself... otherwise the inner join will propagate into the previous level(s) and as a result, you will lose all products that have no orders.
That is, once you join any table using LEFT JOIN, any subsequent tables that are joined through this table, need to keep LEFT JOINs. But it does not mean that once you use LEFT JOIN, all joins have to be of that type... only those that are dependent on the first performed LEFT JOIN. It would be perfectly fine to INNER JOIN the table Products with another table Category for example, if you only want to see Products which have a category set.
(Answer is based on this answer: http://www.sqlservercentral.com/Forums/Topic247971-8-1.aspx -> last entry)