SQL/Presto: right join bigger than original table due to NULL

SQL/Presto: right join bigger than original table due to NULL - sql

I need to right join 2 tables with 3 conditions but the resulting table is bigger than left or right table.
left_table a is like the following:
capacity value group_id level_id tags
100 3 a ab
120 5 a afb lala
122 4 b afg hhh
122 6 c adfg
right table b is like the following: bigger than left table
user group_id level_id tags
adsf a ab
af a abf df
sf a afb lala
dsf b afg hhh
sdf c adfg
I want to append the value and capacity value to the right table b. I have used the following query but the resulting table is larger than the right table. I noticed that it is due to the NULL in tags in both the right and left tables, but i am wondering how to resolve this issue.
select a.capacity, a.value, b.*
from a
right join b
on a.group_id = b._group_id
and a.level_id = b.level_id
and a.tags = b.tags

I noticed that it is due to the NULL in tags in both the right and left tables
No, this is not the cause of duplicates. In fact NULL values fail the comparison, so you will not get a match at all if either value is NULL. That is, the row in b will be returned with NULL values for the columns from a.
If you want NULL values to match as being equal, then you need a NULL-safe comparison -- and Presto supports the SQL Standard is not distinct from. I also strongly prefer left join over right join:
select a.capacity, a.value, b.*
from b left join
a
on a.group_id = b._group_id and
a.level_id = b.level_id and
a.tags is not distinct from b.tags;
If you are getting duplicates, it is because you have duplicates in a. You can check for this using:
select group_id, level_id, tags, count(*)
from a
group by group_id, level_id, tags
having count(*) >= 2;

Related

JOIN query, SQL Server is dropping some rows of my first table

I have two tables customer_details and address_details. I want to display customer details with their corresponding address, so I was using a LEFT JOIN, but when I'm executing this query, SQL Server drops rows where street_no of customer_details table doesn't match with the street_no in address_detials table and displays only rows where `street_no' of customer_detials = street_no of address_details table. I need to display a complete customer_details table and in case if street_no doesn't matches it should display empty string or anything. Am I doing anything wrong in my SQL join?
Table customer_details:
case_id customer_name mob_no street_no
-------------------------------------------------
1 John 242342343 4324234234234
1 Rohan 343233333 43332
1 Ankit 234234233 2342332423433
1 Suresh 234234324 2342342342342
1 Ranjeet 343424323 32233
1 Ramu 234234333 2342342342343
Table address_details:
s_no streen_no address city case_id
------------------------------------------------------
1 4324234234234 Roni road Delhi 1
2 2342332423433 Natan street Lucknow 1
3 2342342342342 Koliko road Herdoi 1
SQL JOIN query:
select
a.*, b.address
from
customer_details a
left join
address_details b on a.street_no = b.street_no
where
b.case_id = 1

Now that it became clear that you used b.case_id=1, I will explain why it filters:
The LEFT JOIN itself returns some rows that contain all NULL values for table b in the result set, which is what you want and expect.
But by using WHERE b.case_id=1, the rows containing NULL values for table b are filtered out because none of them matches the condition (all those rows have b.case_id=NULL so they don't match).
It might work to instead use WHERE a.case_id=1, but we don't know if a.case_id and b.case_id are always the same value for matching rows (they might not be; and if they are always the same, then we just identified a potential redundancy).
There are two ways to fix this for sure.
(1) Move b.case_id = 1 into the left join condition:
left join address_details b on a.street_no = b.street_no and b.case_id = 1
(2) Keep b.case_id = 1 in the WHERE but also allow for NULLED-out b values:
left join address_details b on a.street_no = b.street_no
where b.case_id = 1
or b.street_no IS NULL
Personally I'd go for (1) because that is the most clear way to express that you want to filter b on two conditions, without affecting the rows of a that are being returned.

I do think that Wilhelm Poggenpohl answer is kind of right. You just need to change the last join condition a.case_id=1 to b.case_id=1
select a.* , b.address
from customer_details a
left join address_details b on a.street_no=b.street_no
and b.case_id=1
This query will show every row from customer_details and the corresponding adress if there is a match of street_no and the adress meets the condition case_id=1.

This is because of the where clause. Try this:
select a.* , b.address
from customer_details a
left join address_details b on a.street_no=b.street_no
and a.case_id=1

pad database out with NULL criteria

If I have the following sample table (order by ID)
ID Date Type
-- ---- ----
1 01/01/2000 A
2 22/04/1995 A
2 14/02/2001 B
Where you can immediate see that ID=1 does not have a Type=B, but ID=2 does. What I want to do, if fill in a line to show this:
ID Date Type
-- ---- ----
1 01/01/2000 A
1 NULL B
2 22/04/1995 A
2 14/02/2001 B
where there could potentially be 100's of different types, (so may need to end up inserting 100's rows per person if they lack 100's Types!)
Is there a general solution to do this?
Could I possibly outer join the table on itself and do it that way?

You can do this with a cross join to generate all the rows and a left join to get the actual data values:
select i.id, s.date, t.type
from (select distinct id from sample) i cross join
(select distinct type from sample) t left join
sample s
on s.id = i.id and
s.type = t.type;

In my sql script LEFT JOIN is giving output like CROSS JOIN?

I have two table like following
DailyData
Date Id CompanyName CompanyPrice CompanyId
21-12-2011 123 ABC corp 120 535
25-12-2011 352 Z Edge 101 444
25-12-2011 352 Z Edge 100 444
primary key is `date` and `Id`
ReportData
RId Date CompanyName TodayPrice CompanyId
1 25-12-2011 Z Edge 230 444
primary key is only `RId`
Now I have used following LEFT JOIN on both above table like :
Select a.date,a.companyname,a.CompanyPrice,b.TodayPrice
from DailyData a LEFT JOIN ReportData b ON
a.companyid= b.companyid where a.Date = '25-12-2011'
But instead of two records it is giving more than two records (same records multiple times)
Why is it so ?
Please help me to correct my sql query.
expected output for above data should be:
date companyname companyprice todaysprice
25-12-2011 Z Edge 101 230
25-12-2011 Z Edge 100 230

You current query is missing a JOIN on the actual columns, as a result you are getting a CROSS JOIN result of all the rows that meet the date condition. You will want to use:
Select a.date,a.companyname,a.CompanyPrice,b.TodayPrice
from DailyData a
LEFT JOIN ReportData b
ON a.CompanyId= b.CompanyId
WHERE a.Date = '25-12-2011';
See SQL Fiddle with Demo

Your Join condition: [ ON a.Date = '25-12-2011' ] does not establish any condition on table b, therefore, every row in table b is joined to each row in table a with that specified date.
From looking at the two tables it is not obvious whether the they should be joined on date or on CompanyID.

I believe you need something like
Select a.date,a.companyname,a.CompanyPrice,b.TodayPrice
from DailyData a
LEFT JOIN ReportData b ON
(b.CompanyId = a.CompanyId )
WHERE a.Date = '25-12-2011'

sql fiddle
no LEFT and no WHERE clause
Select a.date,
a.companyname,a.CompanyPrice,b.TodayPrice
from DailyData a
JOIN ReportData b
ON a.CompanyId= b.CompanyId

SQL query construction: checking if query result is subset of another

Hi Guys I have a table relation which works like this (legacy)
A has many B and B has many C;
A has many C as well
Now I am having trouble coming up with a SQL which will help me to get all B (Id of B to make it simple) mapped to certain A(by Id) AND any B which has a collection of C that's a subset of Cs of that A.
I have failed to come up with a decent sql specially for the second part and was wondering if I can get any tips / suggestions re how I can do that.
Thanks
EDIT:
Table A
Id |..
------------
1 |..
Table B
Id |..
--------------
2 |..
Table A_B_rel
A_id | B_id
-----------------
1 | 2
C is a strange table. The data of C (single column) is actually just duped in 2 rel table for A and B. so its like this
Table B_C_Table
B_Id| C_Value
-----------------
2 | 'Somevalue'
Table A_C_Table
A_Id| C_Value
-------------
1 | 'SomeValue'
So I am looking for Bs the C_Values of which are subset of certain A_C_Values.

Yes, the second part of your problem is a bit tricky. We've got B_C_Table on the one hand, and a subset of A_C_Table where A_ID is a specific ID, on the other.
Now, if we use an outer join, we'll be able to see which rows in B_C_Table have no match in A_C_Table:
SELECT *
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
Note that it is important to put the ac.A_ID = #A_ID into the ON clause rather than into WHERE, because in the latter case we would be filtering out non-matching rows of #A_ID, which is not what we want.
The next step (to achieving the final query) would be to group rows by B and count rows. Now, we will calculate both the total number of rows and the number of matching rows.
SELECT
bc.B_ID,
COUNT(*) AS TotalCount,
COUNT(ac.A_ID) AS MatchCount
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
As you can see, to count matches, we simply count ac.A_ID values: in case of no match the corresponding column will be NULL and thus not counted. And if indeed some rows in B_C_Table do not match any rows in the subset of A_C_Table, we will see different values of TotalCount and MatchCount.
And that logically leads us towards the final step: comparing those counts. (For, obviously, if we can obtain values, we can also compare them.) But not in the WHERE clause, of course, because aggregate functions aren't allowed in WHERE. It's the HAVING clause that is used to compare values of grouped rows, including aggregated values too. So...
SELECT
bc.B_ID,
COUNT(*) AS TotalCount,
COUNT(ac.A_ID) AS MatchCount
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
HAVING COUNT(*) = COUNT(ac.A_ID)
The count values aren't really needed, of course, and when you drop them you will be able to UNION the above query with the one selecting B_ID from A_B_rel:
SELECT B_ID
FROM A_B_rel
WHERE A_ID = #A_ID
UNION
SELECT bc.B_ID
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
HAVING COUNT(*) = COUNT(ac.A_ID)

Sounds like you need to think in terms of double negation, i.e. there should not exist any B_C that does not have a matching A_C (and I'm guessing there should be at least one B_C).
So, try something like
select B.B_id
from Table_B B
where exists (select 1 from B_C_Table BC
where BC.B_id = B.B_id)
and not exists (select 1 from B_C_Table BC
where BC.B_id = B.B_id
and not exists(select 1 from B_C_Table AC
join A_B_Rel ABR on AC.A_id = ABR.A_id
where ABR.B_id = B.B_id
and BC.C_Value = AC.C_Value))

Perhaps this is what you're looking for:
SELECT B_id
FROM A_B_rel
WHERE A_id = <A ID>
UNION
SELECT a.B_Id
FROM B_C_Table a
LEFT JOIN A_C_Table b ON a.C_Value = b.C_Value AND b.A_Id = <A ID>
GROUP BY a.B_Id
HAVING COUNT(CASE WHEN b.A_Id IS NULL THEN 1 END) = 0
The first SELECT gets all B's which are mapped to a particular A (<A ID> being the input parameter for the A ID), then we tack onto that result set any additional B's whose entire set of C_Value's are within the subset of the C_Value's of the particular A (again, <A ID> being the input parameter).

SQL DISTINCT INNER JOIN and LEFT JOIN and .NET C# PrimaryKey

I have 3 tables A, B, and C. I want to get the subId and text for each id. I also want to know IF the id has some eId linked to it.
I've used INNER JOIN on A and B and then LEFT JOIN'd that result with tabel C. My SQL string so far is:
SELECT DISTINCT A.id,A.subId, B.text, C.eId
FROM A
INNER JOIN B ON A.id=B.id
LEFT JOIN C ON A.id=C.id
WHERE B.text='something'
The problem is that C.eId has multiple entries for each id. So I'm getting output like this:
=================================
id | subId | text | eId
1 e12 etc
2 e12 etc
2 t23 etc p1111
3 e12 etc
4 e12 etc p1234
4 e12 etc p4325
I want to remove the lines like the last one ("4 e12 etc p4325") because I already know that 4, e12 has some other eId linked to it. I need id and subId to be PrimaryKeys.
How do I do this? DISTINCT worked until I added multiple id's to an eId.
Edit: I use MSSQL if that makes a difference.

I don't recall if it works this way, try it and tell me:
SELECT DISTINCT A.id,A.subId, B.text,
(select top 1 C.eId from C where C.id = A.id) AS eId
FROM A
INNER JOIN B ON A.id=B.id
WHERE B.text='etc'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL/Presto: right join bigger than original table due to NULL - sql

Related

JOIN query, SQL Server is dropping some rows of my first table

pad database out with NULL criteria

In my sql script LEFT JOIN is giving output like CROSS JOIN?

SQL query construction: checking if query result is subset of another

SQL DISTINCT INNER JOIN and LEFT JOIN and .NET C# PrimaryKey

Categories

Resources