SQL One-To-Many join issue - sql

Let's say I have two tables in Access. TableLetters and TableNumbers. TableLetters has one column TheLetter and 4 records, A, B, C, & D. TheNumbers is many for one TableLetters record. Say we have two columns in TheNumbersTable [TheLetter][TheNumber]. See below:
TheLetters
[TheLetter]
A
B
C
D
TheNumbers
[TheLetter][TheNumber]
A 1
A 2
A 3
B 1
B 2
How do I write a query that returns one record for each "TheLetters" record and the MAX "TheNumber" from TheNumbers table or blank if there's no match for TheLetter in TheNumbers table? So I want my result set to be:
[TheLetters.TheLetter][TheNumbers.TheNumber]
A 3
B 2
C <NULL>
D <NULL>
I can get A,3 - B,2 but it cuts out C & D because there's not a match in TheNumbers. I've tried switching my joins all around. I've tried putting an IF in the WHERE clause that says if we have a match return the record from TheNumbers or else give me blank. I can't seem to get the syntax right. Thanks for any help!

The key is to use a LEFT JOIN:
SELECT l.TheLetter, MAX(n.TheNumber)
FROM TheLetters l
LEFT JOIN TheNumbers n ON l.TheLetter = n.TheLetter
GROUP BY l.TheLetter
A left outer join returns all rows in the left table, returning data for any correlated rows in the right table, or a single row with the right table's columns set to NULL if there are no correlated rows.

Left Join should correct as below
SELECT l.TheLetter, MAX(n.TheNumber)
FROM TheLetters l
LEFT JOIN TheNumbers n ON l.TheLetter = n.TheLetter
GROUP BY l.TheLetter

Related

why is my sql inner join return much more data than table 1?

I need to join three tables to get all the info I need. Table a has 70 million rows, after joining a with b, I got 40 million data. But after I join table c, which has only 1.7 million rows, it becomes 300 million rows.
In table c, there are more than one same pt_id and fi_id, one pt_id can connect to many different fi_id, but one fi_id only connects to one same pt_id.
I'm wondering if there is any way to get rid of the duplicate rows, cause I join table c only to get the pt_id.
Thanks for any help!
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
You can use GROUP BY
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
group by c.pt_id,b.fi_id,a.zq_id
to remove all duplicate row as question below:
How do I (or can I) SELECT DISTINCT on multiple columns?

can anyone please explain this left join output? I am new to this

I have 2 tables with just one column.
p1 (id - 1,1,null)
p2(id - 1,1,null, null)
when I am doing left join
(select * from p1 left join p2 on p1.id = p2.id)
I am getting 5 rows
(1,1,1,1.null)
I was expecting to get 4 rows.
Please explain why.
Reason is that 4 rows are for ID = 1 i.e. each ID = 1 in P1 will join with two rows in P2 with value 1 and like wise second 1 in P1 will again join with 2 rows in P2.
5th row is for Null.
SQL left join is basically a join which will match the index on the left table with the right table plus all the non matched records of the left table.
Now left join is quite simple when you don't have duplicate rows
As can be seen from the image, there are 5 records each in the left and right table and 3 of them match. So the SQL left join will return all the matched records once plus all the unmatched records and our final answer will be
Ronaldo,Messi,Zidane,Terry and Pogba
Now what happens if there is a duplicate record in the right table like that in your case
.
Now, since there are 2 records each for Messi and Zidane, the left join will return two records of Messi and Zidane and hence the final output will be
Ronaldo,Messi,Messi,Zidane,Zidane,Terry and Pogba
Hope, I'm able to help in clarifying your doubt

SQL - Searching a table using another table's column as criteria

I have table B with bcust(4-digit integer) and bdate(date) columns. I also have table C with ccust(4-digit integer) and cdate(date). I want to show the records from table c where cdate occurred after bdate.
I guess, maybe you're looking for this?
SELECT c.*
FROM c
INNER JOIN b
ON b.bcust = c.ccust
AND b.bdate < c.cdate;
I assumed, that the records are linked via the bcust and ccust columns.
Although you did not mention anything on how the records in both tables are related, I guess that records are related if bcust = ccust.
Then something like this should do what you want:
SELECT c.*
FROM tableC c
INNER JOIN tableB b ON c.ccust = b.bcust
WHERE c.cdate > b.bdate

Additional inner join modifying results of previous calculations

I am having issues with using the count() function in an sql plus query.. say if
SELECT B.ID COUNT(S.BRANCH_ID) FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
GROUP BY B.ID;
from doing this I'll get the results
b.id count
1 6
2 6
3 6
4 7
5 6
which is fine.. However if I even add an extra inner join i'll get completely different and wrong results.. So if I put for example..
SELECT COUNT(S.BRANCH_ID) FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN TOOL_STOCK TS ON TS.BRANCH_ID = B.ID
GROUP BY B.ID;
Now the results I get will be...
b.id count
1 96
2 96
3 96
4 112
5 96
Why is this and how do I stop it? Cheers!
Try
SELECT B.ID, COUNT(DISTINCT S.STAFF_ID) FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN TOOL_STOCK TS ON TS.BRANCH_ID = B.ID
GROUP BY B.ID;
replacing S.STAFF_ID with the primary key field from the STAFF table.
Your problem is that the COUNT function returns the number of rows matching the GROUP BY clause after all rows have been joined and returned.
In your initial query you are finding the number of employees for each branch, In the second the number of employees is multiplied by the number of stock items.
When you add the second join, you are getting the counts for STAFF + TOOLS at each branch.
You will likely need to add a subquery if you want all the data returned, but only counts of one record type.
I think the key to your issue is, which are you actually trying to count?

pig script: join tables with null values

I'd like to join 2 tables, and I'm a bit lost with different kinds of joins
A(a_name:chararray, a_number:int)
a 1
b 2
c
d 3
e
B(b_id:int, b_name:chararray)
1 one
2 two
3 three
I know that I need to some sort of join, but with
AB = JOIN A by a_number, B by b_id;
FOREACH AB GENERATE
a_name,
b_name as a_number;
I get
a one
b two
d three
Instead of
a one
b two
c
d three
e
which I actually want.
How should I do this?
edit:
Ok, I tried left join but it doesn't keep the row order and instead returns
a one
b two
d three
c
e
Any workaround?
You are looking for a left JOIN.
This will keep all values on the left side of the relationship even if they don't appear in the right. Pig defaults to an inner JOIN, so it only keeps values that are in both sides.
This will now generate what you expect.
AB = JOIN A by a_number LEFT, B by b_id;
C = FOREACH AB GENERATE a_name, b_name AS a_number;
Also, you should be able to compact those two relations into:
AB = FOREACH (JOIN A by a_number LEFT, B by b_id)
GENERATE a_name, b_name AS a_number;
As far as I know there is no option in JOIN to perverse the order of the left relation. However, you can RANK A beforehand then ORDER on the number RANK creates after the JOIN.