Three table join where middle table has duplicate foreign keys - sql

I am working with a database with a structure similar to the illustration below (except with more columns). Basically, each person has a unique person_id and alt_id. However, the only thing connecting table A to table C is table B, and table B has one to many rows for each person/alt_id.
I need to get rows with a person_id, their alt id and their associated shapes.
I could do this:
SELECT DISTINCT a.person_id, a.color, b.alt_id, c.shape
FROM a
JOIN b ON a.person_id = b.person_id
JOIN c ON b.alt_id = c.alt_id
However, that seems inefficient as it will take a Cartesian product of rows from B and C with the same alt_id before finally using DISTINCT to narrow the results down. What's the best/most efficient way to do this query?
Table A
+-----------+-------+
| person_id | color |
+-----------+-------+
| 10 | red |
| 11 | blue |
| 12 | green |
+-----------+-------+
Table B
+-----------+--------+
| person_id | alt_id |
+-----------+--------+
| 10 | 225 |
| 10 | 225 |
| 11 | 226 |
| 11 | 226 |
| 11 | 226 |
| 12 | 227 |
+-----------+--------+
Table C
+--------+----------+
| alt_id | shape |
+--------+----------+
| 225 | square |
| 226 | circle |
| 226 | rhombus |
| 226 | ellipse |
| 227 | triangle |
+--------+----------+

Join to (select distinct * from b) b rather than just the base table b.
SELECT
a.person_id, a.color, b.alt_id, c.shape
FROM
a
INNER JOIN (select distinct * from b) b
ON a.person_id = b.person_id
INNER JOIN c
ON b.alt_id = c.alt_id

You can get a distinct list of values from b before you do your joins.
SELECT DISTINCT a.person_id, a.color, b.alt_id, c.shape
FROM a
JOIN (Select Distinct person_id, alt_id from b) b ON a.person_id = b.person_id
JOIN c ON b.alt_id = c.alt_id
Note that because of indexes, and statistics, getting a DISTINCT list is not always a good idea. Look at the actual execution plan to evaluate how good this is, especially if you have a lot of data.

You could use aggregation along with a common table expression (or subquery, but a CTE might be neater):
WITH ab AS (
SELECT a.person_id, a.color, MAX(b.alt_id) AS alt_id
FROM a INNER JOIN b
ON a.person_id = b.person_id
GROUP BY a.person_id, a.color
)
SELECT ab.person_id, ab.color, ab.alt_id, c.shape
FROM ab INNER JOIN c ON ab.alt_id = c.alt_id;

Related

Join two tables with no relation postgres?

I have this statement which you can see
SELECT t1.*, t2.* FROM
(SELECT m.* FROM microposts AS m) AS t1
FULL JOIN
(SELECT r.* FROM ratings AS r) AS t2
ON true
I am using Rails and connecting to the database raw, but the output removes duplicate named columns eg user_id etc from the second table and is still giving results in the second table in regards to the first even though there is no relation. Eg
+------+-----------+-------+--------+
| m.id | m.content | r.id | rating |
+------+-----------+-------+--------+
| 1 | "hello" | 10 | 5 |
+------+-----------+-------+--------+
There is no relation between table m and r
I would like A output of something like this
+------+-----------+------+---------+
| m.id | m.content | r.id | rating |
+------+-----------+------+---------+
| 1 | "hello" | null | null |
| null | null | 5 | 4 |
| 2 | "gday" | null | null |
+------+-----------+------+---------+
....................... etc
This is rather exotic way to say UNION ALL
SELECT t1.*, t2.*
FROM
(SELECT m.* FROM microposts AS m) AS t1
FULL JOIN
(SELECT r.* FROM ratings AS r) AS t2
ON false
Contrary, ON true will create a cartesian product.

select query joining two tables on a range

I have two tables:
Table A with columns
name | tag | price | ref
and Table B with columns:
id | time | min_ref | max_ref
I want to make the following query, take all columns from table A and columns id and time from Table B, combining rows in such a way that particular row from A is merged with a row from B if value ref from A is in the range (min_ref, max_ref). Example:
A
name | tag | price | ref
A | aaa | 78 | 456
B | bbb | 19 | 123
C | ccc | 5 | 789
B
id | time | min_ref | max_ref
0 | 26-01-2019 | 100 | 150
1 | 27-01-2019 | 450 | 525
2 | 25-01-2019 | 785 | 800
the query should return:
name | tag | price | ref | id | time
A | aaa | 78 | 456 | 1 | 27-01-2019
B | bbb | 19 | 123 | 0 | 26-01-2019
C | ccc | 5 | 789 | 2 | 25-01-2019
The notation (min_ref, max_ref) for ranges signifies exclusive bounds. Would be [min_ref, max_ref] for inclusive.
So:
select a.*, b.id, b.time
from a
join b on a.ref > b.min_ref
and a.ref < b.max_ref;
The BETWEEN predicate treats all bounds as inclusive.
I think this is just a join:
select a.*, b.id, b.time
from a join
b
on a.ref between b.min_ref and b.max_ref;
You want a JOIN which combines rows from the two tables with an appropriate criteria. For instance:
SELECT a.name, a.tag, a.price, a.ref, b.id, bi.time
FROM a
INNER JOIN b ON b.min_ref <= a.ref AND b.max_ref >= a.ref
The INNER JOIN finds matching rows from the two tables, ON a specified criteria. In this case, the criteria is that a.ref is between b.min_ref and b.max_ref.
You can also use the sql BETWEEN operator to simplify the conditionals:
SELECT ...
FROM a
INNER JOIN b ON a.ref BETWEEN b.min_ref AND b.max_ref

SQL nested loops are too costly

I have an sql query like this:
select w.name, c.address, b.salary, a.product, d.contract_amount
from w
left join c c.id = w.id
left join b b.id = w.id
left join a a.id = w.id and a.date > sysdate-30
left join d d.id = w.id
where w.id = '12345';
And it's plan:
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 849 |18896868| 00:01:14 |
| 1 | NESTED LOOPS OUTER | | 1 | 849 |18896868| 00:01:14 |
| 2 | NESTED LOOPS OUTER | | 1 | 849 |18896868| 00:01:14 |
| 3 | NESTED LOOPS OUTER | | 1 | 670 |18896868| 00:01:14 |
| 4 | NESTED LOOPS OUTER | | 1 | 596 |18896868| 00:01:14 |
| 5 | TABLE ACCESS STORAGE FULL | w | 1 | 415 | 20 | 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | c | 1 | 22 | 3 | 00:00:01 |
| 7 | INDEX UNIQUE SCAN |c_id_nd| 1 | | | 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | b | 1 | 66 | 2 | 00:00:01 |
| 9 | INDEX UNIQUE SCAN |b_id_nd| 1 | | | 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | a | 1 | 11 | 3 | 00:00:01 |
| 11 | INDEX UNIQUE |a_id_nd| 1 | | | 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | d | 1 | 25 | 1 | 00:00:01 |
| 13 | INDEX UNIQUE |d_id_nd| 1 | | | 00:00:01 |
-----------------------------------------------------------------------------------
Now its work about for 15-18 seconds and it's too long. I am new at tuning and I don't know how to improve its performance. Actually, all tables have about 33-54 millions of rows and all id columns have indexes. Also statistics was gathered for tables and i'm not able to use parallel hint.
What optimizations can I do?
For this query:
select w.name, c.address, b.salary, a.product, d.contract_amount
from w left join
c
on c.id = w.id left join
b
on b.id = w.id left join
a
on a.id = w.id and a.date > sysdate-30 left join
d
on d.id = w.id
where w.id = '12345';
You want indexes on w(id), c(id),b(id), a(id, date), and d(id).
I guess there is nothing wrong with your query, I'd think the bad execution plan was generated initially and it is still sitting in cache.You can overwrite query in a different way and probably you'll get a better plan (eg with CTE). You can also try to filter id before joining. Try smth like this
with
W as (select id, name from w where w.id = '12345')
,C as (select id, address from C where c.id = '12345')
,B as (select id, salary from B where b.id = '12345')
,A as (select id, product from A where a.id = '12345' and a.date > sysdate - 30)
,D as (select id, contract_amount from D where d.id = '12345')
select w.name, c.address, b.salary, a.product, d.contract_amount
from w
left join c on c.id = w.id
left join b on b.id = w.id
left join a on a.id = w.id
left join d on d.id = w.id
Or this:
with
W1 as (select w.id, w.name from w where w.id = '12345')
,W2 as (select w1.* , c.address from W1 left outer join C on w1.id = c.id)
,W3 as (select w2.*, b.salary from W2 left outer join B on w2.id = b.id)
,W4 as (select w3.*, a.product from W3 left outer join A on w3.id = a.id and a.date > sysdate - 30)
Select w4.*, d.contract_amount from W4 left outer join D on w4.id = d.id
With 35 million records in tables. Are the tables partitioned.? If so are the query ensuring partition pruning
I think the problem lies on cardinality estimator. Due to multiple left joins from - probably - master to detail 'type' tables there is wrong assumption of rows return. A poor cardinality estimation may lead to poor plan selection. I suggest a try with isolated selects as proposed by Mike and compare timings. I am not sure how smart CTEs perform in Oracle so i recommend surely isolated statements even if you have to use temp or memory tables. Select each table alone using your id value and put results into a temporary table. Then perform the final select on those temporary tables.

Return multiple columns with 3 distinct fields in SQL query for Access DB

I am trying to make a query that returns multiple fields, keeping the first 3 as distinct columns and returns values for the last modified date. Some of the variables in the query fields should come from more than one table and one of them has a True/False criterion too. The three 3 distinct fields are needed because the combination of these is associated with the other returning parameters.
The tables look roughly as follows...
Table a:
ID | Sc | Country | TechID | VarA | ... | VarX(T/F) | LastModified
1 | 1 | AA | 1 | x | ... | T | 1-1-2017
2 | 1 | AA | 1 | z | ... | T | 1-1-2017
3 | 1 | AA | 2 | y | ... | T | 1-1-2018
4 | 1 | AB | 1 | u | ... | T | 1-1-2017
5 | 2 | AB | 2 | v | ... | T | 1-1-2018
6 | 3 | AB | 1 | w | ... | F | 1-1-2018
Table b:
TechID | TechName | Categ | Units
1 | Tech1 | Cat1 | M
2 | Tech2 | Cat2 | N
3 | Tech3 | Cat3 | P
The idea is that the query returns something like this (when the T/F criterion is met). Where the combination of Sc-Country-Tech shows up only once, with the last modified having presedence:
Sc' | Country' |TechName'| Units | Cat | VarA... | LastModified |
1 | AA | 1 | ... | ... | ... | 1-1-2018
1 | AB | 2 | ... | ... | ... | 1-1-2017
2 | AB | 1 | ... | ... | ... | 1-1-2018
So far I've tried a few SQL lines to no avail. First, with Select DISTINCT but the option was too "all inclusive".
SELECT DISTINCT a.Sc, a.Country, b.TechName, b.Units, b.Cat, a.VarA,..,a.VarX, Max(a.LastModified) AS MaxOfLastModified
FROM a INNER JOIN (b INNER JOIN a ON b.TechName =
a.TechID) ON b.Cat = a.TechID
GROUP BY a.Sc, a.Country, b.TechName, b.Units, b.Cat, a.VarA,..,a.VarX
HAVING (((a.VarX)=True));
Also tried this but it prompts errors related to aggregate functions:
SELECT a.Sc, a.Country, b.TechID, b.Units, b.Cat, a.VarA,..,a.VarX, Max(a.LastModified) AS MaxOfLastModified
FROM a INNER JOIN (b INNER JOIN a ON b.TechName =
a.TechID) ON b.Cat = a.TechID
GROUP BY a.Sc, a.Country, a.TechID
HAVING (((a.VarX)=True));
Any thoughts/suggestions on how to go about this?? Any pointers to previous related answers are also much appreciated.
Thanks in advance! :)
EDIT (2017.09.29):
This certainly cleared things up a bit!
I managed to get the query going with some of the fields, only when calling fields from a single table with the following:
SELECT a.Sc, a.Country, a.Tech, a.LastModified, a.VarA
FROM a INNER JOIN (SELECT Sc, Country, Tech, max(LastModified) AS lm FROM a GROUP BY Sc, Country, Tech) AS dt ON (dt.lm=a.LastModified) AND (dt.Tech=a.Tech) AND (dt.Country=a.Country) AND (dt.Sc=a.Sc)
GROUP BY a.Sc, a.Country, a.Tech, a.LastModified, a.VarA, a.VarX
HAVING (((a.VarX)=Yes));
I'm still running into a syntax error on JOIN when trying to add fields from a lookup table using the INNER JOIN command as suggested. The code I tried looked something like:
SELECT a.Sc, a.Country, a.Tech, a.LastModified, a.VarA b.TechCategory
FROM a INNER JOIN (SELECT Sc, Country, Tech, max(LastModified) AS lm FROM a GROUP BY Sc, Country, Tech) AS dt ON (dt.lm=a.LastModified) AND (dt.Tech=a.Tech) AND (dt.Country=a.Country) AND (dt.Sc=a.Sc)
INNER JOIN b ON Tech.Category=a.Tech
GROUP BY a.Sc, a.Country, a.Tech, b.TechCategory, a.LastModified, a.VarA, a.VarX
HAVING (((a.VarX)=Yes));
Any additional pointers are much appreciated!
Use an aggregate query to get the maximum date for each combination of Sc, Country, and TechID, then use this as a subquery and join it back to tables a and b to get the data in your final query. Something like this:
select
a.Sc, a.Country, b.TechName,
b.Units, b.Category, b.Units, a.VarA, a.LastModified
from
(Table_a as a
inner join (
select Sc, Country, TechID, max(LastModified) as lm
from Table_a
group by Sc, Country, TechID
) as dt on dt.Sc=a.Sc and dt.Country=a.Country and dt.TechID=a.TechID and dt.lm=a.LastModified)
inner join Table_b as b on b.TechID=a.TechID

sql opposite of join

I have two tables which are combined via a MAP table
Table ANIMAL:
+------+--------------+
| id | description |
+------+--------------+
| 2 | Ape |
| 3 | Lion |
+------+--------------+
Table MAP:
+-----------+---------+
| animal_id | legs_id |
+-----------+---------+
| 2 | 11 |
+-----------+---------+
Table LEGS:
+------+--------------+
| id | legs |
+------+--------------+
| 10 | 4 |
| 11 | 2 |
+------+--------------+
I need the animals that have no map entry in the LEGS table, something like this:
!(select *
from ANIMAL as a
JOIN MAP as m ON (a.id = m.animal_id)
JOIN LEGS as l ON (m.legs_id = l.id) )
which should give me 'Lion' as result
use LEFT JOIN
SELECT a.*
FROM animal a
LEFT JOIN Map b
On a.id = b.animal_id
WHERE b.animal_id IS NULL
SQLFiddle Demo
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Try this :
SELECT a.*
FROM animal a
WHERE a.id NOT IN (SELECT animal_id FROM Map m JOIN Legs l
ON m.legs_id = l.id)
Select * from Animal A
left join Map M on A.id=M.animal_id
where M.animal_id is null;
Could you not do a simple query to return all animals with no associated Map record... i.e.
SELECT
*
FROM
Animal
WHERE
animal_id not in (SELECT animal_id FROM Map)