DISTINCT in JOIN

DISTINCT in JOIN - sql

I have a question in Oracle SQL.
To simplify my problem, let's say I have two tables:
TAB1: TAB2:
Usr Fruit Fruit Calories
1 A A 100
1 A B 200
1 A C 150
1 C D 400
1 C E 50
2 A
2 A
2 E
It's important that there are double entries in TAB1.
Now I want to know the calories for usr 1. But by joining both tables
SELECT TAB2.calories from TAB1
JOIN TAB2 ON TAB1.Fruit = TAB2.Fruit
WHERE TAB1.Usr = 1;
I get double results for the double entries. I could of course use distinct in the header, but is there a possibility to distinct the values (to A and C) directly in the join? I am sure that would improve my (much larger) performance.
Thanks!

I'm a big fan of the semi-join. For tables this small, it won't matter, but for larger tables it can make a big difference:
select
tab2.calories
from tab2
where exists (
select null
from tab1
where tab1.fruit = tab2.fruit and tab1.usr = 1
)

Try as this:
SELECT TAB2.calories
from (select distinct usr, fruit from TAB1) as T1
JOIN TAB2 ON T1.Fruit = TAB2.Fruit
WHERE T1.Usr = 1;

You should do distinct before the join
select sum(tab2.calories) as TotalCalories
from (select distinct tab1.*
from tabl
) t1 join
tab2
on t1.fruit = tab2.fruit
where t1.user = 1;
Also, to add the values, use an aggregation function.

since you select nothing in tabA and maybe you have some usefull index, i'd go for an IN instead of the join
SELECT TAB2.calories
FROM TAB2
WHERE TAB2.Fruit IN ( SELECT TAB1.Fruit FROM TAB1 WHERE TAB1.Usr = 1)
i'm pretty sure this one will take longer but you can still try:
SELECT TAB2.calories
FROM TAB2
WHERE TAB2.Fruit IN ( SELECT DISTINCT TAB1.Fruit FROM TAB1 WHERE TAB1.Usr = 1)

Related

SQL join sum groupby where

I have following two tables:
fiddle
As a result I need a list of 'name' from tab1, where the sum of 'dur' is 10 or bigger. The connection between the two table is the 'number' from tab1 which can be found in column 'xxx' or 'yyy' from tab2.
So the expected output should be: Jack, Anna
Because the sum of 'dur' for Jack(1234) is 10 and the sum of 'dur' for Anna(7582) is 12.
So far I know how to get the sum based on one column XXX:
SELECT tab1.name FROM tab1
INNER JOIN (
SELECT xxx, SUM(dur) AS total_dur
FROM tab2
GROUP BY xxx)
tab2 ON tab1.number=tab2.xxx
WHERE total_dur >=10
but how do I also consider the second 'yyy' column?

You can use OR in the join condition to check two columns. Using HAVING might also make the query a bit more readable.
SELECT name FROM tab1
JOIN tab2 ON xxx = number OR yyy = number
GROUP BY name
HAVING SUM(dur) >= 10

You can unpivot the data. In Postgres, I would suggest a lateral join:
SELECT tab1.name
FROM tab1 INNER JOIN
(SELECT v.col, SUM(dur) AS total_dur
FROM tab2 CROSS JOIN LATERAL
(VALUES (tab2.xxx), (tab2.yyy)) v(col)
GROUP BY v.col
) tab2
ON tab1.number = tab2.col
WHERE total_dur >= 10;
Here is a db<>fiddle.

Join table and pick rows where for given id exists only one value

I don't know, if I made good title, but please let me visualize this.
So I have two tables and for given case I need to select row where payment currency was ONLY in EUR.
Correct document Id's will be: 2, 3, 4, 5
These are overall bigger tables with 900k+ records.
Can you please suggest me how query should look?

use correlated subquery with not exists
select distinct a.document_id from tablename a inner join tablename b b on a.document_id=b.payment_docid
where not exists
(select 1 from tablename b1 where b1.payment_docid=b.payment_docid and currency<>'EUR')

Try this query:
select payment_docId from MyTable
group by payment_docId
having max(currency) = 'EUR'
and min(currency) = 'EUR'
or you could use having count(*) = 1 with min or max as well.

use corelated subquery
select t1.* from table2 as t1
where exists( select 1 from table2 t2 where t1.payment_docid=t2.payment_docid
having count(distinct currency)=1)
and currency='EUR'

It is possible to use INNER JOIN with the following conditions to get all rows:
SELECT
pd.payment_doc_id
, pd.currency
FROM DocTable dt
INNER JOIN PaymentDocs pd
ON dt.document_id = pd.payment_doc_id AND pd.currency IN ('EUR')
If you want distinct rows, then you can apply operator GROUP BY:
SELECT
pd.payment_doc_id
, pd.currency
FROM DocTable dt
INNER JOIN PaymentDocs pd
ON dt.document_id = pd.payment_doc_id AND pd.currency IN ('EUR')
GROUP BY pd.payment_doc_id
, pd.currency

Aggregation is the only efficient want :
select doc_id
from table t
group by doc_id
having min(currency) = max(currency) and min(currency) = 'EUR';

Querying two tables to filter data using select case

I have two tables
Table 1 looks like this
ID Repeats
-----------
A 1
A 1
A 0
B 2
B 2
C 2
D 1
Table 2 looks like this
ID values
-----------
A 100
B 200
C 100
D 300
Using a view I need a result like this
ID values Repeats
-------------------
A 100 NA
B 200 2
C 100 2
D 300 1
that means, I want unique ID, its values and Repeats. Repeats value should display NA when there are multiple values against single ID and it should display the Repeats value in case there is single value for repeats.
Initially I needed to display the max value of repeats so I tried the following view
ALTER VIEW [dbo].[BookingView1]
AS
SELECT bv.*, bd2.Repeats FROM Table1 bv
JOIN
(
SELECT distinct bd.id, bd.Repeats FROM table2 bd
JOIN
(
SELECT Id, MAX(Repeats) AS MaxRepeatCount
FROM table2
GROUP BY Id
) bd1
ON bd.Id = bd1.Id
AND bd.Repeats = bd1.MaxRepeatCount
) bd2
ON bv.Id = bd2.Id;
and this returns the correct result but when trying to implement the CASE it fails to return unique ID results. Please help!!

One method uses outer apply:
select t2.*, t1.repeats
from table2 t2 outer apply
(select (case when max(repeats) = min(repeats) then max(repeats)
else 'NA'
end) as repeats
from table1 t1
where t1.id = t2.id
) t1;
Two notes:
This assumes that repeats is a string. If it is a number, you need to cast it to a string.
repeats is not null.

For the sake of completeness, I'm including another approach that will work if repeats is NULL. However, Gordon's answer has a much simpler query plan and should be preferred.
Option 1 (Works with NULLs):
SELECT
t1.ID, t2.[Values],
CASE
WHEN COUNT(*) > 1 THEN 'NA'
ELSE CAST(MAX(Repeats) AS VARCHAR(2))
END Repeats
FROM (
SELECT DISTINCT t1.ID, t1.Repeats
FROM #table1 t1
) t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values]
Option 2 (does not contain explicit subqueries, but does not work with NULLs):
SELECT DISTINCT
t1.ID,
t2.[Values],
CASE
WHEN COUNT(t1.Repeats) OVER (PARTITION BY COUNT(DISTINCT t1.Repeats), t1.ID) > 1 THEN 'NA'
ELSE CAST(t1.Repeats AS VARCHAR(2))
END Repeats
FROM #table1 t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values], t1.Repeats
NOTE:
This may not give desired results if table2 has different values for the same ID.

LEFT JOIN on 3 tables to get a value

I'm trying to create an new interface for a database but I don't know how to do what I want.
I have 3 tables :
- table1(id1, time, ...)
id11 ..
id12 ..
id13 ..
- table2(id2, price, ...)
id21 ..
id22 ..
id23 ..
- table1_table2(#id1, #id2, value)
id11, id22, 6
id11, id23, 10
id13, id22, 5
So I want to have something like this :
id11, id21, 0
id11, id22, 6
id11, id23, 10
id12, id21, 0
id12, id22, 0
id12, id23, 0
id13, id21, 0
id13, id22, 5
id13, id23, 0
I've tried lots of requests but nothing efficient..
Please, help me ^^
EDIT : I'm using Access ( :'( ) 2007, and apparently, it doesn't support CROSS JOIN...
I tried to use this : http://blog.jooq.org/2014/02/12/no-cross-join-in-ms-access/
but still have a syntax error on the JOIN or the FROM..
EDIT 2 : Here is my query (I'm french, so don't take care of names please ^^)
SELECT Chantier.id_chantier, Indicateur.id_indicateur, Indicateur_chantier.valeur
FROM ((Chantier INNER JOIN Indicateur ON (Chantier.id_chantier*0 = Indicateur.id_indicateur*0))
LEFT JOIN Indicateur_chantier ON ( (Chantier.id_chantier = Indicateur_chantier.id_chantier)
AND (Indicateur.id_indicateur = Indicateur_chantier.id_indicateur) ) )

You should first cross join table1 and table2 to produce their Cartesian product and the left join to get the values where exist :
SELECT t1.id1,t2.id2,ISNULL(t12.value,0)
FROM table1 t1
CROSS JOIN table2 t2
LEFT JOIN table1_table2 t12 on t12.id1=t.id1 and t12.id2=t2.id2
Finally use ISNULL to replace null values with zeros.

Answer may vary by database, this works in SQL Server, you need a CROSS JOIN to get every combination of table1 and table2, then a LEFT JOIN to return pairs with values:
SELECT a.id1, b.id2, COALESCE(c.value,0)
FROM table1 a
CROSS JOIN table2 b
LEFT JOIN table3 c
ON a.id1 = c.id1
AND b.id2 = c.id2
Pairs without values would return NULL, so you can use COALESCE() to return 0 instead.
Demo: SQL Fiddle

In your question you say that Access "doesn't support CROSS JOIN". While it is true that Access SQL does not support
... FROM tableX CROSS JOIN tableY ...
you can perform a cross join in Access by simply using
... FROM tableX, tableY ...
In your case,
SELECT
crossjoin.id1,
crossjoin.id2,
Nz(table1_table2.value, 0) AS [value]
FROM
(
SELECT table1.id1, table2.id2
FROM table1, table2
) AS crossjoin
LEFT JOIN
table1_table2
ON table1_table2.id1 = crossjoin.id1
AND table1_table2.id2 = crossjoin.id2
ORDER BY crossjoin.id1, crossjoin.id2

Multiple count based on dynamic criteria

I have two database for which I want to compare the amount of times a case appears.
TAB1:
ID Sequence
A2D 1
A2D 2
A2D 3
A3D 1
TAB2:
ID Sequence
A2D 1
A2D 2
A3D 1
A3D 2
Now, for this example, I am trying to get this result:
ID Table1 Table2
A2D 3 2
A3D 1 2
I have tried these code without any success:
SELECT R1.ID as ID, COUNT(R1.ID) as Table1,
COUNT(R2.ID) as Table2
FROM TAB1 AS R1, TAB2 AS R2
WHERE R1.ID = R2.ID
GROUP BY R1.ID
This one gave me wrong count values...
Also, this one simply crash:
select
(
select count(*) as Table1
from TAB1
where ID = R1.ID
),(
select count(*) as Table2
from TAB2
where ID= R1.ID
)
FROM TAB1 AS R1
As you can see though, I am trying to have my criteria dynamic. Most examples I found were including basic hard-coded criteria. But for my case, I want the query to look at my first table ID, count the amount of time it appears, do it for the 2nd table with the same ID, then move on to the next ID.
If my question lacks information or is confusing just ask me, I'll do my best to be more precise.
Thanks in advance !

Here I am using a UNION ALL as a subquery
SELECT ID, SUM(T1) AS Table1, SUM(T2) AS Table2
FROM
(SELECT ID, COUNT(ID) AS T1, 0 AS T2 FROM TAB1 GROUP BY ID
UNION ALL
SELECT ID, 0 AS T1, COUNT(ID) AS T2 FROM TAB2 GROUP BY ID)
GROUP BY ID
HAVING SUM(T1)>0 AND SUM(T2)>0

I used a different approach, but unfortunately I have to use two queries, i still don't know if they can be combined together. The first one is just for making sums of both tables, and combining the results:
SELECT "Tab1" AS [Table], Tab1.ID, Count(*) AS Total
FROM Tab1
GROUP BY "Tab1", Tab1.ID
UNION SELECT "Tab2" AS [Table], Tab2.ID, Count(*) AS Total
FROM Tab2
GROUP BY "Tab2", Tab2.ID
and, since Access supports Pivot queries, you can use this:
TRANSFORM Sum(qrySums.[Total]) AS Total
SELECT qrySums.[ID]
FROM qrySums
GROUP BY qrySums.[ID]
PIVOT qrySums.[Table];

Not sure if I understand your question, but you could try something like this:
SELECT DISTINCT t.ID,
(SELECT COUNT(ID) FROM R1 WHERE ID = t.ID) AS table1,
(SELECT COUNT(ID) FROM R2 WHERE ID = t.ID) AS table2
FROM table1 t

To get the desired results, I broke it down into two sub-queries (R1SQ and R2SQ) and a main UNION query - R1R2 that uses inner, left and right joins to include all row entries including those rows that do not appear in both tables:
R1SQ
SELECT R1.Builder, Count(R1.Builder) AS Table1
FROM R1
GROUP BY R1.Builder;
R2SQ
SELECT R2.Builder_E, Count(R2.Builder_E) AS Table2
FROM R2
GROUP BY R2.Builder_E;
R1R2
SELECT R1SQ.Builder, R1SQ.Table1, R2SQ.Table2
FROM R1SQ INNER JOIN R2SQ ON R1SQ.Builder = R2SQ.Builder_E
UNION
SELECT R1SQ.Builder, R1SQ.Table1, 0 AS Table2
FROM R1SQ LEFT JOIN R2SQ ON R1SQ.Builder = R2SQ.Builder_E
WHERE (((R2SQ.Builder_E) Is Null))
UNION
SELECT R2SQ.Builder_E, 0 AS Table1, R2SQ.Table2
FROM R1SQ RIGHT JOIN R2SQ ON R1SQ.Builder = R2SQ.Builder_E
WHERE (((R1SQ.Builder) Is Null))
ORDER BY R1SQ.Builder;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

DISTINCT in JOIN - sql

I'm a big fan of the semi-join. For tables this small, it won't matter, but for larger tables it can make a big difference: select tab2.calories from tab2 where exists ( select null from tab1 where tab1.fruit = tab2.fruit and tab1.usr = 1 )

Try as this: SELECT TAB2.calories from (select distinct usr, fruit from TAB1) as T1 JOIN TAB2 ON T1.Fruit = TAB2.Fruit WHERE T1.Usr = 1;

You should do distinct before the join select sum(tab2.calories) as TotalCalories from (select distinct tab1.* from tabl ) t1 join tab2 on t1.fruit = tab2.fruit where t1.user = 1; Also, to add the values, use an aggregation function.

Related

SQL join sum groupby where

Join table and pick rows where for given id exists only one value

Querying two tables to filter data using select case

LEFT JOIN on 3 tables to get a value

Multiple count based on dynamic criteria

Categories

Resources