SQL nested sums with multiple joins and group by - sql

I'm not very familiar with SQLs. I'm using oracle. I met a question with over summing fields.
Here are the example tables:
A:
A_ID
A_NAME
B:
B_ID
A_ID
B_NAME
B_QTY
C:
C_ID
B_ID
C_QTY
So the data structure is like A -> *B -> *C
I need to get the total quantities of Bs and Cs grouped by B_NAME and A_ID. For example:
A:
A_ID A_NAME
1 A1
B:
B_ID A_ID B_NAME B_QTY
1 1 B1 20
2 1 B1 5
3 1 B1 5
4 1 B2 5
C:
C_ID B_ID C_QTY
1 1 3
2 1 4
4 2 2
5 2 1
6 3 1
7 4 1
The expected result is:
A_ID A_NAME B_NAME B_QTY C_QTY
1 A1 B1 30 11
1 A1 B2 5 1
The 30 of B_QTY in the 1st line is result of 20 + 5 + 5.
The 11 of C_QTY in the 1st line is result of 3 + 4 + 2 + 1 + 1.
Here is my sql:
select a.A_ID,
a.A_NAME,
b.B_NAME
sum(b.B_QTY),
sum(c.C_QTY)
from A a left outer join B b on b.A_ID = a.A_ID
left outer join C c on c.B_ID = b.B_ID
group by a.A_ID
order by a.A_ID, b.B_NAME
where a.XXXX = XXXXX;
So the problem is:
Since the B mapps to multiple Cs, the B_QTY will be summed multiple times. I'm not very familiar with SQL so I don't know if there is any simple way to distict the summing based on some fields (which is B_ID in my example). Thank you!

This can also be done like this:
WITH b2 AS
(SELECT b.*, sum(b.b_qty) over (partition BY b.a_id, b.b_name) b_qty_s
FROM b)
SELECT a.a_id, a.a_name, b2.b_name, b2.b_qty_s, sum(c.c_qty) c_qty_s
FROM a JOIN b2 ON a.a_id = b2.a_id
JOIN c ON b2.b_id = c.b_id
GROUP BY a.a_id,a.a_name, b2.b_name, b2.b_qty_s
Here is a sqlfiddle demo

You can also do like this:
SELECT DISTINCT A_ID,A_NAME,B_NAME,B_SUM,SUM(C_QTY) OVER(PARTITION BY A_NAME,B_NAME) C_SUM
FROM (
SELECT A.A_ID,A_NAME,B_NAME,B_ID,SUM(B_QTY) OVER(PARTITION BY A_NAME,B_NAME) B_SUM
FROM A JOIN B
ON A.A_ID=B.A_ID) T1
JOIN C
ON T1.B_ID=C.B_ID

I created an SQL fiddle for this problem. The trick is that the B_QTY was appearing in your results more than once. Summing on it was giving an artificially high value. So instead, run a sub select to use the B_NAME only once! Great question! :^D
A.B.Cade's answer is cool, but this solution will work for many databases. I've used this technique before with SQL Server, Oracle, and Informix.
Data/Schema:
create table a (A_ID int, A_NAME char(10));
create table b (B_ID int, A_ID int, B_NAME char(10), B_QTY int);
create table c (C_ID int, B_ID int, C_QTY int);
-- One dude
insert into a values (1,'Xiezi');
-- 2 orders? of 4 and 3
insert into b values (1,1,'B1',20);
insert into b values (2,1,'B1',5);
insert into b values (3,1,'B1',5);
insert into b values (4,1,'B2',5);
-- 2 order with 2 lines each.
insert into c values (1,1,3);
insert into c values (2,1,4);
insert into c values (4,2,2);
insert into c values (5,2,1);
insert into c values (6,3,1);
insert into c values (7,4,1);
SQL (The answer):
select a.A_ID,
a.A_NAME,
b.B_NAME,
(select sum(b2.B_QTY) from b b2 where b2.B_NAME = b.B_NAME)
as sum_b_qty,
sum(c.C_QTY)
from a left outer join b on b.A_ID = a.A_ID
left outer join c on c.B_ID = b.B_ID
group by a.A_ID,
a.A_NAME,
b.B_NAME
order by a.A_ID
;
Output:
A_ID A_NAME B_NAME SUM_B_QTY SUM(C.C_QTY)
1 Xiezi B1 30 11
1 Xiezi B2 5 1

Related

How to get the number of occurrences in another table

If I have TABLE_A:
A_ID B_ID
1 6
2 6
3 7
4 7
5 7
and TABLE_B:
B_ID B_NAME
6 B1
7 B2
8 B3
9 B4
How can I find the number of occurrences of B_ID in TABLE_A? Something like this:
select B_ID, B_NAME, total_count from TABLE_B
Where "total_count" is the number of times B_ID is found in TABLE_A so that the result would be:
B_ID B_NAME total_count
6 B1 2
7 B2 3
8 B3 0
9 B4 0
Use group by, left join and count:
SELECT B.B_ID, B.B_NAME, COUNT(A.A_Id) As TotalCount
FROM TableB As B
LEFT JOIN TableA As A
ON B.B_ID = A.B_ID
GROUP BY B.B_ID, B.B_NAME
This is quite a basic query in SQL, and should produce the same result on most if not all relational databases.

Conditional Join in Oracle SQL

Consider below 3 tables.
Table a
Col a Col b Col c
1 000 Actual data
1 001 Actual data
2 000 Actual data
3 000 Actual data
3 001 Actual data
3 002 Actual data
Table b
Col a Col b Col d
1 000 Actual data
1 001 Actual data
2 000 Actual data
Table c
Col a Col b Col d
3 000 Actual data
3 001 Actual data
3 002 Actual data
Table a is parent table and table b and c are child table having col a & b common among 3 and needs to be joined.
Now Join should be such if data is not found in table b then only it should be searched in table c
Desired:
cola col b col c col d
1 000 somedata moredata
1 001 somedata moredata
2 000 somedata moredata
3 000 somedata moredata
3 001 somedata moredata
3 002 somedata moredata
Well, currently what i am doing is, left join b to a and c to a, but i think every time for record in a will be searched in b and c both making it Less cost effective. hence want to make it cost effective/fine-tune such that if records NOT exist in b then only search c.
What you really need is a way to "collect" all the rows from table B, and if there are none, then all the rows from table C. Doing the join to A is then standard.
Something like this should work. Make it a subquery and join to your first table.
select col_a, col_b, col_c
from table_b
union all
select col_a, col_b, col_c
from table_c
where (select count(*) from table_b) = 0
If table_b has at least one row, then nothing will be selected from table_c (because the where condition will be false for all rows in table_c). However, if table_b is empty, all the rows from table_c will be selected.
What you need to do is first create a union of two tables B and C with only those records where are in B and C but if they are in B then we should ignore the C ones then do a join with Table A. Thus:
SELECT B.cola, B.colb from B
UNION ALL
SELECT C.cola, C.colb from C
Now using this table, you can join with Table A like:
SELECT A.cola, A.colb, tmp.colc
FROM A
JOIN
( SELECT B.cola, B.colb, B.colc from B
UNION ALL
SELECT C.cola, C.colb from C) AS tmp
ON A.cola = tmp.cola
AND A.colb = tmp.colb
Two left joins:
select a.*, b.*, c.*
from a
left join b
on a.cola=b.cola
and a.colb = b.colb
left join c
on a.cola=c.cola
and a.colb=c.colb

SQL joining 2 tables without repeating values

I have 2 tables with a 1:n relationship.
I want to join them without repeating (duplicating) the values from the one table.
First, I have a table with budgets:
id name budget
1 John 1000
2 Kim 3000
And second I have a table of spendings:
id amount
1 112
1 145
1 211
The result should look like this:
id name budget amount
1 John 1000 112
1 null null 145
1 null null 211
2 Kim 3000 null
Output could also be: (this is not important)
id name budget amount
1 null null 112
1 John 1000 145
1 null null 211
2 Kim 3000 null
Is this possible with SQL?
Here a join that repeats the values:
create temporary table a (id1 int,name varchar(10),budget int);
insert into a (id1,name,budget) values(1,'Maier',1000),(2,'Mueller',2000);
create temporary table if not exists b (id2 int,betrag int);
insert into b (id2,betrag) values(1,100),(1,133),(1,234);
select * from a left join b
on a.id1=b.id2
;
The keyword DISTINCT is used to eliminate duplicate rows from a query result:
select distinct b.id, b.name, b.budget, s.amount
from budgets b left join spendings s
on b.id = s.id;
You can also use Group By clause which works similarly like Distinct.In that case,
select b.id, b.name, b.budget, s.amount
from budgets b left join spendings s
on b.id = s.id
group by b.id, b.name, b.budget, s.amount;
create table a (id1 int,name varchar(10),budget int)
insert into a (id1,name,budget) values(1,'Maier',1000)
insert into a (id1,name,budget) values(2,'Mueller',2000)
create table b (id2 int,betrag int)
insert into b (id2,betrag) values(1,100)
insert into b (id2,betrag) values(1,133)
insert into b (id2,betrag) values(1,234)
insert into b (id2,betrag) values(2,300)
insert into b (id2,betrag) values(2,400)
select a.id1, CASE WHEN c.themin IS NOT NULL THEN a.name ELSE NULL END AS [name],
CASE WHEN c.themin IS NOT NULL THEN a.budget ELSE NULL END AS [budget],
b.*
from a
LEFT join b on a.id1=b.id2
LEFT OUTER JOIN (SELECT MIN(betrag) AS [themin], id2 FROM b GROUP BY id2) c ON a.id1 = c.id2 AND b.betrag = c.themin

Is it possible to left join two tables and have the right table supply each row no more than once?

Given this table structure:
Table A
ID AGE EDUCATION
1 23 3
2 25 6
3 22 5
Table B
ID AGE EDUCATION
1 26 4
2 24 6
3 21 3
I want to find all matches between the two tables where the age is within 2 and the education is within 2. However, I do not want to select any row from TableB more than once. Each row in B should be selected 0 or 1 times and each row in A should be selected one or more times (standard left join).
SELECT *
FROM TableA as A LEFT JOIN TableB as B ON
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2
A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION
1 23 3 3 21 3
2 25 6 1 26 4
2 25 6 2 24 6
3 22 5 2 24 6
3 22 5 3 21 3
As you can see, the last two rows in the output have duplicated B.ID of 2 and 3 when compared to the entire result set. I'd like those rows to return as a single null match with A.ID = 3 since they were both matched to previous A values.
Desired output:
(note that for A.ID = 3, there is no match in B because all rows in B have already been joined to rows in A.)
A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION
1 23 3 3 21 3
2 25 6 1 26 4
2 25 6 2 24 6
3 22 5 null null null
I can do this with a short program, but I'd like to solve the problem using a SQL query because it is not for me and I will not have the luxury of ever seeing the data or manipulating the environment.
Any ideas? Thanks
As #Joel Coehoorn said earlier, there has to be a mechanism that selects which pairs of (a,b) with the same (b) are filtered out from the output. SQL is not great on allowing you to select ONE row when multiple match, so a pivot query needs to be created, where you filter out the records you don't want. In this case, filtering can be done by reducing all of match IDs of B as a smallest (or largest, it doesn't really matter), using any function that will return one value from a set, it's just min() and max() are most convenient to use. Once you reduced the result to know which (a,b) pairs you care about, then you join against that result, to pull out the rest of the table data.
select a.id a_id, a.age a_age, a.education a_e,
b.id b_id, b.age b_age, b.education b_e
from a left join
(
SELECT
a.id a_id, min(b.id) b_id from a,b where
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2
group by a.id
) g on a.id = g.a_id
left join b on b.id = g.b_id;
To my knowledge something like this is not possible with a simple select statement and joins because you need to know what has already been selected in order to eliminate duplicates.
You can however try something a little more like this:
DECLARE #JoinResults TABLE
(A_ID INT, A_Age INT, A_Education INT, B_ID INT, B_Age INT, B_Education INT)
INSERT INTO #JoinResults (A_ID, A_Age, A_Education)
SELECT ID, AGE, EDUCATION
FROM TableA
DECLARE #i INT
SET #i = 1
--Assume that A_ID is incremental and no values missed
WHILE (#i < (SELECT Max(A_ID) FROM #JoinResults
BEGIN
UPDATE #JoinResult
SET B_ID = SQ.ID,
B_Age = SQ.AGE,
B_Education = SQ.Education
FROM (
SELECT ID, AGE, EDUCATION
FROM TableB b
WHERE (
abs((SELECT A_Age FROM #JoinResult WHERE A_Id = #i) - AGE) <=2
AND abs((SELECT A_Education FROM #JoinResult WHERE A_Id = #i) - EDUCATION) <=2
) AND (SELECT B_ID FROM #JoinResults WHERE B_ID = b.id) IS NULL
) AS SQ
SET #i = #i + 1
END
SELECT #JoinResults
NOTE: I do not currently have access to a database so this is untested and I am weary of 2 potential issues with it
I am not sure how the update will react if there are no results
I am unsure if the IS NULL check is correct to eliminate the duplicates.
If these issues do arise let me know and I can help troubleshoot.
In SQL-Server, you can use the CROSS APPLY syntax:
SELECT
a.id, a.age, a.education,
b.id AS b_id, b.age AS b_age, b.education AS b_education
FROM tableB AS b
CROSS APPLY
( SELECT TOP (1) a.*
FROM tableA AS a
WHERE ABS(a.age - b.age) <= 2
AND ABS(a.education - b.education) <= 2
ORDER BY a.id -- your choice here
) AS a ;
Depending on the order you choose in the subquery, different rows from tableA will be selected.
Edit (after your update): But the above query will not show rows from A that have no matching rows in B or even some that have but not been selected.
It could also be done with window functions but Access does not have them. Here is a query that I think will work in Access:
SELECT
a.id, a.age, a.education,
s.id AS s_id, s.age AS b_age, s.education AS b_education
FROM tableB AS a
LEFT JOIN
( SELECT
b.id, b.age, b.education, MIN(a.id) AS a_id
FROM tableB AS b
JOIN tableA AS a
ON ABS(a.age - b.age) <= 2
AND ABS(a.education - b.education) <= 2
GROUP BY b.id, b.age, b.education
) AS s
ON a.id = s.a_id ;
I'm not sure if Access allows such a subquery but if it doesn't, you can define it as a "Query" and then use it in another.
Use SELECT DISTINCT
SELECT DISTINCT A.id, A.age, A.education, B.age, B.education
FROM TableA as A LEFT JOIN TableB as B ON
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2

Get records using left outer join

I have two tables as given below
Table A Table B Table C
============= ============== =========
Id Name Id AId CId Id Name
1 A 1 1 1 1 x
2 B 2 1 1 2 y
3 C 3 2 1 3 z
4 D 4 2 3 4 w
5 E 5 3 2 5 v
Now I want all the records of Table A with matching Id column CId from Table B where CId = 1.
So the output should be like below :
Id Name CId
1 A 1
2 B 1
3 C 1
4 D Null
5 E Null
Can anyone help me please?
This does what you want:
SELECT
A.Id,
A.Name,
CASE B.CId WHEN 1 THEN 1 ELSE NULL END AS CId
FROM
A LEFT JOIN B ON A.Id = B.Id
This is not about LEFT JOINing. You could as well do it with an INNER JOIN. When you don't want the 3 and 2 of column CId to appear you would still have to filter with WHERE and therefore the rows with Id 4 and 5 would not appear, which is not what you want.
EDIT:
Given this test data:
create table A (Id int, Name varchar(5));
insert into A values
(1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'D'),
(5, 'E');
create table B (Id int, AId int, CId int);
insert into B values
(1,1,1),
(2,1,1),
(3,2,1),
(4,2,3),
(5,3,2);
my query does not give a cartesian product. Read and try before downvoting. Anyway, it was not clear to me what you want to achieve, now I've joined on AId column and with this query:
SELECT DISTINCT
A.Id,
A.Name
, CASE
WHEN B.CId > 1 THEN 1
WHEN B.CId = 1 THEN 1
ELSE NULL END AS CId
FROM
A LEFT JOIN B ON A.Id = B.AId
and it also gives the right output, like the first before. If this is still not what you want, your test data is wrong or I absolutely don't get it.
Try something like this:
SELECT TableA.Id, TableA.Name, TableB.CId
FROM TableA
LEFT OUTER JOIN TableB ON TableA.Id = TableB.CId
WHERE TableB.CId = 1
Hope this helps.
Edit:
The output you desired, can be achieved if you match TableA's ID column with TableB's ID column, NOT TableB's CId column. Try below which I tested in my pc and gives thee similar output you needed.
select TableA.Id, TableA.Name, TableB.CId
from TableA
left outer join TableB on TableA.Id = TableB.Id
and TableB.CId in
(
select TableB.CId
from TableB
left outer join TableC on TableB.CId = TableC.Id
WHERE TableB.CId = 1
)
group by TableA.Id, TableA.Name, TableB.CId
Please inform if I guess it right. Check the column names.