Multiple joins with group by (Sum)

Multiple joins with group by (Sum) - sql

When I using multiple JOIN, I hope to get the sum of some column in joined tables.
SELECT
A.*,
SUM(C.purchase_price) AS purcchase_total,
SUM(D.sales_price) AS sales_total,
B.user_name
FROM
PROJECT AS A
LEFT JOIN
USER AS B ON A.user_idx = B.user_idx
LEFT JOIN
PURCHASE AS C ON A.project_idx = C.project_idx
LEFT JOIN
SALES AS D ON A.project_idx = D.project_idx
GROUP BY
????

You need to use subquery as follows:
SELECT A.project_idx,
a.project_name,
A.project_category,
sum(C.purchase_price) AS purcchase_total,
sum(D.sales_price) as sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_price
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_price
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
I am not sure but you can use inner join of project with user instead of left join.

SELECT A.project_idx,
a.project_name,
A.project_category,
purcchase_total,
sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_total
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_total
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
This is working correctly on MS-SQL Server.
Thanks to Popeye

You are attempting to aggregate over two unrelated dimensions, and that throws off all the calculations.
Correlated subqueries are an alternative:
SELECT p.*,
(SELECT SUM(pu.purchase_price)
FROM PURCHASE pu
WHERE p.project_idx = pu.project_idx
) as purchase_total,
(SELECT SUM(s.sales_price)
FROM SALES s
WHERE p.project_idx = s.project_idx
) as sales_total,
u.user_name
FROM PROJECT p LEFT JOIN
USER u
ON p.user_idx = u.user_idx ;
Note that this uses meaningful table aliases so the query is easier to read. Arbitrary letters are really no better (and perhaps worse) than using the entire table name.
Correlated subqueries avoid the outer aggregation as well -- and let you select all the columns from the first table, which is what you want. They also often have better performance with the right indexes.

Related

Returning multiple aggregated columns from Subquery

I am trying to extend an existing query by aggregating some rows from another table.
It works when I only return one column like this:
Select DISTINCT
Contracts.id,
Contracts.beginTime,
Contracts.endTime,
Suppliers.name
(SELECT COUNT(p.id) from production as p where p.id_contract = Contracts.id)
FROM Contracts
LEFT JOIN Suppliers on Contracts.id = Suppliers.id_contract
Then I tried to add another column for the aggregated volume:
Select DISTINCT
Contracts.id,
Contracts.beginTime,
Contracts.endTime,
Suppliers.name
(SELECT COUNT(p.id), SUM(p.volume) from production as p where p.id_contract = Contracts.id)
FROM Contracts
LEFT JOIN Suppliers on Contracts.id = Suppliers.id_contract
However, this returns the following error:
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
I experimented a bit with the EXISTS keyword, but couldn't figure out how to make it work. Also I'm not sure whether this is the way to go in my case.
The desired output would be like so:
contract1Id, supplierInfoContract1, nrItemsContract1, sumVolumeContract1
contract2Id, supplierInfoContract2, nrItemsContract2, sumVolumeContract2

Instead of using DISTINCT and subqueries, use GROUP BY and normal joins to get the aggregates. And always use aliases, it will make your life easier:
SELECT
c.id,
c.beginTime,
c.endTime,
s.name,
COUNT(p.id) prod_count,
SUM(p.volume) prod_vol
FROM Contracts c
LEFT JOIN production p on p.id_contract = c.id
LEFT JOIN Suppliers s on c.id = s.id_contract
GROUP BY c.id, c.beginTime, c.endTime, s.name;
Another option is to APPLY the grouped up subquery:
SELECT DISTINCT
c.id,
c.beginTime,
c.endTime,
s.name,
p.prod_count,
p.prod_vol
FROM Contracts c
LEFT JOIN Suppliers s on c.id = s.id_contract
OUTER APPLY (
SELECT
COUNT(p.id) prod_count,
SUM(p.volume) prod_vol
FROM production p WHERE p.id_contract = c.id
GROUP BY ()
) p;
You can also use CROSS APPLY and leave out the GROUP BY (), this uses a scalar aggregate and returns 0 instead of null for no rows.
One last point: DISTINCT in a joined query is a bit of a code smell, it usually indicates the query writer wasn't thinking too hard about what the joined tables returned, and just wanted to get rid of duplicate rows.

You should use it like below:
Select DISTINCT Contracts.id, Contracts.beginTime, Contracts.endTime, Suppliers.name
(SELECT COUNT(p.id) from production as p where p.id_contract = Contracts.id) as CNT,
(SELECT SUM(p.volume) from production as p where p.id_contract = Contracts.id) as VOLUME
FROM Contracts
LEFT JOIN Suppliers on Contracts.id = Suppliers.id_contract

I guess you can try to rework your query as
SELECT X.ID,X.beginTime,X.endTime,X.name,CR.CNTT,CR.TOTAL_VOLUME
FROM
(
Select DISTINCT Contracts.id, Contracts.beginTime, Contracts.endTime, Suppliers.name
FROM Contracts
LEFT JOIN Suppliers on Contracts.id = Suppliers.id_contract
)X
CROSS APPLY
(
SELECT COUNT(p.id)AS CNTT,SUM(p.volume) AS TOTAL_VOLUME
from production as p where p.id_contract = X.id
)CR

I reworked you query slightly by separating the subqueries.
Select DISTINCT
Contracts.id,
Contracts.beginTime,
Contracts.endTime,
Suppliers.name,
(SELECT COUNT(p.id) from production as p where p.id_contract = Contracts.id) count_id,
(SELECT SUM(p.volume) from production as p where p.id_contract = Contracts.id) sum_volume
FROM Contracts
LEFT JOIN Suppliers on Contracts.id = Suppliers.id_contract

SQL Server Circular Query

I have 4 tables, in that I want to fetch records from all 4 and aggregate the values
I have these tables
I am expecting this output
but getting this output as a Cartesian product
It is multiplying the expenses and allocation
Here is my query
select
a.NAME, b.P_NAME,
sum(a.DURATION) DURATION,
sum(b.[EXP]) EXPEN
from
(select
e.ID, a.P_ID, e.NAME, a.DURATION DURATION
from
EMPLOYEE e
inner join
ALLOCATION a ON e.ID = a.E_ID) a
inner join
(select
p.P_ID, e.E_ID, p.P_NAME, e.amt [EXP]
from
PROJECT p
inner join
EXPENSES e ON p.P_ID = e.P_ID) b ON a.ID = b.E_ID
and a.P_ID = b.P_ID
group by
a.NAME, b.P_NAME
Can anyone suggest something about this.

The following should work:
SELECT e.Name,p.Name,COALESCE(d.Duration,0),COALESCE(exp.Expen,0)
FROM
Employee e
CROSS JOIN
Project p
LEFT JOIN
(SELECT E_ID,P_ID,SUM(Duration) as Duration FROM Allocation
GROUP BY E_ID,P_ID) d
ON
e.E_ID = d.E_ID and
p.P_ID = d.P_ID
LEFT JOIN
(SELECT E_ID,P_ID,SUM(AMT) as Expen FROM Expenses
GROUP BY E_ID,P_ID) exp
ON
e.E_ID = exp.E_ID and
p.P_ID = exp.P_ID
WHERE
d.E_ID is not null or
exp.E_ID is not null
I've tried to write a query that will produce results where e.g. there are rows in Expenses but no rows in Allocations (or vice versa) for some particular E_ID,P_ID combination.

Use left join in select query by passing common id for all table

Hi I got the answer what I want from some modification in the query
The above query is also working like a charm and have done some modification to the original query and got the answer
Just have to group by the inner queries and then join the queries it will then not showing Cartesian product
Here is the updated one
select a.NAME,b.P_NAME,sum(a.DURATION) DURATION,sum(b.[EXP]) EXPEN from
(select e.ID,a.P_ID, e.NAME,sum(a.DURATION) DURATION from EMPLOYEE e inner join ALLOCATION a
ON e.ID=a.E_ID group by e.ID,e.NAME,a.P_ID) a
inner join
(select p.P_ID,e.E_ID, p.P_NAME,sum(e.amt) [EXP] from PROJECT p inner join EXPENSES e
ON p.P_ID=e.P_ID group by p.P_ID,p.P_NAME,e.E_ID) b
ON a.ID=b.e_ID and a.P_ID=b.P_ID group by a.NAME,b.P_NAME
Showing the correct output

Query extensibility with WHERE EXISTS with a large table

The following query is designed to find the number of people who went to a hospital, the total number of people who went to a hospital and the divide those two to find a percentage. The table Claims is two million plus rows and does have the correct non-clustered index of patientid, admissiondate, and dischargdate. The query runs quickly enough but I'm interested in how I could make it more usable. I would like to be able to add another code in the line where (hcpcs.hcpcs ='97001') and have the change in percentRehabNotHomeHealth be relfected in another column. Is there possible without writing a big, fat join statement where I join the results of the two queries together? I know that by adding the extra column the math won't look right, but I'm not worried about that at the moment. desired sample output: http://imgur.com/BCLrd
database schema
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join
--this join adds the hospitalCounts column
(
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
) as t on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select distinct p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10

You might look into CTE (Common Table Expressions) to get what you need. It would allow you to get summarized data and join that back to the detail on a common key. As an example I modified your join on the subquery to be a CTE.
;with hospitalCounts as (
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
)
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join hospitalCounts on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10

Using sum with a nested select

I'm using SQL Server. This statement lists my products per menu:
SELECT menuname, productname
FROM [web].[dbo].[tblMenus]
FULL OUTER JOIN [web].[dbo].[tblProductsRelMenus]
ON [tblMenus].Id = [tblProductsRelMenus].MenuId
FULL OUTER JOIN [web].[dbo].[tblProducts]
ON [tblProductsRelMenus].ProductId = [tblProducts].ProductId
LEFT JOIN [web].[dbo].[tblOrderDetails]
ON ([tblProducts].Id = [tblOrderDetails].ProductId)
GROUP BY [tblProducts].ProductName
Some products don't have menus and vice versa. I use the following to establish what has been sold of each product.
SELECT [tblProducts].ProductName, SUM([tblOrderDetails].Ammount) as amount
FROM [web].[dbo].[tblProducts]
LEFT JOIN [web].[dbo].[tblOrderDetails]
ON ([tblProducts].ProductId = [tblOrderDetails].ProductId)
GROUP BY [tblProducts].ProductName
What I want to do is complement the top table with an amount column. That is, I want a table with the same number of rows as in the first table above but with an amount value if it exists, otherwise null.
I can't figure out how to do this. Any suggestions?

If I am not missing anything, the second query could be simplified, then incorporated into the first query like this:
SELECT
m.menuname,
p.productname,
t.amount
FROM [web].[dbo].[tblMenus] m
FULL JOIN [web].[dbo].[tblProductsRelMenus] pm ON m.Id = pm.MenuId
FULL JOIN [web].[dbo].[tblProducts] p ON pm.ProductId = p.ProductId
LEFT JOIN (
SELECT ProductId, SUM(Amount) as amount
FROM [web].[dbo].[tblOrderDetails]
GROUP BY ProductId
) t ON p.ProducId = t.ProductId

SQL Inner join division

I have issue with my inner join division below. From my oracle, it keep prompt me missing right parenthesis when I have already close it. I'll need to get the names of the patient who have collected all items.
Select P.name
From ((((Select Patientid From Patient) As P
Inner Join (Select Accountno, Patientid From Account) As A1
on P.PatientID = A1.PatientID)
Inner Join (Select Accountno, Itemno From AccountType) As Al
On A1.Accountno = Al.Accountno)
Inner Join (Select Itemno From Item) As I
On Al.Itemno = I.Itemno)
Group By Al.Itemno
Having Count(*) >= (Select Count(*) FROM AccountType);

Here's a simpler approach that I believe is essentially equivalent:
select a.name
from Patient a
inner join Account b on a.PatientID = b.PatientID
inner join AccountType c on b.Accountno = c.Accountno
inner join Item d on c.Itemno = d.Itemno
group by c.Accountno, a.name
having Count(*) >= (Select Count(*) FROM AccountType);
This approach is a bit simpler. It has the added benefit of being much more likely to use indexes on the tables -- if you do joins between what are essentially 'join tables' in memory, you don't get the benefit of the indexes that exist for the physical tables in memory.
I also usually alias table names using sequential letters -- 'a', 'b', 'c', 'd' as you can see. I find that when I'm writing complicated queries it makes it easier for me to follow. 'a' is the first table in the join, 'b' is the second, etc.

It sounds like you just want
SELECT p.name
FROM patient p
INNER JOIN account a ON (a.patientID = p.patientID)
INNER JOIN accountType accTyp ON (accTyp.accountNo = a.accountNo)
INNER JOIN item i ON (i.itemNo = accTyp.itemNo)
GROUP BY accTyp.itemNo
HAVING COUNT(*) = (SELECT COUNT(*)
FROM accountType);
Note that having an alias of A1 and an alias of Al is quite confusing. You want to pick more meaningful and more distinguishing aliases.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple joins with group by (Sum) - sql

Related

Returning multiple aggregated columns from Subquery

SQL Server Circular Query

Query extensibility with WHERE EXISTS with a large table

Using sum with a nested select

SQL Inner join division

Categories

Resources