SQL 2 Left outer joins with Sum and Group By - sql

Looking for some guidance on this. I am attempting to run a report in my complaint management system.. Complaints by Year, Location, Subcategory, Showing Totals for TotalCredits (child table) and TotalsCwts (childtable) as well as total ExternalRootCause (on master table).
This is my SQL, but the TotalCwts and TotalCredits are not being calculated correctly. It calculates 1 time for each child record rather than the total for each master record.
SELECT
dbo.Complaints.Location,
YEAR(dbo.Complaints.ComDate) AS Year,
dbo.Complaints.ComplaintSubcategory,
COUNT(Distinct(dbo.Complaints.ComId)) AS CustomerComplaints,
SUM(DISTINCT CASE WHEN (dbo.Complaints.RootCauseSource = 'External' ) THEN 1 ELSE 0 END) as ExternalRootCause,
SUM(dbo.ComplaintProducts.Cwts) AS TotalCwts,
Coalesce(SUM(dbo.CreditDeductions.CreditAmount),0) AS TotalCredits
FROM dbo.Complaints
JOIN dbo.CustomerComplaints
ON dbo.Complaints.ComId = dbo.CustomerComplaints.ComId
LEFT OUTER JOIN dbo.CreditDeductions
ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts
ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
WHERE
dbo.Complaints.Location = Coalesce(#Location,Location)
GROUP BY
YEAR(dbo.Complaints.ComDate),
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
ORDER BY
[YEAR] desc,
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
Data Results
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 8 | 8.00
Data Should Read
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 4 | 2.00
Above data reflects 1 complaint having 4 Product Records with 1cwt each and 2 credit records with 1.00 each.
What do I need to change in my query or should I approach this query a different way?

The problem is that the 1 complaint has 2 Deductions and 4 products. When you join in this manner then it will return every combination of Deduction/Product for the complaint which gives 8 rows as you're seeing.
One solution, which should work here, is to not query the Dedustion and Product tables directly; query a query which returns one row per table per complaint. In other words, replace:
LEFT OUTER JOIN dbo.CreditDeductions ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
...with this - showing the Deductions table only, you can work out the Products:
LEFT OUTER JOIN (
select ComId, count(*) CountDeductions, sum(CreditAmount) CreditAmount
from dbo.CreditDeductions
group by ComId
) d on d.ComId = Complaints.ComId
You'll have to change the references to dbo.CreditDedustions to just d (or whatever you want to call it).
Once you've done them both then you'll one each per complaint, which will result with 1 row per complaint contaoining the counts and totals from the two sub-tables.

Related

Create multiple filtered result sets of a joined table for use in aggregate functions

I have a (heavily simplified) orders table, total being the dollar amount, containing:
| id | client_id | type | total |
|----|-----------|--------|-------|
| 1 | 1 | sale | 100 |
| 2 | 1 | refund | 100 |
| 3 | 1 | refund | 100 |
And clients table containing:
| id | name |
|----|------|
| 1 | test |
I am attempting to create a breakdown, by client, metrics about the total number of sales, refunds, sum of sales, sum of refunds etc.
To do this, I am querying the clients table and joining the orders table. The orders table contains both sales and refunds, specified by the type column.
My idea was to join the orders twice using subqueries and create aliases for those filtered tables. The aliases would then be used in aggregate functions to find the sum, average etc. I have tried many variations of joining the orders table twice to achieve this but it produces the same incorrect results. This query demonstrates this idea:
SELECT
clients.*,
SUM(sales.total) as total_sales,
SUM(refunds.total) as total_refunds,
AVG(sales.total) as avg_ticket,
COUNT(sales.*) as num_of_sales
FROM clients
LEFT JOIN (SELECT * FROM orders WHERE type = 'sale') as sales
ON sales.client_id = clients.id
LEFT JOIN (SELECT * FROM orders WHERE type = 'refund') as refunds
ON refunds.client_id = clients.id
GROUP BY clients.id
Result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 200 | 200 | 100 | 2 |
Expected result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 100 | 200 | 100 | 1 |
When the second join is included in the query, the rows returned from the first join are returned again with the second join. They are multiplied by the number of rows in the second join. It's clear my understanding of joining and/or subqueries is incomplete.
I understand that I can filter the orders table with each aggregate function. This produces correct results but seems inefficient:
SELECT
clients.*,
SUM(orders.total) FILTER (WHERE type = 'sale') as total_sales,
SUM(orders.total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(orders.total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(orders.*) FILTER (WHERE type = 'sale') as num_of_sales
FROM clients
LEFT JOIN orders
on orders.client_id = clients.id
GROUP BY clients.id
What is the appropriate way to created filtered and aliased versions of this joined table?
Also, what exactly is happening with my initial query where the two subqueries are joined. I would expect them to be treated as separate subsets even though they are operating on the same (orders) table.
You should do the (filtered) aggregation once for all aggregates you want, and then join to the result of that. As your aggregation doesn't need any columns from the clients table, this can be done in a derived table. This is also typically faster than grouping the result of the join.
SELECT clients.*,
o.total_sales,
o.total_refunds,
o.avg_ticket,
o.num_of_sales
FROM clients
LEFT JOIN (
select client_id,
SUM(total) FILTER (WHERE type = 'sale') as total_sales,
SUM(total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(*) FILTER (WHERE type = 'sale') as num_of_sales
from orders
group by client_id
) o on o.client_id = clients.id

INNER JOIN Need to use column value twice in results

I've put in the requisite 2+ hours of digging and not getting an answer.
I'd like to merge 3 SQL tables, where Table A and B share a column in common, and Table B and C share a column in common--Tables A and C do not.
For example:
Table A - entity_list
entity_id | entity_name | Other, irrelevant columns
Example:
1 | Microsoft |
2 | Google |
Table B - transaction_history
transaction_id | purchasing_entity | supplying_entity | other, irrelevant columns
Example:
1 | 2 | 1
Table C - transaction_details
transactional_id | amount_of_purchase | Other, irrelevant columns
1 | 5000000 |
Using INNER JOIN, I've been able to get a result where I can link entity_name to either purchasing_entity or supplying_entity. And then, in the results, rather than seeing the entity_id, I get the entity name. But I want to substitute the entity name for both purchasing and supplying entity.
My ideal results would look like this:
1 [transaction ID] | Microsoft | Google | 5000000
The closes I've come is:
1 [transaction ID] | Microsoft | 2 [Supplying Entity] | 5000000
To get there, I've done:
SELECT transaction_history.transaction_id,
entity_list.entity_name,
transaction_history.supplying_entity,
transaction_details.amount_of_purchase
FROM transaction.history
INNER JOIN entity_list
ON transaction_history.purchasing_entity=entity_list.entity.id
INNER JOIN
ON transaction_history.transaction_id=transaction_details.transaction_id
I can't get entity_name to feed to both purchasing_entity and supplying_entity.
Here is the query:
SELECT h.transaction_id, h.purchasing_entity, purchaser.entity_name, h.supplying_entity, supplier.entity_name, d.amount_of_purchase
FROM transaction_history h
INNER JOIN transaction_details d
ON h.transaction_id = d.transaction_id
INNER JOIN entity_list purchaser
ON h.purchasing_entity = purchaser.entity_id
INNER JOIN entity_list supplier
ON h.supplying_entity = supplier.entity_id

SQL select only highest date

For a project I want to generate a price list.
I want to get only the latest prices from each supplier for each article.
There are just those two tables.
Table articles
ARTNR | TXT | ACTIVE | SUPPLIER
------------------------------------------
10 | APPLE | Y | 10
20 | ORANGE | Y | 10
30 | KEYBOARD | N | 20
40 | ORANGE | Y | 20
50 | BANANA | Y | 10
60 | CHERRY | Y | 10
Table prices
ARTNR | PRCGRP | PRCDAT | PRICE
--------------------------------------
10 | 10 | 01-Aug-10 | 2.1
10 | 10 | 05-Aug-11 | 2.2
10 | 10 | 21-Aug-12 | 2.5
20 | 0 | 01-Aug-10 | 2.1
20 | 10 | 09-Aug-12 | 2.3
10 | 10 | 14-Aug-13 | 2.7
This is what I have so far:
SELECT
ARTICLES.[ARTNR], ARTICLES.[TXT], ARTICLES.[ACTIVE], ARTICLES.[SUPPLIER], PRICES.PRCGRP, PRICES.PRCDAT, PRICES.PRICE
FROM
ARTICLES INNER JOIN PRICES ON ARTICLES.ARTNR = PRICES.ARTNR
WHERE
(
(ARTICLES.[ACTIVE]="Y") AND
(ARTICLES.[SUPPLIER]=10) AND
(PRICES.PRCGRP=0) AND
(PRICES.PRCDAT=(SELECT MAX(PRCDAT) FROM PRICES as art WHERE art.ARTNR = PRICES.artnr) )
)
ORDER BY ARTICLES.ARTNR
;
It is okay to choose just one supplier each time, but I want the max price.
The problem is:
Lots of articles do not show up with the query above,
but I cannot figure out what is wrong.
I can see that they should be in the resultset when I leave out the subselect on max prcdat.
What is wrong?
Your subquery to get the latest price does not take the other conditions into account, that is when you're getting the latest price, you may get a price in another price group or that is not active. When you join that against the filtered list that has no inactive prices and only prices in a single price group, you get no hits that exist in both.
Either you need to duplicate or - better - move your conditions inside the subquery to get the best price under the conditions. I can't test against access, but something like this should be possible if the SQL is not too limited;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
JOIN (
SELECT a.artnr, MAX(p.prcdat) prcdat
FROM articles a JOIN prices p ON a.artnr = p.artnr
WHERE a.active='Y' AND a.supplier=10 AND p.prcgrp=10
GROUP BY a.artnr) z
ON a.artnr = z.artnr AND p.prcdat = z.prcdat
ORDER BY a.ARTNR
If the SQL support in access won't allow a join with a subquery, you can just move the conditions inside your existing subquery, something like;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
WHERE p.prcdat = (
SELECT MAX(p2.prcdat)
FROM articles a2 JOIN prices p2 ON a2.artnr = p2.artnr
WHERE a.artnr = a2.artnr AND a2.active='Y' AND a2.supplier=10 AND p2.prcgrp=10
)
ORDER BY a.ARTNR;
Note that due to limitations in identifying a unique price (no primary key in prices), the queries may give duplicates if several prices for the same article have the same prcdat. If that's a problem, you'll probably need to duplicate your conditions outside the subquery too.

Perform right outer join with a condition for left table

I have two tables,
Student:
rollno | name
1 | Abc
2 | efg
3 | hij
4 | klm
Attendance:
name | date |status
Abc | 10-10-2013 | A
efg | 10-10-2013 | A
Abc | 11-10-2013 | A
hij | 25-10-2013 | A
My required output is:
Some query with where condition as "where date between '10-09-2013' and '13-10-2013' "
rollno| name |count
1 | Abc | 2
2 | efg | 1
3 | hij | 0
4 | klm | 0
I tried using:
SELECT p.rollno,p.name,case when s.statuss='A' then COUNT(p.rollno) else '0' end as count
from attendance s
right outer join student p
on s.rollno=p.rollno
where s.date between '10-09-2013' and '13-10-2013'
group by p.rollno,p.regno,p.name,s.statuss
order by p.rollno
And the Output is:
rollno| name |count
1 | Abc | 2
2 | efg | 1
I want the remaining values from the student table to also be appended. I have tried many different queries, but all have been unsuccessful. Is there a query that will return the required output above?
You need to move the criteria from the where to the join:
SELECT p.rollno,p.name,case when s.statuss='A' then COUNT(p.rollno) else 0 end as count
from attendance s
right outer join student p
on s.rollno=p.rollno
and s.date between '10-09-2013' and '13-10-2013'
group by p.rollno,p.regno,p.name,s.statuss
order by p.rollno;
At the moment even though you have an outer join, by referring to the outer table in the where clause you effectively turn it into an inner join. Where there is no match in attendance, s.Date will be NULL, and because NULL is not between '10-09-2013' and '13-10-2013' the rows are excluded.
It is not apparent from the question, but I would image that what you are actually looking for is this. It appears you are just after a count of entries in attendance where status = 'A' by student:
SELECT p.rollno,
p.name,
COUNT(s.statuss) as count
from attendance s
right outer join student p
on s.rollno=p.rollno
and s.date between '10-09-2013' and '13-10-2013'
AND s.statuss = 'A'
group by p.rollno,p.regno,p.name,
order by p.rollno;
I have removed s.statuss from the group by, and changed the count so that there is only one row per student, rather than one row per status per student. I have changed the column within the count to a column in the attendance status table, to ensure that you get a count of 0 when there are no entries in attendance. if you use a column in students you will get a count of 1 even when there are no entries. Finally, since you are only interested in entries with statuss = 'A' I have also moved this to the join condition.
On one final note, it is advisable when using strings for dates to use the culture insensitive format yyyyMMdd, as this is completely unanbiguous, 20130201' is always the 1st February, and never 2nd January, whereas in your query10-09-2013' could be 10th September, or 9th October, depending on your settings.

Summing cost by id that appears on multiple rows

SOLUTION
I solved it by simple doing the following.
SELECT table_size, sum(cost) as total_cost, sum(num_players) as num_players
FROM
(
SELECT table_size, cost, sum(tp.uid) as num_players
FROM tournament as t
LEFT JOIN takes_part AS tp ON tp.tid = t.tid
LEFT JOIN users as u on u.uid = tp.tid
JOIN attributes as a on a.aid = t.attrId
GROUP BY t.tid
) as res
GROUP BY table_size
I wasn't sure it would work, what with the other aggregate functions that I had to use in my real sql, but it seems to be working ok. There may be problems in the future if I want to do other kind of calculations, for instance do a COUNT(DISTINCT tp.uid) over all tournaments. Still, in this case that is not all that important so I am satisfied for now. Thank you all for your help.
UPDATE!!!
Here is a Fiddle that explains the problem:
http://www.sqlfiddle.com/#!2/e03ff/7
I want to get:
table_size | cost
-------------------------------
5 | 110
8 | 80
OLD POST
I'm sure that there is an easy solution to this that I'm just not seeing, but I can't seem to find a solution to it anywhere. What I'm trying to do is the following:
I need to sum 'costs' per tournament in a system. For other reasons, I've had to join with lots of other tables, making the same cost appear on multiple rows, like so:
id | name | cost | (hidden_id)
-----------------------------
0 | Abc | 100 | 1
1 | ASD | 100 | 1
2 | Das | 100 | 1
3 | Ads | 50 | 2
4 | Ads | 50 | 2
5 | Fsd | 0 | 3
6 | Ads | 0 | 3
7 | Dsa | 0 | 3
The costs in the table above are linked to an id value that is not necessary selected in by the SQL (this depends on what the user decides at runtime). What I want to get, is the sum 100+50+0 = 150. Of course, if I just use SUM(cost) I will get a different answer. I tried using SUM(cost)/COUNT(*)*COUNT(tourney_ids) but this only gives correct result under certain circumstances. A (very) simple form of query looks like this:
SELECT SUM(cost) as tot_cost -- This will not work as it sums all rows where the sum appears.
FROM t
JOIN ta ON t.attr_id = ta.toaid
JOIN tr ON tr.toid = t.toid -- This row will cause multiple rows with same cost
GROUP BY *selected by user* -- This row enables the user to group by several attributes, such as weekday, hour or ids of different kinds.
UPDATE. A more correct SQL-query, perhaps:
SELECT
*some way to sum cost*
FROM tournament AS t
JOIN attribute AS ta ON t.attr_id = ta.toaid
JOIN registration AS tr ON tr.tourneyId = t.tourneyId
INNER JOIN pokerstuff as ga ON ta.game_attr_id = ga.gameId
LEFT JOIN people AS p ON p.userId = tr.userId
LEFT JOIN parttaking AS jlt ON (jlt.tourneyId = t.tourneyId AND tr.userId = jlt.userId)
LEFT JOIN (
SELECT t.tourneyId,
ta.a - (ta.b) - sum(c)*ta.cost AS cost
FROM tournament as t
JOIN attribute as ta ON (t.attr_id = ta.toaid)
JOIN registration tr ON (tr.tourneyId = t.tourneyId)
GROUP BY t.tourneyId, ta.b, ta.a
) as o on t.tourneyId = o.tourneyId
AND whereConditions
GROUP BY groupBySql
Description of the tables
tournament (tourneyId, name, attributeId)
attributes (attributeId, ..., gameid)
registration (userId, tourneyId, ...)
pokerstuff(gameid,...)
people(userId,...)
parttaking(userId, tourneyId,...)
Let's assume that we have the following (cost is actually calculated in a subquery, but since it's tied to tournament, I will treat it as an attribute here):
tournament:
tourneyId | name | cost
1 | MyTournament | 50
2 | MyTournament | 80
and
userId | tourneyId
1 | 1
2 | 1
3 | 1
4 | 1
1 | 2
4 | 2
The problem is rather simple. I need to be able to get the sum of the costs of the tournaments without counting a tournament more than once. The sum (and all other aggregates) will be dynamically grouped by the user.
A big problem is that many solutions that I've tried (such as SUM OVER...) would require that I group by certain attributes, and that I cannot do. The group by-clause must be completely decided by the user. The sum of the cost should sum over any group-by attributes, the only problem is of course the multiple rows in which the sum appears.
Do anyone of you have any good hints on what can be done?
Try the following:
select *selected by user*, sum(case rownum when 1 then a.cost end)
from
(
select
*selected by user*, cost,
row_number() over (partition by t.tid) as rownum
FROM t
JOIN ta ON t.attr_id = ta.toaid
JOIN tr ON tr.toid = t.toid
) a
group by *selected by user*
The row_number is used to number each row with the same tournament row. When suming the costs we only consider those rows with a rownum of 1. All other rows are duplicates of this one with regards to the costs.
In terms of the fiddle:
select table_size, sum(case rownum when 1 then a.cost end)
from
(
SELECT
table_size, cost,
row_number() over (partition by t.tid) as rownum
FROM tournament as t
LEFT JOIN takes_part AS tp ON tp.tid = t.tid
LEFT JOIN users as u on u.uid = tp.tid
JOIN attributes as a on a.aid = t.attrId
) a
group by table_size
As the repeated costs are the same each time you can average them by their hidden id and do something like this:
WITH MrTable AS (
SELECT DISTINCT hidden_id, AVG(cost) OVER (PARTITION BY hidden_id) AS cost
FROM stuff
)
SELECT SUM(cost) FROM MrTable;
(Updated) Given that the cost currently returned is the total cost per tournament, you could include a fractional value of cost on each line of an inner select, such that the total of all those values adds up to the total cost (allowing for the fact that each given tournament's values may be appearing multiple times), then sum that fractional cost in your outer select, like so:
select table_size, sum(frac_cost) as agg_cost from
(SELECT a.table_size , cost / count(*) over (partition by t.tid) as frac_cost
FROM tournament as t
LEFT JOIN takes_part AS tp ON tp.tid = t.tid
LEFT JOIN users as u on u.uid = tp.uid
JOIN attributes as a on a.aid = t.attrId) sq
GROUP BY table_size
SQLFiddle here.