Using CTE to count number of rows in inner query - sql

I'm learning CTE and I've encounter an exercise on I cannot solve. It is not a homework, but an exercise from an online course I've taken to learn SQL. I'm interested in where I've made a mistake and some explanation so answering with only the correct code will not help me to learn CTE.
The task is to count projects that raised 100% to 150% of the minimum amount, and those that raised more than 150%.
I've written the following CTE:
WITH nice_proj AS
(SELECT project_id AS pid,
amount AS amount,
minimal_amount AS minimal
FROM donation d
INNER JOIN project p ON (d.project_id = p.id)
GROUP BY pid,
minimal,
amount
HAVING sum(amount) >= minimal_amount)
SELECT count(*) AS COUNT,
(CASE
WHEN sum(amount)/minimal <=1.5 THEN 'good projects'
ELSE 'great projects'
END) AS tag
FROM nice_proj
GROUP BY minimal;
The query returns nothing but it should produce something similar to:
+-------+----------------+
| count | tag |
+-------+----------------+
| 16 | good projects |
+-------+----------------+
| 7 | great projects |
+-------+----------------+
Please have a look at the tables (they are truncated):
donation
+----+------------+--------------+---------+------------+------------+
| id | project_id | supporter_id | amount | amount_eur | donated |
+----+------------+--------------+---------+------------+------------+
| 1 | 4 | 4 | 928.40 | 807.70 | 2016-09-07 |
+----+------------+--------------+---------+------------+------------+
| 2 | 8 | 18 | 384.38 | 334.41 | 2016-12-16 |
+----+------------+--------------+---------+------------+------------+
| 3 | 6 | 12 | 367.21 | 319.47 | 2016-01-21 |
+----+------------+--------------+---------+------------+------------+
| 4 | 2 | 19 | 108.62 | 94.50 | 2016-12-29 |
+----+------------+--------------+---------+------------+------------+
| 5 | 10 | 20 | 842.58 | 733.05 | 2016-11-30 |
+----+------------+--------------+---------+------------+------------+
| 6 | 4 | 15 | 653.76 | 568.77 | 2016-08-05 |
+----+------------+--------------+---------+------------+------------+
| 7 | 4 | 14 | 746.52 | 649.48 | 2016-08-03 |
+----+------------+--------------+---------+------------+------------+
| 8 | 10 | 3 | 962.36 | 837.25 | 2016-10-30 |
+----+------------+--------------+---------+------------+------------+
| 9 | 1 | 20 | 764.05 | 664.72 | 2016-08-24 |
+----+------------+--------------+---------+------------+------------+
| 10 | 10 | 4 | 1033.42 | 899.08 | 2016-02-26 |
+----+------------+--------------+---------+------------+------------+
| 11 | 5 | 6 | 571.90 | 497.55 | 2016-10-06 |
+----+------------+--------------+---------+------------+------------+
project
+----+------------+-----------+----------------+
| id | category | author_id | minimal_amount |
+----+------------+-----------+----------------+
| 1 | music | 1 | 1677 |
+----+------------+-----------+----------------+
| 2 | music | 5 | 21573 |
+----+------------+-----------+----------------+
| 3 | travelling | 2 | 4952 |
+----+------------+-----------+----------------+
| 4 | travelling | 5 | 3135 |
+----+------------+-----------+----------------+
| 5 | travelling | 2 | 8555 |
+----+------------+-----------+----------------+
| 6 | video | 4 | 6835 |
+----+------------+-----------+----------------+
| 7 | video | 4 | 7978 |
+----+------------+-----------+----------------+
| 8 | games | 1 | 4560 |
+----+------------+-----------+----------------+
| 9 | games | 2 | 4259 |
+----+------------+-----------+----------------+
| 10 | games | 1 | 5253 |
+----+------------+-----------+----------------+

My advice is to aggregate the donations table first, then compare it to the project table.
By doing this the join between donations and project is always 1:1. This in turn means you avoid having to group by "values" (minimal_amount), instead only grouping by "identifiers" (project_id).
WITH
donation_summary AS
(
SELECT
project_id,
SUM(amount) AS total_amount
FROM
donation
GROUP BY
project_id
)
SELECT
CASE WHEN d.total_amount <= p.minimal_amount * 1.5
THEN 'good projects'
ELSE 'great projects'
END
AS tag,
COUNT(*) AS project_count
FROM
donation_summary AS d
INNER JOIN
project AS p
ON p.id = d.project_id
WHERE
d.total_amount >= p.minimal_amount
GROUP BY
tag
That said, I'd normally use the following final query and get two columns rather than two rows...
SELECT
SUM(CASE WHEN d.total_amount <= p.minimal_amount * 1.5 THEN 1 ELSE 0 END) AS good_projects,
SUM(CASE WHEN d.total_amount > p.minimal_amount * 1.5 THEN 1 ELSE 0 END) AS great_projects
FROM
donation_summary AS d
INNER JOIN
project AS p
ON p.id = d.project_id
WHERE
d.total_amount >= p.minimal_amount

You need to remove amount from the grouping, this should return the expected result:
WITH nice_proj AS
(SELECT project_id AS pid,
sum(amount) AS amount,
minimal_amount AS minimal
FROM donation d
INNER JOIN project p ON (d.project_id = p.id)
GROUP BY pid,
minimal
HAVING sum(amount) >= minimal_amount)
SELECT count(*) AS COUNT,
(CASE
WHEN amount/minimal <=1.5 THEN 'good projects'
ELSE 'great projects'
END) AS tag
FROM nice_proj
GROUP BY tag;

Related

Sum with 3 tables to join

I have 3 tables. The link between the first and the second table is REQ_ID and the link between the second and the third table is ENC_ID. There is no direct link between the first and the third table.
INS_RCPT
+----+--------+------+----------+
| ID | REQ_ID | CURR | RCPT_AMT |
+----+--------+------+----------+
| 1 | 1 | USD | 100 |
| 2 | 2 | USD | 200 |
| 3 | 3 | USD | 300 |
+----+--------+------+----------+
ENC_LOG
+----+--------+--------+-------------+
| ID | REQ_ID | ENC_ID | ENC_LOG_AMT |
+----+--------+--------+-------------+
| 1 | 1 | 1 | 20 |
| 2 | 1 | 2 | 50 |
| 3 | 1 | 3 | 30 |
| 4 | 2 | 4 | 20 |
+----+--------+--------+-------------+
ENC_RCPT
+----+--------+--------------+
| ID | ENC_ID | ENC_RCPT_AMT |
+----+--------+--------------+
| 1 | 1 | 10 |
| 2 | 1 | 10 |
| 3 | 2 | 15 |
| 4 | 2 | 25 |
| 5 | 2 | 10 |
| 6 | 3 | 12 |
| 7 | 3 | 18 |
| 8 | 4 | 10 |
+----+--------+--------------+
I would like to have output as follows:
+----+--------+------+----------+-------------+--------------+
| ID | REQ_ID | CURR | RCPT_AMT | ENC_LOG_AMT | ENC_RCPT_AMT |
+----+--------+------+----------+-------------+--------------+
| 1 | 1 | USD | 100 | 100 | 100 |
| 2 | 2 | USD | 200 | 20 | 10 |
| 3 | 3 | USD | 300 | 0 | 0 |
+----+--------+------+----------+-------------+--------------+
I am using SQL Server to write this query. Any help is appreciated.
One approach would be to join the first table to two subqueries which compute the sums separately:
SELECT
ir.ID,
ir.REQ_ID,
ir.CURR,
ir.RCPT_AMT,
el.ENC_LOG_AMT,
er.ENC_RCPT_AMT
FROM INS_RCPT ir
LEFT JOIN
(
SELECT REQ_ID, SUM(ENC_LOG_AMT) AS ENC_LOG_AMT
FROM ENC_LOG
GROUP BY REQ_ID
) el
ON ir.REQ_ID = el.REQ_ID
LEFT JOIN
(
SELECT t1.REQ_ID, SUM(t2.ENC_RCPT_AMT) AS ENC_RCPT_AMT
FROM ENC_LOG t1
INNER JOIN ENC_RCPT t2 ON t1.ENC_ID = t2.ENC_ID
GROUP BY t1.REQ_ID
) er
ON ir.REQ_ID = er.REQ_ID
Demo
Note that your question includes a curve ball. The second subquery needs to return aggregates of the receipt table by REQ_ID, even though this field does not appear in that table. As a result, we actually need to join ENC_LOG to ENC_RCPT in that subquery, and then aggregate by REQ_ID.
You can try the below query. Also change the join from left to inner as per your requirement.
select a.id,a.req_id,a.curr,sum(a.rcpt_amt) rcpt_amt,sum(a.enc_log_amt) enc_log_amt,sum(c.enc_rcpt_amt) enc_rcpt_amt
from
(
select a.id id ,a.req_id req_id ,a.curr curr,sum(rcpt_amt) as rcpt_amt,sum(enc_log_amt) as enc_log_amt
from ins_rcpt a
left join enc_log b
on a.req_id=b.req_id
group by id,req_id,curr
) a
left join enc_rcpt c
on a.enc_id = c.enc_id
group by id,req_id,curr;

Is there an easier way to find the row with a max value?

I have a schema where these two tables exist (among others)
participation
+------+--------+------------------+
| movie| person | role |
+------+--------+------------------+
| 1 | 1 | "Regisseur" |
| 1 | 1 | "Schauspieler" |
| 1 | 2 | "Schauspielerin" |
| 2 | 3 | "Regisseur" |
| 3 | 4 | "Regisseur" |
| 3 | 5 | "Schauspieler" |
| 3 | 6 | "Schauspieler" |
| 4 | 7 | "Schauspielerin" |
| 4 | 8 | "Schauspieler" |
| 5 | 1 | "Schauspieler" |
| 5 | 8 | "Schauspieler" |
| 5 | 14 | "Schauspieler" |
+------+--------+------------------+
movie
+----+------------------------------+------+-----+
| id | title | year | fsk |
+----+------------------------------+------+-----+
| 1 | "Die Bruecke am Fluss" | 1995 | 12 |
| 2 | "101 Dalmatiner" | 1961 | 0 |
| 3 | "Vernetzt - Johnny Mnemonic" | 1995 | 16 |
| 4 | "Waehrend Du schliefst..." | 1995 | 6 |
| 5 | "Casper" | 1995 | 6 |
| 6 | "French Kiss" | 1995 | 6 |
| 7 | "Stadtgespraech" | 1995 | 12 |
| 8 | "Apollo 13" | 1995 | 6 |
| 9 | "Schlafes Bruder" | 1995 | 12 |
| 10 | "Assassins - Die Killer" | 1995 | 16 |
| 11 | "Braveheart" | 1995 | 16 |
| 12 | "Das Netz" | 1995 | 12 |
| 13 | "Free Willy 2" | 1995 | 6 |
+----+------------------------------+------+-----+
I want to get the movie with the highest number of people that participated. I figured out an SQL statement that actually does this, but looks super complicated. It looks like this:
SELECT titel
FROM movie.movie
JOIN (SELECT *
FROM (SELECT Max(count_person) AS max_count_person
FROM (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons) AS
maxCountPersons
JOIN (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons
ON maxCountPersons.max_count_person =
countPersons.count_person)
AS maxPersonsmovie
ON maxPersonsmovie.movie = movie.id
The main problem is, that I can't find an easier way to select the row with the highest value. If I simply could make a selection on the inner table and pick the row with the highest value on count_person without losing the information about the movie itself, this would look so much simpler. Is there a way to simplify this, or is this really the easiest way to do this?
Here is a way without subqueries:
SELECT m.title
FROM movie.movie m JOIN
movie.participation p
ON m.id = p.movie
GROUP BY m.title
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
You can use LIMIT 1 instead of FETCH, if you prefer.
Note: In the event of ties, this only returns one value. That seems consistent with your question.
You can use rank window function to do this.
SELECT title
FROM (SELECT m.title,rank() over(order by count(p.person) desc) as rnk
FROM movie.movie m
LEFT JOIN movie.participation p ON m.id=p.movie
GROUP BY m.title
) t
WHERE rnk=1
SELECT title
FROM movie.movie
WHERE id = (SELECT movie
FROM movie.participation
GROUP BY movie
ORDER BY count(*) DESC
LIMIT 1);

Get the Id of the matched data from other table. No duplicates of ID from both tables

Here is my table A.
| Id | GroupId | StoreId | Amount |
| 1 | 20 | 7 | 15000 |
| 2 | 20 | 7 | 1230 |
| 3 | 20 | 7 | 14230 |
| 4 | 20 | 7 | 9540 |
| 5 | 20 | 7 | 24230 |
| 6 | 20 | 7 | 1230 |
| 7 | 20 | 7 | 1230 |
Here is my table B.
| Id | GroupId | StoreId | Credit |
| 12 | 20 | 7 | 1230 |
| 14 | 20 | 7 | 15000 |
| 15 | 20 | 7 | 14230 |
| 16 | 20 | 7 | 1230 |
| 17 | 20 | 7 | 7004 |
| 18 | 20 | 7 | 65523 |
I want to get this result without getting duplicate Id of both table.
I need to get the Id of table B and A where the Amount = Credit.
| A.ID | B.ID | Amount |
| 1 | 14 | 15000 |
| 2 | 12 | 1230 |
| 3 | 15 | 14230 |
| 4 | null | 9540 |
| 5 | null | 24230 |
| 6 | 16 | 1230 |
| 7 | null | 1230 |
My problem is when I have 2 or more same Amount in table A, I get duplicate ID of table B. which should be null. Please help me. Thank you.
I think you want a left join. But this is tricky because you have duplicate amounts, but you only want one to match. The solution is to use row_number():
select . . .
from (select a.*, row_number() over (partition by amount order by id) as seqnum
from a
) a left join
(select b.*, row_number() over (partition by credit order by id) as seqnum
from b
)b
on a.amount = b.credit and a.seqnum = b.seqnum;
Another approach, I think simplier and shorter :)
select ID [A.ID],
(select top 1 ID from TABLE_B where Credit = A.Amount) [B.ID],
Amount
from TABLE_A [A]

MS Access SQL getting results from different tables and sorting by date

i hope my description will be enough. i tried to remove all non-significant fields.
i have 5 tables (Customer, Invoice, Items, Invoice_Item, Payment):
Customer fields and sample date are:
+----+------+
| ID | Name |
+----+------+
| 1 | John |
| 2 | Mary |
+----+------+
Invoice fields and sample date are:
+----+-----------+----------+------+
| ID | Date | Customer | Tax |
+----+-----------+----------+------+
| 1 | 1.1.2017 | 1 | 0.10 |
| 2 | 2.1.2017 | 2 | 0.10 |
| 3 | 3.1.2017 | 1 | 0.10 |
| 4 | 3.1.2017 | 2 | 0.10 |
| 5 | 8.1.2017 | 1 | 0.10 |
| 6 | 11.1.2017 | 1 | 0.10 |
| 7 | 12.1.2017 | 2 | 0.10 |
| 8 | 13.1.2017 | 1 | 0.10 |
+----+-----------+----------+------+
Item fields and sample data are:
+----+--------+
| ID | Name |
+----+--------+
| 1 | Door |
| 2 | Window |
| 3 | Table |
| 4 | Chair |
+----+--------+
Invoice_Item fields and sample data are:
+------------+---------+--------+------------+
| Invoice_ID | Item_ID | Amount | Unit_Price |
+------------+---------+--------+------------+
| 1 | 1 | 4 | 10 |
| 1 | 2 | 2 | 20 |
| 1 | 3 | 1 | 30 |
| 1 | 4 | 2 | 40 |
| 2 | 1 | 1 | 10 |
| 2 | 3 | 1 | 15 |
| 2 | 4 | 2 | 12 |
| 3 | 3 | 4 | 15 |
| 4 | 1 | 1 | 10 |
| 4 | 2 | 20 | 30 |
| 4 | 3 | 15 | 30 |
| 5 | 1 | 4 | 10 |
| 5 | 2 | 2 | 20 |
| 5 | 3 | 1 | 30 |
| 5 | 4 | 2 | 40 |
| 6 | 1 | 1 | 10 |
| 6 | 3 | 1 | 15 |
| 6 | 4 | 2 | 12 |
| 7 | 3 | 4 | 15 |
| 8 | 1 | 1 | 10 |
| 8 | 2 | 20 | 30 |
| 8 | 3 | 15 | 30 |
+------------+---------+--------+------------+
The reason the price is in this table not in the item table is because it is customer specific price.
Payment fields are:
+----------+--------+-----------+
| Customer | Amount | Date |
+----------+--------+-----------+
| 1 | 40 | 3.1.2017 |
| 2 | 10 | 7.1.2017 |
| 1 | 60 | 10.1.2017 |
+----------+--------+-----------+
so my report should be combine all tables and sort by DATE (either from Invoice or Payment) for a certain customer.
so for e.g. for customer John (1) it should be like:
+------------+----------------+---------+-----------+
| Invoice_ID | Invoice_Amount | Payment | Date |
+------------+----------------+---------+-----------+
| 1 | 171 | - | 1.1.2017 |
| 3 | 54 | - | 3.1.2017 |
| - | - | 40 | 3.1.2017 |
| 5 | 171 | - | 8.1.2017 |
| - | 10 | 60 | 10.1.2017 |
| 6 | 44.1 | - | 11.1.2017 |
| 8 | 954 | - | 13.1.2017 |
+------------+----------------+---------+-----------+
it is sorted by date, Invoice amount is (sum of (Amount* unit price)) * (1-tax)
i started with union but then got lost.
here is my try:
SELECT Inv_ID as Num, SUM(Invoice_Items.II_Price*Invoice_Items.II_Amount) AS Amount, Inv_Date as Created
FROM Invoice INNER JOIN Invoice_Items ON Invoice.Inv_ID = Invoice_Items.II_Inv_ID
UNION ALL
SELECT Null as Num, P_Value as Amount, P_Date as Created
FROM Payments
ORDER BY created ASC
Your help is appreciated!
Thanks
You can generate the report you requested using the following SQL script:
SELECT CustomerID,Invoice_ID,Invoice_Amount,Payment,Date
FROM (
SELECT c.ID AS CustomerID, i.ID AS Invoice_ID, SUM((t.Amount * t.UnitPrice)*(1-i.tax)) AS Invoice_Amount, NULL AS Payment,i.Date
FROM (Customer c
LEFT JOIN Invoice i
ON c.ID = i.Customer)
LEFT JOIN Invoice_Item t
ON i.ID = t.Invoice_ID
GROUP BY c.ID, i.ID,i.Date
UNION
SELECT c.ID AS CustomerID,NULL AS Invoice_ID, NULL AS Invoice_Amount, p.Amount AS Payment, p.Date
FROM Customer c
INNER JOIN Payment p
ON c.ID = p.Customer ) a
ORDER BY CustomerID, Date, Payment ASC
Note: I've added CustomerID to the output so you know what customer the data corresponds to.
here is the Answer which worked for me, a bit corrected from #Catzeye Answer , which didnt show the second part of the Union.
SELECT c.ID AS CustomerID,NULL AS Invoice_ID, NULL AS Invoice_Amount, p.Amount AS Payment, p.Date
FROM Customer c
INNER JOIN Payment p
ON c.ID = p.Customer
UNION ALL
SELECT c.ID AS CustomerID, i.ID AS Invoice_ID, SUM((t.Amount * t.Unit_Price)*(1-i.tax)) AS Invoice_Amount, NULL AS Payment,i.Date
FROM (Customer c
INNER JOIN Invoice i
ON c.ID = i.Customer)
INNER JOIN Invoice_Item t
ON i.ID = t.Invoice_ID
GROUP BY c.ID, i.ID,i.Date
ORDER BY CustomerID, Date, Payment;

Sum data from two tables with different number of rows

There are 3 Tables (SorMaster, SorDetail, and InvWarehouse):
SorMaster:
+------------+
| SalesOrder |
+------------+
| 100 |
| 101 |
| 102 |
+------------+
SorDetail:
+------------+------------+---------------+
| SalesOrder | MStockCode | MBackOrderQty |
+------------+------------+---------------+
| 100 | PN-1 | 4 |
| 100 | PN-2 | 9 |
| 100 | PN-3 | 1 |
| 100 | PN-4 | 6 |
| 101 | PN-1 | 6 |
| 101 | PN-3 | 2 |
| 102 | PN-2 | 19 |
| 102 | PN-3 | 14 |
| 102 | PN-4 | 6 |
| 102 | PN-5 | 4 |
+------------+------------+---------------+
InvWarehouse:
+------------+-----------+-----------+
| MStockCode | Warehouse | QtyOnHand |
+------------+-----------+-----------+
| PN-1 | A | 1 |
| PN-2 | B | 9 |
| PN-3 | A | 0 |
| PN-4 | B | 1 |
| PN-1 | A | 0 |
| PN-3 | B | 5 |
| PN-2 | A | 9 |
| PN-3 | B | 4 |
| PN-4 | A | 6 |
| PN-5 | B | 0 |
+------------+-----------+-----------+
Desired Results:
+------------+-----------------+--------------+
| MStockCode | SumBackOrderQty | SumQtyOnHand |
+------------+-----------------+--------------+
| PN-1 | 10 | 10 |
| PN-2 | 28 | 1 |
| PN-3 | 17 | 5 |
| PN-4 | 12 | 13 |
| PN-5 | 11 | 6 |
+------------+-----------------+--------------+
I have been going around in circles with no end in sight. Seems like it should be simple but just can't wrap my head around it. The SumBackOrderQty obviously getting counted twice as the SumQtyOnHand is evaluated. To this point I have been doing the calculations in the PHP instead of the select statement but would like to clean things up a bit where possible.
Current query statement is:
SELECT SorDetail.MStockCode,
SUM(SorDetail.MBackOrderQty) AS 'SumMBackOrderQty',
SUM(InvWarehouse.QtyOnHand) AS 'SumQtyOnHand'
FROM SysproCompanyJ.dbo.SorMaster SorMaster,
SysproCompanyJ.dbo.SorDetail SorDetail LEFT OUTER JOIN SysproCompanyJ.dbo.InvWarehouse InvWarehouse
ON SorDetail.MStockCode = InvWarehouse.StockCode
WHERE SorMaster.SalesOrder = SorDetail.SalesOrder
AND SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > '0'
AND SorDetail.MPrice > '0'
GROUP BY SorDetail.MStockCode
ORDER BY SorDetail.MStockCode ASC
Without providing the complete picture, in terms of your RDBMS, database schema, a description of the problem you're trying to solve and sample data that matches the aforementioned, the following is just an illustration of what a solution based on Barmar's comment could look like:
SELECT SD.MStockCode,
SD.SumBackOrderQty,
IW.SumQtyOnHand
FROM (SELECT MStockCode,
SUM(MBackOrderQty) AS `SumBackOrderQty`
FROM SorDetail
JOIN SorMaster ON SorDetail.SalesOrder=SorMaster.SalesOrder
WHERE SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > 0
AND SorDetail.MPrice > 0
GROUP BY MStockCode) AS SD
LEFT JOIN (SELECT MStockCode,
SUM(QtyOnHand) AS `SumQtyOnHand`
FROM InvWarehouse
GROUP BY MStockCode) AS IW ON SD.MStockCode=IW.MStockCode
ORDER BY SD.MStockCode;
Here's one approach:
select MStockCode,
(select sum(MBackOrderQty) from sorDetail as T2
where T2.MStockCode = T1.MStockCode ) as SumBackOrderQty,
(select sum(QtyOnHand) from invWarehouse as T3
where T3.MStockCode = T1.MStockCode ) as SumQtyOnHand
from
(
select mstockcode from sorDetail
union
select mstockcode from invWarehouse
) as T1
In a fiddle here: http://sqlfiddle.com/#!9/fdaca/6
Though my SumQtyOnHand values don't match yours (as #Gordon pointed out).