SQL Duplicating tables with groupBy - sql

I'm trying to compare income/outgoings using a simple query, but for some reason, I'm getting duplicated data. This is the query I'm running:
SELECT
Event.Name as "Event",
Concat("£", round(sum(Ticket.Price),2)) as "Ticket Sales",
sum(Invoice.NetTotal) as "Invoice Costs",
Concat("£", round(sum(Ticket.Price),2) - round(sum(Invoice.NetTotal),2)) as "Total Loss"
FROM Ticket
JOIN Event ON Ticket.EventID = Event.EventID
JOIN Invoice ON Event.EventID = Invoice.EventID
GROUP BY Event.EventID;
This is the result I'm getting
+--------------------------+--------------+---------------+------------+
| Event | Ticket Sales | Invoice Costs | Total Loss |
+--------------------------+--------------+---------------+------------+
| Victorious Festival 2018 | £47.94 | 1800 | £-1752.06 |
+--------------------------+--------------+---------------+------------+
Despite there only being 2 items in the Invoice table, totaling £600,
and 3 relevant items in the ticket table totaling £24.97
+-----------+--------+---------+---------------+-------------+----------+------+
| InvoiceNo | ItemID | EventID | HireStartDate | HireEndDate | NetTotal | VAT |
+-----------+--------+---------+---------------+-------------+----------+------+
| 1 | 1 | 1 | 2018-05-05 | 2018-05-06 | 500 | 20 |
| 2 | 2 | 1 | 2018-05-05 | 2018-05-06 | 100 | 20 |
+-----------+--------+---------+---------------+-------------+----------+------+
+----------+---------+-------+------------+------------+----------+
| TicketNo | EventID | Price | ValidFrom | ValidTo | Class |
+----------+---------+-------+------------+------------+----------+
| 1 | 1 | 7.99 | 2018-05-05 | 2018-05-22 | Standard |
| 2 | 1 | 7.99 | 2018-05-05 | 2018-05-22 | Standard |
| 3 | 2 | 10 | 2018-04-28 | 2018-04-28 | Standard |
| 4 | 2 | 10 | 2018-04-28 | 2018-04-28 | Standard |
| 5 | 2 | 10 | 2018-04-28 | 2018-04-28 | Standard |
| 6 | 2 | 10 | 2018-04-28 | 2018-04-28 | Standard |
| 7 | 2 | 10 | 2018-04-28 | 2018-04-28 | Standard |
| 8 | 2 | 10 | 2018-04-28 | 2018-04-28 | Standard |
| 9 | 1 | 7.99 | 2018-05-05 | 2018-05-22 | Standard |
+----------+---------+-------+------------+------------+----------+

You have two different independent dimensions. The best solution is to aggregate before joining:
SELECT e.Name as "Event",
Concat("£", round(sum(t.Price), 2)) as "Ticket Sales",
sum(i.NetTotal) as "Invoice Costs",
Concat("£", round(sum(t.Price), 2) - round(sum(i.NetTotal), 2)) as "Total Loss"
FROM Event e JOIN
(SELECT t.EventId, SUM(Price) as Price
FROM Ticket t
GROUP BY t.EventId
) t
ON t.EventID = e.EventID JOIN
(SELECT i.EventId, SUM(i.NetTotal) as NetTotal
FROM Invoice i
GROUP BY i.EventId
) i
ON e.EventID = i.EventID
GROUP BY e.EventID;
Two comments. First, I don't really like aggregating on EventId, because it is not in the SELECT (preferring EventName instead). Assuming that it is the primary key for Events, then this structure is fine -- the id uniquely identifies each row in events, so the name is well-defined.
Second, you might want to make the joins left joins, so you are including all events, even those that might be missing tickets or invoices.

Related

SQL some selections into one (or get two colums from one)

I use PostgreSql, I have two tables (for example)
Let table1 will contain stores, there are 2 types 'candy store' and 'dental store'.
Each row contains information about a customer's purchase in a particular store
In result i want to get money from each type of store group by id and the last date of purchase. Money from candy stores start sum since 2016, but money from dental stores start sum from 2018
table1:
+----+---------+------------------+-------+
| id | store | date of purchase | money |
| 1 | store 1 | 2016-01-01 | 10 |
| 1 | store 5 | 2018-01-01 | 50 |
| 2 | store 2 | 2017-01-20 | 10 |
| 2 | store 3 | 2019-02-20 | 15 |
| 3 | store 2 | 2017-02-02 | 20 |
| 3 | store 6 | 2019-01-01 | 60 |
| 1 | store 1 | 2015-01-01 | 20 |
+----+---------+------------------+-------+
table2 :
+---------+--------+
| store | type |
| store 1 | candy |
| store 2 | candy |
| store 3 | candy |
| store 4 | dental |
| store 5 | dental |
| store 6 | dental |
+---------+--------+
I want my query to return a table like this:
+----+---------------+-----------------+---------------+-----------------+
| id | money( candy) | the last date c | money(dental) | the last date d |
| 1 | 10 | 2016-01-01 | 50 | 2018-01-01 |
| 2 | 25 | 2019-02-20 | - | - |
| 3 | 20 | 2017-02-02 | 60 | 2019-01-01 |
+----+---------------+-----------------+---------------+-----------------+
if I understand correctly , this is what you want to do :
select id
, sum(money) filter (where ty.type = 'candy') candymoney
, max(purchasedate) filter (where ty.type = 'candy') candylastdate
, sum(money) filter (where ty.type = 'dental') dentalmoney
, max(purchasedate) filter (where ty.type = 'dental') dentallastdate
from table t
join storetype table st on t.store = ty.store
group by id

Using CTE to count number of rows in inner query

I'm learning CTE and I've encounter an exercise on I cannot solve. It is not a homework, but an exercise from an online course I've taken to learn SQL. I'm interested in where I've made a mistake and some explanation so answering with only the correct code will not help me to learn CTE.
The task is to count projects that raised 100% to 150% of the minimum amount, and those that raised more than 150%.
I've written the following CTE:
WITH nice_proj AS
(SELECT project_id AS pid,
amount AS amount,
minimal_amount AS minimal
FROM donation d
INNER JOIN project p ON (d.project_id = p.id)
GROUP BY pid,
minimal,
amount
HAVING sum(amount) >= minimal_amount)
SELECT count(*) AS COUNT,
(CASE
WHEN sum(amount)/minimal <=1.5 THEN 'good projects'
ELSE 'great projects'
END) AS tag
FROM nice_proj
GROUP BY minimal;
The query returns nothing but it should produce something similar to:
+-------+----------------+
| count | tag |
+-------+----------------+
| 16 | good projects |
+-------+----------------+
| 7 | great projects |
+-------+----------------+
Please have a look at the tables (they are truncated):
donation
+----+------------+--------------+---------+------------+------------+
| id | project_id | supporter_id | amount | amount_eur | donated |
+----+------------+--------------+---------+------------+------------+
| 1 | 4 | 4 | 928.40 | 807.70 | 2016-09-07 |
+----+------------+--------------+---------+------------+------------+
| 2 | 8 | 18 | 384.38 | 334.41 | 2016-12-16 |
+----+------------+--------------+---------+------------+------------+
| 3 | 6 | 12 | 367.21 | 319.47 | 2016-01-21 |
+----+------------+--------------+---------+------------+------------+
| 4 | 2 | 19 | 108.62 | 94.50 | 2016-12-29 |
+----+------------+--------------+---------+------------+------------+
| 5 | 10 | 20 | 842.58 | 733.05 | 2016-11-30 |
+----+------------+--------------+---------+------------+------------+
| 6 | 4 | 15 | 653.76 | 568.77 | 2016-08-05 |
+----+------------+--------------+---------+------------+------------+
| 7 | 4 | 14 | 746.52 | 649.48 | 2016-08-03 |
+----+------------+--------------+---------+------------+------------+
| 8 | 10 | 3 | 962.36 | 837.25 | 2016-10-30 |
+----+------------+--------------+---------+------------+------------+
| 9 | 1 | 20 | 764.05 | 664.72 | 2016-08-24 |
+----+------------+--------------+---------+------------+------------+
| 10 | 10 | 4 | 1033.42 | 899.08 | 2016-02-26 |
+----+------------+--------------+---------+------------+------------+
| 11 | 5 | 6 | 571.90 | 497.55 | 2016-10-06 |
+----+------------+--------------+---------+------------+------------+
project
+----+------------+-----------+----------------+
| id | category | author_id | minimal_amount |
+----+------------+-----------+----------------+
| 1 | music | 1 | 1677 |
+----+------------+-----------+----------------+
| 2 | music | 5 | 21573 |
+----+------------+-----------+----------------+
| 3 | travelling | 2 | 4952 |
+----+------------+-----------+----------------+
| 4 | travelling | 5 | 3135 |
+----+------------+-----------+----------------+
| 5 | travelling | 2 | 8555 |
+----+------------+-----------+----------------+
| 6 | video | 4 | 6835 |
+----+------------+-----------+----------------+
| 7 | video | 4 | 7978 |
+----+------------+-----------+----------------+
| 8 | games | 1 | 4560 |
+----+------------+-----------+----------------+
| 9 | games | 2 | 4259 |
+----+------------+-----------+----------------+
| 10 | games | 1 | 5253 |
+----+------------+-----------+----------------+
My advice is to aggregate the donations table first, then compare it to the project table.
By doing this the join between donations and project is always 1:1. This in turn means you avoid having to group by "values" (minimal_amount), instead only grouping by "identifiers" (project_id).
WITH
donation_summary AS
(
SELECT
project_id,
SUM(amount) AS total_amount
FROM
donation
GROUP BY
project_id
)
SELECT
CASE WHEN d.total_amount <= p.minimal_amount * 1.5
THEN 'good projects'
ELSE 'great projects'
END
AS tag,
COUNT(*) AS project_count
FROM
donation_summary AS d
INNER JOIN
project AS p
ON p.id = d.project_id
WHERE
d.total_amount >= p.minimal_amount
GROUP BY
tag
That said, I'd normally use the following final query and get two columns rather than two rows...
SELECT
SUM(CASE WHEN d.total_amount <= p.minimal_amount * 1.5 THEN 1 ELSE 0 END) AS good_projects,
SUM(CASE WHEN d.total_amount > p.minimal_amount * 1.5 THEN 1 ELSE 0 END) AS great_projects
FROM
donation_summary AS d
INNER JOIN
project AS p
ON p.id = d.project_id
WHERE
d.total_amount >= p.minimal_amount
You need to remove amount from the grouping, this should return the expected result:
WITH nice_proj AS
(SELECT project_id AS pid,
sum(amount) AS amount,
minimal_amount AS minimal
FROM donation d
INNER JOIN project p ON (d.project_id = p.id)
GROUP BY pid,
minimal
HAVING sum(amount) >= minimal_amount)
SELECT count(*) AS COUNT,
(CASE
WHEN amount/minimal <=1.5 THEN 'good projects'
ELSE 'great projects'
END) AS tag
FROM nice_proj
GROUP BY tag;

Return Max Value Date for each group in Netezza SQL

+--------+---------+----------+------------+------------+
| CASEID | USER ID | TYPE | OPEN_DT | CLOSED_DT |
+--------+---------+----------+------------+------------+
| 1 | 1000 | MA | 2017-01-01 | 2017-01-07 |
| 2 | 1000 | MB | 2017-07-15 | 2017-07-22 |
| 3 | 1000 | MA | 2018-02-20 | NULL |
| 8 | 1001 | MB | 2017-05-18 | 2018-02-18 |
| 9 | 1001 | MA | 2018-03-05 | 2018-04-01 |
| 7 | 1002 | MA | 2018-06-01 | 2018-07-01 |
+--------+---------+----------+------------+------------+
This is a snippet of my data set. I need a query that returns just the max(OPEN_DT) row for each USER_ID in Netezza SQL.
so given the above the results would be:
| CASEID | USERID | TYPE | OPEN_DT | CLOSED_DT |
| 3 | 1000 | MA | 2018-02-20 | NULL |
| 9 | 1001 | MA | 2018-03-05 | 2018-04-01 |
| 7 | 1002 | MA | 2018-06-01 | 2018-07-01 |
Any help is very much appreciated!
You can use correlated subquery :
select t.*
from table t
where open_dt = (select max(t1.open_dt) from table t1 where t1.user_id = t.user_id);
You can also row_number() :
select t.*
from (select *, row_number() over (partition by user_id order by open_dt desc) as seq
from table t
) t
where seq = 1;
However if you have a ties with open_dt then you would need to use limit clause with correlated subquery but i am not sure about the ties so i just leave it.

SQL window excluding current group?

I'm trying to provide rolled up summaries of the following data including only the group in question as well as excluding the group. I think this can be done with a window function, but I'm having problems with getting the syntax down (in my case Hive SQL).
I want the following data to be aggregated
+------------+---------+--------+
| date | product | rating |
+------------+---------+--------+
| 2018-01-01 | A | 1 |
| 2018-01-02 | A | 3 |
| 2018-01-20 | A | 4 |
| 2018-01-27 | A | 5 |
| 2018-01-29 | A | 4 |
| 2018-02-01 | A | 5 |
| 2017-01-09 | B | NULL |
| 2017-01-12 | B | 3 |
| 2017-01-15 | B | 4 |
| 2017-01-28 | B | 4 |
| 2017-07-21 | B | 2 |
| 2017-09-21 | B | 5 |
| 2017-09-13 | C | 3 |
| 2017-09-14 | C | 4 |
| 2017-09-15 | C | 5 |
| 2017-09-16 | C | 5 |
| 2018-04-01 | C | 2 |
| 2018-01-13 | D | 1 |
| 2018-01-14 | D | 2 |
| 2018-01-24 | D | 3 |
| 2018-01-31 | D | 4 |
+------------+---------+--------+
Aggregated results:
+------+-------+---------+----+------------+------------------+----------+
| year | month | product | ct | avg_rating | avg_rating_other | other_ct |
+------+-------+---------+----+------------+------------------+----------+
| 2018 | 1 | A | 5 | 3.4 | 2.5 | 4 |
| 2018 | 2 | A | 1 | 5 | NULL | 0 |
| 2017 | 1 | B | 4 | 3.6666667 | NULL | 0 |
| 2017 | 7 | B | 1 | 2 | NULL | 0 |
| 2017 | 9 | B | 1 | 5 | 4.25 | 4 |
| 2017 | 9 | C | 4 | 4.25 | 5 | 1 |
| 2018 | 4 | C | 1 | 2 | NULL | 0 |
| 2018 | 1 | D | 4 | 2.5 | 3.4 | 5 |
+------+-------+---------+----+------------+------------------+----------+
I've also considered producing two aggregates, one with the product in question and one without, but having trouble with creating the appropriate joining key.
You can do:
select year(date), month(date), product,
count(*) as ct, avg(rating) as avg_rating,
sum(count(*)) over (partition by year(date), month(date)) - count(*) as ct_other,
((sum(sum(rating)) over (partition by year(date), month(date)) - sum(rating)) /
(sum(count(*)) over (partition by year(date), month(date)) - count(*))
) as avg_other
from t
group by year(date), month(date), product;
The rating for the "other" is a bit tricky. You need to add everything up and subtract out the current row -- and calculate the average by doing the sum divided by the count.

How do you join records one to one with multiple possible matches ...?

I have a table of transactions like the following
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | NULL | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | NULL | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | NULL | 12 |
| 4 | ID Issue | 11/27/2012 | -3 | -3.50 | 100 | NULL |
| 5 | ID Issue | 11/27/2012 | -3 | -3.50 | 102 | NULL |
| 6 | ID Issue | 11/25/2012 | -1 | -1.25 | 104 | NULL |
These transactions are duplicates where the 'Issue's have a work order ID while the 'ID Issue' transactions have the item number. I would like to update the [Item Number] field for the 'Issue' transactions to include the Item Number. When I do a join on the Date, Qty, and Total I get something like this
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 10 |
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 11 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | 104 | 12 |
The duplicates are multiplied! I would like this
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | 104 | 12 |
Or this (Item Number is switched for the two matches)
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | 104 | 12 |
Either would be fine. What would be a simple solution?
Use SELECT DISTINCT to filter same results out or you could partition your results to get the first item in each grouping.
UPDATE
Here's the code to illustrate the partition approach.
SELECT ID, [Trans Type], [Date], [Qty], [Total], [Item Number], [Work Order]
FROM
(
SELECT
ID, [Trans Type], [Date], [Qty], [Total], [Item Number], [Work Order], ROW_NUMBER() OVER
(PARTITION BY ID, [Trans Type], [Date], [Qty], [Total]
ORDER BY [Item Number]) AS ItemRank
FROM YourTable
) AS SubQuery
WHERE ItemRank = 1