SQL | Aggregation

SQL | Aggregation - sql

My objective is to get a table such as this:
Request:
Wards where the annual cost of drugs prescribed exceeds 25
ward_no | year | cost
w2 | 2007 | 34
w4 | 2007 | 160
w5 | 2006 | 26
w5 | 2007 | 33
I would input a picture but lack reputation points.
Here is what I have done so far:
select w.ward_no,
year(pn.start_date) as year,
pn.quantity * d.price as cost
from ward w,
patient pt,
prescription pn,
drug d
where w.ward_no = pt.ward_no
and
pn.drug_code = d.drug_code
and
pn.patient_id = pt.patient_id
group by w.ward_no,
year,
cost
having cost > 25
order by w.ward_no, year
My current output is such:
ward_no|year|cost
'w2' |2006|0.28
'w2' |2007|3.20
'w2' |2007|9.50
'w2' |2007|21.60
'w3' |2006|10.08
'w3' |2007|4.80
'w4' |2006|4.41
'w4' |2007|101.00
'w4' |2007|58.80
'w5' |2006|25.20
'w5' |2006|0.56
'w5' |2007|20.16
'w5' |2007|12.60
How would I reduce each ward_no to have only a single row of year (say 2006 or 2007) instead of having x of them?
Any help would be very much appreciated I am really stuck and have no clue what to do.

You need to group by ward and year, and sum up the costs:
select w.ward_no,
year(pn.start_date) as year,
sum(pn.quantity * d.price) as cost
from ward w
inner join patient pt
on pt.ward_no = w.ward_no
inner join prescription pn
on pn.patient_id = pt.patient_id
inner join drug d
on d.drug_code = pn.drug_code
group by w.ward_no,
year(pn.start_date)
having sum(pn.quantity * d.price) > 25
order by w.ward_no, year
What flavor of SQL is this supposed to be for?
Share and enjoy.

You are grouping by cost, so each cost gets a new line. Remove the cost from your grouping
select w.ward_no,
year(pn.start_date) as year,
avg(pn.quantity * d.price) as cost
from ward w,
patient pt,
prescription pn,
drug d
where w.ward_no = pt.ward_no
and
pn.drug_code = d.drug_code
and
pn.patient_id = pt.patient_id
group by w.ward_no,
year
having avg(pn.quantity * d.price) > 25
order by w.ward_no, year

Related

SQL: How to return revenue for specific year

I would like to show the revenue for a specific year for all customers regardless of whether or not they have revenue data for the specific year. (in cases they dont have data for the specific year, a filler like 'no data' would work)
Sample Data looks like:
Table 1
Customer
Price
Quantity
Order Date
xxx
12
5
1990/03/25
yyy
15
7
1991/05/35
xxx
34
2
1990/08/21
Desired Output would look a little something like this:
Customer
Revenue (for 1990)
xxx
128
yyy
no data
Getting the total revenue for each would be:
SELECT Customer,
SUM(quantity*price) AS Revenue
but how would i go about listing it out for a specific year for all customers? (incl. customers that dont have data for that specific year)

We can use a CTE or a sub-query to create a list of all customers and another to get all years and the cross join them and left join onto revenue.
This gives an row for each customer for each year. If you add where y= you will only get the year requested.
CREATE TABLE revenue(
Customer varchar(10),
Price int,
Quantity int,
OrderDate date);
insert into revenue values
('xxx', 12,5,'2021-03-25'),
('yyy', 15,7,'2021-05-15'),
('xxx', 34,2,'2022-08-21');
with cust as
(select distinct customer c from revenue),
years as
(select distinct year(OrderDate) y from revenue)
select
y "year",
c customer ,
sum(price*quantity) revenue
from years
cross join cust
left join revenue r
on cust.c = r.customer and years.y = year(OrderDate)
group by
c,y,
year(OrderDate)
order by y,c
year | customer | revenue
---: | :------- | ------:
2021 | xxx | 60
2021 | yyy | 105
2022 | xxx | 68
2022 | yyy | null
db<>fiddle here

You would simply use group by and do the sum in a subquery and left join it to your customers table. ie:
select customers.Name, totals.Revenue
from Customers
Left join
( select customerId, sum(quantity*price) as revenue
from myTable
where year(orderDate) = 1990
group by customer) totals on customers.CustomerId = myTable.customerId;

SQL sum values for each ID

I have a dataset about trains, it's including a table for the customers information which is a number representing an age group and the amount of travellers for that age group.
The ID represents a location which has multiple departure times, which has multiple age groups.
The data looks something like this
StationID
Time of Departure
TravellerID
Amount of travellers
1
12:13
4001
30
1
12:13
4002
15
1
19:45
4001
10
1
19:45
4002
20
I want to sum the amount of travellers for each departure
I tried to code it this way:
SELECT StationID,[Time of Departure], sum(Amount)
FROM Train_Stations AS TS
INNER JOIN DepartureData AS DD
ON DD.FK_StationID = TS.PK_StationID
INNER JOIN CustomerInfo AS CI
ON CI.FK_StationID = TS.PK_StationID
GROUP BY StationID, [Time of Departure]
The result is like this:
StationID
Time of Departure
Amount
1
12:13
75
1
12:13
75
1
19:45
75
1
19:45
75
But I want it like this:
StationID
Time of Departure
Amount
1
12:13
45
1
19:45
30

Seems, you do something different.Based on your data query is correct
WITH CTE(StationID,DEPARTURE_TIME,TRAVELLERID,AMOUNT_OF_TRAVELLERS) AS
(
SELECT 1,CAST('12:13'AS TIME),4001,30 UNION ALL
SELECT 1,CAST('12:13'AS TIME),4002,15 UNION ALL
SELECT 1,CAST('19:45'AS TIME),4001,10 UNION ALL
SELECT 1,CAST('19:45'AS TIME),4002,20
)
SELECT C.StationID,C.DEPARTURE_TIME,SUM(AMOUNT_OF_TRAVELLERS)TOTAL_TRAVELLERS
FROM CTE AS C
GROUP BY C.StationID,C.DEPARTURE_TIME

You should specify the column as DD.StationID. It will return as an expected result.
SELECT DD.StationID,DD.[Time of Departure], sum(DD.Amount)
FROM Train_Stations AS TS
INNER JOIN DepartureData AS DD
ON DD.FK_StationID = TS.PK_StationID
INNER JOIN CustomerInfo AS CI
ON CI.FK_StationID = TS.PK_StationID
GROUP BY DD.StationID, DD.[Time of Departure]

Aggregate before and after a date column

I have two tables: db.transactions and db.salesman, which I would like to combine in order to create an output that has aggregated sales before each salesman's hire date and after each salesman's hire date.
select * from db.transactions
index sales_rep sales trx_date
1 Tom 200 9/18/2020
2 Jerry 435 6/21/2020
3 Patrick 1400 4/30/2020
4 Tom 560 5/24/2020
5 Francis 240 1/2/2021
select * from db.salesman
index sales_rep hire_date
1 Tom 8/19/2020
2 Jerry 1/28/2020
3 Patrick 4/6/2020
4 Francis 9/4/2020
I would like to aggregate sales from db.transactions before and after each sales rep's hire date.
Expected output:
index sales_rep hire_date agg_sales_before_hire_date agg_sales_after_hire_date
1 Tom 8/19/2020 1200 5000
2 Jerry 1/28/2020 500 900
3 Patrick 4/6/2020 5000 300
4 Francis 9/4/2020 2900 1500
For a single sales rep, to calculate the agg_sales_before_hire_date is likely:
select tx.sales_rep, tx.sum(sales)
from db.transactions tx
inner join db.salesman sm on sm.sales_rep = tx.sales_rep
where hire_date < '8/19/2020' and sales_rep = 'Tom'
group by tx.sales_rep
PostGRESQL. I am also open to the idea of doing it into Tableau or Python.

Using CROSS JOIN LATERAL
select
sa.sales_rep, sa.hire_date,
l.agg_sales_before_hire_date,
l.agg_sales_after_hire_date
from salesman sa
cross join lateral
(
select
sum(tx.sales) filter (where tx.trx_date < sa.hire_date) agg_sales_before_hire_date,
sum(tx.sales) filter (where tx.trx_date >= sa.hire_date) agg_sales_after_hire_date
from transactions tx
where tx.sales_rep = sa.sales_rep
) l;

Use conditional aggregation:
select tx.sales_rep,
sum(case when tx.txn_date < sm.hire_date then sales else 0 end) as before_sales,
sum(case when tx.txn_date >= sm.hire_date then sales else 0 end) as after_sales
from db.transactions tx inner join
db.salesman sm
on sm.sales_rep = tx.sales_rep
group by tx.sales_rep;
EDIT:
In Postgres, you would use filter for the logic:
select tx.sales_rep,
sum(sales) filter (where tx.txn_date < sm.hire_date) as before_sales,
sum(sales) filter (where tx.txn_date >= sm.hire_date then sales) as after_sales

JOIN CTE's Grouped by Month/Year

I have multiple CTEs which result in the following common table structure:
Year | Month | Total_Purchases_Product_Line_X
These represent purchases grouped by month & year across several product lines.
Ex.)
SELECT * FROM cte_line_x
Year | Month | Total_Purchases_Product_Line_X
2018 01 256
2018 02 192
SELECT * FROM cte_line_y
Year | Month | Total_Purchases_Product_Line_Y
2018 01 76
2018 02 59
I'd like to create something like the following
Year | Month | Total_Purchases_Line_X | Total_Purchases_Line_Y | Total_Purchases_Line_Z
2018 01 256 76
2018 02 192 59
Where the total purchases of each product line is joined. However, I'm running into issues grouping the dates from each CTE after I have joined them together.
Here is what I've tried:
SELECT
cte_product_x.Month,
cte_product_x.Year,
cte_product_x.total as Total_X,
cte_product_y.total as Total_Y,
cte_product_z.total as Total_Z
FROM
cte_product_x
LEFT JOIN
cte_product_y ON
cte_product_y.year = cte_product_x.year
AND
cte_product_y.month = cte_product_x.month
LEFT JOIN
cte_product_z ON
cte_product_z.year = cte_product_x.year
AND
cte_product_z.month = cte_product_x.month
GROUP BY
cte_product_x.Month,
cte_product_x.Year
ORDER BY
cte_product_x.Month,
cte_product_x.Year
I tried changing my SELECT to:
SELECT
cte_product_x.Month,
cte_product_x.Year,
MAX(cte_product_x.total as Total_X),
MAX(cte_product_y as Total_Y),
MAX(cte_product_z as Total_Z)
However, it only worked for "Total_X". The counts for the other columns were the max value found for a grouped total for all months. I don't understand why.

Doesn't this work?
SELECT x.Month, x.Year, x.total as Total_X,
y.total as Total_Y, z.total as Total_Z
FROM cte_product_x x JOIN
cte_product_y y
ON y.year = x.year AND y.month = x.month JOIN
cte_product_z z
ON z.year = x.year AND z.month = x.month
ORDER BY x.Month, x.Year;
At least it works for your sample data.

SQL nested select and aliases

The Situation
I have a typical MS Access database containing information on Companies, Pay, Employees and Positions. Some of the tables are:
tbl_Report (Report_ID PK, Report_Year)
tbl_Employee (Employee_ID PK)
tbl_Pay (Pay_ID PK, Salary, Employee_ID FK, Report_ID FK)
tbl_Position (Position_ID PK, Position, Employee_ID FK, Report_ID FK)
I have a query that selects the salary for each position and year, to produce:
qry_Salary_by_Position_Year: (This query is parameterised to accept a 'Year').
Year | Salary | Position
------------------------
2014 | 100 | CEO
2013 | 200 | CEO
2014 | 300 | CFO
2014 | 200 | Chairman
2013 | 150 | CEO
etc.
I then use another query to extract the top x percent of salaries for a given position:
qry_Select_Top_25:
SELECT TOP 25 PERCENT Salary, Year, Position
FROM qry_Salary_by_Position_Year;
which gives something like:
Salary | Year | Position
------------------------
100 | 2014 | CEO
100 | 2014 | CFO
200 | 2014 | CFO
The Question
What I would like is a final table that displays the Max(25%), Max(50%), Max(75%), Max(X%) values, grouped by Position and Year, eg:
Year | Position | 25th | 50th | 75th
-------------------------------------
2013 | CEO | 10 | 30 | 75
2014 | CEO | 20 | 50 | 80
2014 | CFO | 15 | 30 | 90
2014 | Chairman | 20 | 25 | 30
I can do this for one percentile value using
SELECT Year, Position, Max(qry50.Salary) AS 50_Percentile
FROM (SELECT TOP 50 PERCENT qry_Salary_by_Position_Year.Salary, Year, Position
FROM qry_Salary_by_Position_Year) AS qry50
WHERE Position IN (SELECT DISTINCT Position FROM qry_Salary_by_Position_Year) AND Year IN (SELECT DISTINCT Year FROM qry_Salary_by_Position_Year)
GROUP BY Year, Position;
But I can't get my head around how to construct the query with the correct aliases etc. to add in the other percentage values as other columns. Does anyone have any suggestions/comments/questions?
Edit
I may have come up with a solution that I'm now checking:
SELECT qry.Year, qry.Position, Max(qry25.Salary) AS 25_Percentile, Max(qry50.Salary) AS 50_Percentile, Max(qry75.Salary) AS 75_Percentile, Max(qry100.Salary) AS 100_Percentile
FROM
((((qry_Salary_by_Position_Year qry
LEFT OUTER JOIN (SELECT TOP 50 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry50 ON qry.Year = qry50.Year AND qry.Position = qry50.Position)
LEFT OUTER JOIN (SELECT TOP 25 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry25 ON qry.Year = qry25.Year AND qry.Position = qry25.Position)
LEFT OUTER JOIN (SELECT TOP 75 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry75 ON qry.Year = qry75.Year AND qry.Position = qry75.Position)
LEFT OUTER JOIN (SELECT TOP 100 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry100 ON qry.Year = qry100.Year AND qry.Position = qry100.Position)
GROUP BY qry.Year, qry.Position

I think this is what I'm after:
SELECT qry.Year, qry.Position, Max(qry25.Salary) AS 25_Percentile, Max(qry50.Salary) AS 50_Percentile, Max(qry75.Salary) AS 75_Percentile, Max(qry100.Salary) AS 100_Percentile
FROM
((((qry_Salary_by_Position_Year qry
LEFT OUTER JOIN (SELECT TOP 50 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry50 ON qry.Year = qry50.Year AND qry.Position = qry50.Position)
LEFT OUTER JOIN (SELECT TOP 25 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry25 ON qry.Year = qry25.Year AND qry.Position = qry25.Position)
LEFT OUTER JOIN (SELECT TOP 75 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry75 ON qry.Year = qry75.Year AND qry.Position = qry75.Position)
LEFT OUTER JOIN (SELECT TOP 100 PERCENT Salary, Year, Position FROM qry_Salary_by_Position_Year) AS qry100 ON qry.Year = qry100.Year AND qry.Position = qry100.Position)
GROUP BY qry.Year, qry.Position
I was lead to this solution by this answer to another post: https://stackoverflow.com/a/7855015/4002530

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL | Aggregation - sql

Related

SQL: How to return revenue for specific year

SQL sum values for each ID

Aggregate before and after a date column

JOIN CTE's Grouped by Month/Year

SQL nested select and aliases

Categories

Resources