Postgresql - Aggregate queries inside aggregate queries - sql

I'm working on building a select statement for a sales rep commission report that uses postgresql tables. I want it to show these columns:
-Customer No.
-Part No.
-Month-to-date Qty (MTD Qty)
-Year-to-date Qty (YTD Qty)
-Month-to-date Extended Selling Price (MTD Extended)
-Year-to-date Extended Selling Price (YTD Extended)
The data is in two tables:
Sales_History (one record per invoice and this table includes Cust. No. and Invoice Date)
Sales_History_Items (one record per part no. per invoice and this table includes Part No., Qty and Unit Price).
If I do a simple query that combines these two tables, this is what it looks like:
Date / Cust / Part / Qty / Unit Price
Apr 1 / ABC Co. / WIDGET / 5 / $11
Apr 4 / ABC Co. / WIDGET / 8 / $11.50
Apr 1 / ABC Co. / GADGET / 1 / $30
Apr 7 / XYZ Co. / WIDGET / 3 / $11.50
etc.
This is the final result I want (one line per customer per part):
Cust / Part / Qty / MTD Qty / MTD Sales / YTD Qty / YTD Sales
ABC Co. / WIDGET / 13 / $147 / 1500 / $16,975
ABC Co. / GADGET / 1 / $30 / 7 / $210
XYZ Co. / WIDGET / 3 / $34.50 / 18 / $203.40
I’ve been able to come up with this SQL statement so far, which does not get me the extended selling columns (committed_qty * unit_price) per line and then summarize them by cust no./part no., and that’s my problem:
with mtd as
(SELECT sales_history.cust_no, part_no, Sum(sales_history_items.committed_qty) AS MTDQty
FROM sales_history left JOIN sales_history_items
ON sales_history.invoice_no = sales_history_items.invoice_no where sales_history_items.part_no is not null and sales_history.invoice_date >= '2020-04-01' and sales_history.invoice_date <= '2020-04-30'
GROUP BY sales_history.cust_no, sales_history_items.part_no),
ytd as
(SELECT sales_history.cust_no, part_no, Sum(sales_history_items.committed_qty) AS YTDQty
FROM sales_history left JOIN sales_history_items
ON sales_history.invoice_no = sales_history_items.invoice_no where sales_history_items.part_no is not null and sales_history.invoice_date >= '2020-01-01' and sales_history.invoice_date <= '2020-12-31' GROUP BY sales_history.cust_no, sales_history_items.part_no),
mysummary as
(select MTDQty, YTDQty, coalesce(ytd.cust_no,mtd.cust_no) as cust_no,coalesce(ytd.part_no,mtd.part_no) as part_no
from ytd full outer join mtd on ytd.cust_no=mtd.cust_no and ytd.part_no=mtd.part_no)
select * from mysummary;
I believe that I have to nest another couple of aggregate queries in here that would group by cust_no, part_no, unit_price but then have those extended price totals (qty * unit_price) sum up by cust_no, part_no.
Any assistance would be greatly appreciated. Thanks!

Do this in one go with filter expressions:
with params as (
select '2020-01-01'::date as year, 4 as month
)
SELECT h.cust_no, i.part_no,
SUM(i.committed_qty) AS YTDQty,
SUM(i.committed_qty * i.unit_price) as YTDSales,
SUM(i.committed_qty) FILTER
(WHERE extract('month' from h.invoice_date) = p.month) as MTDQty,
SUM(i.committed_qty * i.unit_price) FILTER
(WHERE extract('month' from h.invoice_date) = p.month) as MTDSales
FROM params p
CROSS JOIN sales_history h
LEFT JOIN sales_history_items i
ON i.invoice_no = h.invoice_no
WHERE i.part_no is not null
AND h.invoice_date >= p.year
AND h.invoice_date < p.year + interval '1 year'
GROUP BY h.cust_no, i.part_no

If I follow you correctly, you can do conditional aggregation:
select sh.cust_no, shi.part_no,
sum(shi.qty) mtd_qty,
sum(shi.qty * shi.unit_price) ytd_sales,
sum(shi.qty) filter(where sh.invoice_date >= date_trunc('month', current_date) mtd_qty,
sum(shi.qty * shi.unit_price) filter(where sh.invoice_date >= date_trunc('month', current_date) mtd_sales
from sales_history sh
left join sales_history_items shi on sh.invoice_no = shi.invoice_no
where shi.part_no is not null and sh.invoice_date >= date_trunc('year', current_date)
group by sh.cust_no, shi.part_no
The logic is to filter on the current year, and use simple aggregation to compute the "year to date" figures. To get the "month to date" columns, we can just filter the aggregate functions.

Related

Decluttering a SQL query

For a practice project i wrote the following query and i was wondering if there is way to make it more efficient than writing everything 12 times like a for loop for sql.
CREATE TABLE temp (month INT, total_sales INT, market_share decimal(5,2), year_change decimal(5,2))
insert into temp (month)
Values (1)
UPDATE temp
SET total_sales = (
SELECT COUNT(purchases_2020.purchaseid)
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
WHERE (categories.category = 'whole milk' OR categories.category = 'yogurt' OR categories.category = 'domestic eggs') AND (purchases_2020.fulldate BETWEEN '2020-01-01' AND '2020-01-31')
)
WHERE month = 1
UPDATE temp
SET market_share = (
SELECT (SELECT 100 * COUNT(purchases_2020.purchaseid)
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
WHERE (categories.category = 'whole milk' OR categories.category = 'yogurt' OR categories.category = 'domestic eggs') AND (purchases_2020.fulldate BETWEEN '2020-01-01' AND '2020-01-31'))
* 1. /
(SELECT COUNT(purchases_2020.purchaseid)
FROM purchases_2020
WHERE purchases_2020.fulldate BETWEEN '2020-01-01' AND '2020-01-31')
)
WHERE month = 1
UPDATE temp
SET year_change = (
SELECT market_share -
(SELECT
(SELECT 100 * COUNT(purchases_2019.purchase_id)
FROM purchases_2019
JOIN categories ON purchases_2019.purchase_id = categories.purchase_id
WHERE (categories.category = 'whole milk' OR categories.category = 'yogurt' OR categories.category = 'domestic eggs') AND (purchases_2019.full_date BETWEEN '2019-01-01' AND '2019-01-31'))
* 1./
(SELECT COUNT(purchases_2019.purchase_id)
FROM purchases_2019
WHERE purchases_2019.full_date BETWEEN '2019-01-01' AND '2019-01-31'))
FROM temp
WHERE month = 1
)
WHERE month = 1
EDIT
I was given the 3 tables represented on the following database schema , and im trying to create a table with the total sales of dairy every month, the monthly market share of the dairy products and the difference between the 2020 monthly market share and the 2019 monthly market share (the year change colunm)
There is also an aritmethic error somewhere, when checking the project i get the following message ResultSet does not contain the correct numeric values! and im at my wits end looking for it butmy priority is to decluter the query.
Your error message tells me that you are trying to run this from a reporting tool or a host language.
It also makes no sense to put the data into separate tables by years.
SQL is a declarative language that works with data as sets.
Instead of pushing the results into table temp, try writing a query like this:
with all_data as (
select p.fulldate, p.purchaseid, c.category,
extract(year from p.fulldate) as year,
extract(month from p.fulldate) as month
from purchases_2020 p
join categories c on c.purchase_id = p.purchaseid
union all
select p.fulldate, p.purchaseid, c.category,
extract(year from p.fulldate) as year,
extract(month from p.fulldate) as month
from purchases_2019 p
join categories c on c.purchase_id = p.purchaseid
), kpis as (
select year, month,
count(purchaseid)
filter (where category in ('whole milk', 'yogurt', 'domestic eggs'))
as dairy_sales,
count(purchaseid) * 1.0 as total_sales
from all_data
group by year, month
)
select ty.month, ty.dairy_sales as total_sales,
100.0 * ty.dairy_sales / ty.total_sales as market_share,
100.0 * ( (ty.dairy_sales / ty.total_sales)
- (ly.dairy_sales / ly.total_sales)) as year_change
from kpis ty
join kpis ly
on (ly.year, ly.month) = (ty.year - 1, ty.month);

calculate CAGR using SQL

I have a dataset which looks like below
ADVERTISER YR REVENUE
---------------------------------
Altus Dental 2015 5560.00
Altus Dental 2016 48295.00
Altus Dental 2017 39920.00
I'm trying to find CAGR - year over year and taking an average of them, meaning
CAGR = (((REVENUE(2016)/REVENUE(2015)) - 1) + ((REVENUE(2017)/REVENUE(2016)) - 1) ) / 2
And Finally I will need an output something like this
ADVERTISER CAGR
--------------------
Altus Dental 3.75
How can I accomplish this in SQL? Please help me in providing an effective solution for this.
Calculate the CAGR (revenue/prev_revenue - 1) for each year and calculate the average CAGR (assume your dbms supports the LAG function)
select advertiser, avg(cagr) as CAGR
from
(
select advertiser, yr, revenue, revenue/prev_revenue - 1 as cagr
from
(select *, lag(revenue, 1) over
(partition by advertiser order by yr) as prev_revenue
from test ) t
) t1
group by advertiser
Here is one way:
select advertiser,
(((t16.revenue/t15.revenue) - 1) + ((t17.revenue/t16.revenue) - 1) ) / 2 as cagr
from t t15 join
t t16
on t15.advertiser = t16.advertiser and t15.yr = 2015 and t16.yr = 2016 join
t t17
on t15.advertiser = t17.advertiser and t17.yr = 2017
I'm assuming there won't be "holes" in the list of years. This should work for n years, and n advertisers:
SELECT advertiser,
SUM(revenue) / (COUNT(*) - 1) AS CAGR
FROM (SELECT advertiser,
COALESCE((revenue/revenue_old - 1), 0) as revenue
FROM (SELECT s.advertiser,
s.revenue,
LAG(s.revenue, 1) OVER(PARTITION BY s.advertiser
ORDER BY s.yr) AS revenue_old
FROM table_1 s))
GROUP BY advertiser;

SQL Year over year growth percentage from data same query

How do I calculate the percentage difference from 2 different columns, calculated in that same query? Is it even possible?
This is what I have right now:
SELECT
Year(OrderDate) AS [Year],
Count(OrderID) AS TotalOrders,
Sum(Invoice.TotalPrice) AS TotalRevenue
FROM
Invoice
INNER JOIN Order
ON Invoice.InvoiceID = Order.InvoiceID
GROUP BY Year(OrderDate);
Which produces this table
Now I'd like to add one more column with the YoY growth, so even when 2016 comes around, the growth should be there..
EDIT:
I should clarify that I'd like to have for example next to
2015,5,246.28 -> 346,15942029% ((R2015-R2014) / 2014 * 100)
If you save your existing query as qryBase, you can use it as the data source for another query to get what you want:
SELECT
q1.Year,
q1.TotalOrders,
q1.TotalRevenue,
IIf
(
q0.TotalRevenue Is Null,
Null,
((q1.TotalRevenue - q0.TotalRevenue) / q0.TotalRevenue) * 100
) AS YoY_growth
FROM
qryBase AS q1
LEFT JOIN qryBase AS q0
ON q1.Year = (q0.Year + 1);
Access may complain it "can't represent the join expression q1.Year = (q0.Year + 1) in Design View", but you can still edit the query in SQL View and it will work.
What you are looking for is something like this?
Year Revenue Growth
2014 55
2015 246 4.47
2016 350 1.42
You could wrap the original query a twice to get the number from both years.
select orders.year, orders.orders, orders.revenue,
(select (orders.revenue/subOrders.revenue)
from
(
--originalQuery or table link
) subOrders
where subOrders.year = (orders.year-1)
) as lastYear
from
(
--originalQuery or table link
) orders
here's a cheap union'd table example.
select orders.year, orders.orders, orders.revenue,
(select (orders.revenue/subOrders.revenue)
from
(
select 2014 as year, 2 as orders, 55.20 as revenue
union select 2015 as year, 2 as orders, 246.28 as revenue
union select 2016 as year, 7 as orders, 350.47 as revenue
) subOrders
where subOrders.year = (orders.year-1)
) as lastYear
from
(
select 2014 as year, 2 as orders, 55.20 as revenue
union select 2015 as year, 2 as orders, 246.28 as revenue
union select 2016 as year, 7 as orders, 350.47 as revenue
) orders

Using a column in sql join without adding it to group by clause

My actual table structures are much more complex but following are two simplified table definitions:
Table invoice
CREATE TABLE invoice (
id integer NOT NULL,
create_datetime timestamp with time zone NOT NULL,
total numeric(22,10) NOT NULL
);
id create_datetime total
----------------------------
100 2014-05-08 1000
Table payment_invoice
CREATE TABLE payment_invoice (
invoice_id integer,
amount numeric(22,10)
);
invoice_id amount
-------------------
100 100
100 200
100 150
I want to select the data by joining above 2 tables and selected data should look like:-
month total_invoice_count outstanding_balance
05/2014 1 550
The query I am using:
select
to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') as month,
count(i.id) as total_invoice_count,
(sum(i.total) - sum(pi.amount)) as outstanding_balance
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by date_trunc('month', i.create_datetime)
order by date_trunc('month', i.create_datetime);
Above query is giving me incorrect results as sum(i.total) - sum(pi.amount) returns (1000 + 1000 + 1000) - (100 + 200 + 150) = 2550.
I want it to return (1000) - (100 + 200 + 150) = 550
And I cannot change it to i.total - sum(pi.amount), because then I am forced to add i.total column to group by clause and that I don't want to do.
You need a single row per invoice, so aggregate payment_invoice first - best before you join.
When the whole table is selected, it's typically fastest to aggregate first and join later:
SELECT to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') AS month
, count(*) AS total_invoice_count
, (sum(i.total) - COALESCE(sum(pi.paid), 0)) AS outstanding_balance
FROM invoice i
LEFT JOIN (
SELECT invoice_id AS id, sum(amount) AS paid
FROM payment_invoice pi
GROUP BY 1
) pi USING (id)
GROUP BY date_trunc('month', i.create_datetime)
ORDER BY date_trunc('month', i.create_datetime);
LEFT JOIN is essential here. You do not want to loose invoices that have no corresponding rows in payment_invoice (yet), which would happen with a plain JOIN.
Accordingly, use COALESCE() for the sum of payments, which might be NULL.
SQL Fiddle with improved test case.
Do the aggregation in two steps. First aggregate to a single line per invoice, then to a single line per month:
select
to_char(date_trunc('month', t.create_datetime), 'MM/YYYY') as month,
count(*) as total_invoice_count,
(sum(t.total) - sum(t.amount)) as outstanding_balance
from (
select i.create_datetime, i.total, sum(pi.amount) amount
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by i.id, i.total
) t
group by date_trunc('month', t.create_datetime)
order by date_trunc('month', t.create_datetime);
See sqlFiddle
SELECT TO_CHAR(invoice.create_datetime, 'MM/YYYY') as month,
COUNT(invoice.create_datetime) as total_invoice_count,
invoice.total - payments.sum_amount as outstanding_balance
FROM invoice
JOIN
(
SELECT invoice_id, SUM(amount) AS sum_amount
FROM payment_invoice
GROUP BY invoice_id
) payments
ON invoice.id = payments.invoice_id
GROUP BY TO_CHAR(invoice.create_datetime, 'MM/YYYY'),
invoice.total - payments.sum_amount

sql query to calculate monthly growth percentage

I need to build a query with 4 columns (sql 2005).
Column1: Product
Column2: Units sold
Column3: Growth from previous month (in %)
Column4: Growth from same month last year (in %)
In my table the year and months have custom integer values. For example, the most current month is 146 - but also the table has a year (eg 2011) column and month (eg 7) column.
Is it possible to get this done in one query or do i need to start employing temp tables etc??
Appreciate any help.
thanks,
KS
KS,
To do this on the fly, you could use subqueries.
SELECT product, this_month.units_sold,
(this_month.sales-last_month.sales)*100/last_month.sales,
(this_month.sales-last_year.sales)*100/last_year.sales
FROM (SELECT product, SUM(units_sold) AS units_sold, SUM(sales) AS sales
FROM product WHERE month = 146 GROUP BY product) AS this_month,
(SELECT product, SUM(units_sold) AS units_sold, SUM(sales) AS sales
FROM product WHERE month = 145 GROUP BY product) AS last_month,
(SELECT product, SUM(units_sold) AS units_sold, SUM(sales) AS sales
FROM product WHERE month = 134 GROUP BY product) AS this_year
WHERE this_month.product = last_month.product
AND this_month.product = last_year.product
If there's a case where a product was sold in one month but not another month, you will have to do a left join and check for null values, especially if last_month.sales or last_year.sales is 0.
I hope I got them all:
SELECT
Current_Month.product_name, units_sold_current_month,
units_sold_last_month * 100 / units_sold_current_month prc_last_month,
units_sold_last_year * 100 / units_sold_current_month prc_last_year
FROM
(SELECT product_id, product_name, sum(units_sold) units_sold_current_month FROM MyTable WHERE YEAR = 2011 AND MONTH = 7) Current_Month
JOIN
(SELECT product_id, product_name, sum(units_sold) units_sold_last_month FROM MyTable WHERE YEAR = 2011 AND MONTH = 6) Last_Month
ON Current_Month.product_id = Last_Month.product_id
JOIN
(SELECT product_id, product_name, sum(units_sold) units_sold_last_year FROM MyTable WHERE YEAR = 2010 AND MONTH = 7) Last_Year
ON Current_Month.product_id = Last_Year.product_id
I am slightly guessing as the structure of the table provided is the result table, right? You will need to do self-join on month-to-previous-month basis:
SELECT <growth computation here>
FROM SALES s1 LEFT JOIN SALES s2 ON (s1.month = s2.month-1) -- last month join
LEFT JOIN SALES s3 ON (s1.month = s3.month - 12) -- lat year join
where <growth computation here> looks like
((s1.sales - s2.sales)/s2.sales * 100),
((s1.sales - s3.sales)/s3.sales * 100)
I use LEFT JOIN for months that have no previous months. Change your join conditions based on actual relations in month/year columns.