FInding market share and year change with SQL - sql

Here for database schema
The Case Problem:
What was the total number of purchases of dairy products for each month of 2020 (i.e., the total_sales)?
What was the total share of dairy products (out of all products purchased) for each month of 2020 (i.e., the market_share)?
For each month of 2020, what was the percentage increase or decrease in total monthly dairy purchases compared to the same month in 2019 (i.e., the year_change)?
As a result, it interested in these three categories (which they treat as dairy): ‘whole milk’, 'yogurt' and 'domestic eggs'.
The instruction:
Order your query by month in ascending order. Both month and total_sales should be expressed as integers, and market_share and year_change should be percentages rounded to two decimal places (e.g., 27.95% becomes 27.95).
Your query will need to return a table that resembles the following, including the same column names.
Here for the code:
with purchases_2019 as (SELECT p1.month as month,COUNT(p1.purchase_id) as count_2
FROM purchases_2019 as p1
LEFT JOIN categories as cat ON p1.purchase_id=cat.purchase_id
WHERE cat.category IN ('whole milk', 'yogurt' ,'domestic eggs')
GROUP BY p1.month
ORDER BY p1.month ASC),
purchases_2020 as ( SELECT to_char(CAST(p2.fulldate AS DATE),'MM')::int as month,
COUNT(p2.purchaseid) as total_sales,
ROUND((COUNT(p2.purchaseid)*100::numeric/18277)::numeric,2) as market_share
FROM purchases_2020 as p2
LEFT JOIN categories as cat ON p2.purchaseid=cat.purchase_id
WHERE cat.category IN ('whole milk', 'yogurt' ,'domestic eggs')
GROUP BY month
ORDER BY month ASC)
SELECT t2.month,t2.total_sales,t2.market_share,
ROUND(((t2.total_sales-t1.count_2)*100::numeric/t1.count_2) ,2) as year_change
FROM purchases_2020 as t2
INNER JOIN purchases_2019 as t1 ON t2.month=t1.month
The result is obtained:
But it's still wrong answer. I don't have any idea. Can you give me some enlightenment? Thank You

with p as
(select
extract(month from to_date(b.full_date, 'YYYY/MM/DD')) as "month",
sum(case when c.category in ('whole milk', 'yogurt', 'domestic eggs') then 1 else 0 end) as "old_sales"
from purchases_2019 b left join categories c
on b.purchase_id = c.purchase_id
group by 1
order by 1),
temp as
(select
extract(month from to_date(a.fulldate,'YYYY/MM/DD')) as "month",
sum(case when c.category in ('whole milk', 'yogurt', 'domestic eggs') then 1 else 0 end) as "total_sales",
round(100 * sum(case when c.category in ('whole milk', 'yogurt', 'domestic eggs') then 1 else 0 end)::numeric
/ count(a.purchaseid),2) as "market_share"
from
purchases_2020 a left join categories c
on a.purchaseid = c.purchase_id
group by 1
order by 1)
select
temp.month, total_sales, market_share,
round(100 * (total_sales - old_sales)::numeric / old_sales, 2) as "year_change"
from temp left join p on temp.month = p.month;

Why 18277?
This part:
ROUND((COUNT(p2.purchaseid)*100::numeric/18277)::numeric,2) as market_share
Could there be an error in the market_share calculation?
I think in this code, only 3 categories are calculated, but market share should not be all sales/3 category sales?
Just an idea.

Related

Issues with postgreSQL subqueries

I have the following chunk of code, in which in trying to count the sales of beef, chicken and pork in each month of the last year (i also need to determine the market share of the meats each month)
SELECT
CAST(EXTRACT('MONTH' FROM TO_TIMESTAMP(FULLDATE, 'YYYY-MM-DD')) AS INT) AS month
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
(
SELECT
COUNT (purchaseid) AS total_sales
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
WHERE category = 'whole milk' OR category = 'yogurt' OR category = 'domestic eggs'
GROUP BY month
) a
GROUP BY month
ORDER BY month
The expected result is the following image
EDIT to add the exact error message
but in getting this error message
syntax error at or near "SELECT"
LINE 6: SELECT
^
[SQL: SELECT
CAST(EXTRACT('MONTH' FROM TO_TIMESTAMP(FULLDATE, 'YYYY-MM-DD')) AS INT) AS month
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
(
SELECT
COUNT (purchaseid) AS total_sales
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
WHERE category = 'whole milk' OR category = 'yogurt' OR category = 'domestic eggs'
GROUP BY month
) a
GROUP BY month
ORDER BY month
This is the data schema i'm working with.
EDIT
I'm aware i can query the total_sales like this:
SELECT
CAST(EXTRACT('MONTH' FROM TO_TIMESTAMP(FULLDATE, 'YYYY-MM-DD')) AS INT) AS month,
COUNT (purchaseid) AS total_sales
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
WHERE category = 'beef' OR category = 'pork' OR category = 'chicken'
GROUP BY month
ORDER BY month
But doing it like this locks me out of doing of writting the market_share formula on the select statement because of the WHERE statement no being inside a subquery.
This query should give you the count of sales by month and category. I can't test it because I don't have datas.
SELECT
c.category,
EXTRACT('MONTH' FROM FULLDATE) AS month,
count(purchaseid) AS total_sales
FROM purchases_2020 p JOIN categories c ON p.purchaseid = c.purchase_id
WHERE category in ('beef','pork','chicken')
GROUP BY month,c.category
ORDER BY month,c.category;

Decluttering a SQL query

For a practice project i wrote the following query and i was wondering if there is way to make it more efficient than writing everything 12 times like a for loop for sql.
CREATE TABLE temp (month INT, total_sales INT, market_share decimal(5,2), year_change decimal(5,2))
insert into temp (month)
Values (1)
UPDATE temp
SET total_sales = (
SELECT COUNT(purchases_2020.purchaseid)
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
WHERE (categories.category = 'whole milk' OR categories.category = 'yogurt' OR categories.category = 'domestic eggs') AND (purchases_2020.fulldate BETWEEN '2020-01-01' AND '2020-01-31')
)
WHERE month = 1
UPDATE temp
SET market_share = (
SELECT (SELECT 100 * COUNT(purchases_2020.purchaseid)
FROM purchases_2020
JOIN categories ON purchases_2020.purchaseid = categories.purchase_id
WHERE (categories.category = 'whole milk' OR categories.category = 'yogurt' OR categories.category = 'domestic eggs') AND (purchases_2020.fulldate BETWEEN '2020-01-01' AND '2020-01-31'))
* 1. /
(SELECT COUNT(purchases_2020.purchaseid)
FROM purchases_2020
WHERE purchases_2020.fulldate BETWEEN '2020-01-01' AND '2020-01-31')
)
WHERE month = 1
UPDATE temp
SET year_change = (
SELECT market_share -
(SELECT
(SELECT 100 * COUNT(purchases_2019.purchase_id)
FROM purchases_2019
JOIN categories ON purchases_2019.purchase_id = categories.purchase_id
WHERE (categories.category = 'whole milk' OR categories.category = 'yogurt' OR categories.category = 'domestic eggs') AND (purchases_2019.full_date BETWEEN '2019-01-01' AND '2019-01-31'))
* 1./
(SELECT COUNT(purchases_2019.purchase_id)
FROM purchases_2019
WHERE purchases_2019.full_date BETWEEN '2019-01-01' AND '2019-01-31'))
FROM temp
WHERE month = 1
)
WHERE month = 1
EDIT
I was given the 3 tables represented on the following database schema , and im trying to create a table with the total sales of dairy every month, the monthly market share of the dairy products and the difference between the 2020 monthly market share and the 2019 monthly market share (the year change colunm)
There is also an aritmethic error somewhere, when checking the project i get the following message ResultSet does not contain the correct numeric values! and im at my wits end looking for it butmy priority is to decluter the query.
Your error message tells me that you are trying to run this from a reporting tool or a host language.
It also makes no sense to put the data into separate tables by years.
SQL is a declarative language that works with data as sets.
Instead of pushing the results into table temp, try writing a query like this:
with all_data as (
select p.fulldate, p.purchaseid, c.category,
extract(year from p.fulldate) as year,
extract(month from p.fulldate) as month
from purchases_2020 p
join categories c on c.purchase_id = p.purchaseid
union all
select p.fulldate, p.purchaseid, c.category,
extract(year from p.fulldate) as year,
extract(month from p.fulldate) as month
from purchases_2019 p
join categories c on c.purchase_id = p.purchaseid
), kpis as (
select year, month,
count(purchaseid)
filter (where category in ('whole milk', 'yogurt', 'domestic eggs'))
as dairy_sales,
count(purchaseid) * 1.0 as total_sales
from all_data
group by year, month
)
select ty.month, ty.dairy_sales as total_sales,
100.0 * ty.dairy_sales / ty.total_sales as market_share,
100.0 * ( (ty.dairy_sales / ty.total_sales)
- (ly.dairy_sales / ly.total_sales)) as year_change
from kpis ty
join kpis ly
on (ly.year, ly.month) = (ty.year - 1, ty.month);

window function count aggregation

I have two quite complex queries going on here. Both should return per given country (Netherlands in this case) the monthly use over the total of 12 months and the percentage that makes up that month of the total of those 12 months.
But the one where i use count as a windowing function returns one more row than the one where i don't use the count as a windowing function.
The query for the left side picture is:
;WITH MonthUsage AS
(
SELECT customer_id, country_name, [Month], [Year], SUM(ItemsPerMonth)
AS ItemsPerMonth
FROM (
SELECT cs.country_name, c.customer_id, YEAR(mol.date_watched) AS Year, MONTH(mol.date_watched) AS [Month], COUNT(*) AS ItemsPerMonth
FROM Customer c
JOIN Customer_Subscription CS ON C.customer_id = CS.customer_id
JOIN Movie_Order_Line mol ON mol.customer_id = c.customer_id
WHERE (mol.date_watched BETWEEN '2017-07-01' AND '2018-07-01') AND cs.country_name = 'Nederland'
GROUP BY c.customer_id,cs.country_name, YEAR(mol.date_watched),
MONTH(mol.date_watched)
UNION ALL
SELECT cs.country_name, c.customer_id, YEAR(sol.date_watched) AS Year,
MONTH(sol.date_watched), COUNT(*) AS ItemsPerMonth
FROM Customer c
JOIN Customer_Subscription CS ON C.customer_id = CS.customer_id
JOIN Show_Order_Line sol ON sol.customer_id = c.customer_id
WHERE sol.date_watched BETWEEN '2017-07-01' AND '2018-07-01'
GROUP BY c.customer_id, cs.country_name, YEAR(sol.date_watched),
MONTH(sol.date_watched)
) AS MonthItems
WHERE country_name = 'Nederland'
GROUP BY customer_id, country_name, [Month], [Year]
),
Months(MonthNumber) AS
(
SELECT 1
UNION ALL
SELECT MonthNumber + 1
FROM months
WHERE MonthNumber < 12
)
SELECT cmb.[Year], ISNULL(cmb.[Month], m.MonthNumber) AS [Month],
ISNULL(ItemsPerMonth, 0) AS ItemsPerMonth,
ISNULL(FORMAT(((CAST(ItemsPerMonth AS decimal) / CAST((
SELECT SUM(ItemsPerMonth)
FROM MonthUsage) AS decimal
))),'P0'), '0%') AS [PercentageOfTotal]
FROM MonthUsage cmb
JOIN Customer c ON c.customer_id = cmb.customer_id
JOIN Months m ON m.MonthNumber = cmb.[Month]
ORDER BY cmb.[Year] ASC, [Month] ASC
And the query for the right picture is:
;WITH MonthUsage AS (
SELECT *
FROM (
SELECT
YEAR(date_watched) AS [Year],
MONTH(date_watched) AS [Month],
COUNT(order_id) OVER(PARTITION BY CONCAT(YEAR(date_watched), MONTH(date_watched))) AS ItemsPerMonth
FROM Movie_Order_Line mov
JOIN Customer c ON c.customer_id = mov.customer_id
JOIN Customer_Subscription cs ON C.customer_id = cs.customer_id
WHERE date_watched BETWEEN '2017-07-01' AND '2018-07-01' AND cs.country_name = 'Nederland'
UNION ALL
SELECT YEAR(date_watched) AS [Year], MONTH(date_watched) AS [Month], COUNT(order_id) OVER(PARTITION BY CONCAT(YEAR(date_watched), MONTH(date_watched))) AS ItemsPerMonth
FROM Show_Order_Line sol
JOIN Customer c ON c.customer_id = sol.customer_id
JOIN Customer_Subscription cs ON C.customer_id = cs.customer_id
WHERE date_watched BETWEEN '2017-07-01' AND '2018-07-01' AND cs.country_name = 'Nederland'
) AS Combined
GROUP BY [YEAR], [Month], ItemsPerMonth
)
SELECT *, ISNULL(FORMAT(((CAST(ItemsPerMonth AS decimal) / CAST((SELECT
SUM(ItemsPerMonth) FROM MonthUsage) AS decimal))),'P0'), '0%') AS
[PercentageOfTotal]
FROM MonthUsage
ORDER BY [Year] ASC, [Month] ASC
I can't seem to figure out why i'm getting different results. Any help is much appreciated. Thank you in advance for your time.

full outer join with group by query

Calculate profit of company by month, profit of company calculated like this:
sum of incoming minus outgoing.
Tables:
incoming(amount, month)
outgoing(amount, month)
Data types:
month integer, ranges 1-12
amount integer
I tried
SELECT month,tsum-bsum FROM
(SELECT month,SUM(amount) tsum FROM incoming GROUP BY month
FULL OUTER JOIN
(SELECT month,SUM(amount) bsum FROM outgoing GROUP BY month)
)
ON incoming.month=outgoing.month;
but month can be null after joining which will cause problem with tsum-bsum.
Use coalesce():
SELECT coalesce(i.month, o.month) as month, i.tsum - o.bsum
FROM (SELECT month, SUM(amount) as tsum FROM incoming GROUP BY month) i
FULL OUTER JOIN
(SELECT month, SUM(amount) as bsum FROM outgoing GROUP BY month) o
ON i.month = o.month;
Note that the difference may be NULL if either value is NULL, so you may want:
SELECT coalesce(i.month, o.month) as month,
coalesce(i.tsum, 0) - coalesce(o.bsum, 0) as diff
FROM (SELECT month, SUM(amount) as tsum FROM incoming GROUP BY month) i
FULL OUTER JOIN
(SELECT month, SUM(amount) as bsum FROM outgoing GROUP BY month) o
ON i.month = o.month;
here is sample query
SELECT COALESCE(in.month, out.month) as month, sum(COALESCE(in.amount,0) - COALESCE(out.amount,0)) as profit
FROM
incoming as in
FULL OUTER JOIN
outgoing as out
ON in.month = out.month;
GROUP BY COALESCE(in.month, out.month)

How to return the most ordered item for each month

I am trying to return the most ordered product per month, of the year 2007. I would like to see the name of the product, how many of them where ordered that month, and the month. I am using the AdventureWorks2012 database. I have tried a few different ways but each time multiple product orders are returned for the same month, instead of the one product that had the most order quantity that month. Sorry if this is not clear. I am trying to test myself so I make up my own questions and try to answer them. If anyone knows a site that have questions and answers like this so I can verify that would be super helpful! Thanks for any help. Here is the farthest I have been able to get with the query.
WITH Ord2007Sum
AS (SELECT sum(od.orderqty) AS sorder,
od.productid,
oh.orderdate,
od.SalesOrderID
FROM Sales.SalesOrderDetail AS od
INNER JOIN
sales.SalesOrderHeader AS oh
ON od.SalesOrderID = oh.SalesOrderID
WHERE year(oh.OrderDate) = 2007
GROUP BY ProductID, oh.OrderDate, od.SalesOrderID)
SELECT max(sorder),
s.productid,
month(h.orderdate) AS morder --, s.salesorderid
FROM Ord2007Sum AS s
INNER JOIN
sales.SalesOrderheader AS h
ON s.OrderDate = h.OrderDate
GROUP BY s.ProductID, month(h.orderdate)
ORDER BY morder;
Make a CTE that groups our products by month and creates a sum
;WITH OrderRows AS
(
SELECT
od.ProductId,
MONTH(oh.OrderDate) SalesMonth,
SUM(od.orderqty) OVER (PARTITION BY od.ProductId, MONTH(oh.OrderDate) ORDER BY oh.OrderDate) ProdMonthSum
FROM SalesOrderDetail AS od
INNER JOIN SalesOrderHeader AS oh
ON od.SalesOrderID = oh.SalesOrderID
WHERE year(oh.OrderDate) = 2007
),
Make a simple numbers table to break out each month of the year
Months AS
(
SELECT 1 AS MonthNum UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8
UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12
)
We query our months table against the data and select the top product for each month based on the sum
SELECT
m.MonthNum,
d.ProductID,
d.ProdMonthSum
FROM Months m
OUTER APPLY
(
SELECT TOP 1 r.ProductID, r.ProdMonthSum
FROM OrderRows r
WHERE r.SalesMonth = m.MonthNum
ORDER BY ProdMonthSum DESC
) d
Your group by statement should not include oh.OrderDate, od.SalesOrderID because this will aggregate your data to the incorrect level. You want the ProductID that was most commonly sold per month so the group by conditions become ProductID, datepart(mm,oh.OrderDate). As Andrew suggested the Row_Number function is useful in this case as it lets you create a key that is ordered by month and sorder and which resets each month. Finally in the outer query limits the results to the first instance (which is the highest quantity)for each month.
WITH Ord2007Sum
AS(
SELECT sum(od.orderqty) AS sorder,
od.productid,
datepart(mm,oh.OrderDate) AS 'Month'
row_number() over (partition by datepart(mm,oh.OrderDate)
Order by datepart(mm,oh.OrderDate)desc, sorder desc) row
FROM Sales.SalesOrderDetail AS od
INNER JOIN
sales.SalesOrderHeader AS oh
ON od.SalesOrderID = oh.SalesOrderID
WHERE datepart(yyyy,oh.OrderDate) = 2007
GROUP BY ProductID, datepart(mm,oh.OrderDate)
)
SELECT productid,
sorder,
[month]
FROM Ord2007Sum
WHERE row =1