Historic and avg data in sql - sql

I want to get report data in below scenarios using the sample table provided below(data is huge in my db)
List item(same week, prior year) sales and (same day, same week, prior year) sales.
Rolling 6 month avg weekly selling
id
date_week
date_value
sales
Item1
2020/01-04
20200120
230
Item2
2020/06-03
20200608
23.0
Item3
2019/11-03
20191111
null
Item4
2020/07-04
20200720
123
Item5
2019/08-01
20190729
456
Item6
2019/09-03
20190909
1234
Item7
2020/06-02
20200601
4556
Item8
2020/09-01
20200824
23
Item9
2021/09-02
20210906
1223
in above table date_week is year/month_week ( so here i get the week number)
Am trying the below query to achieve
SELECT
DATEPART(week, date_value) AS Week,
id ,
sum(sales) AS sales
FROM table
WHERE date_value <= date_value
AND date_value < DATEADD(year, 1, date_value)
GROUP BY DATEPART(week, date_value), id
ORDER BY DATEPART(week, date_value);
Please suggest me how to achieve the scenarios am looking for.

You can do these with a join. First I would separate my date columns to Year | Month | Week | Day. If this is the only format you have available for your dates, you can use left(), right() functions. Always better to have datetime format tho.
After your query can look like:
SELECT t1.Year, t1.Week, t1.Id, t1.Sales, t2.Sales as Last_year_this_week_sales
from (
SELECT
cast(right(date_value,2) as int) AS Week,
cast(left(date_value,4) as int) as Year
id ,
sum(sales) AS sales
FROM table
GROUP BY right(date_value,2),
left(date_value,4), id ) t1
left join (
SELECT
cast(right(date_value,2) as int) AS Week,
cast(left(date_value,4) as int) as Year
id ,
sum(sales) AS sales
FROM table
GROUP BY right(date_value,2),
left(date_value,4), id ) t2 ON t1.week = t2.week and t2.year = t1.year-1 and t1.id = t2.id';
I am assuming you want to have results on id level. if not, you need to remove it from your grouping and join. And you can do the same for the daily results replacing week with day.
You can do with a subquery - if you need to take the average of weekly totals in the last 6 months. And again, if you need it on item level, keep the id in your select and group by statements.
If not, you can do:
SELECT avg(sales) as Sales_avg
FROM
(SELECT
cast(right(date_value,2) as int) AS Week,
cast(left(date_value,4) as int) as Year,
sum(sales) AS sales
FROM table
where date_value>='20210101' --replace with the date you need
GROUP BY right(date_value,2),
left(date_value,4) ) t1;

Related

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Period over period SQL script?

I have a dataset of 2 columns: 'Date' and 'Total Sales'. My dates are 01-01-2021, 02-01-2021... so on and so forth up until 12-01-2022. I basically want to add another row where I have a "previous month" column that gives me the total sales for the previous month in the same row as the current month (else null) -- e.g. say I have 2 rows in my date column 01-01-2021 and 02-01-2021 and total sales would be $10 and $20 respectively. How do can I create a column that would show the following:
Date |Sales | Previous Month Sales|
---------------------------------------------
01-01-2021 | $10 | null
02-01-2021 | $20 | $10
So on and so forth; this is my query:
CASE
WHEN `Date` > DATE_SUB(`Date`, INTERVAL 1 MONTH)
THEN `Monthly Sales`
ELSE 'null'
END
Thanks in advance
Well, Domo's back-end is running a MySQL back-engine (from what I recall the last time I touched Domo [2018])
I think this is just a SQL question, and I wonder if a simple windowing function would do the trick.
select Date,
Sales,
max (case when *month* = *this month -1* then Sales else null end) over (order by 1) as "Previous Month Sales"
from table
You just need to figure out how to break down the Date into the month based on whatever SQL dialect Domo uses nowadays.
Cheers
I think domo support MySQL-like language, so you could do something like this:
with cte as
(
select date,
date + interval 1 month as next_month,
sales
from sales
)
select a.date,
a.sales as current_sales,
b.sales as prior_month_sales
from sales a
left join cte b
on b.next_month = a.date
order by a.date
I do this by joining the table onto itself with a LEFT OUTER JOIN. The outer join allows you to keep the null value for previous month. You match the date such that 1 column is calculated to show the previous month (I do this with EOMONTH() to ensure I always get the previous month and account for the year, if say it is January).
IF OBJECT_ID('TEMPDB..#TEMP') IS NOT NULL
DROP TABLE #TEMP
CREATE TABLE #TEMP(
[Date] DATE
,[Sales] INT
)
INSERT INTO #TEMP([Date],[Sales])
VALUES ('2020-12-20',50)
,('2021-01-20',100)
,('2021-02-20',200)
,('2021-03-20',300)
,('2021-04-20',400)
,('2021-05-20',500)
SELECT #TEMP.[Date]
,#TEMP.Sales
,TEMPII.Date [PREV M]
,TEMPII.Sales [PREV M SALES]
FROM #TEMP
LEFT OUTER JOIN #TEMP TEMPII
ON YEAR(EOMONTH(#TEMP.[Date],-1))*100+MONTH(EOMONTH(#TEMP.[Date],-1)) = YEAR(TEMPII.[Date])*100+MONTH(TEMPII.[Date])
ORDER BY #TEMP.[Date]
Output:

SQL join for comparing this year vs last year sales by store and by product

I'm trying to compare this year vs last sales by store and by product.
The idea behind my SQL is to create a base table for 24 months rolling data and perform a join on transaction date - 1 year. This is somewhat complicated that my data is aggregated by date, by store and by product.
My code is as below. And my issue is that when i do a left join, the numbers for this year and last year doesn't match up. For example, feb-19 last year sales should equal to feb-18 this year sales, but I am not getting this result.
My guess is that last year has certain stores and products that are not available this year but I have no idea how to resolve this. I tried a full join, but the numbers are also off. Appreciate any feedback please!
-- extract sales by day, store, product
select business_date, store_code, product_code,
sum(sales) as sales
into #temp1
from sales_table
where business_date >= date_trunc('month', dateadd('month', -24, sysdate))
group by business_date, store_code, product_code;
-- compare this year against last year
select ty.*, ly.sales as sales_ly
into #temp2
from #temp1 ty left join #temp1 ly
on ty.product_code = ly.product_code
and ty.store_code = ly.store_code
and trunc(dateadd(year, -1, ty.business_date)) = ly.business_date;
-- check
select to_char(business_date, 'yyyymm'), sum(sales) ty, sum(sale_ly) as ly
from #temp2
group by to_char(business_date,'yyyymm')
order by 1;
You have join - trunc(dateadd(year, -1, ty.business_date)) = ly.business_date;
You are trying to compare sales of particular day of this year with last year (22/02/2019 with 22/02/2018)
In your table, do you have data for both of these days, aggregated sample data from your table could help in writing the query.
Query -
with sales_tab as (
select '2019/02/22' as date1, 123 as store, 456 as prod, 20 sales from dual
union
select '2019/02/23' as date1, 123 as store, 456 as prod, 30 as sales from dual
union
select '2019/02/24' as date1, 123 as store, 456 as prod, 40 sales from dual
union
select '2018/02/22' as date1, 123 as store, 456 as prod, 60 sales from dual
union
select '2018/02/23' as date1, 123 as store, 456 as prod, 70 as sales from dual
union
select '2018/02/25' as date1, 123 as store, 456 as prod, 80 sales from dual)
select
t1.date1, t2.date1, t1.store, t1.prod, t1.sales this_year, t2.sales prev_year
from sales_tab t1 left outer join sales_tab t2 on
(t1.store=t2.store
and t1.prod=t2.prod
and cast(substr(t1.date1,1,4) as int)-1=cast(substr(t2.date1, 1,4) as int)
and substr(t1.date1,6,2)=substr(t2.date1,6,2)
and substr(t1.date1,9,2)=substr(t2.date1,9,2)
);

Using Over & Partition in SQL

I have a SQL table with monthly prices for products from different categories (e.g. fruits, vegetables, dairy).
I'd like to calculate a running monthly average for a specific category and for all the products in the same month, in the same query.
So combining:
Select date, avg(price) group by date where category = 'Fruit'
Select date, avg(price) group by date
Is that possible to do using OVER & Partition (or any other way for that matter)
Edit
I am using MS SQL
My data is monthly, so I don't need to extract month end dates -I can just group on date then I will get month end data
As an example, if my table looks like this:
|Date| Item | Category |Price |
|Jan |Banana | Fruit | 10|
|Jan |Potato | Veg | 20 |
Then the output would be
Date | Fruit Avg | Overall Avg |
Jan | 10 | 15
Apologies in advance for mangling the tables, but that's for another thread.
Thanks
why you need over?
Select date, category , avg(price) group by date, category
Try to use
Select extract(year from date)||extract(month from date) as month,
category, avg(price) as avg_price
from sales
group by month, category
P.S. using alias month may not be allowed for some database systems, in that case replace month with extract(year from date)||extract(month from date) in the group by list. Also concatenation operator may differ, replace pipes( || ) with a plus sign ( + ) whenever such a situation is met.
If you want one row per month and the data has multiple rows, then you need to aggregate:
select date_trunc('month', date) as month,
avg(price) as avg_per_month,
avg(avg(price)) over (order by date_trunc('month', date))
from t
where category = 'Fruit'
group by date_trunc('month', date)
order by date_trunc('month', date);
If you only have one row per month, then you can do:
select date, price,
avg(price) over (order by date)
from t
where category = 'Fruit'
order by date;

How to calculate average value based on duration between measurements?

I have data similar to this:
Price DateChanged Product
10 2012-01-01 A
12 2012-02-01 A
30 2012-03-01 A
10 2012-09-01 A
12 2013-01-01 A
110 2012-01-01 B
112 2012-02-01 B
130 2012-03-01 B
110 2012-09-01 B
112 2013-01-01 B
I want to calculate average value, but the challenge is this:
Look at the first record, price 10 is valid for a duration of one month, price 12 is valid for a duration of one month while price 30 is valid for a duration of six months.
So, a basic average for product A (10+12+30+10+12)/5 would result in 14.8 while taking duration in to account then the average price would be ~20.1.
What is the best approach to solve this?
I know I could create a sub-query with a row_number() to join against to calculate a duration, but is there a better way? SQL Server has powerful features like STDistance, so surely there is a function for this?
What you are looking for is called weighted average, and AFAIK, there is no built-in function in SQL Server that calculates it for you. However, is not that hard to calculate it by hand.
First, you need to find the weight of each data point, in this case, you need to find the duration of each price period. You might have some additional columns in your data that could enable easier lookup, but you could do it like this as well:
SELECT p1.Product, p1.Price, p1.DateChanged AS DateStart,
isnull(min(p2.DateChanged),getdate()) AS DateEnd
INTO #PricePlanStartEnd
FROM PricePlan p1
LEFT OUTER JOIN PricePlan p2
ON p1.DateChanged < p2.DateChanged
AND p1.Product =p2.Product
GROUP BY p1.Product, p1.Price, p1.DateChanged
ORDER BY p1.Product, p1.DateChanged
This creates a #PricePlanStartEnd temporary table that has the start and the end of each price period. I've used getdate() as the end of the current time period. If you need to just calculate an average up to the last price change, just use INNER JOIN instead of the LEFT OUTER JOIN.
After that you just need to divide the sum of (price * period) by the total length of the period, and get the answer.
Here is an SQL Fiddle with the calculation
Also when your working with months, you must remember that not all months are equal, so the price for December was active longer than it was for February.
Using CTE and row_number() to get monthly average up to the last dateChanged. Fiddle-Demo
;with cte as (
select product, dateChanged, price,
row_number() over (partition by product order by datechanged) rn
from x
)
select t1.product,
sum(t1.price *1.0 * datediff(month, t1.dateChanged,t2.dateChanged))/12 monthlyAvg
from cte t1 join cte t2 on t1.product = t2.product
and t1.rn +1 = t2.rn
group by t1.product
--Results
Product MonthlyAvg
A 20.166666
B 120.166666
OR if you need up to date daily average then use a LEFT JOIN Fiddle-Demo;
;with cte as (
select product, dateChanged, price,
row_number() over (partition by product order by datechanged) rn
from x
)
select t1.product,
sum(t1.price *1.0 *
datediff(day, t1.dateChanged,isnull(t2.dateChanged,getdate())))/365 dailyAvg
from cte t1 left join cte t2 on t1.product = t2.product
and t1.rn +1 = t2.rn
group by t1.product
--Results
product dailyAvg
A 21.386301
B 130.975342