How can I calculate sales on the basis of date comparing the previous, current, and upcoming dates?
order date | total qty
------------------------------
02/01/2021 | 5
02/04/2021 | 10
02/06/2021 | 7
02/08/2021 | 10
02/10/2021 | 2
Your bucket column could be given by:
CONCAT(
DATE_PART('day', AGE('2021-02-01', orderdate))*7+1,
'-',
(DATE_PART('day', AGE('2021-02-01', orderdate))+1)*7,
' days'
)
Your cumu total by:
SUM(total) OVER(PARTITION BY DATE_PART('day', AGE('2021-02-01', orderdate)) ORDER BY orderdate)
A sum has an implied "rows unbounded preceding" if it has an order by
I presume you're starting your report somewhere (eg your front end does where orderdate > x so it can supply the min date for the functions too. If it doesn't then you might benefit from a cte that calls the min orderdate
Related
I have a dataset in bigquery which contains order_date: DATE and customer_id.
order_date | CustomerID
2019-01-01 | 111
2019-02-01 | 112
2020-01-01 | 111
2020-02-01 | 113
2021-01-01 | 115
2021-02-01 | 119
I try to count distinct customer_id between the months of the previous year and the same months of the current year. For example, from 2019-01-01 to 2020-01-01, then from 2019-02-01 to 2020-02-01, and then who not bought in the same period of next year 2020-01-01 to 2021-01-01, then 2020-02-01 to 2021-02-01.
The output I am expect
order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191 |250
2020-02-01| 4859 |500
2020-03-01| 3567 |349
..........| .... |......
and the next periods shouldn't include the previous.
I tried the code below but it works in another way
with customers as (
select distinct date_trunc(date(order_date),month) as dates,
CUSTOMER_WID
from t
where date(order_date) between '2018-01-01' and current_date()-1
)
select
dates,
customers_previous,
customers_next_period
from
(
select dates,
count(CUSTOMER_WID) as customers_previous,
count(case when customer_wid_next is null then 1 end) as customers_next_period,
from (
select prev.dates,
prev.CUSTOMER_WID,
next.dates as next_dates,
next.CUSTOMER_WID as customer_wid_next
from customers as prev
left join customers
as next on next.dates=date_add(prev.dates,interval 1 year)
and prev.CUSTOMER_WID=next.CUSTOMER_WID
) as t2
group by dates
)
order by 1,2
Thanks in advance.
If I understand correctly, you are trying to count values on a window of time, and for that I recommend using window functions - docs here and here a great article explaining how it works.
That said, my recommendation would be:
SELECT DISTINCT
periods,
COUNT(DISTINCT CustomerID) OVER 12mos AS count_customers_last_12_mos
FROM (
SELECT
order_date,
FORMAT_DATE('%Y%m', order_date) AS periods,
customer_id
FROM dataset
)
WINDOW 12mos AS ( # window of last 12 months without current month
PARTITION BY periods ORDER BY periods DESC
ROWS BETWEEN 12 PRECEEDING AND 1 PRECEEDING
)
I believe from this you can build some customizations to improve the aggregations you want.
You can generate the periods using unnest(generate_date_array()). Then use joins to bring in the customers from the previous 12 months and the next 12 months. Finally, aggregate and count the customers:
select period,
count(distinct c_prev.customer_wid),
count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
customers c_prev
on c_prev.order_date <= period and
c_prev.order_date > date_add(period, interval -12 month) left join
customers c_next
on c_next.customer_wid = c_prev.customer_wid and
c_next.order_date > period and
c_next.order_date <= date_add(period, interval 12 month)
group by period;
My table is currently looking like this:
+---------+---------------+------------+------------------+
| Segment | Product | Pre_Date | ON_Prepaid |
+---------+---------------+------------+------------------+
| RB | 01. Auto Loan | 2020-01-01 | 10645976180.0000 |
| RB | 01. Auto Loan | 2020-01-02 | 4489547174.0000 |
| RB | 01. Auto Loan | 2020-01-03 | 1853117000.0000 |
| RB | 01. Auto Loan | 2020-01-04 | 9350258448.0000 |
+---------+---------------+------------+------------------+
I'm trying to sum values of 'ON_Prepaid' over the course of 7 days, let's say from '2020-01-01' to '2020-01-07'.
Here is what I've tried
drop table if exists ##Prepay_summary_cash
select *,
[1W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 1 following and 7 following),
[2W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 8 following and 14 following),
[3W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 15 following and 21 following),
[1M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 22 following and 30 following),
[1.5M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 31 following and 45 following),
[2M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 46 following and 60 following),
[3M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 61 following and 90 following),
[6M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 91 following and 181 following)
into ##Prepay_summary_cash
from ##Prepay1
Things should be fine if the dates are continuous; however, there are some missing days in 'Pre_Date' (you know banks don't work on Sundays, etc.).
So I'm trying to work on something like
[1W] = SUM(ON_Prepaid) over (where Pre_date between dateadd(d,1,Pre_date) and dateadd(d,7,Pre_date))
something like that. So if per se there's no record on 2020-01-05, the result should only sum the dates on the 1,2,3,4,6,7 of Jan 2020, instead of 1,2,3,4,6,7,8 (8 because of "rows 7 following"). Or for example I have missing records over the span of 30 days or something, then all those 30 should be summed as 0s. So 45 days should return only the value of 15 days.
I've tried looking up all over the forum and the answers did not suffice. Can you guys please help me out? Or link me to a thread which the problem had already been solved.
Thank you so much.
Things should be fine if the dates are continuous
Then make them continuous. Left join your real data (grouped up so it is one row per day) onto your calendar table (make one, or use a recursive cte to generate you a list of 360 dates from X hence) and your query will work out
WITH d as
(
SELECT *
FROM
(
SELECT *
FROM cal
CROSS JOIN
(SELECT DISTINCT segment s, product p FROM ##Prepay1) x
) c
LEFT JOIN ##Prepay1 p
ON
c.d = p.pre_date AND
c.segment = p.segment AND
c.product = p.product
WHERE
c.d BETWEEN '2020-01-01' AND '2021-01-01' -- date range on c.d not c.pre_date
)
--use d.d/s/p not d.pre_date/segment/product in your query (sometimes the latter are null)
select *,
[1W_Prepaid] = sum(ON_Prepaid) over (partition by s, s order by d.d rows between 1 following and 7 following),
...
CAL is just a table with a single column of dates, one per day, no time, extending for n thousand days into the past/future
Wish to note that months have variable number of days so 6M is a bit of a misnomer.. might be better to call the month ones 180D, 90D etc
Also want to point out that your query performs a per row division of your data into into groups. If you want to perform sums up to 180 days after the date of the row you need to pull a year's worth of data so that on row 180(June) you have the December data available to sum (dec being 6 months from June)
If you then want to restrict your query to only showing up to June (but including data summed from 6 months after June) you need to wrap it all again in a sub query. You cannot "where between jan and jun" in the query that does the sum over because where clauses are done before window clauses (doing so will remove the dec data before it is summed)
Some other databases make this easier, Oracle and Postgres spring to mind; they can perform sum in a range where the other rows values are within some distance of the current row's values. SQL server only usefully supports distancing based on a row's index rather than its values (the distancing-based-on-values support is limited to "rows that have the same value", rather than "rows that have values n higher or lower than the current row"). I suppose the requirement could be met with a cross apply, or a coordinated sub in the select, though I'd be careful to check the performance..
SELECT *,
(SELECT SUM(tt.a) FROM x tt WHERE t.x = tt.x AND tt.y = t.y AND tt.z BETWEEN DATEADD(d, 1, t.z) AND DATEADD(d, 7, t.z) AS 1W
FROM
x t
I have a table like this, with column names as Date of Sale and insurance Salesman Names -
Date of Sale | Salesman Name | Sale Amount
2021-03-01 | Jack | 40
2021-03-02 | Mark | 60
2021-03-03 | Sam | 30
2021-03-03 | Mark | 70
2021-03-02 | Sam | 100
I want to do a group by, using the date of sale. The next column should display the cumulative count of the sellers who have made the sale till that date. But same sellers shouldn't be considered again.
For example,
The following table is incorrect,
Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01 | 1 | 40
2021-03-02 | 3 | 200
2021-03-03 | 5 | 300
The following table is correct,
Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01 | 1 | 40
2021-03-02 | 3 | 200
2021-03-03 | 3 | 300
I am not sure how to frame the SQL query, because there are two conditions involved here, cumulative count while ignoring the duplicates. I think the OVER clause along with the unbounded row preceding may be of some use here? Request your help
Edit - I have added the Sale Amount as a column. I need the cumulative sum for the Sales Amount also. But in this case , all the sale amounts should be considered unlike the salesman name case where only unique names were being considered.
One approach uses a self join and aggregation:
WITH cte AS (
SELECT t1.SaleDate,
COUNT(CASE WHEN t2.Salesman IS NULL THEN 1 END) AS cnt,
SUM(t1.SaleAmount) AS amt
FROM yourTable t1
LEFT JOIN yourTable t2
ON t2.Salesman = t1.Saleman AND
t2.SaleDate < t1.SaleDate
GROUP BY t1.SaleDate
)
SELECT
SaleDate,
SUM(cnt) OVER (ORDER BY SaleDate) AS NumSalesman,
SUM(amt) OVER (ORDER BY SaleDate) AS TotalAmount
FROM cte
ORDER BY SaleDate;
The logic in the CTE is that we try to find, for each salesman, an earlier record for the same salesman. If we can't find such a record, then we assume the record in question is the first appearance. Then we aggregate by date to get the counts per day, and finally take a rolling sum of counts in the outer query.
The best way to do this uses window functions to determine the first time a sales person appears. Then, you just want cumulative sums:
select saledate,
sum(case when seqnum = 1 then 1 else 0 end) over (order by saledate) as num_salespersons,
sum(sum(sales)) over (order by saledate) as running_sales
from (select t.*,
row_number() over (partition by salesperson order by saledate) as seqnum
from t
) t
group by saledate
order by saledate;
Note that this in addition to being more concise, this should have much, much better performance than a solution that uses a self-join.
i need variance for last two month and i am using below query
with Positions as
(
select
COUNT(DISTINCT A_SALE||B_SALE) As SALES,
TO_CHAR(DATE,'YYYY-MON') As Period
from ORDERS
where DATE between date '2020-02-01' and date '2020-02-29'
group by TO_CHAR(DATE,'YYYY-MON')
union all
select
COUNT(DISTINCT A_SALE||B_SALE) As SALES,
TO_CHAR(DATE,'YYYY-MON') As Period
from ORDERS
where DATE between date '2020-03-01' and date '2020-03-31'
group by TO_CHAR(DATE,'YYYY-MON')
)
select
SALES,
period,
case when to_char(round((SALES-lag(SALES,1, SALES) over (order by period desc))/ SALES*100,2), 'FM999999990D9999') <0
then to_char(round(abs( SALES-lag(SALES,1, SALES) over (order by period desc))/ SALES*100,2),'FM999999990D9999')||'%'||' (Increase) '
when to_char(round((SALES-lag(SALES,1,SALES) over (order by period desc))/SALES*100,2),'FM999999990D9999')>0
then to_char(round(abs(SALES-lag(SALES,1, SALES) over (order by period desc ))/SALES*100,2),'FM999999990D9999')||'%'||' (Decrease) '
END as variances
from Positions
order by Period;
i am getting output like this
SALES | Period | variances
---------|------------------|--------------------
100 | 2020-FEB | 100%(Increase)
200 | 2020-MAR | NULL
i want record something like that where variance in front of march instead of feb as we are looking variance for the latest month
SALES | Period | variances
---------|------------------|--------------------
200 | 2020-MAR | 100%(Increase)
100 | 2020-FEB | NULL
I did not analyze the query in too much detail but you have one obvious flaw.
You change your period from a date to char.
That means when you apply your window functions your ordering will not work as expected.
a date ordered desc will look like (based on chronological ordering)
MAR - 2020
FEB - 2020
JAN - 2020
Text ordered desc will look like (based on alphabetical ordering)
MAR - 2020
JAN - 2020
FEB - 2020
That being said, you are comparing a 'good' case (FEB + MAR) where both the text ordering and date ordering will work the same way.
The implied ordering is ASCENDING. So at the end when you do
order by Period;
it will display February first and then March. If you do
order by Period DESC;
you will get March displayed first.
I've been working to build a distinct count and sum total of sales based on orders placed 4 to 180 days back for each day in the data table starting at Orders placed on day 181, then grouped by Month & Year, but have been unable to do it.
The end result would look something like the table below. Each order would show up multiple times, up to 176 times, but would be distinct for the given day (order 42999, placed on 10-01-2011 for example would be counted once on every day between 10-05-2011 and 2-01-2012 for example)
| OrdMonthYr | Grouped Order Count | Sum of Orders |
------------------------------------------------------
| 2011-06 | 140 | $450 |
| 2011-07 | 190 | $500 |
| 2011-08 | 250 | $600 |
------------------------------------------------------
The order count would take the total count of sales for a given day executed 4 to 180 days prior to that day (so March 1st, 2011 would have a distinct order count and order sum for orders placed between Nov 1st, 2010 and Feb 25th, 2011 as an example) followed by a function aggregating each of those totals up to month & year per the table above.
As I understand you want to get cumulative sum and count for the previous days from 4 to 180. But its not clear how it should be rolled up
If so you may use analytic functions. Next query will calculate it
select trunc(o.orderdate)
,count(*) over (order by trunc(o.orderdate) range between 180 PRECEDING AND 4 PRECEDING )
,sum(amount) over (order by trunc(o.orderdate) range between 180 PRECEDING AND 4 PRECEDING)
from orders o
What about rolling up orders to month. May be you need to take the first of every month and get sum and amount if so you may just take one row for each month from previous query:
select ord_date, cnt,sum_amount FRoM (
select trunc(o.orderdate) as ord_date
,count(*) over (order by trunc(o.orderdate) range between 180 PRECEDING AND 4 PRECEDING ) as cnt
,sum(amount) over (order by trunc(o.orderdate) range between 180 PRECEDING AND 4 PRECEDING) as sum_amount
,row_number() over (order by trunc(o.orderdate),rowid) as RN
from orders o)
WHERE rn = 1
and ord_date = trunc(ord_date,'MM')
Does this get at what you want?
select orderdate,
(select count(*)
from orders o
where o.orderdate between d.dte - 180 an d.dte - 4
) as cnt,
(select sum(amount)
from orders o
where o.orderdate between d.dte - 180 an d.dte - 4
) as amount
from (select distinct orderdate as dte from orders) d;