How to get total amount per previous weeks - sql

I have this table for example:
Date
amount
2021-02-16T21:06:38
10
2021-02-16T21:07:01
5
2021-02-17T01:10:12
-1
2021-02-19T12:00:00
3
2021-02-24T12:00:00
20
2021-02-25T12:00:00
-1
I want the total amount of all previous weeks, per week. So the result in this case would be:
Date
amount
2021-02-15
0
2021-02-22
17
2021-03-01
36
Note: The dates are now the start of each week (Monday).
Any help would this would be greatly appreciated.

Try This:
select week_date, sum(amount) over (order by week_date )
from (
SELECT date(date_) + cast(abs(extract(dow FROM date_) -7 ) + 1 as int) "week_date",
sum(amount) "amount"
from example group by 1) t
DEMO
Above Query will cover only the week in which transaction records are there. If you want to cover all missing week then try below query:
with cte as (
SELECT date(date_) + cast(abs(extract(dow FROM date_) -7 ) + 1 as int) "week_date",
sum(amount) "amount"
from example group by 1
)
select
t1."Date",coalesce(sum(cte.amount) over (order by t1."Date"),0)
from cte right join
(select generate_series(min(week_date)- interval '1 week', max(week_date),interval '1 week') "Date" from cte) t1 on cte.week_date=t1."Date"
DEMO

Use generate_series() to generate the dates you want. Then use left join to bring in the data and aggregate with a cumulative sum:
select gs.week,
coalesce(sum(e.amount), 0) as week_amount,
sum(coalesce(sum(e.amount), 0)) over (order by gs.week) as running_amount
from generate_series('2021-02-15'::date, '2021-03-01'::date, interval '1 week') gs(week) left join
example e
on e.date < gs.week and
e.date >= gs.week - interval '1 week'
group by gs.week
order by gs.week;
Here is a db<>fiddle.

Related

Taking Count Based On Year and Month from Date Columns

I want to take count based on from and to date. using from and to date I am trying to take year and month then based on month and year taking count. can someone suggest me how can i implement this.
Database : Snowflake
You want to do more less the solution to this other question
but here let me do all the work for you:
WITH data_table(start_date, end_date) as (
SELECT * from values
('2022-01-15'::date, '2022-02-12'::date),
('2021-12-25'::date, '2022-03-18'::date),
('2022-02-25'::date, '2022-03-06'::date),
('2021-10-20'::date, '2022-01-07'::date)
), large_range as (
SELECT row_number() over (order by null)-1 as rn
FROM table(generator(ROWCOUNT => 1000))
), pre_condition as (
SELECT
date_trunc('month', start_date) as month_start
,datediff('month', month_start, date_trunc('month', end_date)) as m
FROM data_table
)
SELECT
to_char(dateadd('month', r.rn, d.month_start),'MON-YY') as month_yr
,count(*) as count
FROM pre_condition as d
JOIN large_range as r ON r.rn <= d.m
GROUP BY 1;
MONTH_YR
COUNT
Jan-22
3
Dec-21
2
Feb-22
3
Oct-21
1
Nov-21
1
Mar-22
2

How to get number of IDs in the current month that also appears in the previous three months in Snowflake - SQL

I have a table in the snowflake with a time range from for example 2019.01 to 2020.01. An ID can appear multiple times (match with) on any of the dates.
For example:
my_table: two columns dddate and id
dddate
id
2019-02-03
607
2019-01-07
356
2019-08-06
491
2019-01-01
607
2019-12-17
529
2019-04-15
356
......
Is there a way I can find the total number of IDs that appeared at least one time in the current month that also appeared at least one time in the previous three months, and group by month to show each month's number count starting from 2019-04 (The first month that has previous three months data available in the table) until 2020-01.
I am thinking of some code like this:
WITH PREV_THREE AS (
SELECT
DATE_TRUNC('MONTH', dddate) AS MONTH,
ID AS CURR_ID
FROM my_table mt
INNER JOIN
(
(
SELECT
MONTH(DATEADD(DATE_TRUNC('MONTH', dddate), -1, GETDATE())) AS PREV_MONTH,
ID AS PREV_3_MON_ID
FROM my_table
)
UNION ALL
(
SELECT
MONTH(DATEADD(DATE_TRUNC('MONTH', dddate), -2, GETDATE())) AS PREV_MONTH,
ID AS PREV_3_MON_ID
FROM my_table
)
UNION ALL
(
SELECT
MONTH(DATEADD(DATE_TRUNC('MONTH', dddate), -3, GETDATE())) AS PREV_MONTH,
ID AS PREV_3_MON_ID
FROM my_table
)
) AS PREV_3_MON
ON mt.CURR_ID = PREV_3_MON.PREV_3_MON_ID
)
SELECT MONTH, COUNT(DISTINCT ID) AS COUNTER
FROM PREV_THREE
GROUP BY 1
ORDER BY 1
However, it somehow returns an error and doesn't seem working. Could anyone please help me with this? Thank you in advance!
You can use lag():
select distinct id
from (select t.*,
lag(dddate) over (partition by id order by dddate) as prev_dddate
from my_table t
) t
where dddate >= date_trunc('MONTH', current_date) and
prev_dddate < date_trunc('MONTH', current_date) and
prev_dddate >= date_trunc('MONTH', current_date) - interval '3 month';
You can do this for multiple months as:
select date_trunc('MONTH', dddate), count(distinct id)
from (select t.*,
lag(dddate) over (partition by id order by dddate) as prev_dddate
from my_table t
) t
where prev_dddate < date_trunc('MONTH', date_trunc('MONTH', dddate)) and
prev_dddate >= date_trunc('MONTH', date_trunc('MONTH', dddate)) - interval '3 month'
group by date_trunc('MONTH', dddate);
Even if an id appears multiple times in one month, one of those will be first and the lag() will identify the most recent previous month.

sql user retention calculation

I have a table records like this in Athena, one user one row in a month:
month, id
2020-05 1
2020-05 2
2020-05 5
2020-06 1
2020-06 5
2020-06 6
Need to calculate the percentage=( users come both prior month and current month )/(prior month total users).
Like in the above example, users come both in May and June 1,5 , May total user 3, this should calculate a percentage of 2/3*100
with monthly_mau AS
(SELECT month as mauMonth,
date_format(date_add('month',1,cast(concat(month,'-01') AS date)), '%Y-%m') AS nextMonth,
count(distinct userid) AS monthly_mau
FROM records
GROUP BY month
ORDER BY month),
retention_mau AS
(SELECT
month,
count(distinct useridLeft) AS retention_mau
FROM (
(SELECT
userid as useridLeft,month as monthLeft,
date_format(date_add('month',1,cast(concat(month,'-01') AS date)), '%Y-%m') AS nextMonth
FROM records ) AS prior
INNER JOIN
(SELECT
month ,
userid
FROM records ) AS current
ON
prior.useridLeft = current.userid
AND prior.nextMonth = current.month )
WHERE userid is not null
GROUP BY month
ORDER BY month )
SELECT *, cast(retention_mau AS double)/cast(monthly_mau AS double)*100 AS retention_mau_percentage
FROM monthly_mau as m
INNER JOIN monthly_retention_mau AS r
ON m.nextMonth = r.month
order by r.month
This gives me percentage as 100 which is not right. Any idea?
Hmmm . . . assuming you have one row per user per month, you can use window functions and conditional aggregation:
select month, count(*) as num_users,
sum(case when prev_month = dateadd('month', -1, month) then 1 else 0 end) as both_months
from (select r.*,
cast(concat(month, '-01') AS date) as month_date,
lag(cast(concat(month, '-01') AS date)) over (partition by id order by month) as prev_month_date
from records r
) r
group by month;

SQL Get last 7 days from event date

The best way to explain what I need is showing, so, here it is:
Currently I have this query
select
date_
,count(*) as count_
from table
group by date_
which returns me the following database
Now I need to get a new column, that shows me the count off all the previous 7 days, considering the row date_.
So, if the row is from day 29/06, I have to count all ocurrencies of that day ( my query is already doing it) and get all ocurrencies from day 22/06 to 29/06
The result should be something like this:
If you have values for all dates, without gaps, then you can use window functions with a rows frame:
select
date,
count(*) cnt
sum(count(*)) over(order by date rows between 7 preceding and current row) cnt_d7
from mytable
group by date
order by date
you can try something like this:
select
date_,
count(*) as count_,
(select count(*)
from table as b
where b.date_ <= a.date_ and b.date_ > a.date - interval '7 days'
) as count7days_
from table as a
group by date_
If you have gaps, you can do a more complicated solution where you add and subtract the values:
with t as (
select date_, count(*) as count_
from table
group by date_
union all
select date_ + interval '8 day', -count(*) as count_
from table
group by date_
)
select date_,
sum(sum(count_)) over (order by date_ rows between unbounded preceding and current row) - sum(count_)
from t;
The - sum(count_) is because you do not seem to want the current day in the cumulated amount.
You can also use the nasty self-join approach . . . which should be okay for 7 days:
with t as (
select date_, count(*) as count_
from table
group by date_
)
select t.date_, t.count_, sum(tprev.count_)
from t left join
t tprev
on tprev.date_ >= t.date_ - interval '7 day' and
tprev.date_ < t.date_
group by t.date_, t.count_;
The performance will get worse and worse as "7" gets bigger.
Try with subquery for the new column:
select
table.date_ as groupdate,
count(table.date_) as date_count,
(select count(table.date_)
from table
where table.date_ <= groupdate and table.date_ >= groupdate - interval '7 day'
) as total7
from table
group by groupdate
order by groupdate

How to generate date series to occupy absent dates in google BiqQuery?

I am trying to get daily sum of sales from a google big-query table. I used following code for that.
select Day(InvoiceDate) date, Sum(InvoiceAmount) sales from test_gmail_com.sales
where year(InvoiceDate) = Year(current_date()) and
Month(InvoiceDate) = Month(current_date())
group by date order by date
From the above query it gives only the sum of sales daily which were in the table. There is a chance that some days do not have any sales. For those kind of situations, I need to get the date and sum should be 0. As an example, in every month should 30 0r 31 rows with sum of sales. Examples show below. 4th day of the month does not have a sales. So its sum should be 0.
date | sales
-----+------
1 | 259
-----+------
2 | 359
-----+------
3 | 45
-----+------
4 | 0
-----+------
5 | 156
Is it possible to do in Big-query? Basically date column should be a series from 1 - 28/29/30 or 31st depending on the month of the year
Generting a list of dates and then joining whatever table you need on top seems the easiest. I used the generate_date_array + unnest and it looks quite clean.
To generate a list of days (one day per row):
SELECT
*
FROM
UNNEST(GENERATE_DATE_ARRAY('2018-10-01', '2020-09-30', INTERVAL 1 DAY)) AS example
You can use below to generate on fly all dates in given range (in below example it is all dates from 2015-06-01 till CURRENT_DATE() - by changing those you can control which dates range to generate)
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
so, now - you can use it with LEFT JOIN with your table to have all dates accounted. See potential example below
SELECT
calendar_day,
IFNULL(sales, 0) AS sales
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
) AS all_dates
LEFT JOIN (
SELECT DAY(InvoiceDate) DATE, SUM(InvoiceAmount) sales
FROM test_gmail_com.sales
WHERE YEAR(InvoiceDate) = YEAR(CURRENT_DATE()) AND
MONTH(InvoiceDate) = MONTH(CURRENT_DATE())
GROUP BY DATE
)
ON DATE = calendar_day
I wanna need to get previous months sales
Below gives all days of previous month
SELECT DATE(DATE_ADD(DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY"), pos - 1, "DAY")) AS calendar_day
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(DATE_ADD(CURRENT_DATE(), - DAY(CURRENT_DATE()), "DAY"), DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
Using the Standard SQL dialect and the generate_array function to simplify the code:
WITH serialnum AS (
SELECT
sn
FROM
UNNEST(GENERATE_ARRAY(0,
DATE_DIFF(DATE_ADD(DATE_TRUNC(CURRENT_DATE()
, MONTH)
, INTERVAL 1 MONTH)
, DATE_TRUNC(CURRENT_DATE(), MONTH)
, DAY) - 1)
) AS sn
), date_seq AS (
SELECT
DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH),
INTERVAL(sn) DAY) AS this_day
FROM
serialnum
)
SELECT
Day(InvoiceDate) date
, Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
date_seq
LEFT JOIN
test_gmail_com.sales
ON
date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
year(InvoiceDate) = Year(current_date())
and
Month(InvoiceDate) = Month(current_date())
GROUP BY
date
ORDER BY
date
;
UPDATE
Or, simpler still using the generate_date_array function:
WITH date_seq AS (
SELECT
GENERATE_DATE_ARRAY(DATE_TRUNC(CURRENT_DATE(), MONTH),
DATE_ADD(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH)
, INTERVAL 1 MONTH)
, INTERVAL -1 DAY)
, INTERVAL 1 DAY)
AS this_day
)
SELECT
Day(InvoiceDate) date
, Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
date_seq
LEFT JOIN
test_gmail_com.sales
ON
date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
year(InvoiceDate) = Year(current_date())
and
Month(InvoiceDate) = Month(current_date())
GROUP BY
date
ORDER BY
date
;
For these purposes it is practical to have a 'calendar' table, a table that just lists all the days within a certain range. For your specific question, it would suffice to have a table with the numbers 1 to 31. A quick way to get this table is to make a spreadsheet with these numbers, save it as a csv file and import this file into BigQuery as a table.
You then left outer join your result set onto this table, with ifnull(sales,0) as sales.
If you want the number of days per month (28--31) to be right, you basically have two options. Either you create a proper calendar table that covers several years and that you join on using year, month and day. Or you use the simple table with numbers 1--31 and remove numbers based on the month and the year.
For Standard SQL
WITH
splitted AS (
SELECT
*
FROM
UNNEST( SPLIT(RPAD('',
1 + DATE_DIFF(CURRENT_DATE(), DATE("2015-06-01"), DAY),
'.'),''))),
with_row_numbers AS (
SELECT
ROW_NUMBER() OVER() AS pos,
*
FROM
splitted),
calendar_day AS (
SELECT
DATE_ADD(DATE("2015-06-01"), INTERVAL (pos - 1) DAY) AS day
FROM
with_row_numbers)
SELECT
*
FROM
calendar_day
ORDER BY
day DESC