SQL sum over a time interval / rows - sql

the following code
SELECT distinct DATE_PART('year',date) as year_date,
DATE_PART('month',date) as month_date,
count(prepare_first_buyer.person_id) as no_of_customers_month
FROM
(
SELECT DATE(bestelldatum) ,person_id
,ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) prepare_first_buyer
WHERE row_number=1
GROUP BY DATE_PART('year',date),DATE_PART('month',date)
ORDER BY DATE_PART('year',date),DATE_PART('month',date)
gives this table back:
| year_date | month_date | no_of_customers_month |
|:--------- |:----------:| ---------------------:|
| 2017 | 1 | 2 |
| 2017 | 2 | 5 |
| 2017 | 3 | 4 |
| 2017 | 4 | 8 |
| 2017 | 5 | 1 |
| . | . | . |
| . | . | . |
where als three are numeric values.
I need now a new column were i sum up all values from 'no_of_customers_month' for 12 months back.
e.g.
| year_date | month_date | no_of_customers_month | sum_12mon |
|:--------- |:----------:| :--------------------:|----------:|
| 2019 | 1 | 2 | 23 |
where 23 is the sum from 2019-1 back to 2018-1 over 'no_of_customers_month'.
Thx for the help.

You can use window functions:
SELECT DATE_TRUNC('month', date) as yyyymm,
COUNT(*) as no_of_customers_month,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', date) RANGE BETWEEN '11 month' PRECEDING AND CURRENT ROW)
FROM (SELECT DATE(bestelldatum), person_id,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) b
WHERE row_number = 1
GROUP BY yyyymm
ORDER BY yyyymm;
Note: This uses date_trunc() to retrieve the year/month as a date, allowing the use of range(). I also find a date more convenient than having the year and month in separate columns.
Some versions of Postgres don't support range window frames. Assuming you have data for each month, you can use rows:
SELECT DATE_TRUNC('month', date) as yyyymm,
COUNT(*) as no_of_customers_month,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', date) ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)
FROM (SELECT DATE(bestelldatum), person_id,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) b
WHERE row_number = 1
GROUP BY yyyymm
ORDER BY yyyymm;

Related

Filtering consecutive dates ranges using SQL Server

I want to filter categories that only have consecutive dates.
I will explain with an example.
My table is
| ID | Category | Date |
|--------------------|-----------------|---------------------|
| 1 | 1 | 01-04-2021 |
| 2 | 1 | 02-04-2021 |
| 3 | 2 | 01-03-2021 |
| 4 | 2 | 04-03-2021 |
| 5 | 2 | 01-02-2010 |
| 6 | 3 | 02-02-2010 |
| 7 | 3 | 03-02-2010 |
| 8 | 4 | 03-02-2010 |
Expected output:
| Category |
|----------------|
| 1 |
| 3 |
| 4 |
I would like to filter my data such as I only have categories that do not contain consecutive dates.
… for unique dates per category
select category
from mytable
group by category
having max(Date) = dateadd(day, count(*)-1, min(Date))
Here's one way. You'll have to maybe adjust it for your particular flavor of SQL.
WITH a AS (
SELECT
category,
DATEDIFF('days', date, LAG(date) OVER (PARTITION BY category ORDER BY
date)) AS days_apart
FROM tbl
),
b AS (
SELECT
category,
MAX(days_apart) AS max_days_apart
FROM a
GROUP BY 1
)
SELECT
category
FROM b
WHERE max_days_apart IS NULL OR max_days_apart = 1
select distinct category
from dates
where category not in (
select distinct category
from (
select category, [date],
row_number() over (partition by category order by [date]) as days_cnt,
min([date]) over (partition by category) as min_date
from dates
group by category, [date]
) as c
where c.[date]<>dateadd(d, c.days_cnt-1, c.min_date))
order by category
Categories where the sequence of dates is the same as the sequence of ids.
with cte as (
select [category],
row_number() over (partition by [category] order by [date], [id])
- row_number() over (partition by [category] order by [id]) drn
)
select [category]
from cte
group by [category]
having sum(abs(drn)) = 0;

Repeat rows cumulative

I have this table
| date | id | number |
|------------|----|--------|
| 2021/05/01 | 1 | 10 |
| 2021/05/02 | 2 | 20 |
| 2021/05/03 | 3 | 30 |
| 2021/05/04 | 1 | 20 |
I am trying to write a query to have this other table
| date | id | number |
|------------|----|--------|
| 2021/05/01 | 1 | 10 |
| 2021/05/02 | 1 | 10 |
| 2021/05/02 | 2 | 20 |
| 2021/05/03 | 1 | 10 |
| 2021/05/03 | 2 | 20 |
| 2021/05/03 | 3 | 30 |
| 2021/05/04 | 1 | 20 |
| 2021/05/04 | 2 | 20 |
| 2021/05/04 | 3 | 30 |
The idea is that each date should have all the previus different ids with its number, and if an id is repeated then only the last value should be considered.
One way is to expand out all the rows for each date. Then take the most recent value using qualify:
with t as (
select date '2021-05-01' as date, 1 as id, 10 as number union all
select date '2021-05-02' as date, 2 as id, 20 as number union all
select date '2021-05-03' as date, 3 as id, 30 as number union all
select date '2021-05-04' as date, 1 as id, 20 as number
)
select d.date, t.id, t.number
from t join
(select date
from (select min(date) as min_date, max(date) as max_date
from t
) tt cross join
unnest(generate_date_array(min_date, max_date, interval 1 day)) date
) d
on t.date <= d.date
where 1=1
qualify row_number() over (partition by d.date, t.id order by t.date desc) = 1
order by 1, 2, 3;
A more efficient method doesn't generate all the rows and then filter them. Instead, it just generates the rows that are needed by generating the appropriate dates within each row. That requires a couple of window functions to get the "next" date for each id and the maximum date in the data:
with t as (
select date '2021-05-01' as date, 1 as id, 10 as number union all
select date '2021-05-02' as date, 2 as id, 20 as number union all
select date '2021-05-03' as date, 3 as id, 30 as number union all
select date '2021-05-04' as date, 1 as id, 20 as number
)
select date, t.id, t.number
from (select t.*,
date_add(lead(date) over (partition by id order by date), interval -1 day) as next_date,
max(date) over () as max_date
from t
) t cross join
unnest(generate_date_array(date, coalesce(next_date, max_date))) date
order by 1, 2, 3;
Consider below [less verbose] approach
select t1.date, t2.id, t2.number
from (
select *, array_agg(struct(date, id,number)) over(order by date) arr
from `project.dataset.table`
) t1, unnest(arr) t2
where true
qualify row_number() over (partition by t1.date, t2.id order by t2.date desc) = 1
# order by date, id
if applied to sample data in your question - output is

Count distinct customers over rolling window partition

My question is similar to redshift: count distinct customers over window partition but I have a rolling window partition.
My query looks like this but distinct within COUNT in Redshift is not supported
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x
My goal is to calculate total unique customer up to every date (hence rolling window).
I tried using the dense_rank() approach but it would simply fail since I cannot use window function like this
select p_date, max(total_cumulative_customer) over ()
(select p_date, seconds_read,
dense_rank() over (order by customer_id rows between unbounded preceding and current row) as total_cumulative_customer -- WILL FAIL HERE
from table_x
Any workaround or different approach would be helpful!
EDIT:
INPUT DATA sample
+------+----------+--------------+
| Cust | p_date | seconds_read |
+------+----------+--------------+
| 1 | 1-Jan-20 | 10 |
| 2 | 1-Jan-20 | 20 |
| 4 | 1-Jan-20 | 30 |
| 5 | 1-Jan-20 | 40 |
| 6 | 5-Jan-20 | 50 |
| 3 | 5-Jan-20 | 60 |
| 2 | 5-Jan-20 | 70 |
| 1 | 5-Jan-20 | 80 |
| 1 | 5-Jan-20 | 90 |
| 1 | 7-Jan-20 | 100 |
| 3 | 7-Jan-20 | 110 |
| 4 | 7-Jan-20 | 120 |
| 7 | 7-Jan-20 | 130 |
+------+----------+--------------+
Expected Output
+----------+--------------------------+------------------+--------------------------------------------+
| p_date | total_distinct_cum_cust | sum_seconds_read | Comment |
+----------+--------------------------+------------------+--------------------------------------------+
| 1-Jan-20 | 4 | 100 | total distinct cust = 4 i.e. 1,2,4,5 |
| 5-Jan-20 | 6 | 450 | total distinct cust = 6 i.e. 1,2,3,4,5,6 |
| 7-Jan-20 | 7 | 910 | total distinct cust = 6 i.e. 1,2,3,4,5,6,7 |
+----------+--------------------------+------------------+--------------------------------------------+
For this operation:
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x;
You can do pretty much what you want with two levels of aggregation:
select min_p_date,
sum(count(*)) over (order by min_p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, min(p_date) as min_p_date
from table_x
group by customer_id
) c
group by min_p_date;
Summing the seconds read as well is a bit tricky, but you can use the same idea:
select p_date,
sum(sum(seconds_read)) over (order by p_date rows between unbounded preceding and current row) as seconds_read,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, p_date, seconds_read,
row_number() over (partition by customer_id order by p_date) as seqnum
from table_x
) c
group by min_p_date;
One workaround uses a subquery:
select p_date, seconds_read,
(
select count(distinct t1.customer_id)
from table_x t1
where t1.p_date <= t.p_date
) as total_cumulative_customer
from table_x t
I'd like to add that you can also accomplish this with an explicit self join which is, in my opinion, more straightforward and readable than the subquery approaches described in the other answers.
select
t1.p_date,
sum(t2.seconds_read) as sum_seconds_read,
count(distinct t2.customer_id) as distinct_cum_cust_totals
from
table_x t1
join
table_x t2
on
t2.date <= t1.date
group by
t1.date
Most query planners will reduce a correlated subquery like in the solutions above to an efficient join like this, so either solution is usually fine, but for the general case, I believe this is a better solution since some engines (like BigQuery) won't allow correlated subqueries and will force you to explicitly define the join in your query.

Changing a Select Query to a Count Distinct Query

I am using a Select query to select Members, a variable that serves as a unique identifier, and transaction date, a Date format (MM/DD/YYYY).
Select Members , transaction_date,
FROM table WHERE Criteria = 'xxx'
Group by Members, transaction_date;
My ultimate aim is to count the # of unique members by month (i.e., a unique member in day 3, 6, 12 of a month is only counted once). I don't want to select any data, but rather run this calculation (count distinct by month) and output the calculation.
This will give distinct count per month.
SQLFiddle Demo
select month,count(*) as distinct_Count_month
from
(
select members,to_char(transaction_date, 'YYYY-MM') as month
from table1
/* add your where condition */
group by members,to_char(transaction_date, 'YYYY-MM')
) a
group by month
So for this input
+---------+------------------+
| members | transaction_date |
+---------+------------------+
| 1 | 12/23/2015 |
| 1 | 11/23/2015 |
| 1 | 11/24/2015 |
| 2 | 11/24/2015 |
| 2 | 10/24/2015 |
+---------+------------------+
You will get this output
+----------+----------------------+
| month | distinct_count_month |
+----------+----------------------+
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
+----------+----------------------+
You might want to try this. This might work.
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/') AS [DATE], COUNT(MEMBERS) AS [NO OF MEMBERS]
FROM BAR
WHERE REPLACE(CONVERT(DATE,transaction_date,101),'-','/') IN
(
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
FROM BAR
)
GROUP BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
ORDER BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
Use COUNT(DISTINCT members) and date_trunc('month', transaction_date) to retain timestamps for most calculations (and this can also help with ordering the result). to_char() can then be used to control the display format but it isn't required elsewhere.
SELECT
to_char(date_trunc('month', transaction_date), 'YYYY-MM')
, COUNT(DISTINCT members) AS distinct_Count_month
FROM table1
GROUP BY
date_trunc('month', transaction_date)
;
result sample:
| to_char | distinct_count_month |
|---------|----------------------|
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
see: http://sqlfiddle.com/#!15/57294/2

Select rows which repeat every month

I am trying to resolve on simple task for first look.
I have transactions table.
| name |entity_id| amount | date |
|--------|---------|--------|------------|
| Github | 1 | 4.80 | 01/01/2014 |
| itunes | 2 | 2.80 | 22/01/2014 |
| Github | 1 | 4.80 | 01/02/2014 |
| Foods | 3 | 24.80 | 01/02/2014 |
| amazon | 4 | 14.20 | 01/03/2014 |
| amazon | 4 | 14.20 | 01/04/2014 |
I have to select rows which repeat every month in same day with same the amount for entity_id.(Subscriptions). Thanks for help
If your date column is created as a date type,
you could use a recursive CTE to collect continuations
after that, eliminate duplicate rows with distinct on
(and you should rename that column, because it's a reserved name in SQL)
with recursive recurring as (
select name, entity_id, amount, date as first_date, date as last_date, 0 as lvl
from transactions
union all
select r.name, r.entity_id, r.amount, r.first_date, t.date, r.lvl + 1
from recurring r
join transactions t
on row(t.name, t.entity_id, t.amount, t.date - interval '1' month)
= row(r.name, r.entity_id, r.amount, r.last_date)
)
select distinct on (name, entity_id, amount) *
from recurring
order by name, entity_id, amount, lvl desc
SQLFiddle
group it by day, for sample:
select entity_id, amount, max(date), min(date), count(*)
from transactions
group by entity_id, amount, date_part('day', date)