Select rows which repeat every month - sql

I am trying to resolve on simple task for first look.
I have transactions table.
| name |entity_id| amount | date |
|--------|---------|--------|------------|
| Github | 1 | 4.80 | 01/01/2014 |
| itunes | 2 | 2.80 | 22/01/2014 |
| Github | 1 | 4.80 | 01/02/2014 |
| Foods | 3 | 24.80 | 01/02/2014 |
| amazon | 4 | 14.20 | 01/03/2014 |
| amazon | 4 | 14.20 | 01/04/2014 |
I have to select rows which repeat every month in same day with same the amount for entity_id.(Subscriptions). Thanks for help

If your date column is created as a date type,
you could use a recursive CTE to collect continuations
after that, eliminate duplicate rows with distinct on
(and you should rename that column, because it's a reserved name in SQL)
with recursive recurring as (
select name, entity_id, amount, date as first_date, date as last_date, 0 as lvl
from transactions
union all
select r.name, r.entity_id, r.amount, r.first_date, t.date, r.lvl + 1
from recurring r
join transactions t
on row(t.name, t.entity_id, t.amount, t.date - interval '1' month)
= row(r.name, r.entity_id, r.amount, r.last_date)
)
select distinct on (name, entity_id, amount) *
from recurring
order by name, entity_id, amount, lvl desc
SQLFiddle

group it by day, for sample:
select entity_id, amount, max(date), min(date), count(*)
from transactions
group by entity_id, amount, date_part('day', date)

Related

Repeat rows cumulative

I have this table
| date | id | number |
|------------|----|--------|
| 2021/05/01 | 1 | 10 |
| 2021/05/02 | 2 | 20 |
| 2021/05/03 | 3 | 30 |
| 2021/05/04 | 1 | 20 |
I am trying to write a query to have this other table
| date | id | number |
|------------|----|--------|
| 2021/05/01 | 1 | 10 |
| 2021/05/02 | 1 | 10 |
| 2021/05/02 | 2 | 20 |
| 2021/05/03 | 1 | 10 |
| 2021/05/03 | 2 | 20 |
| 2021/05/03 | 3 | 30 |
| 2021/05/04 | 1 | 20 |
| 2021/05/04 | 2 | 20 |
| 2021/05/04 | 3 | 30 |
The idea is that each date should have all the previus different ids with its number, and if an id is repeated then only the last value should be considered.
One way is to expand out all the rows for each date. Then take the most recent value using qualify:
with t as (
select date '2021-05-01' as date, 1 as id, 10 as number union all
select date '2021-05-02' as date, 2 as id, 20 as number union all
select date '2021-05-03' as date, 3 as id, 30 as number union all
select date '2021-05-04' as date, 1 as id, 20 as number
)
select d.date, t.id, t.number
from t join
(select date
from (select min(date) as min_date, max(date) as max_date
from t
) tt cross join
unnest(generate_date_array(min_date, max_date, interval 1 day)) date
) d
on t.date <= d.date
where 1=1
qualify row_number() over (partition by d.date, t.id order by t.date desc) = 1
order by 1, 2, 3;
A more efficient method doesn't generate all the rows and then filter them. Instead, it just generates the rows that are needed by generating the appropriate dates within each row. That requires a couple of window functions to get the "next" date for each id and the maximum date in the data:
with t as (
select date '2021-05-01' as date, 1 as id, 10 as number union all
select date '2021-05-02' as date, 2 as id, 20 as number union all
select date '2021-05-03' as date, 3 as id, 30 as number union all
select date '2021-05-04' as date, 1 as id, 20 as number
)
select date, t.id, t.number
from (select t.*,
date_add(lead(date) over (partition by id order by date), interval -1 day) as next_date,
max(date) over () as max_date
from t
) t cross join
unnest(generate_date_array(date, coalesce(next_date, max_date))) date
order by 1, 2, 3;
Consider below [less verbose] approach
select t1.date, t2.id, t2.number
from (
select *, array_agg(struct(date, id,number)) over(order by date) arr
from `project.dataset.table`
) t1, unnest(arr) t2
where true
qualify row_number() over (partition by t1.date, t2.id order by t2.date desc) = 1
# order by date, id
if applied to sample data in your question - output is

SQL sum over a time interval / rows

the following code
SELECT distinct DATE_PART('year',date) as year_date,
DATE_PART('month',date) as month_date,
count(prepare_first_buyer.person_id) as no_of_customers_month
FROM
(
SELECT DATE(bestelldatum) ,person_id
,ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) prepare_first_buyer
WHERE row_number=1
GROUP BY DATE_PART('year',date),DATE_PART('month',date)
ORDER BY DATE_PART('year',date),DATE_PART('month',date)
gives this table back:
| year_date | month_date | no_of_customers_month |
|:--------- |:----------:| ---------------------:|
| 2017 | 1 | 2 |
| 2017 | 2 | 5 |
| 2017 | 3 | 4 |
| 2017 | 4 | 8 |
| 2017 | 5 | 1 |
| . | . | . |
| . | . | . |
where als three are numeric values.
I need now a new column were i sum up all values from 'no_of_customers_month' for 12 months back.
e.g.
| year_date | month_date | no_of_customers_month | sum_12mon |
|:--------- |:----------:| :--------------------:|----------:|
| 2019 | 1 | 2 | 23 |
where 23 is the sum from 2019-1 back to 2018-1 over 'no_of_customers_month'.
Thx for the help.
You can use window functions:
SELECT DATE_TRUNC('month', date) as yyyymm,
COUNT(*) as no_of_customers_month,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', date) RANGE BETWEEN '11 month' PRECEDING AND CURRENT ROW)
FROM (SELECT DATE(bestelldatum), person_id,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) b
WHERE row_number = 1
GROUP BY yyyymm
ORDER BY yyyymm;
Note: This uses date_trunc() to retrieve the year/month as a date, allowing the use of range(). I also find a date more convenient than having the year and month in separate columns.
Some versions of Postgres don't support range window frames. Assuming you have data for each month, you can use rows:
SELECT DATE_TRUNC('month', date) as yyyymm,
COUNT(*) as no_of_customers_month,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', date) ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)
FROM (SELECT DATE(bestelldatum), person_id,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id)
FROM ani.bestellung
) b
WHERE row_number = 1
GROUP BY yyyymm
ORDER BY yyyymm;

Filling in missing balance and dates in table to track balance

I hope you can help me with this problem. I just started out on SQL using Bigquery so my problem can seem a bit tedious.
So I have a table that basically records the date and balance whenever the balance changes. It looks somewhat like this:
+------------+-----------+------+---------+
| Date | seller_ID | Name | Balance |
+------------+-----------+------+---------+
| 2020-09-10 | 1 | John | 10 |
| 2020-09-13 | 1 | John | 8 |
| 2020-09-15 | 1 | John | 6 |
+------------+-----------+------+---------+
However, I need to create a new table with the daily balances that looks like this
+------------+-----------+------+---------+
| Date | seller_ID | Name | Balance |
+------------+-----------+------+---------+
| 2020-09-10 | 1 | John | 10 |
| 2020-09-11 | 1 | John | 10 |
| 2020-09-12 | 1 | John | 10 |
| 2020-09-13 | 1 | John | 8 |
| 2020-09-14 | 1 | John | 8 |
| 2020-09-15 | 1 | John | 6 |
+------------+-----------+------+---------+
I tried creating a separate table of all the dates between the first and final date, and then LEFT JOIN the original table with it but the resulting table isn't very helpful to draw from.
Does anyone have an idea of what to do in this case?
To fill null value with previous non-null value in BigQuery you can use LAST_VALUE with IGNORE NULLS:
WITH test_table AS (
SELECT DATE '2020-09-10' AS Date, 1 AS seller_Id, 'John' AS Name, 10 AS Balance UNION ALL
SELECT '2020-09-13', 1, 'John' AS Name, 8 UNION ALL
SELECT '2020-09-15', 1, 'John' AS Name, 6
)
SELECT Date,
LAST_VALUE(seller_Id IGNORE NULLS) OVER (ORDER BY Date) AS seller_Id,
LAST_VALUE(Name IGNORE NULLS) OVER (ORDER BY Date) AS Name,
LAST_VALUE(Balance IGNORE NULLS) OVER (ORDER BY Date) AS purchase_date
FROM UNNEST(GENERATE_DATE_ARRAY('2020-09-10', '2020-09-15')) AS Date
LEFT JOIN test_table USING (Date)
ORDER BY Date
You can do this without window functions for the balance. The key is the window function only for the date:
WITH t AS (
SELECT DATE '2020-09-10' AS Date, 1 AS seller_Id, 'John' AS Name, 10 AS Balance UNION ALL
SELECT '2020-09-13', 1, 'John' AS Name, 8 UNION ALL
SELECT '2020-09-15', 1, 'John' AS Name, 6
),
tt as (
SELECT t.*, LEAD(date) OVER (PARTITION BY name ORDER BY date) as next_date
FROM t
)
SELECT dte, tt.name, tt.balance
FROM tt LEFT JOIN
UNNEST(GENERATE_DATE_ARRAY(tt.date, COALESCE(DATE_ADD(tt.next_date, INTERVAL - 1 DAY), DATE '2020-09-15'))) dte
ON true;
(Note: The ON clause is optional in this case. However, I am not a fan of having joins without ON -- unless it is a CROSS JOIN.)
This has two important advantages over Sergey's solution. The most important is that it will work for multiple names with different time periods.
The second advantage is that it is more efficient, because it is not using window functions to fetch values from previous rows.

Postgres: select query with group by clause on a range of dates

My table contains answers from repeatable questionnaires that can be filled in a range of 30 days and are scheduled every 60 days.
Therefore, the answers from a single instance of a questionnaire are spread in a range of date that is always smaller tha 30 days and the first answer to the following repeatable questionnaire is at least 31 days after the last answer of the previous one.
How do I create a view that calculate a score (which is basically the sum of the answers of a single questionnaire) among the values whose dates are within 30 days from the start date (min date)?
Table raw_data
------------------------------------------------
user_name | question_id | answer | answer_date |
------------------------------------------------
user001 | 1 | 2 | 2019-02-04 |
user001 | 2 | 1 | 2019-02-04 |
user001 | 3 | 2 | 2019-02-05 |
user001 | 4 | 2 | 2019-02-05 |
user001 | 5 | 2 | 2019-02-09 |
user002 | 1 | 2 | 2019-01-09 |
user002 | 2 | 2 | 2019-01-10 |
user002 | 3 | 1 | 2019-02-01 |
user002 | 4 | 2 | 2019-02-01 |
user002 | 5 | 1 | 2019-02-01 |
user002 | 1 | 2 | 2019-03-11 |
user002 | 2 | 2 | 2019-03-11 |
user002 | 3 | 1 | 2019-03-12 |
user002 | 4 | 1 | 2019-03-13 |
user002 | 5 | 1 | 2019-03-14 |
Expected result
------------------------------
user_name | sum | start_date |
------------------------------
user001 | 9 | 2019-02-04 |
user002 | 8 | 2019-01-09 |
user002 | 7 | 2019-03-11 |
The solution I tried works for the first group only:
SELECT user_name, SUM(answer::int),
CASE
WHEN answer_date - MIN(answer_date) OVER (PARTITION BY user_name ORDER BY user_name ASC, answer_date ASC) < 30
THEN MIN(answer_date) OVER (PARTITION BY user_name ORDER BY user_name ASC, answer_date ASC)
ELSE answer_date END AS start_date,
FROM public.raw_data
GROUP BY user_name, answer_date
It's a classical gaps-and-islands problem. You'll find a lot under the tag I added.
An optimized query for your case could look like:
SELECT user_name
, sum(answer)
, min(answer_date) AS start_date
FROM (
SELECT user_name, answer, answer_date
, count(*) FILTER (WHERE step) OVER (PARTITION BY user_name ORDER BY answer_date) AS grp
FROM (
SELECT user_name, answer, answer_date
, lag(answer_date) OVER (PARTITION BY user_name ORDER BY answer_date) < answer_date - 30 AS step
FROM raw_data
) sub1
) sub2
GROUP BY user_name, grp
ORDER BY user_name, start_date; -- ORDER BY optional
db<>fiddle here
Closely related, with more explanation:
How to group timestamps into islands (based on arbitrary gap)?
Use lag() to find the gaps. Then a cumulative sum to assign a "question period" and then summarize:
select userid, min(answer_date) as start_date, sum(answer)
from (select rd.*,
count(*) filter (where prev_ad is null or prev_ad < answer_date - interval '30 day') over (partition by user_id) as period
from (select rd.*,
lag(answer_date) over (partition by user_id order by answer_date) as prev_ad
from raw_data rd
) rd
)
group by userid, period;
Thanks to #Gordon and to this
answer
I eventually found the missing step to determine my groups on a date range basis.
I will use the following query to create a view and SUM answers grouping by grp2
WITH query AS (
SELECT r.*,
SUM(CASE WHEN answer_date < prev_date + 30 THEN 0 ELSE 1 END) OVER (PARTITION BY user_name ORDER BY user_name ASC, answer_date ASC) AS grp
FROM (SELECT r.*,
LAG(answer_date) OVER (PARTITION BY user_name ORDER BY user_name ASC, answer_date ASC) AS prev_date
FROM raw_data r
) r
)
SELECT user_name, question_id, answer_date, answer, DENSE_RANK() OVER (ORDER BY user_name, grp) AS grp2
FROM query
You can use the query with row_number() window analytic function as below
with raw_data( user_name, question_id, answer, answer_date ) as
(
select 'user001',1,2, '2019-02-04' union all
select 'user001',2,1, '2019-02-04' union all
select 'user001',3,2, '2019-02-05' union all
select 'user001',4,2, '2019-02-05' union all
select 'user001',5,2, '2019-02-09' union all
select 'user002',1,2, '2019-01-09' union all
select 'user002',2,2, '2019-01-10' union all
select 'user002',3,1, '2019-02-01' union all
select 'user002',4,2, '2019-02-01' union all
select 'user002',5,1, '2019-02-01' union all
select 'user002',1,2, '2019-03-11' union all
select 'user002',2,2, '2019-03-11' union all
select 'user002',3,1, '2019-03-12' union all
select 'user002',4,1, '2019-03-13' union all
select 'user002',5,1, '2019-03-14'
)
select user_name, sum(answer) as sum, min(answer_date) as start_date
from
(
select row_number() over (partition by question_id order by user_name, answer_date) as rn,
t.*
from raw_data t
) t
group by user_name, rn
order by rn;
user_name sum start_date
--------- --- ----------
user001 9 2019-02-04
user002 8 2019-01-09
user002 7 2019-03-11
Demo

Changing a Select Query to a Count Distinct Query

I am using a Select query to select Members, a variable that serves as a unique identifier, and transaction date, a Date format (MM/DD/YYYY).
Select Members , transaction_date,
FROM table WHERE Criteria = 'xxx'
Group by Members, transaction_date;
My ultimate aim is to count the # of unique members by month (i.e., a unique member in day 3, 6, 12 of a month is only counted once). I don't want to select any data, but rather run this calculation (count distinct by month) and output the calculation.
This will give distinct count per month.
SQLFiddle Demo
select month,count(*) as distinct_Count_month
from
(
select members,to_char(transaction_date, 'YYYY-MM') as month
from table1
/* add your where condition */
group by members,to_char(transaction_date, 'YYYY-MM')
) a
group by month
So for this input
+---------+------------------+
| members | transaction_date |
+---------+------------------+
| 1 | 12/23/2015 |
| 1 | 11/23/2015 |
| 1 | 11/24/2015 |
| 2 | 11/24/2015 |
| 2 | 10/24/2015 |
+---------+------------------+
You will get this output
+----------+----------------------+
| month | distinct_count_month |
+----------+----------------------+
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
+----------+----------------------+
You might want to try this. This might work.
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/') AS [DATE], COUNT(MEMBERS) AS [NO OF MEMBERS]
FROM BAR
WHERE REPLACE(CONVERT(DATE,transaction_date,101),'-','/') IN
(
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
FROM BAR
)
GROUP BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
ORDER BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
Use COUNT(DISTINCT members) and date_trunc('month', transaction_date) to retain timestamps for most calculations (and this can also help with ordering the result). to_char() can then be used to control the display format but it isn't required elsewhere.
SELECT
to_char(date_trunc('month', transaction_date), 'YYYY-MM')
, COUNT(DISTINCT members) AS distinct_Count_month
FROM table1
GROUP BY
date_trunc('month', transaction_date)
;
result sample:
| to_char | distinct_count_month |
|---------|----------------------|
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
see: http://sqlfiddle.com/#!15/57294/2