Generate cyclic column variable conditional on lag of another variable postgreSQL - sql

So suppose I have a table as follows:
user date
a 10/15/2015
a 11/15/2015
a 12/15/2015
a 2/15/2015
b 1/15/2015
b 2/15/2015
b 4/15/2015
b 6/15/2015
I need to create three column variables (acutally two - i figured the time lag variable) (1) one that counts the number of successive logins by month and if there is a lapse restarts the counter (2) the numbers of days between logins (figured this one out) (3) if the counter resets then their cycle count increases by one. The resulting table should look as follows: (I am going to just use 30 days for 1 month span for illustrative purposes.)
user date count timelapse cycle
a 10/15/2015 1 0 1
a 11/15/2015 2 30 1
a 12/15/2015 3 30 1
a 2/15/2015 1 60 2
b 1/15/2015 1 0 1
b 2/15/2015 2 30 1
b 4/15/2015 1 60 2
b 6/15/2015 1 60 3
Any ideas? I was able to get the count column to work - but I could not get it to reset when the timelapse was greater than 30. Since the cycle was conditional on two columns I was at a bit of a loss there.
Any help or ideas would be greatly appreciated.

Here is the idea. Use lag() to determine when a gap occurs. You can do this by truncating the date to the beginning of the month, for comparison purposes.
Then, do a cumulative sum of the gap flags. This provides the cycle column. The count is then row_number() using the cycle:
select t.*,
row_number() over (partition by user, cycle order by date) as count
from (select t.*, sum(IsGap) over (partition by user order by date) as cycle
from (select user, date,
(case when date_trunc('month', date) = date_trunc('month', lag(date) over (partition by user order by date) + interval '1 month'
then 0
else 1
end) as IsGap
from t
) t
) t

Related

Find Individuals who have purchased 10 times within a rolling 1 year period

So let's say I have 2 tables. One table is for consumers, and another is for sales.
Consumers
ID
Name
...
1
John Johns
...
2
Cathy Dans
...
Sales
ID
consumer_id
purchase_date
...
1
1
01/03/05
...
2
1
02/04/10
...
3
1
03/04/11
...
4
2
02/14/07
...
5
2
09/24/08
...
6
2
12/15/09
...
I want to find all instances of consumers who made more than 10 purchases within any 6 month rolling period.
SELECT
consumers.id
, COUNT(sales.id)
FROM
consumers
JOIN sales ON consumers.id = sales.consumer_id
GROUP BY
consumers.id
HAVING
COUNT(sales.id) >= 10
ORDER BY
COUNT(sales.id) DESC
So I have this code, which just gives me a list of consumers who have made more than 10 purchases ALL TIME. But how do I incorporate the rolling 6 month period logic?!
Any help or guidance on which functions can help me accomplish this would be appreciated!
You can use window functions to count the number of sales in a six-month period. Then just filter down to those consumers:
select distinct consumer_id
from (select s.*,
count(*) over (partition by consumer_id
order by purchase_date
range between current row and interval '6 month' following
) as six_month_count
from sales s
) s
where six_month_count > 10;

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

finding the number of days in between first 2 date point

So the question seems to be quite difficult I wonder if I could get some advice from here. I am trying to solve this with SQLite 3. So I have a data format of this.
customer | purchase date
1 | date 1
1 | date 2
1 | date 3
2 | date 4
2 | date 5
2 | date 6
2 | date 7
number of times the customer repeats is random.
so I just want to find whether customer 1's 1st and 2nd purchase date are fallen in between a specific time period. repeat for other customers. only need to consider 1st and 2nd dates.
Any help would be appreciated!
We can try using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer ORDER BY "purchase date") rn
FROM yourTable
)
SELECT
customer,
CAST(MAX(CASE WHEN rn = 2 THEN julianday("purchase date") END) -
MAX(CASE WHEN rn = 1 THEN julianday("purchase date") END) AS INTEGER) AS diff_in_days
FROM cte
GROUP BY
customer;
The idea here is to aggregate by customer and then take the date difference between the second and first purchase. ROW_NUMBER is used to find these first and second purchases, for each customer.

SQL How to calculate Average time between Order Purchases? (do sql calculations based on next and previous row)

I have a simple table that contains the customer email, their order count (so if this is their 1st order, 3rd, 5th, etc), the date that order was created, the value of that order, and the total order count for that customer.
Here is what my table looks like
Email Order Date Value Total
r2n1w#gmail.com 1 12/1/2016 85 5
r2n1w#gmail.com 2 2/6/2017 125 5
r2n1w#gmail.com 3 2/17/2017 75 5
r2n1w#gmail.com 4 3/2/2017 65 5
r2n1w#gmail.com 5 3/20/2017 130 5
ation#gmail.com 1 2/12/2018 150 1
ylove#gmail.com 1 6/15/2018 36 3
ylove#gmail.com 2 7/16/2018 41 3
ylove#gmail.com 3 1/21/2019 140 3
keria#gmail.com 1 8/10/2018 54 2
keria#gmail.com 2 11/16/2018 65 2
What I want to do is calculate the time average between purchase for each customer. So lets take customer ylove. First purchase is on 6/15/18. Next one is 7/16/18, so thats 31 days, and next purchase is on 1/21/2019, so that is 189 days. Average purchase time between orders would be 110 days.
But I have no idea how to make SQL look at the next row and calculate based on that, but then restart when it reaches a new customer.
Here is my query to get that table:
SELECT
F.CustomerEmail
,F.OrderCountBase
,F.Date_Created
,F.Total
,F.TotalOrdersBase
FROM #FullBase F
ORDER BY f.CustomerEmail
If anyone can give me some suggestions, that would be greatly appreciated.
And then maybe I can calculate value differences (in percentage). So for example, ylove spent $36 on their first order, $41 on their second which is a 13% increase. Then their second order was $140 which is a 341% increase. So on average, this customer increased their purchase order value by 177%. Unrelated to SQL, but is this the correct way of calculating a metric like this?
looking to your sample you clould try using the diff form min and max date divided by total
select email, datediff(day, min(Order_Date), max(Order_Date))/(total-1) as avg_days
from your_table
group by email
and for manage also the one order only
select email,
case when total-1 > 0 then
datediff(day, min(Order_Date), max(Order_Date))/(total-1)
else datediff(day, min(Order_Date), max(Order_Date)) end as avg_days
from your_table
group by email
The simplest formulation is:
select email,
datediff(day, min(Order_Date), max(Order_Date)) / nullif(total-1, 0) as avg_days
from t
group by email;
You can see this is the case. Consider three orders with od1, od2, and od3 as the order dates. The average is:
( (od2 - od1) + (od3 - od2) ) / 2
Check the arithmetic:
--> ( od2 - od1 + od3 - od2 ) / 2
--> ( od3 - od1 ) / 2
This pretty obviously generalizes to more orders.
Hence the max() minus min().

SQL : How to count number of times each ID exists continuously from previous period

My SQL data set is like this;
Date firm_id
======================
2010-01 1
2010-01 2
2010-01 3
----------------------
2010-02 1
2010-02 2
----------------------
2010-03 1
2010-03 2
2010-03 3
----------------------
2010-04 1
2010-04 3
How can I create a variable, name firm_age, to represent age of firms existing continuously from the previous period? like this,
Date firm_id firm_age
=================================
2010-01 1 0
2010-01 2 0
2010-01 3 0
-----------------------------------
2010-02 1 1
2010-02 2 1
-----------------------------------
2010-03 1 2
2010-03 2 2
2010-03 3 0
-----------------------------------
2010-04 1 3
2010-04 3 1
Thank you
This is a use case for the PACK operator from "Time & Relational Theory", which is not supported, at least not directly, in SQL.
You are trying to find [for each given row of the table] the smallest month such that there does not exist any intervening month between that smallest month and the month of the given row such that the company of the given row did not exist at that intervening month. Given two months, assessing the [non-]existence of such an intervening month is relatively trivial, however, finding the smallest month that makes the condition true for all intervening months is another order (*). I wouldn't try to do this completely in plain SQL.
(*) which set of months are you going to SELECT that "smallest month" from ? You cannot rely on the fact that all months will be mentioned in your table as there is always the slight theoretical possibility that one particular month, no companies existed at all. (This possibility also breaks any attack on the problem based on window functions ans row_numbers.)
This is a gaps-and-islands problem. You want "islands" where the values are sequential. Then you want to enumerate them. You can use row_number() for this:
select t.*,
row_number() over (partition by firm_id, date - seqnum * interval '1 month'
order by date
) as firm_age
from (select t.*,
row_number() over (partition by firm_id order by date) as seqnum
from t
) t;
Note that date functions are not standard across databases. This makes some assumptions about the data representation, but the idea for the processing should work in almost any database.