Create interval from discrete dates - sql

I have a function which saves the current status of several objects and writes it in a table, which looks like something like this:
ObjectId StatusId Date
1 10 2020-04-04 00:00:00.000
2 10 2020-04-04 00:00:00.000
1 11 2020-04-05 00:00:00.000
2 10 2020-04-05 00:00:00.000
1 10 2020-04-06 00:00:00.000
2 10 2020-04-06 00:00:00.000
I would like to make it an interval grouped by ObjectId and StatusId.
So for the above the preferred output would look like this:
ObjectId StatusId StartDate EndDate
1 10 2020-04-04 00:00:00.000 2020-04-04 00:00:00.000
1 11 2020-04-05 00:00:00.000 2020-04-05 00:00:00.000
1 10 2020-04-06 00:00:00.000 2020-04-06 00:00:00.000
2 10 2020-04-04 00:00:00.000 2020-04-06 00:00:00.000
Note one object can have the same status on multiple occasions but if it had a different status it needs to be in a separate interval. So simple group by and max(Date) doesn't work in my case.
Thanks in advance.

This is a form of gaps-and-islands. For this purpose, the difference of row numbers is probably the simplest method:
select objectid, status, min(date), max(date)
from (select t.*,
row_number() over (partition by objectid order by date) as seqnum,
row_number() over (partition by objectid, status order by date) as seqnum_2
from t
) t
group by objectid, status, (seqnum - seqnum_2);
Why this works can be a little cumbersome to explain. However, if you look at the results of the subquery, you will see how the difference is constant for the groups you want to identify.

Related

How to find the number of events for the first 24 hours for each user id

I'm working on snowflake to solve a problem. I wanted to find the number of events for the first 24 hours for each user id.
This is a snippet of the database table I'm working on. I modified the table and used a date format without the time for simplification purposes.
user_id
client_event_time
1
2022-07-28
1
2022-07-29
1
2022-08-21
2
2022-07-29
2
2022-07-30
2
2022-08-03
I used the following approach to find the minimum event time per user_id.
SELECT user_id, client_event_time,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY client_event_time) row_number,
MIN(client_event_time) OVER (PARTITION BY user_id) MinEventTime
FROM Data
ORDER BY user_id, client_event_time;
user_id
client_event_time
row_number
MinEventTime
1
2022-07-28
1
2022-07-28
1
2022-07-29
2
2022-07-28
1
2022-08-21
3
2022-07-28
2
2022-07-29
1
2022-07-29
2
2022-07-30
2
2022-07-29
2
2022-08-03
3
2022-07-29
Then I tried to find the difference between the minimum event time and client_event_time, and if the difference is less than or equal to 24, I counted the client_event_time.
with NewTable as (
(SELECT user_id,client_event_time, event_type,
row_number() over (partition by user_id order by CLIENT_EVENT_TIME) row_number,
MIN(client_event_time) OVER (PARTITION BY user_id) MinEventTime
FROM Data
ORDER BY user_id, client_event_time))
SELECT user_id,
COUNT(case when timestampdiff(hh, client_event_time, MinEventTime) <= 24 then 1 else 0 end) AS duration
FROM NEWTABLE
GROUP BY user_id
I got the following result:
user_id
duration
1
3
2
3
I wanted to find the following result:
user_id
duration
1
2
2
2
Could you please help me solve this problem? Thanks!
This looks like a problem for windowed functions! I like them a lot.
Here's you sample data
DECLARE #table TABLE (user_id INT, client_event_time DATETIME)
INSERT INTO #table (user_id, client_event_time) VALUES
(1, '2022-07-28 13:30:00'),
(1, '2022-07-29 08:30:00'),
(1, '2022-08-21 12:34:56'),
(2, '2022-07-29 08:30:00'),
(2, '2022-07-30 13:30:00'),
(2, '2022-08-03 12:34:56')
I added some hours to it, so we can look at 24 hour windows more easily. For user_id 1 we can see they had 2 events in the 24 hours after their initial one. For user_id 2 there was only the first one. We can capture that with a MIN OVER, along with the actual datetimes.
SELECT user_id, MIN(client_event_time) OVER (PARTITION BY user_id) AS FirstEventDateTime, client_event_time
FROM #table
user_id FirstEventDateTime client_event_time
-------------------------------------------------------
1 2022-07-28 13:30:00.000 2022-07-28 13:30:00.000
1 2022-07-28 13:30:00.000 2022-07-29 08:30:00.000
1 2022-07-28 13:30:00.000 2022-08-21 12:34:56.000
2 2022-07-29 08:30:00.000 2022-07-29 08:30:00.000
2 2022-07-29 08:30:00.000 2022-07-30 13:30:00.000
2 2022-07-29 08:30:00.000 2022-08-03 12:34:56.000
Now we have the first datetime and each rows datetime in the resultset together, we can make a comparison:
SELECT user_id, MIN(client_event_time) OVER (PARTITION BY user_id) AS FirstEventDateTime, client_event_time, CASE WHEN DATEDIFF(HOUR,MIN(client_event_time) OVER (PARTITION BY user_id), client_event_time) < 24 THEN 1 ELSE 0 END AS EventsInFirst24Hours
FROM #table
user_id FirstEventDateTime client_event_time EventsInFirst24Hours
----------------------------------------------------------------------------
1 2022-07-28 13:30:00.000 2022-07-28 13:30:00.000 1
1 2022-07-28 13:30:00.000 2022-07-29 08:30:00.000 1
1 2022-07-28 13:30:00.000 2022-08-21 12:34:56.000 0
2 2022-07-29 08:30:00.000 2022-07-29 08:30:00.000 1
2 2022-07-29 08:30:00.000 2022-07-30 13:30:00.000 0
2 2022-07-29 08:30:00.000 2022-08-03 12:34:56.000 0
Now we have an indicator telling us which events occurred in the first 24 hours, all we really need is to sum it, but SQL Server is mean about using a windowed function in another aggregate, so we need to cheat and put it into a subquery.
SELECT user_id, SUM(EventsInFirst24Hours) AS CountOfEventsInFirst24Hours
FROM (
SELECT user_id, MIN(client_event_time) OVER (PARTITION BY user_id) AS FirstEventDateTime, client_event_time, CASE WHEN DATEDIFF(HOUR,MIN(client_event_time) OVER (PARTITION BY user_id), client_event_time) < 24 THEN 1 ELSE 0 END AS EventsInFirst24Hours
FROM #table
) a
GROUP BY user_id
And that gets us to the result:
user_id CountOfEventsInFirst24Hours
-----------------------------------
1 2
2 1
A little about what's going on with the windowed function:
MIN - the aggregation we want it to do. The common aggregate functions have windowed counterparts.
(client_event_time) - the value we want to do it to.
OVER (PARTITION BY user_id) - the window we want to set up. In this case we want to know the minimum datetime for each of the user_ids.
We can partition by as many columns as we'd like.
You can also use an ORDER BY with as many columns as you'd like, but that was not necessary here. Ex:
OVER (PARTITION BY column1, column2 ORDER BY column4, column5 DESC)
Partition (or group by) column1 and column2 and order by column4 and column5 descending.
Easier done with a qualify
with cte as
(select *
from mytable
qualify event_time<=min(event_time) over (partition by user_id) + interval '24 hours')
select user_id, count(*) as counts
from cte
group by user_id
If you want the count of events around 24 hours of the minimun event time, you canuse a group by CTE that givbes you all the minumum event tomes for all users
the rest is to get all the rows that are in the tme limit
WITH min_data as
(SELECT user_id,MIN(client_event_time) mindate FROM data GROUP BY user_id)
SELECT d.user_id, COUNT(*)
FROM data d JOIN min_data md ON d.user_id = md.user_id WHERE client_event_time <= mindate + INTERVAL '24 hour'
GROUP BY d.user_id
ORDER BY d.user_id
user_id
count
1
2
2
2

sql query using time series

I have the below table in bigquery:
Timestamp variant_id activity
2020-04-02 08:50 1 active
2020-04-03 07:39 1 not_active
2020-04-04 07:40 1 active
2020-04-05 10:22 2 active
2020-04-07 07:59 2 not_active
I want to query this subset of data to get the number of active variant per day.
If variant_id 1 is active at date 2020-04-04, it still active the follwing dates also 2020-04-05, 2020-04-06 until the value activity column is not_active , the goal is to count each day the number of variant_id who has the value active in the column activity, but I should take into account that each variant_id has the value of the last activity on a specific date.
for example the result of the desired query in the subset data must be:
Date activity_count
2020-04-02 1
2020-04-03 0
2020-04-04 1
2020-04-05 2
2020-04-06 2
2020-04-07 1
2020-04-08 1
2020-04-09 1
2020-04-10 1
any help please ?
Consider below approach
select date, count(distinct if(activity = 'active', variant_id, null)) activity_count
from (
select date(timestamp) date, variant_id, activity,
lead(date(timestamp)) over(partition by variant_id order by timestamp) next_date
from your_table
), unnest(generate_date_array(date, ifnull(next_date - 1, '2020-04-10'))) date
group by date
if applied to sample data in your question - output is

SQL Select with grouping and replacing a column

I have a requirement in which I need to retrieve rows in a select query in which I have to get value of END_DATE as EFFECTIVE_DATE -1 DAY for the records with same key (CARD_NBR in this case)
I have tried using it by GROUP by but I am not able to get the desired output. Could someone please help in guiding me ? The record with most recent effective date should have END_DATE as 9999-12-31 only.
Table:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
9999-12-31
12345
2
2021-01-25
9999-12-31
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
9999-12-31
67899
2
2021-04-02
9999-12-31
67899
3
2021-05-24
9999-12-31
Output:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
2021-01-24
12345
2
2021-01-25
2021-02-14
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
2021-04-01
67899
2
2021-04-02
2021-05-24
67899
3
2021-05-24
9999-12-31
You can use lead():
select t.*,
lead(effective_date - interval '1 day', 1, effective_date) over (partition by card_nbr order by effective_date) as imputed_end_date
from t;
Date manipulations are highly database-dependent so this uses Standard SQL syntax. You can incorporate this into an update, but the best approach also depends on the database.
SQLite v.3.25 now supports windows function and you can use below code to get your result.
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
COALESCE(B.START_DT,A.END_DT) AS END_DT
FROM
(
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
A.END_DT,
ROW_NUMBER() OVER(PARTITION BY A.CARD_NBR ORDER BY A.SRL_NO ASC) RNUM1
FROM T1 A
)A
LEFT JOIN
(
SELECT B.CARD_NBR,
B.SRL_NO,
B.START_DT,
B.END_DT,
ROW_NUMBER() OVER(PARTITION BY B.CARD_NBR ORDER BY B.SRL_NO ASC) RNUM1
FROM T1 B
)B
ON A.CARD_NBR=B.CARD_NBR
AND A.RNUM1+1=B.RNUM1

create a row index column beginning at -1

I need to create a row index column that begins at -1 so i can query the previous day's balance. My current query:
select TRANSDATE, sum(convert(float,AMOUNTMST-SETTLEAMOUNTMST)) as Balance
from [AX2cTestStage].[dbo].[CUSTTRANS_V]
group by TRANSDATE
order by TRANSDATE asc
TRANSDATE Balance
2019-04-12 00:00:00.000 -22591.47
2019-04-15 00:00:00.000 -394.95
2019-04-25 00:00:00.000 -1776
2019-04-26 00:00:00.000 -11973.84
2019-04-29 00:00:00.000 -24230.16
2019-05-02 00:00:00.000 -10695.39
This is what i need:
TRANSDATE Balance Row Index
2019-04-12 00:00:00.000 -22591.47 -1
2019-04-15 00:00:00.000 -394.95 0
2019-04-25 00:00:00.000 -1776 1
2019-04-26 00:00:00.000 -11973.84 2
2019-04-29 00:00:00.000 -24230.16 3
2019-05-02 00:00:00.000 -10695.39 4
I have tried to declare a variable as the row index
declare #row_num as int = -1
select TRANSDATE, sum(convert(float,AMOUNTMST-SETTLEAMOUNTMST)) as Balance, #row_num += 1 as Row Index
from [AX2cTestStage].[dbo].[CUSTTRANS_V]
group by TRANSDATE
i receive this error:
A SELECT statement that assigns a value to a variable must not be combined with data-retrieval operations.
after declaring a variable for each field I still receives errors. Is there an easier way to accomplish this? thanks
You can use ROW_NUMBER(). For example:
select
TRANSDATE,
sum(convert(float,AMOUNTMST-SETTLEAMOUNTMST)) as Balance,
row_number() over(order by TRANSDATE) - 2 as Row Index
from [AX2cTestStage].[dbo].[CUSTTRANS_V]
group by TRANSDATE

Select next subsequent change of certain column in a new column

I have a table with a unique index on Contracts of Customers that live in Houses. I want to know the days per house how long it takes when someone moves out (Contract end date) and a new contracts starts. For that I want to know what the first next contract will be in that house, but on the same row as the old contract for a (potentially different) customer.
This how the table currently looks like, I select the top 10 here:
SELECT TOP 10
PMCCONTRACT.ACCOUNTNUM --Customer
,PMCCONTRACT.RENTALOBJECTID --House
,PMCCONTRACT.CONTRACTID --Contract & Unique index of the table
,PMCCONTRACT.VALIDFROM --Contract Start Date
,PMCCONTRACT.VALIDTO --Contract End Date
FROM PMCCONTRACT
Then this rolls out:
ACCOUNTNUM RENTALOBJECTID CONTRACTID VALIDFROM VALIDTO
101852 2488 HC000001 1994-03-01 00:00:00.000 NULL
101136 2489 HC000002 1920-01-01 00:00:00.000 NULL
101352 2491 HC000003 1996-09-16 00:00:00.000 NULL
100687 2492 HC000004 1984-11-01 00:00:00.000 NULL
105160 2499 HC000005 1975-05-02 00:00:00.000 2018-01-31 00:00:00.000
102821 2501 HC000006 1997-09-16 00:00:00.000 NULL
100731 2506 HC000007 1920-01-01 00:00:00.000 2018-11-15 00:00:00.000
102797 2508 HC000008 1998-02-01 00:00:00.000 NULL
102155 2512 HC000009 1981-09-01 00:00:00.000 NULL
102563 2515 HC000010 1965-10-17 00:00:00.000 2017-06-30 00:00:00.000
And what I want is that based on the RENTALOBJECTID it will show what the First Next contract on that house was (so it is important that the CONTRACTID remains unique in this table).
Below is the code I use to get it, however, it shows all the following contract changes for that specific RENTALOBJECTID (House).
SELECT --TOP 1000
PMCCONTRACT.CONTRACTID
,PMCCONTRACT.RENTALOBJECTID
,PMCCONTRACT.VALIDFROM
,PMCCONTRACT.VALIDTO
,P2.CONTRACTID AS 'FirstNextContractId'
,P2.VALIDFROM
,P2.VALIDTO
FROM PMCCONTRACT
LEFT JOIN PMCCONTRACT P2
ON PMCCONTRACT.RENTALOBJECTID = P2.RENTALOBJECTID
LEFT JOIN
(SELECT
RENTALOBJECTID,
MAX(CONTRACTID) AS CONTRACTID
FROM PMCCONTRACT
GROUP BY RENTALOBJECTID) X ON X.CONTRACTID = P2.CONTRACTID
WHERE P2.VALIDFROM > PMCCONTRACT.VALIDTO
This is what I get when I select only ContractID HC000028, it shows 2 rows, while I want it to show only the first row.
CONTRACTID RENTALOBJECTID VALIDFROM VALIDTO FirstNextContractId VALIDFROM2 VALIDTO2
HC000028 75 1995-01-01 00:00:00.000 2016-04-30 00:00:00.000 HC009990 2016-05-01 00:00:00.000 2018-11-25 00:00:00.000 --<< Only row I want to show
HC000028 75 1995-01-01 00:00:00.000 2016-04-30 00:00:00.000 HC025218 2018-11-26 00:00:00.000 1900-01-01 00:00:00.000 --Too far in the future
Kind regards,
Igor
It looks like a simple LEAD window function is enough. It returns the next row, as defined by partitioning and ordering clauses.
SELECT TOP 10
PMCCONTRACT.ACCOUNTNUM --Customer
,PMCCONTRACT.RENTALOBJECTID --House
,PMCCONTRACT.CONTRACTID --Contract & Unique index of the table
,PMCCONTRACT.VALIDFROM --Contract Start Date
,PMCCONTRACT.VALIDTO --Contract End Date
,LEAD(CONTRACTID) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS NextContractID
,LEAD(VALIDFROM) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS NextVALIDFROM
,LEAD(VALIDTO) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS NextVALIDTO
FROM PMCCONTRACT
;