Analytical function range window for max date interval

Analytical function range window for max date interval - sql

I am trying to get 10 minute interval data from latest date of each group or partition.
Pseudo code SQL:
Select
count(1) Over( partition by col1, col2, col3
Order by Col_Date Desc
Range Max(Col_Date) Between Max(Col_Date) - 10(24*60) ) col_upd
From
Table_1;
Values out of of this particular range will have need assign number to set for delete.
2014-01-05 01:20:00 -- Max date
2014-01-05 01:15:13
2014-01-05 01:12:13
2014-01-05 01:07:13 -- 1) these last two rows should be set for
2014-01-05 01:06:13 -- 2) delete or assign same id
Is there any analytical function way to approach this?

You haven't given table structures, but if I make up a dummy table like:
create table t42 (id number, grp_id number, dt date);
insert into t42 values (1, 1, timestamp '2014-01-05 01:20:00');
insert into t42 values (2, 1, timestamp '2014-01-05 01:15:13');
insert into t42 values (3, 1, timestamp '2014-01-05 01:12:13');
insert into t42 values (4, 1, timestamp '2014-01-05 01:07:13');
insert into t42 values (5, 1, timestamp '2014-01-05 01:06:13');
Then this will give you the age of each row in the group compared to its (analytic) max:
select grp_id, id, dt, max(dt) over (partition by grp_id) - dt as age
from t42
order by id;
GRP_ID ID DT AGE
---------- ---------- ------------------- ------------
1 1 2014-01-05 01:20:00 0
1 2 2014-01-05 01:15:13 .00332175926
1 3 2014-01-05 01:12:13 .00540509259
1 4 2014-01-05 01:07:13 .00887731481
1 5 2014-01-05 01:06:13 .00957175926
And you can use that as an inner query and filter out records up to 10 minutes old:
select grp_id, id, dt
from (
select grp_id, id, dt, max(dt) over (partition by grp_id) - dt as age
from t42
)
where age > (10*60)/(24*60*60)
order by id;
GRP_ID ID DT
---------- ---------- -------------------
1 4 2014-01-05 01:07:13
1 5 2014-01-05 01:06:13
And you can then use those up delete/update as needed. It's not clear from your question if your group/partition is already being calculated from an inner query; if so you can just use that instead of my t42 table. (Changing column names etc., of course).

Related

Extract the number of daily users from table

Given a start date and end date for every user I would like to count the daily number of users on the platform:
ID
START
END
1
2022-12-01
2022-12-03
2
2022-12-01
2022-12-01
I want to get an output like this:
DATE
NUMBER
2022-12-01
2
2022-12-02
1
2022-12-03
1

Make a list of all the dates (generate_series) and count for each of them.
with the_table(id, dstart, dend) as
(
values
(1, '2022-12-01'::date, '2022-12-03'::date),
(2, '2022-12-01', '2022-12-01')
)
select d::date as "DATE",
(select count(*) from the_table where d between dstart and dend) as "NUMBER"
from generate_series('2022-12-01'::date,'2022-12-03'::date,interval '1 day') as d;
Alternative
with the_table(id,dstart,dend) as
(
values
(1, '2022-12-01'::date, '2022-12-03'::date),
(2, '2022-12-01', '2022-12-01')
),
d (id, dlogged) as
(
select id, generate_series(dstart,dend,interval '1 day')::date
from the_table
)
select dlogged as "DATE", count(*) as "NUMBER"
from d group by dlogged;

How to get min and max from 2 tables in SQL

I am Trying to get start date from min ID (ID=1) and end date from max ID (ID=3) but i am not sure how i can retrieve. Following is my data -
Table1 and Table2 are source table. I am trying to get output like 3rd table.
My requirement is get start date from first record of ID and End Date from last record of ID, we can recognize first and and last record with the help of ID field. If ID is min means first record and ID is max then last record
Please help me!

Here's one option; presuming you use Oracle (regarding you use Oracle SQL Developer), the x inline view selects
start_date which belongs to name with the lowest ID column value for that name (i.e. first_value partition by name order by id)
end_date which belongs to name with the highest ID column value for that name (i.e. first_value partition by name order by id DESC)
SQL> with
2 -- sample data
3 t1 (pid, name) as
4 (select 123, 'xyz' from dual union all
5 select 234, 'pqr' from dual
6 ),
7 t2 (id, name, start_date, end_date) as
8 (select 1, 'xyz', date '2020-01-01', date '2020-07-20' from dual union all
9 select 2, 'xyz', date '2020-02-01', date '2020-05-30' from dual union all
10 select 3, 'xyz', date '2020-06-30', date '2020-07-30' from dual union all
11 --
12 select 1, 'pqr', date '2020-04-30', date '2020-09-30' from dual union all
13 select 2, 'pqr', date '2020-05-30', date '2020-09-30' from dual union all
14 select 3, 'pqr', date '2020-06-30', date '2020-07-01' from dual
15 )
16 select a.pid,
17 x.name,
18 max(x.start_date) start_date,
19 max(x.end_date) end_date
20 from t1 a join
21 (
22 -- start_date: always for the lowest T2.ID value row
23 -- end_date : always for the highest T2.ID value row
24 select b.name,
25 first_value(b.start_date) over (partition by b.name order by b.id ) start_date,
26 first_value(b.end_date) over (partition by b.name order by b.id desc) end_date
27 from t2 b
28 ) x
29 on a.name = x.name
30 group by a.pid,
31 x.name
32 order by a.pid;
PID NAME START_DATE END_DATE
---------- ---- ---------- ----------
123 xyz 01/01/2020 07/30/2020
234 pqr 04/30/2020 07/01/2020
SQL>

Column with exact date and time of recording

How to Add to a Database Table the Exact Date and Time, Adding a New Record to this Table
SELECT * FROM test;
ID NAME DT
---------- ---------- ----------
1 Ana 01.01.2019 00:00:00
2 Ina 01.01.2019 00:00:00
I want the exact time when this one was created
example:
ID NAME DT
---------- ---------- ----------
1 Ana 01.01.2019 10:41:22
2 Ina 01.01.2019 10:45:17
CREATE TABLE table
(
Id NUMBER(10),
Name varcahar2(10),
DT date
);

CREATE TABLE test
(
Id NUMBER(10),
Name varchar2(10),
DT date
);
Insert data:
INSERT INTO test values (1, 'Ana', sysdate);
INSERT INTO test values (2, 'Ina', sysdate);
COMMIT;
Query results:
SELECT id, name, TO_CHAR(dt, 'dd.mm.yyyy hh24:mi:ss') FROM test;
ID NAME TO_CHAR(DT,'DD.MM.Y
---------- ---------- -------------------
1 Ana 02.09.2019 10:07:18
2 Ina 02.09.2019 10:07:18

Oracle Setup:
CREATE TABLE table_name(
Id NUMBER(10),
Name VARCHAR2(10),
DT DATE
);
Option 1:
Use a DATE and INTERVAL literals:
INSERT INTO table_name ( id, name, dt )
VALUES ( 1, 'Ana', DATE '2019-01-01' + INTERVAL '10:41:22' HOUR TO SECOND );
Option 2:
Use TO_DATE and convert from a string:
INSERT INTO table_name ( id, name, dt )
VALUES ( 2, 'Ina', TO_DATE( '2019-01-01 10:45:17', 'YYYY-MM-DD HH24:MI:SS' ) );
Option 3:
Use a TIMESTAMP literal:
INSERT INTO table_name ( id, name, dt )
VALUES ( 3, 'Ona', TIMESTAMP '2019-01-01 10:49:12' );
Option 4:
If you want the current date & time then use SYSDATE:
INSERT INTO table_name ( id, name, dt )
VALUES ( 4, 'Una', SYSDATE );
or CURRENT_DATE:
INSERT INTO table_name ( id, name, dt )
VALUES ( 4, 'Una', CURRENT_DATE );
Output:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';
SELECT * FROM table_name;
or
SELECT id, name, TO_CHAR( dt, 'YYYY-MM-DD HH24:MI:SS' ) AS dt FROM table_name;
Outputs:
ID | NAME | DT
-: | :--- | :------------------
1 | Ana | 2019-01-01 10:41:22
2 | Ina | 2019-01-01 10:45:17
3 | Ona | 2019-01-01 10:49:12
4 | Una | 2019-09-02 09:22:28
db<>fiddle here

use TIMESTAMP
CREATE TABLE table
(
Id NUMBER(10),
Name varchar2(10),
DT TIMESTAMP
);

Insert current date with your data, and use analytical function to compare time since last event.
CREATE TABLE so_test
(
Id NUMBER(10),
Name varchar2(10),
DT date
);
insert into so_test values (1, 'Scott', sysdate);
insert into so_test values (2, 'Andrii', sysdate);
insert into so_test values (3, 'Zaynul', sysdate);
select id, name, dt
,round((dt-LAG(dt) OVER (ORDER BY id))*86400) seconds_diff
from so_test
ID NAME DT SECONDS_DIFF
---------- ---------- ------------------- ------------
1 Scott 02-09-2019 16:02:47
2 Andrii 02-09-2019 16:02:58 11
3 Zaynul 02-09-2019 16:03:12 14

Compare dates in different columns and rows dynamically in SQL

I have a set of data like this.
Data
ID Start_dt End_dt
A 1/1/2010 12/31/2010
A 1/1/2011 12/31/2011
A 6/1/2012 12/31/2012
A 1/1/2014 12/31/2014
A 1/1/2016 10/31/2016
A 1/1/2018 12/31/2018
B 1/1/2016 2/29/2016
B 3/1/2016 10/31/2016
B 1/1/2017 7/31/2017
B 1/1/2019 12/31/9999
C 1/1/2017 12/31/2017
C 1/1/2017 12/31/2018
C 1/1/2019 12/31/9999
I need to create a query that looks at each member's row, compares the current Start_dt against the previous End_dt. If the difference is less than one year, treat those 2 records as one continuous enrollment and return the combined MIN Start_dt and MAX End_dt, and repeat that for all rows for each member. If the difference is >=1 year, treat that as separate enrollment.
Desired result
ID Start_dt End_dt
A 1/1/2010 12/31/2012
A 1/1/2014 12/31/2014
A 1/1/2016 10/31/2016
A 1/1/2018 12/31/2018
B 1/1/2016 7/31/2017
B 1/1/2019 12/31/2019
C 1/1/2017 12/31/9999
Here's a Create Table query:
if OBJECT_ID ('tempdb..#test1') is not null
drop table #test1
CREATE TABLE #test1 (
ID varchar(10),
Start_dt datetime,
End_dt datetime
);
INSERT INTO #test1 VALUES ('A', '1/1/2010', '12/31/2010')
,('A', '1/1/2011', '12/31/2011')
,('A', '6/1/2012', '12/31/2012')
,('A', '1/1/2014', '12/31/2014')
,('A', '1/1/2016', '10/31/2016')
,('A', '1/1/2018', '12/31/2018')
,('B', '1/1/2016', '2/29/2016')
,('B', '3/1/2016', '10/31/2016')
,('B', '1/1/2017', '7/31/2017')
,('B', '1/1/2019', '12/31/9999')
,('C', '1/1/2017', '12/31/2017')
,('C', '1/1/2017', '12/31/2018')
,('C', '1/1/2019', '12/31/2999')
I've been trying to solve this for days but have tried self-joins, loops but have not found a good solution. Can someone help?
Thank you!

You can use lag() or a cumulative max() to get the previous end date. Then compare it to the current start date.
When the difference is more than a year, then a new group starts. Do a cumulative sum of these new group starts to get a grouping id.
And the rest is aggregation:
select id, min(start_dt), max(end_dt)
from (select t1.*,
sum(case when prev_end_dt > dateadd(year, -1, start_dt) then 0 else 1 end) over
(partition by id order by start_dt) as grp
from (select t1.*,
max(end_dt) over (partition by id
order by start_dt
rows between unbounded preceding and 1 preceding
) as prev_end_dt
from test1 t1
) t1
) t1
group by id, grp
order by id, min(start_dt);

You could try this query
SELECT ID, StartDate, End_dt AS EndDate
FROM (
SELECT *
, LAG(End_dt) OVER(PARTITION BY ID ORDER BY ID, Start_dt, End_dt) AS PrevEnd
, DATEDIFF(DAY, LAG(End_dt) OVER(PARTITION BY ID ORDER BY ID, Start_dt, End_dt), Start_dt) AS DaysBreak
, (
CASE
WHEN DATEDIFF(DAY, LAG(End_dt) OVER(PARTITION BY ID ORDER BY ID, Start_dt, End_dt), Start_dt) > 365 THEN Start_dt
WHEN LAG(End_dt) OVER(PARTITION BY ID ORDER BY ID, Start_dt, End_dt) IS NULL THEN Start_dt
ELSE NULL
END
) AS StartDate
FROM #test1
) a
WHERE StartDate IS NOT NULL

Determine contiguous date intervals

I have the following table structure:
id int -- more like a group id, not unique in the table
AddedOn datetime -- when the record was added
For a specific id there is at most one record each day. I have to write a query that returns contiguous (at day level) date intervals for each id.
The expected result structure is:
id int
StartDate datetime
EndDate datetime
Note that the time part of AddedOn is available but it is not important here.
To make it clearer, here is some input data:
with data as
(
select * from
(
values
(0, getdate()), --dummy record used to infer column types
(1, '20150101'),
(1, '20150102'),
(1, '20150104'),
(1, '20150105'),
(1, '20150106'),
(2, '20150101'),
(2, '20150102'),
(2, '20150103'),
(2, '20150104'),
(2, '20150106'),
(2, '20150107'),
(3, '20150101'),
(3, '20150103'),
(3, '20150105'),
(3, '20150106'),
(3, '20150108'),
(3, '20150109'),
(3, '20150110')
) as d(id, AddedOn)
where id > 0 -- exclude dummy record
)
select * from data
And the expected result:
id StartDate EndDate
1 2015-01-01 2015-01-02
1 2015-01-04 2015-01-06
2 2015-01-01 2015-01-04
2 2015-01-06 2015-01-07
3 2015-01-01 2015-01-01
3 2015-01-03 2015-01-03
3 2015-01-05 2015-01-06
3 2015-01-08 2015-01-10
Although it looks like a common problem I couldn't find a similar enough question. Also I'm getting closer to a solution and I will post it when (and if) it works but I feel that there should be a more elegant one.

Here's answer without any fancy joining, but simply using group by and row_number, which is not only simple but also more efficient.
WITH CTE_dayOfYear
AS
(
SELECT id,
AddedOn,
DATEDIFF(DAY,'20000101',AddedOn) dyID,
ROW_NUMBER() OVER (ORDER BY ID,AddedOn) row_num
FROM data
)
SELECT ID,
MIN(AddedOn) StartDate,
MAX(AddedOn) EndDate,
dyID-row_num AS groupID
FROM CTE_dayOfYear
GROUP BY ID,dyID - row_num
ORDER BY ID,2,3
The logic is that the dyID is based on the date so there are gaps while row_num has no gaps. So every time there is a gap in dyID, then it changes the difference between row_num and dyID. Then I simply use that difference as my groupID.

In Sql Server 2008 it is a little bit pain without LEAD and LAG functions:
WITH data
AS ( SELECT * ,
ROW_NUMBER() OVER ( ORDER BY id, AddedOn ) AS rn
FROM ( VALUES ( 0, GETDATE()), --dummy record used to infer column types
( 1, '20150101'), ( 1, '20150102'), ( 1, '20150104'),
( 1, '20150105'), ( 1, '20150106'), ( 2, '20150101'),
( 2, '20150102'), ( 2, '20150103'), ( 2, '20150104'),
( 2, '20150106'), ( 2, '20150107'), ( 3, '20150101'),
( 3, '20150103'), ( 3, '20150105'), ( 3, '20150106'),
( 3, '20150108'), ( 3, '20150109'), ( 3, '20150110') )
AS d ( id, AddedOn )
WHERE id > 0 -- exclude dummy record
),
diff
AS ( SELECT d1.* ,
CASE WHEN ISNULL(DATEDIFF(dd, d2.AddedOn, d1.AddedOn),
1) = 1 THEN 0
ELSE 1
END AS diff
FROM data d1
LEFT JOIN data d2 ON d1.id = d2.id
AND d1.rn = d2.rn + 1
),
parts
AS ( SELECT * ,
( SELECT SUM(diff)
FROM diff d2
WHERE d2.rn <= d1.rn
) AS p
FROM diff d1
)
SELECT id ,
MIN(AddedOn) AS StartDate ,
MAX(AddedOn) AS EndDate
FROM parts
GROUP BY id ,
p
Output:
id StartDate EndDate
1 2015-01-01 00:00:00.000 2015-01-02 00:00:00.000
1 2015-01-04 00:00:00.000 2015-01-06 00:00:00.000
2 2015-01-01 00:00:00.000 2015-01-04 00:00:00.000
2 2015-01-06 00:00:00.000 2015-01-07 00:00:00.000
3 2015-01-01 00:00:00.000 2015-01-01 00:00:00.000
3 2015-01-03 00:00:00.000 2015-01-03 00:00:00.000
3 2015-01-05 00:00:00.000 2015-01-06 00:00:00.000
3 2015-01-08 00:00:00.000 2015-01-10 00:00:00.000
Walkthrough:
diff
This CTE returns data:
1 2015-01-01 00:00:00.000 1 0
1 2015-01-02 00:00:00.000 2 0
1 2015-01-04 00:00:00.000 3 1
1 2015-01-05 00:00:00.000 4 0
1 2015-01-06 00:00:00.000 5 0
You are joining same table on itself to get the previous row. Then you calculate difference in days between current row and previous row and if the result is 1 day then pick 0 else pick 1.
parts
This CTE selects result from previous step and sums up the new column(it is a cumulative sum. sum of all values of new column from starting till current row), so you are getting partitions to group by:
1 2015-01-01 00:00:00.000 1 0 0
1 2015-01-02 00:00:00.000 2 0 0
1 2015-01-04 00:00:00.000 3 1 1
1 2015-01-05 00:00:00.000 4 0 1
1 2015-01-06 00:00:00.000 5 0 1
2 2015-01-01 00:00:00.000 6 0 1
2 2015-01-02 00:00:00.000 7 0 1
2 2015-01-03 00:00:00.000 8 0 1
2 2015-01-04 00:00:00.000 9 0 1
2 2015-01-06 00:00:00.000 10 1 2
2 2015-01-07 00:00:00.000 11 0 2
3 2015-01-01 00:00:00.000 12 0 2
3 2015-01-03 00:00:00.000 13 1 3
The last step is just a grouping by ID and new column and picking min and max values for dates.

I took the "Islands Solution #3 from SQL MVP Deep Dives" solution from https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/ and applied to your test data:
with
data as
(
select * from
(
values
(0, getdate()), --dummy record used to infer column types
(1, '20150101'),
(1, '20150102'),
(1, '20150104'),
(1, '20150105'),
(1, '20150106'),
(2, '20150101'),
(2, '20150102'),
(2, '20150103'),
(2, '20150104'),
(2, '20150106'),
(2, '20150107'),
(3, '20150101'),
(3, '20150103'),
(3, '20150105'),
(3, '20150106'),
(3, '20150108'),
(3, '20150109'),
(3, '20150110')
) as d(id, AddedOn)
where id > 0 -- exclude dummy record
)
,CTE_Seq
AS
(
SELECT
ID
,SeqNo
,SeqNo - ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SeqNo) AS rn
FROM
data
CROSS APPLY
(
SELECT DATEDIFF(day, '20150101', AddedOn) AS SeqNo
) AS CA
)
SELECT
ID
,DATEADD(day, MIN(SeqNo), '20150101') AS StartDate
,DATEADD(day, MAX(SeqNo), '20150101') AS EndDate
FROM CTE_Seq
GROUP BY ID, rn
ORDER BY ID, StartDate;
Result set
ID StartDate EndDate
1 2015-01-01 00:00:00.000 2015-01-02 00:00:00.000
1 2015-01-04 00:00:00.000 2015-01-06 00:00:00.000
2 2015-01-01 00:00:00.000 2015-01-04 00:00:00.000
2 2015-01-06 00:00:00.000 2015-01-07 00:00:00.000
3 2015-01-01 00:00:00.000 2015-01-01 00:00:00.000
3 2015-01-03 00:00:00.000 2015-01-03 00:00:00.000
3 2015-01-05 00:00:00.000 2015-01-06 00:00:00.000
3 2015-01-08 00:00:00.000 2015-01-10 00:00:00.000
I'd recommend you to examine the intermediate results of CTE_Seq to understand how it actually works. Just put
select * from CTE_Seq
instead of the final SELECT ... GROUP BY .... You'll get this result set:
ID SeqNo rn
1 0 -1
1 1 -1
1 3 0
1 4 0
1 5 0
2 0 -1
2 1 -1
2 2 -1
2 3 -1
2 5 0
2 6 0
3 0 -1
3 2 0
3 4 1
3 5 1
3 7 2
3 8 2
3 9 2
Each date is converted into a sequence number by DATEDIFF(day, '20150101', AddedOn). ROW_NUMBER() generates a set of sequential numbers without gaps, so when these numbers are subtracted from a sequence with gaps the difference jumps/changes. The difference stays the same until the next gap, so in the final SELECT GROUP BY ID, rn brings all rows from the same island together.

Here is a simple solution that does not use analytics. I tend not to use analytics because I work with many different DBMSs and many don't (yet) have them emplemented and even those who do have different syntaxes. I just have the habit of writing generic code whenever possible.
with
Data( ID, AddedOn )as(
select 1, convert( date, '20150101' ) union all
select 1, '20150102' union all
select 1, '20150104' union all
select 1, '20150105' union all
select 1, '20150106' union all
select 2, '20150101' union all
select 2, '20150102' union all
select 2, '20150103' union all
select 2, '20150104' union all
select 2, '20150106' union all
select 2, '20150107' union all
select 3, '20150101' union all
select 3, '20150103' union all
select 3, '20150105' union all
select 3, '20150106' union all
select 3, '20150108' union all
select 3, '20150109' union all
select 3, '20150110'
)
select d.ID, d.AddedOn StartDate, IsNull( d1.AddedOn, '99991231' ) EndDate
from Data d
left join Data d1
on d1.ID = d.ID
and d1.AddedOn =(
select Min( AddedOn )
from data
where ID = d.ID
and AddedOn > d.AddedOn );
In your situation I assume that ID and AddedOn form a composite PK and so are indexed. Thus, the query will run impressively fast even on very large tables.
Also, I used the outer join because it seemed like the last AddedOn date of each ID should be seen in the StartDate column. Instead of NULL I used a common MaxDate value. The NULL could work just as well as a "this is the latest StartDate row" flag.
Here is the output for ID=1:
ID StartDate EndDate
----------- ---------- ----------
1 2015-01-01 2015-01-02
1 2015-01-02 2015-01-04
1 2015-01-04 2015-01-05
1 2015-01-05 2015-01-06
1 2015-01-06 9999-12-31

I'd like to post my own solution too because it's yet another approach:
with data as
(
...
),
temp as
(
select d.id
,d.AddedOn
,dprev.AddedOn as PrevAddedOn
,dnext.AddedOn as NextAddedOn
FROM data d
left JOIN
data dprev on dprev.id = d.id
and dprev.AddedOn = dateadd(d, -1, d.AddedOn)
left JOIN
data dnext on dnext.id = d.id
and dnext.AddedOn = dateadd(d, 1, d.AddedOn)
),
starts AS
(
select id
,AddedOn
from temp
where PrevAddedOn is NULL
),
ends as
(
select id
,AddedOn
from temp
where NextAddedon is NULL
)
SELECT s.id as id
,s.AddedOn as StartDate
,(select min(e.AddedOn) from ends e where e.id = s.id and e.AddedOn >= s.AddedOn) as EndDate
from starts s

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Analytical function range window for max date interval - sql

Related

Extract the number of daily users from table

How to get min and max from 2 tables in SQL

Column with exact date and time of recording

Compare dates in different columns and rows dynamically in SQL

Determine contiguous date intervals

Categories

Resources