Counting blocks of continuous sequences in SQL - sql

Let´s suppose this situation:
CAR TIME
A 1300
A 1301
A 1302
A 1315
A 1316
A 1317
A 1319
A 1320
B 1321
B 1322
I´d like to generate another column, enumerating each trip did by each car.
We consider there´s a new trip every time we get a discontinuity on TIME.
CAR TIME TRIP
A 1300 1
A 1301 1
A 1302 1
A 1315 2
A 1316 2
A 1317 2
A 1319 3
A 1320 3
B 1321 1
B 1322 1
Is there some SQL function to obtain this count ?
Thanks in advance.

You seems want cumulative approach :
select t.*, dense_rank() over (partition by car order by grp1) as trp
from (select t.*, sum(case when grp > 1 then 1 else 0 end) over (partition by car order by time) as grp1
from (select t.*, coalesce((time - lag(time) over (partition by car order by time)), 1) as grp
from table t
) t
) t;

I would use row_number() . . . and - to define the groups. Then, dense_rank():
select t.*,
dense_rank() over (partition by car order by time - seqnum) as trip
from (select t.*, row_number() over (partition by car order by time) as seqnum
from t
) t;
I cannot readily think of any alternative that uses fewer than 2 window functions -- or that would likely be faster using joins and group bys.

Here is how I'd solve this problem:
with grp as (
select row_number() over (partition by CAR order by TIME) rn, a.CAR, a.TIME
from test a
where not exists (select * from test b
where a.CAR=b.CAR
and to_date(b.TIME, 'YYYYmmDDHH24MI')+1/(24*60) = to_date(a.TIME, 'YYYYmmDDHH24MI'))
)
select t.CAR, t.TIME, (
select max(rn) from grp where t.CAR=grp.CAR and grp.TIME <= t.TIME
) as trip
from test t
the main idea is to select start time for each trip (this is done in CTE grp), then use row number as trip identifier
Sample fiddle http://sqlfiddle.com/#!4/6a327/10

Another approach:
SELECT t.car, t.time, MIN(t3.time)
FROM test t, test t3
WHERE NOT EXISTS (SELECT 1
FROM test t2
WHERE t2.car = t.car
AND t2.time = t.time - 1)
AND t3.car = t.car
AND t3.time >= t.time
AND NOT EXISTS (SELECT 1
FROM test t4
WHERE t4.car = t3.car
AND t4.time = t3.time + 1)
GROUP BY t.car, t.time
ORDER BY 1, 2;
The first not-exists finds all the rows that don't have a row for the same car in the previous minute - that is to say, those rows who begin a period for a car.
The later not-exists gets a set of rows that do not have a following row for the same car - i.e. rows that end a period. The max function finds the least of these (that also are filtered to be greater or equal to the start of the period in question.

Combining some of the other ideas, including trips crossing an hour boundary but without converting to a date (in case that is significantly slowing things down), and allowing for repeated times in the same trip:
-- CTE for sample data
with your_table (car, time) as (
select 'A', 201808151259 from dual -- extra row to go across hour
union all select 'A', 201808151300 from dual
union all select 'A', 201808151301 from dual
union all select 'A', 201808151302 from dual
union all select 'A', 201808151315 from dual
union all select 'A', 201808151316 from dual
union all select 'A', 201808151317 from dual
union all select 'A', 201808151319 from dual
union all select 'A', 201808151319 from dual -- extra row for duplicate time
union all select 'A', 201808151320 from dual
union all select 'B', 201808151321 from dual
union all select 'B', 201808151322 from dual
)
-- actual query
select car,
time,
dense_rank() over (partition by car order by trip_start) as trip
from (
select car,
time,
max(case when lag_time = time
or lag_time = time - case when mod(time, 100) = 00 then 41 else 1 end
then null else time end
) over (partition by car order by time) as trip_start
from (
select car,
time,
lag(time) over (partition by car order by time) as lag_time
from your_table
)
)
order by car, time;
which gets
CAR TIME TRIP
--- ------------ ------------
A 201808151259 1
A 201808151300 1
A 201808151301 1
A 201808151302 1
A 201808151315 2
A 201808151316 2
A 201808151317 2
A 201808151319 3
A 201808151319 3
A 201808151320 3
B 201808151321 1
B 201808151322 1
The innermost query just gets the original data and the previous time value for each row using lag().
The next query out finds the trip start by treating duplicate and adjacent times - including over an hour boundary, via the nested case expression - as null, and then finding the highest value so far, which ignores the just-generated nulls by default. All contiguous runs of times end up with the same trip-start time:
select car,
time,
max(case when lag_time = time
or lag_time = time - case when mod(time, 100) = 00 then 41 else 1 end
then null else time end
) over (partition by car order by time) as trip_start
from (
select car,
time,
lag(time) over (partition by car order by time) as lag_time
from your_table
)
order by car, time;
CAR TIME TRIP_START
--- ------------ ------------
A 201808151259 201808151259
A 201808151300 201808151259
A 201808151301 201808151259
A 201808151302 201808151259
A 201808151315 201808151315
A 201808151316 201808151315
A 201808151317 201808151315
A 201808151319 201808151319
A 201808151319 201808151319
A 201808151320 201808151319
B 201808151321 201808151321
B 201808151322 201808151321
The outermost query then uses dense_rank() to give the trips consecutive numbering based on their trip-start times.

Related

SQL Query to find the Row with first change of data

UniqueId
ITEM
DATE
1
A
2022-01-01
2
A
2022-01-02
3
B
2022-01-03
4
B
2022-01-04
5
A
2022-01-05
6
A
2022-01-06
7
B
2022-01-07
8
B
2022-01-08
9
A
2022-01-09
10
A
2022-01-10
11
A
2022-01-11
I have above table where the item is changing from A to B and then B to A (etc).
The the most recent item in the table based on the date is A (the last row).
I need to find the date on which this last item (A) was started to be in effect.
So in this case the item A was in effect from 2022-01-09 onwards (UniqueId 9).
How can I find the UniqueId or the date of item A, where it got changed to be in effect (Row 9)?
Thank you.
with data as (
select *,
last_value(item) over (order by "date") as last_item,
lag(item) over (order by "date") as prev_item
from T
)
select
max(case when item = last_item and item <> prev_item then "date" end) as max_date
from data;
or
with data as (
select *,
case when item <> lag(item) over (order by "date")
and item = last_value(item) over (order by "date")
then 1 end as flag
from T
)
select max("date") as last_transition_date
from data
where flag = 1;
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=bd5f6398c0167d74c26a67fafac5225e
Supposing you need all the data:
with data as (
select *,
case when item <> lag(item) over (order by "date")
and item = last_value(item) over (order by "date")
then 1 end as flag
from T
)
select *,
max(case when flag = 1 then "date" end) over () as last_transition_date
from data;
Getting a flag using a comparison of current item with previous item in time, using LAG() is indeed the way.
But it's absolutely sufficient to get the highest date and highest unique (as both are sorted ascending together) where the obtained flag is 1:
WITH
-- your input
indata(UniqueId,ITEM,DATE) AS (
SELECT 1,'A',DATE '2022-01-01'
UNION ALL SELECT 2,'A',DATE '2022-01-02'
UNION ALL SELECT 3,'B',DATE '2022-01-03'
UNION ALL SELECT 4,'B',DATE '2022-01-04'
UNION ALL SELECT 5,'A',DATE '2022-01-05'
UNION ALL SELECT 6,'A',DATE '2022-01-06'
UNION ALL SELECT 7,'B',DATE '2022-01-07'
UNION ALL SELECT 8,'B',DATE '2022-01-08'
UNION ALL SELECT 9,'A',DATE '2022-01-09'
UNION ALL SELECT 10,'A',DATE '2022-01-10'
UNION ALL SELECT 11,'A',DATE '2022-01-11'
)
-- real query starts here; replace following comma with "WITH"
,
w_change_ind AS (
SELECT
*
, CASE WHEN LAG(item) OVER(ORDER BY date) <> item
THEN 1
ELSE 0
END AS chg_ind
FROM indata
)
SELECT
MAX(uniqueid) AS uqid
, MAX(date) AS dt
FROM w_change_ind
WHERE chg_ind=1
;
-- out uqid | dt
-- out ------+------------
-- out 9 | 2022-01-09
Based on your description, this is one way to do what you want.
select top 1 * from table1
where item ='A'
order by uniqueid desc
If this is not what you want, then you will have to provide additional information.

Oracle SQL - Return rows where there is at least one row in a group for the current month and there has been a change in class in a previous month

I am trying to output rows that meet the following conditions:
At least one row for the ClientID must be in the current month (only interested in the most recent row for the Client ID in that month)
The class in current month for the ClientID is different to the immediately previous row from an earlier month for the ClientID
My data can have multiple rows per client per month and I am only interested in the latest row per month per client.
Here is a sample of my data:
ID Client ID Class Date
14609 87415 C 04/DEC/18
13859 87415 X 16/AUG/18
11906 87415 C 27/FEB/17
10667 87415 B 23/JAN/17
14538 132595 D 03/DEC/18
14567 141805 C 04/DEC/18
14411 141805 A 27/NOV/18
Desired Output based on the above is:
ID Client ID Class Date
14609 87415 C 04/DEC/18
13859 87415 X 16/AUG/18
14567 141805 C 04/DEC/18
14411 141805 A 27/NOV/18
I have had multiple attempts at this with zero success. Any help would be greatly appreciated. My attempts have not been able to find the immediately previous row. :/
select * from
(
select drh.defaultriskhistid, drh.clientid, cv.description,
drh.updatetimestamp
from default_risk_history drh
inner join code_values cv on drh.defaultriskcodeid = cv.codevalueid
where
defaultriskhistid in
(select max(defaultriskhistid) from default_risk_history
group by clientid, ltrim(TO_CHAR(updatetimestamp,'mm-yyyy'),'0'))
) t
where
(
Select count(*) from default_risk_history drh1 where drh1.clientid =
t.clientid and ltrim(TO_CHAR(drh1.updatetimestamp,'mm-yyyy'),'0') =
ltrim(TO_CHAR(current_date,'mm-yyyy'),'0')
) >=1
order by clientid, updatetimestamp desc
You seem to want the two most recent rows, if they have different classes and the most recent one is in the current month. If so:
select t.*
from (select t.*,
max(date) over (partition by clientid) as max_date,
lag(class) over (partition by client_id order by date) as prev_class,
lead(class) over (partition by client_id order by date) as next_class,
row_number() over (partition by clientid order by date desc) as seqnum
from t
) t
where max_date >= trunc(sysdate, 'MON') and
( (seqnum = 1 and prev_class <> class) or
(seqnum = 2 and next_class <> class)
);
Here's one option:
SQL> alter session set nls_date_format = 'dd.mm.yyyy';
Session altered.
SQL> with test (client_id, class, datum) as
2 (select 87415, 'c', date '2018-12-04' from dual union all
3 select 87415, 'x', date '2018-08-16' from dual union all
4 select 87415, 'c', date '2017-02-27' from dual union all
5 select 87415, 'b', date '2017-01-23' from dual union all
6 --
7 select 132595, 'd', date '2018-12-03' from dual union all
8 select 141805, 'c', date '2018-12-04' from dual union all
9 select 141805, 'a', date '2018-11-27' from dual
10 ),
11 inter as
12 (select client_id,
13 class,
14 datum,
15 lag(class) over (partition by client_id order by datum desc) prev_class,
16 row_number() over (partition by client_id order by datum desc) rn
17 from test
18 )
19 select client_id, class, datum
20 from inter
21 where (class <> prev_class or prev_class is null)
22 and client_id in (select client_id from inter
23 group by client_id
24 having max(rn) >= 2
25 )
26 and rn <= 2
27 order by client_id, datum desc;
CLIENT_ID C DATUM
---------- - ----------
87415 c 04.12.2018
87415 x 16.08.2018
141805 c 04.12.2018
141805 a 27.11.2018
SQL>

Oracle SQL loop LEAD() through partition

I have a set that looks something like this
ID date_IN date_out
1 1/1/18 1/2/18
1 1/3/18 1/4/18
1 1/5/18 1/8/18
2 1/1/18 1/5/18
2 1/7/18 1/9/18
I began by
SELECT ID, date_IN, Date_out, lead(date_out) over ( partition by (ID)
order by ID) as next_out
From table
And get something like this...
ID date_IN date_out next_out
1 1/1/18 1/2/18 1/4/18
1 1/3/18 1/4/18 1/8/18
1 1/5/18 1/8/18 Null
2 1/1/18 1/5/18 1/9/18
2 1/7/18 1/9/18 Null
The problem I’m going to to have is that in my actual data many of the ID’s have A LOT of entries. The goal is to have all of the date_out’s appear on one row per ID....
ID date_IN date_out next_out next_out1 etc. etc.
1 1/1/18 1/2/18 1/4/18 1/8/18 X X
2 1/1/18 1/5/18 1/7/18 X Null Null
Is there a way to loop the lead() through the entire partition, order by ID drop everything but the first row then move on to the next ID?
Here is one approach, which assumes that you only expect to have a maximum of three date pairs per ID. You may assign a row number and then aggregate by ID:
WITH cte AS (
SELECT ID, date_IN, date_out,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date_IN) rn
FROM yourTable
)
SELECT
ID,
MAX(CASE WHEN rn = 1 THEN date_IN END) AS date_IN,
MAX(CASE WHEN rn = 1 THEN date_out END) AS date_out,
MAX(CASE WHEN rn = 2 THEN next_IN END) AS next_in_1,
MAX(CASE WHEN rn = 2 THEN date_out END) AS next_out_2,
MAX(CASE WHEN rn = 3 THEN date_IN END) AS next_in_2,
MAX(CASE WHEN rn = 3 THEN date_out END) AS next_out_2
FROM cte
GROUP BY ID
No need to do a loop but use the offset option. Below is lifted from the documentation.
offset
Optional. It is the physical offset from the current row in the table.
If this parameter is omitted, the default is 1.
example; lead(date_out) means next value
lead(date_out, 2) means 2nd row after current row
lead(date_out, 3) 3rd row after current row and so on.
in your code; use below snippet;
lead(date_out) over ( partition by (ID) order by ID) as next_out,
lead(date_out, 2) over ( partition by (ID) order by ID) as next_out2,
lead(date_out, 3) over ( partition by (ID) order by ID) as next_out3
WITH TAB AS(
SELECT 1 ID, CAST('2018/01/01' AS DATE) DATE_IN, CAST('2018/01/02' AS DATE) DATE_OUT FROM DUAL
UNION
SELECT 1, CAST('2018/01/03' AS DATE) , CAST('2018/01/04' AS DATE) FROM DUAL
UNION
SELECT 1, CAST('2018/01/05' AS DATE) , CAST('2018/01/08' AS DATE) FROM DUAL
UNION
SELECT 1, CAST('2018/01/09' AS DATE) , CAST('2018/01/10' AS DATE) FROM DUAL
UNION
SELECT 1, CAST('2018/01/11' AS DATE) , CAST('2018/01/12' AS DATE) FROM DUAL
UNION
SELECT 2, CAST('2018/01/01' AS DATE) , CAST('2018/01/05' AS DATE) FROM DUAL
UNION
SELECT 2, CAST('2018/01/07' AS DATE) , CAST('2018/01/09' AS DATE) FROM DUAL
) --select * from tab;
, LEAF_CALC AS( --CONNECTING THE DATE_OUTS
SELECT
ID
,SYS_CONNECT_BY_PATH(DATE_OUT, '$') HRCHY
, LEVEL LVL
, CONNECT_BY_ISLEAF ISLEAF
FROM TAB
CONNECT BY PRIOR DATE_OUT < DATE_IN
START WITH ID = 1
) --SELECT * FROM LEAF_CALC;
, DATA_SORT AS( --ADDING ALL DATE_OUTS IN 1 ROW
SELECT
P.ID, P.HRCHY
FROM LEAF_CALC P,
(SELECT ID, MAX(LVL) MAXLVL FROM
LEAF_CALC
GROUP BY ID) C
WHERE P.ID = C.ID
AND P.LVL = C.MAXLVL
)--SELECT * FROM DATA_SORT
--SEGREGATING ALL DATES USING REGEXP_SUBSTR
SELECT
ID
, REGEXP_SUBSTR(HRCHY, '[^$]+', 1, 1) DATE_IN
, REGEXP_SUBSTR(HRCHY, '[^$]+', 1, 2) NEXT_OUT
, REGEXP_SUBSTR(HRCHY, '[^$]+', 1, 3) NEXT_OUT2
, COALESCE(REGEXP_SUBSTR(HRCHY, '[^$]+', 1, 4), 'NA') NEXT_OUT3
, COALESCE(REGEXP_SUBSTR(HRCHY, '[^$]+', 1, 5), 'NA') NEXT_OUT4
FROM DATA_SORT;

SQL Connect clause - generate all data by dates

The data in by table is stored by effective date. Can you please help me with an ORACLE SQL statement, that replicates the 8/1 data onto 8/2, 8/3,8/4 and repeat the 8/5 value after?
DATE VALUE1 VALUE2
8/1/2017 x 1
8/1/2017 x 2
8/7/2017 y 4
8/7/2017 x 3
Desired output :
DATE VALUE1 VALUE2
8/1/2017 x 1
8/1/2017 x 2
8/2/2017 x 1
8/2/2017 x 2
... repeat to 8/6
8/7/2017 y 4
8/7/2017 x 3
8/8/2017 y 4
8/8/2017 x 3
... repeat to sysdate - 1
Here is one way to do this. It's not the most elegant or efficient, but it is the most elementary way I could think of (short of really inefficient things like correlated subqueries which can't be unwound easily to joins).
In the first subquery, aliases as a, I create all the needed dates. In the second subquery, b, I create the date ranges, for which we will need to repeat specific rows (in the test data, I allow the number of rows which must be repeated to be variable, to make one of the subtleties of the problem more evident).
With these in hand, it's easy to get the result by joining these two subqueries and the original data. Alas, this approach requires reading the base table three times; hopefully you don't have too much data to process.
with
inputs ( dt, val1, val2 ) as (
select date '2017-08-14', 'x', 1 from dual union all
select date '2017-08-14', 'x', 2 from dual union all
select date '2017-08-17', 'y', 4 from dual union all
select date '2017-08-17', 'x', 3 from dual union all
select date '2017-08-19', 'a', 5 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- Use your actual table and column names in the SQL query below.
select a.dt, i.val1, i.val2
from (
select min_dt + level - 1 as dt
from ( select min(dt) as min_dt from inputs )
connect by level <= sysdate - min_dt
) a
join
(
select dt, lead(dt, 1, sysdate) over (order by dt) as lead_dt
from (select distinct dt from inputs)
) b
on a.dt >= b.dt and a.dt < b.lead_dt
join
inputs i on i.dt = b.dt
order by dt, val1, val2
;
Output:
DT VAL1 VAL2
---------- ---- ----
2017-08-14 x 1
2017-08-14 x 2
2017-08-15 x 1
2017-08-15 x 2
2017-08-16 x 1
2017-08-16 x 2
2017-08-17 x 3
2017-08-17 y 4
2017-08-18 x 3
2017-08-18 y 4
2017-08-19 a 5
2017-08-20 a 5
You want to make use of the LAST_VALUE analytic function, something like this:
select
fakedate,
CASE
WHEN flip=1 THEN
LAST_VALUE(yourvalue1rown1 IGNORE NULLS) OVER(ORDER BY fakedate)
ELSE
LAST_VALUE(yourvalue1rown2 IGNORE NULLS) OVER(ORDER BY fakedate)
END as lastvalue1,
CASE
WHEN flip=1 THEN
LAST_VALUE(yourvalue2rown1 IGNORE NULLS) OVER(ORDER BY fakedate)
ELSE
LAST_VALUE(yourvalue2rown2 IGNORE NULLS) OVER(ORDER BY fakedate)
END as lastvalue2
from
select
fakedate, flip,
CASE WHEN rown = 1 THEN yourvalue1 END as yourvalue1rown1,
CASE WHEN rown = 2 THEN yourvalue1 END as yourvalue1rown2,
CASE WHEN rown = 1 THEN yourvalue2 END as yourvalue2rown1,
CASE WHEN rown = 2 THEN yourvalue2 END as yourvalue2rown2
from
(select (sysdate - 100) + trunc(rownum/2) fakedate, mod(rownum, 2)+1 as flip from dual connect by level <= 100) fakedates
left outer join
(select yt.*, row_number() over(partition by yourdate order by yourvalue1) as rown) yourtable
on
fakedate = yourdate and flip = rown
You'll have to adjust the column names to match your table. You'll also have to adjust the 100 to reflect how many days back you need to go to get to the start of your date data.
Please note this is untested (SQLFiddle is having some oracle issues for me at the momnt) so if you get any syntax errors or other minor things you cant fix, comment and I'll address them

SQL Server : group by sum of column

I need to aggregate data by one column which contains numeric data.
I have data like:
ID | Amount
---+-------
1 | 44
2 | 15
3 | 16
4 | 8
5 | 16
Result, which I expect is:
ID | Amount
---+-------
1 | 44
2 | 31
4 | 24
Query should group data ordered by ID column by Amount column in parts of max sum of amount 32. If amount is greater then 32 then it should be presented as one 'group'. Result should contain Min(ID) and SUM(Amount) which can't be greater than 32 when group more than one record.
The only way that I know how to accomplish this is using iteration (although in your case if you have enough single values over 32, then you might be able to use a more efficient approach).
Iteration in SQL Server queries is handled by recursive CTEs (once you forswear cursors):
with v as (
select *
from (values (1, 44), (2, 15), (3, 16), (4, 8), (5, 16) ) v(id, amount)
),
t as (
select v.*, row_number() over (order by id) as seqnum
),
cte as (
select seqnum, id, amount, id as grp
from t
where seqnum = 1
union all
select t.seqnum, t.id,
(case when t.amount + cte.amount > 32 then t.amount else t.amount + cte.amount end) as amount,
(case when t.amount + cte.amount > 32 then t.id else cte.grp end) as grp
from cte join
t
on cte.seqnum = t.seqnum + 1
)
select grp, max(amount)
from cte
group by grp;
I should note that the use of max(amount) in the outer query assumes that the values are never negative. A slight modification can handle that situation.
Also, the intermediate result using t is not strictly necessary for the data you have provided. It ensures that the columns used in the join actually have no gaps.
You can try this version with rownumbers assigned initially and each row is joined to the previous one in a recursive cte. And if the running sum > 32 a new group starts.
with rownums as (select t.*,row_number() over(order by id) as rnum from t)
,cte(rnum,id,amount,runningsum,grp) as (select rnum,id,amount,amount,1 from rownums where rnum=1
union all
select t.rnum,t.id,t.amount
,case when c.runningsum+t.amount > 32 then t.amount else c.runningsum+t.amount end
,case when c.runningsum+t.amount > 32 then t.id else c.grp end
from cte c
join rownums t on t.rnum=c.rnum+1
)
select grp as id,max(runningsum) as amount
from cte
group by grp
Sample Demo