How to detect an interval between consecutive rows? - sql

Consider the following rows:
Id RecordedOn
1 9/3/19 11:15:00
2 9/3/19 11:15:01
3 9/3/19 11:15:02
4 9/3/19 11:18:55
5 9/3/19 11:18:01
As you can see, there are typically records every second, but from row 3 to row 4, there is a gap.
How do I find gaps like these? Preferably I'd like the starting and ending row of the gap, so 3, 4 in this case.

If you want both the before and after rows, use lag() and lead():
select t.*
from (select t.*,
lag(recordedon) over (order by recordedon) as prev_ro,
lead(recordedon) over (order by recordedon) as next_ro
from t
) t
where prev_ro < dateadd(second, -1, recordedon) or
next_ro > dateadd(second, 1, recordedon);

SQL DEMO
SELECT *, DATEDIFF(second, previous, [RecordedOn]) as diff
FROM (
SELECT [Id], [RecordedOn], LAG([RecordedOn]) OVER (ORDER BY [RecordedOn]) previous
FROM t
) t
OUTPUT
| Id | RecordedOn | previous | diff |
|----|----------------------|----------------------|--------|
| 1 | 2019-09-03T11:15:00Z | (null) | (null) |
| 2 | 2019-09-03T11:15:01Z | 2019-09-03T11:15:00Z | 1 |
| 3 | 2019-09-03T11:15:02Z | 2019-09-03T11:15:01Z | 1 |
| 5 | 2019-09-03T11:18:01Z | 2019-09-03T11:15:02Z | 179 |
| 4 | 2019-09-03T11:18:55Z | 2019-09-03T11:18:01Z | 54 |
You can also use LAG() to get previous id if need it.

You could self-join the table with a LEFT JOIN anti-pattern to exhibit records for which no record exist 1 second later, like:
SELECT t.id
FROM mytable t
LEFT JOIN mytable t1 ON t1.RecordedOn = DATEADD(second, 1, t.RecordedOn)
WHERE t1.id IS NULL
Demo on DB Fiddle:
| id |
| -: |
| 3 |
| 4 |
| 5 |

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

SQL SERVER How to select the latest record in each group? [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 2 years ago.
| ID | TimeStamp | Item |
|----|-----------|------|
| 1 | 0:00:20 | 0 |
| 1 | 0:00:40 | 1 |
| 1 | 0:01:00 | 1 |
| 2 | 0:01:20 | 1 |
| 2 | 0:01:40 | 0 |
| 2 | 0:02:00 | 1 |
| 3 | 0:02:20 | 1 |
| 3 | 0:02:40 | 1 |
| 3 | 0:03:00 | 0 |
I have this and I would like to turn it into
| ID | TimeStamp | Item |
|----|-----------|------|
| 1 | 0:01:00 | 1 |
| 2 | 0:02:00 | 1 |
| 3 | 0:03:00 | 0 |
Please advise, thank you!
A correlated subquery is often the fastest method:
select t.*
from t
where t.timestamp = (select max(t2.timestamp)
from t t2
where t2.id = t.id
);
For this, you want an index on (id, timestamp).
You can also use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by timestamp desc) as seqnum
from t
) t
where seqnum = 1;
This is typically a wee bit slower because it needs to assign the row number to every row, even those not being returned.
You need to group by id, and filter out through timestamp values descending in order to have all the records returning as first(with value 1) in the subquery with contribution of an analytic function :
SELECT *
FROM
(
SELECT *,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TimeStamp DESC) AS dr
FROM t
) t
WHERE t.dr = 1
where DENSE_RANK() analytic function is used in order to include records with ties also.

How do I select rows with maximum value?

Given this table I want to retrieve for each different url the row with the maximum count. For this table the output should be: 'dell.html' 3, 'lenovo.html' 4, 'toshiba.html' 5
+----------------+-------+
| url | count |
+----------------+-------+
| 'dell.html' | 1 |
| 'dell.html' | 2 |
| 'dell.html' | 3 |
| 'lenovo.html' | 1 |
| 'lenovo.html' | 2 |
| 'lenovo.html' | 3 |
| 'lenovo.html' | 4 |
| 'toshiba.html' | 1 |
| 'toshiba.html' | 2 |
| 'toshiba.html' | 3 |
| 'toshiba.html' | 4 |
| 'toshiba.html' | 5 |
+----------------+-------+
What SQL query do I need to write to do this?
Try to use this query:
select url, max(count) as count
from table_name
group by url;
use aggregate function
select max(count) ,url from table_name group by url
From your comments it seems you need corelated subquery
select t1.* from table_name t1
where t1.count = (select max(count) from table_name t2 where t2.url=t1.url
)
If row_number support on yours sqllite version
then you can write query like below
select * from
(
select *,row_number() over(partition by url order by count desc) rn
from table_name
) a where a.rn=1

Calculating consecutive range of dates with a value in Hive

I want to know if it is possible to calculate the consecutive ranges of a specific value for a group of Id's and return the calculated value(s) of each one.
Given the following data:
+----+----------+--------+
| ID | DATE_KEY | CREDIT |
+----+----------+--------+
| 1 | 8091 | 0.9 |
| 1 | 8092 | 20 |
| 1 | 8095 | 0.22 |
| 1 | 8096 | 0.23 |
| 1 | 8098 | 0.23 |
| 2 | 8095 | 12 |
| 2 | 8096 | 18 |
| 2 | 8097 | 3 |
| 2 | 8098 | 0.25 |
+----+----------+--------+
I want the following output:
+----+-------------------------------+
| ID | RANGE_DAYS_CREDIT_LESS_THAN_1 |
+----+-------------------------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 1 |
| 2 | 1 |
+----+-------------------------------+
In this case, the ranges are the consecutive days with credit less than 1. If there is a gap between date_key column, then the range won't have to take the next value, like in ID 1 between 8096 and 8098 date key.
Is it possible to do this with windowing functions in Hive?
Thanks in advance!
You can do this with a running sum classifying rows into groups, incrementing by 1 every time a credit<1 row is found(in the date_key order). Thereafter it is just a group by.
select id,count(*) as range_days_credit_lt_1
from (select t.*
,sum(case when credit<1 then 0 else 1 end) over(partition by id order by date_key) as grp
from tbl t
) t
where credit<1
group by id
The key is to collapse all the consecutive sequence and compute their length, I struggled to achieve this in a relatively clumsy way:
with t_test as
(
select num,row_number()over(order by num) as rn
from
(
select explode(array(1,3,4,5,6,9,10,15)) as num
)
)
select length(sign)+1 from
(
select explode(continue_sign) as sign
from
(
select split(concat_ws('',collect_list(if(d>1,'v',d))), 'v') as continue_sign
from
(
select t0.num-t1.num as d from t_test t0
join t_test t1 on t0.rn=t1.rn+1
)
)
)
Get the previous number b in the seq for each original a;
Check if a-b == 1, which shows if there is a "gap", marked as 'v';
Merge all a-b to a string, and then split using 'v', and compute length.
To get the ID column out, another string which encode id should be considered.

microsoft sql server - calculate return between every row and the last row

I have a table like the following:
+-------+--------------+
| Value | Date |
+-------+--------------+
| 14 | 10/11/2010 |
| 12 | 10/12/2010 |
| 12 | 10/13/2010 |
| 10 | 10/14/2010 |
| 8 | 10/15/2010 |
| 6 | 10/16/2010 |
| 4 | 10/17/2010 |
| 2 | 10/18/2010 |
+-------+--------------+
I would like to calculate the return (the quotient) between every row and the last row (which is with the latest date). e.g for the row with date "10/16/2010", the result should be 6/2=3
Hence, the resulting table should be
+-------+--------------+
| result| Date |
+-------+--------------+
| 7 | 10/11/2010 |
| 6 | 10/12/2010 |
| 6 | 10/13/2010 |
| 5 | 10/14/2010 |
| 4 | 10/15/2010 |
| 3 | 10/16/2010 |
| 2 | 10/17/2010 |
| 1 | 10/18/2010 |
+-------+--------------+
Is it possible to complete this? thanks you!
You can get the value you want to divide by. Since that's always going to be a single row, you can just use a cross join to join to that and perform your division. SQL Fiddle
with maxdate as
(select max([Date]) as maxdate from table1),
divby as
(select
value as divby
from
table1
inner join maxdate md
on md.maxdate = table1.[date])
select
value / divby
,[date]
from
table1
cross join divby
To break it down a bit, the first CTE (cleverly named maxdate) gets the maximum date for the whole thing. The second CTE (divby) get the value (that you will be dividing by) for that max date. As long as you only get one row back from that, you can safely use a cross join, resulting in each row in your table being divided by that one value.
Another possible solution JOIN the the table to itself.
SQL Fiddle Example
select (t1b.value / t1a.value) as result,
t1b.date from table1 t1a
join table1 t1b on t1a.date = (select max(date) from table1)
Thanks for the fiddle, Andrew! Can be accomplished like this as well if 2008 and above (fiddle: http://sqlfiddle.com/#!3/ecda1/11):
SELECT [Value] / MIN([Value]) OVER () AS result,
[Date]
FROM Table1