Vertica Analytic function to count instances in a window - sql

Let's say I have a dataset with two columns: ID and timestamp. My goal is to count return IDs that have at least n timestamps in any 30 day window.
Here is an example:
ID Timestamp
1 '2019-01-01'
2 '2019-02-01'
3 '2019-03-01'
1 '2019-01-02'
1 '2019-01-04'
1 '2019-01-17'
So, let's say I want to return a list of IDs that have 3 timestamps in any 30 day window.
Given above, my resultset would just be ID = 1. I'm thinking some kind of windowing function would accomplish this, but I'm not positive.
Any chance you could help me write a query that accomplishes this?

A relatively simple way to do this involves lag()/lead():
select t.*
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;
The lag() looks at the third timestamp in a series. The where checks if this is within 30 days of the original one. The result is rows where this occurs.
If you just want the ids, then:
select distinct id
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;

Related

SQL - Find the two closest date after a specific date

Dear Stack Overflow community,
I am looking for the patient id where the two consecutive dates after the very first one are less than 7 days.
So differences between 2nd and 1st date <= 7 days
and differences between 3rd and 2nd date <= 7 days
Example:
ID Date
1 9/8/2014
1 9/9/2014
1 9/10/2014
2 5/31/2014
2 7/20/2014
2 9/8/2014
For patient 1, the two dates following it are less than 7 days apart.
For patient 2 however, the following date are more than 7 days apart (50 days).
I am trying to write an SQL query that just output the patient id "1".
Thanks for your help :)
You want to use lead(), but this is complicated because you want this only for the first three rows. I think I would go for:
select t.*
from (select t.*,
lead(date, 1) over (partition by id order by date) as next_date,
lead(date, 2) over (partition by id order by date) as next_date_2,
row_number() over (partition by id order by date) as seqnum
from t
) t
where seqnum = 1 and
next_date <= date + interval '7' day and
next_date2 <= next_date + interval '7' day;
You can try using window function lag()
select * from
(
select id,date,lag(date) over(order by date) as prevdate
from tablename
)A where datediff(day,date,prevdate)<=7

How to take only one entry from a table based on an offset to a date column value

I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.
If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;

How to query database for rows from next 5 days

How can I make a query in SQL Server to query for all rows for the next 5 days.
The problem is that it has to be days with records, so the next 5 days, might become something like, Today, Tomorrow, some day in next month, etc...
Basically I want to query the database for the records for the next non empty X days.
The table has a column called Date, which is what I want to filter.
Why not split the search into 2 queries. First one searches for the date part, the second uses that result to search for records IN the dates returned by the first query.
#Anagha is close, just a little modification and it is OK.
SELECT *
FROM TABLE
WHERE DATE IN (
SELECT DISTINCT TOP 5 DATE
FROM TABLE
WHERE DATE >= referenceDate
ORDER BY DATE
)
You can use following SQL query where 5 different dates are fetched at first then all rows for those selected dates are displayed
declare #n int = 5;
select *
from myData
where
datecol in (
SELECT distinct top (#n) cast(datecol as date) as datecol
FROM myData
WHERE datecol >= '20180101'
ORDER BY datecol
)
Try this:
select date from table where date in (select distinct top 5 date
from table where date >= getdate() order by date)
If your values are dates, you can use `dense_rank():
select t.*
from (select t.*, dense_rank() over (order by datecol) as seqnum
from t
where datecol >= cast(getdate() as date)
) t
where seqnum <= 5;
If the column has a time component and you still want to define days by midnight-to-midnight (as suggested by the question), just convert to date:
select t.*
from (select t.*,
dense_rank() over (order by cast(datetimecol as date)) as seqnum
from t
where datetimecol >= cast(getdate() as date)
) t
where seqnum <= 5;

How to select most dense 1 min in Oracle

I have table with time stamp column tmstmp, this table contains log of certain events. I need to find out the max number events which occurred within 1 min interval.
Please read carefully! I do NOT want to extract the time stamps minute fraction and sum like this:
select count(*), TO_CHAR(tmstmp,'MI')
from log_table
group by TO_CHAR(tmstmp,'MI')
order by TO_CHAR(tmstmp,'MI');
It needs to take 1st record and then look ahead until it selects all records within 1 min from the 1st and sum number of records, then take 2nd and do the same etc..
And as the result there must be a recordset of (sum, starting timestamp).
Anyone has a snippet of code somewhere and care to share please?
Analytic function with a logical window can provide this information directly:
select l.tmstmp,
count(*) over (order by tmstmp range between current row and interval '59.999999' second following) cnt
from log_table l
order by 1
;
TMSTMP CNT
--------------------------- ----------
01.01.16 00:00:00,000000000 4
01.01.16 00:00:10,000000000 4
01.01.16 00:00:15,000000000 3
01.01.16 00:00:20,000000000 2
01.01.16 00:01:00,000000000 3
01.01.16 00:01:40,000000000 2
01.01.16 00:01:50,000000000 1
Please adjust the interval length for your precision. It must be the highest possible value below 1 minute.
To get the maximal minute use the subquery (and don't forget you may receive more that one record - with the MAX count):
with tst as (
select l.tmstmp,
count(*) over (order by tmstmp range between current row and interval '59.999999' second following) cnt
from log_table l)
select * from tst where cnt = (select max(cnt) from tst);
TMSTMP CNT
--------------------------- ----------
01.01.16 00:00:00,000000000 4
01.01.16 00:00:10,000000000 4
I think you can achieve your goal using a subquery in SELECT statement, as follow:
SELECT tmstmp, (
SELECT COUNT(*)
FROM log_table t2
WHERE t2.tmstmp >= t.tmstmp AND t2.tmstmp < t.tmstmp + 1 / (24*60)
) AS events
FROM log_table t;
One method uses a join and aggregation:
select t.*
from (select l.tmstmp, count(*)
from log_table l join
log_table l2
on l2.tmstmp >= l.tmstmp and
l2.tmstmp < l.tmstmp + interval '1' minute
group by l.tmpstmp
order by count(*) desc
) t
where rownum = 1;
Note: This assumes that tmstmp is unique on each row. If this is not true, then the subquery should be aggregating by some column that is unique.
EDIT:
For large data, there is a more efficient way that makes use of cumulative sums:
select tmstamp - interval 1 minute as starttm, tmstamp as endtm, cumulative
from (select tmstamp, sum(inc) over (order by tmstamp) as cumulative
from (select tmstamp, 1 as inc from log_table union all
select tmstamp + interval '1' day, -1 as inc from log_table
) t
order by sum(inc) over (order by tmstamp) desc
) t
where rownum = 1;

Counting an already counted column in SQL (db2)

I'm pretty new to SQL and have this problem:
I have a filled table with a date column and other not interesting columns.
date | name | name2
2015-03-20 | peter | pan
2015-03-20 | john | wick
2015-03-18 | harry | potter
What im doing right now is counting everything for a date
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
what i want to do now is counting the resulting lines and only returning them if there are less then 10 resulting lines.
What i tried so far is surrounding the whole query with a temp table and the counting everything which gives me the number of resulting lines (yeah)
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select count(*)
from temp_count
What is still missing the check if the number is smaller then 10.
I was searching in this Forum and came across some "having" structs to use, but that forced me to use a "group by", which i can't.
I was thinking about something like this :
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
having count(*) < 10
maybe im too tired to think of an easy solution, but i can't solve this so far
Edit: A picture for clarification since my english is horrible
http://imgur.com/1O6zwoh
I want to see the 2 columned results ONLY IF there are less then 10 rows overall
I think you just need to move your having clause to the inner query so that it is paired with the GROUP BY:
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
having count(*) < 10
)
select *
from temp_count
If what you want is to know whether the total # of records (after grouping), are returned, then you could do this:
with temp_count (date, counter) as
(
select date, counter=count(*)
from testtable
where date >= current date - 10 days
group by date
)
select date, counter
from (
select date, counter, rseq=row_number() over (order by date)
from temp_count
) x
group by date, counter
having max(rseq) >= 10
This will return 0 rows if there are less than 10 total, and will deliver ALL the results if there are 10 or more (you can just get the first 10 rows if needed with this also).
In your temp_count table, you can filter results with the WHERE clause:
with temp_count (date, counter) as
(
select date, count(distinct date)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
where counter < 10
Something like:
with t(dt, rn, cnt) as (
select dt, row_number() over (order by dt) as rn
, count(1) as cnt
from testtable
where dt >= current date - 10 days
group by dt
)
select dt, cnt
from t where 10 >= (select max(rn) from t);
will do what you want (I think)