Datediff between multiple rows for certain ranges - sql

My DB Table has a data set with datetime values.
How can I return a result set, that returns the datediff between the smallest and the highest date only in case the datediff between two values are not larger than 5 minutes?
Date
2018-01-01 00:00:00
2018-01-01 00:01:00
2018-01-01 00:02:00
2018-01-01 00:03:00
2018-01-01 00:04:00
2018-01-01 00:13:00
2018-01-01 00:14:00
2018-01-01 00:15:00
2018-01-01 00:19:00
2018-01-01 00:54:00
2018-01-01 00:59:00
2018-01-01 01:00:00
Result set should look like this:
Ranges(min)
5
4
1
2
What would be an approach for that query?

You can put breaks in whenever there is a gap of more than 5 minutes. Then accumulate the number of breaks to define a group and aggregate:
select min(dte), max(dte), count(*) as cnt
from (select t.*,
sum(isbreak) over (order by dte) as grp
from (select t.*,
(case when lag(dte) over (order by dte) > dateadd(minute, -5, dte)
then 0 else 1
end) as isbreak
from t
) t
) t
group by grp;
For some reason (not clear to me right now), I thought your question involved SQL Server, so it uses that syntax. lag() is ANSI standard functionality and available in most databases; date arithmetic does vary among databases.

Related

Oracle SQL - Select users between two date by month

I am learning SQL and I was wondering how to select active users by month, depending on their starting and ending date (both timestamp(6)). My table looks like this:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
So, counting the active users by month, I should have an output like:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
So far, I do a manual operation by entering each month:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
Not very optimized and sustainable if I want to enter 10 years ao 120 months lol.
Any help is welcome. Thanks a lot!
This query shows the active-user-count effective as-of the end of the month.
How it works:
Convert each input row (with StartDate and EndDate value) into two rows that represent a point-in-time when the active-user-count incremented (on StartDate) and decremented (on EndDate). We need to convert NULL to a far-off date value because NULL values are sorted before instead of after non-NULL values:
This makes your data look like this:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
Then we simply SUM OVER the Change values (after sorting) to get the active-user-count as of that specific date:
So first, sort by OnThisDate:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
Then SUM OVER:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
Then we PARTITION (not group!) the rows by month and sort them by their date so we can identify the last ActiveCount row for that month (this actually happens in the WHERE of the outermost query, using ROW_NUMBER() and COUNT() for each month PARTITION):
OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
Then filter on that where IsLastInMonth = 1 (actually, where ROW_COUNT() = COUNT(*) inside each PARTITION) to give us the final output data:
At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
This does result in "gaps" in the result-set because the At-end-of-month column only shows rows where the Active-count value actually changed rather than including all possible calendar months - but that's ideal (as far as I'm concerned) because it excludes redundant data. Filling in the gaps can be done inside your application code by simply repeating output rows for each additional month until it reaches the next At-end-of-month value.
Here's the query using T-SQL on SQL Server (I don't have access to Oracle right now). And here's the SQLFiddle I used to come to a solution: http://sqlfiddle.com/#!18/ad68b7/24
SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
This query can be flattened into fewer nested queries by using aggregate and window functions directly instead of using aliases (like OtdYear, ActiveCount, etc) but that would make the query much harder to understand.
I have created the query which will give the result of all the months starting from the minimum start date in the table till maximum end date.
You can change it using adding one condition in WHERE clause.
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
Query to fetch desired result
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-- output --
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
Cheers!!

SQL - How can I count distinct IDs for each day within the last 7 days?

So, I'm trying to get the number of distinct users the registered sales on the last 7 days, for each day. Here's a sample of the table I have:
ID Date
1 2018-01-01
2 2018-01-02
3 2018-01-03
3 2018-01-04
2 2018-01-05
4 2018-01-06
5 2018-01-07
2 2018-01-08
Here's the outcome that I'd expect:
Distinct IDs Date
1 2018-01-01
2 2018-01-02
3 2018-01-03
3 2018-01-04
3 2018-01-05
4 2018-01-06
5 2018-01-07
4 2018-01-08
It's as if I as counting the distinct ID's in groups of the date and the 6 days before it. Any ideas?
Probably the simplest method is to use a correlated subquery:
select t.date,
(select count(distinct t2.id)
from t t2
where t2.date >= t.date - interval '6 day' and t2.date <= t.date
) as uniques_7day
from (select distinct date
from t
) t;
Note that you haven't specified the database, so this uses ANSI/ISO standard SQL syntax.

SQL Query to only pull one record for first day of month, for every month

I have a table that has one record per day. E.g. (this is just the date col of the table)
2018-07-08 03:00:00
2018-07-07 03:00:00
2018-07-06 03:00:00
2018-07-05 03:00:00
2018-07-04 03:00:00
2018-07-03 03:00:00
2018-07-02 03:00:00
2018-07-01 03:00:00
2018-06-30 03:00:00
2018-06-29 03:00:00
This data goes back a few years
I want to pull just the first day of month record, for all months in the table.
What is the SQL to do that?
(On SQL Server 2014)
I would use the day() function:
select t.*
from t
where day(t.MyDate) = 1;
Neither this nor datepart() are ANSI/ISO-standard, but there are other databases that support day(). The standard function is extract(day from t.MyDate).
If you want the first record in the table for each month -- but for some months, that might not be day 1 -- then you can use row_number(). One method is:
select top (1) with ties t.*
from t
order by row_number() over (partition by year(mydate), month(mydate) order by day(mydate) asc);
If all your time are zeroed all you do need is to get everything where DATEPART is first day.
select * from dbo.MyTable mt where DATEPART(day, mt.MyDate) = 1
It will work if you got one row per day. Off course you will need to use DISTINCT or an aggregation if you got more than one row per day.
You can use row_number() function :
select *
from (select *, row_number() over (partition by datepart(year, date), datepart(month, date) order by datepart(day, date)) seq
from table
) t
where seq = 1;
Perhaps you also need year in partition clause.
Though this has been answered, you can use date from parts in MS SQL as well.
create table #temp (dates date)
insert into #temp values ('2018-01-02'),('2018-01-05'), ('2018-01-09'), ('2018-01-10')
select * from #temp
dates
2018-01-02
2018-01-05
2018-01-09
2018-01-10
You can use this to get beginning of the month
select DATEFROMPARTS(year(dates), month(dates), 01) Beginningofmonth from #temp
group by DATEFROMPARTS(year(dates), month(dates), 01)
Output:
Beginningofmonth
2018-01-01

PostgreSQL group by with interval

Well, I have a seemingly simple set of data but it gives me a lot of trouble.
This is an example of what my data look like:
quantity price1 price2 date
100 1 0 2018-01-01 10:00:00
200 1 0 2018-01-02 10:00:00
50 5 0 2018-01-02 11:00:00
100 1 1 2018-01-03 10:00:00
100 1 1 2018-01-03 11:00:00
300 1 0 2018-01-03 12:00:00
I need to sum up "quantity" column grouped by "price1" and "price2" and it would be very easy but I need to take into account time changes of "price1" and "price2". Data is sorted by "date".
What I need is the last row to be not grouped with the first two although it has the same values for "price1" and "price2". Also I need to get minimal and maximal date of each interval.
The end result should looks like this:
quantity price1 price2 dateStart dateEnd
300 1 0 2018-01-01 10:00:00 2018-01-02 10:00:00
50 5 0 2018-01-02 11:00:00 2018-01-02 11:00:00
200 1 1 2018-01-03 10:00:00 2018-01-03 11:00:00
300 1 0 2018-01-03 12:00:00 2018-01-03 12:00:00
Any suggestions for a SQL query?
It is a gap and island problem. Use the following code:
select sum(quantity), price1, price2, min(date) dateStart, max(date) dateend
from
(
select *,
row_number() over (order by date) -
row_number() over (partition by price1, price2 order by date) grp
from data
) t
group by price1, price2, grp
order by dateStart
dbfiddle demo
The solution is based on an identification of consecutive sequences of price1 and price2, which is done by a creation of the grp column. Once you isolate the consecutive sequences then you do a simple group by using grp as well.
I changed a little bit the accepted answer to catch the cases when "date" column of two rows next to each other are exactly the same. I added second parameter so they will be ordered in correct order (my table has "oid" column)
select sum(quantity), price1, price2, min(date) dateStart, max(date) dateend
from
(
select *,
row_number() over (order by date, oid) -
row_number() over (partition by price1, price2 order by date, oid) grp
from data
) t
group by price1, price2, grp
order by dateStart

Find longest streak in sqlite

please help me with getting streaks data. I have table of goal achievements
Table test
dt
2017-01-01
2017-01-02
2017-01-03. //3 days end of streak
2017-02-10 // 1 day
2017-02-15
2017-02-16
2017-02-17
2017-02-18 //4 days
I tried this in MySQL
Select dt, (select count(*) from test as t1 where t1.dt < t2.dt and datediff(t2.dt,t1.dt) =1) as str
from test as t2
And got
Dt str
2017-01-01 0
2017-01-02 1
2017-01-03 2
2017-02-10 0
2017-02-15 0
2017-02-16 1
2017-02-17 2
2017-02-18 3
Is it possible to get something like this
Dt. Str
2017-01-03 3
2017-02-10 1
2017-02-18 4
And get Max of it?
You can subtract row number (i.e the number of rows <= current row's date) from the current row's date to classify consecutive rows with one day difference into the same group. Then it is just a grouping operation to calculate the count.
select max(dt) as dt, count(*) as streak
from (select t1.dt
,date(t1.dt,-(select count(*) from t t2 where t2.dt<=t1.dt)||' day') as grp
from t t1
) t
group by grp
Run the inner query to see how groups are assigned.