SQL: Select minimum date difference for each group - sql

Any hints to get the minimum difference between start time and end time per guid with the following data in Microsoft SQL 2014:
id| start time | guid | end time
1 | 2015-04-05 12:00 | a | 2015-04-05 12:30
2 | 2015-04-05 12:10 | a | 2015-04-05 12:15
3 | 2015-04-05 12:20 | a | 2015-04-05 12:30
4 | 2015-04-05 12:30 | b | 2015-04-05 12:35
5 | 2015-04-05 12:40 | b | 2015-04-05 12:55
6 | 2015-04-05 12:50 | c | 2015-04-05 12:55
7 | 2015-04-05 13:00 | c | 2015-04-05 13:25
the output I am looking for is:
id | start time | guid | end time
2 | 2015-04-05 12:10 | a | 2015-04-05 12:15
4 | 2015-04-05 12:30 | b | 2015-04-05 12:35
6 | 2015-04-05 12:50 | c | 2015-04-05 12:55
I have tried grouping by guid and using the DateDiff function, but it didn't work.

try with below query
;with CTE as(
select id, sttime,guid,endtime
row_number() over (partition by guid order by datediff(ss,endtime,sttime))
from tablename
) select * from CTE where rowid =1

This answer looks a bit like Indra's answer, however there is a significant difference. Not using datediff, which will fail if any dates are more than approximate 168 years(or 2147483647 seconds) apart. Also fixed some issues.
;WITH CTE as
(
SELECT
id, start_time, guid, end_time,
row_number() over (partition by guid order by end_time - start_time) rn
FROM
table
)
SELECT
id, start_time, guid, end_time
FROM CTE
WHERE rn = 1

WITH CTE
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY GUID ORDER BY DATEDIFF(SS,STARTTIME,ENDTIME) ASC) AS RN
FROM YOURTABLE
)
SELECT * FROM CTE WHERE RN=1

Use NOT EXIST to return a row if no other row with same guid has less datediff:
select id, start_time, guid, end_time
from tablename t1
where not exists
(select 1 from tablename t2
where t2.guid = t1.guid
and datediff(t2.end_time - t2.start_time) < datediff(t1.end_time - t1.start_time))
Note, I don't know SQL Server, so you'll have to adjust the datediff code above.

Related

SQL: Sum timestamp intervals of the same day

I'm setting up a new SQL query to summarize records from a table of employee attendance. These records are downloaded from a fingerprint or RFID sensor and recorded on the same table. I want to get the amount of hours on the workplace.
Everything works fine if the employee comes in and leaves the workplace once at day. The device generates 2 records on the table and that's are not a problem. It's easy identify entrance and exit time.
But I don't realize how to solve if the person comes in, take a break (leaves workplace) and after that he comes in again until the exit time.
Assume that they are always even records (arrival and exit timestamps) on each interval. Also, an employee never checks in one day and leaves on the next day.
I have the following query. Remember: this only gets the minimum timestamp (arrival time) and the max timestamp (leave time).
SELECT Userid, Name, Date, Entrance, Exit, Hours FROM
(SELECT Userid AS user,
CONVERT(VARCHAR, CONVERT(TIME, min(Checktime))) AS Entrance,
CONVERT(VARCHAR, CONVERT(TIME, max(Checktime))) AS Exit,
CONVERT(VARCHAR, CONVERT(TIME, max(Checktime)-min(CheckTime))) AS Hours,
CONVERT(VARCHAR, CONVERT(DATE, CheckTime)) AS Fecha,
COUNT(*) AS Regs,
SUM(edited) AS edited FROM attendance
WHERE CONVERT(DATE, CheckTime) < CONVERT(DATE, GETDATE())
GROUP BY Userid, CONVERT(DATE, CheckTime)) AS Hs
INNER JOIN Userinfo
ON Userinfo.Userid = Hs.user
ORDER BY Date DESC, Name ASC;
For example, if the table has the following records:
id | Logid | Userid | CheckTime | edited
1 | 10 | 1 | 2019-06-18 8:00:00 | 0
2 | 11 | 1 | 2019-06-18 12:00:00 | 0
3 | 12 | 1 | 2019-06-18 15:00:00 | 0
4 | 13 | 1 | 2019-06-18 17:00:00 | 0
5 | 14 | 2 | 2019-06-18 8:00:00 | 0
6 | 15 | 2 | 2019-06-18 17:00:00 | 0
What I get:
Userid | Name | Date | Entrance | Exit | Hours | edited
1 | Gandalf | 2019-06-18 | 8:00:00 | 17:00:00 | 9:00:00 | 0
2 | Frodo | 2019-06-18 | 8:00:00 | 17:00:00 | 9:00:00 | 0
What I need:
Userid | Name | Date | Entrance | Exit | Hours | edited
1 | Gandalf | 2019-06-18 | 8:00:00 | 17:00:00 | 6:00:00 | 0
2 | Frodo | 2019-06-18 | 8:00:00 | 17:00:00 | 9:00:00 | 0
The total time was calculated from (12:00:00 - 8:00:00) + (17:00:00 - 15:00:00).
The columns "Entrance" and "Exit" on this case are not necessary at all.
Have you an idea how I can solve this? Thank you very much!
This assume you have pair enter/exit and handle multiple breaks.
SQL DEMO
with cte as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY [Userid], cast ([CheckTime] as Date)
ORDER BY [CheckTime]) as rn
FROM Table1 t1
)
SELECT c1.[Userid],
cast (c1.[CheckTime] as Date) as the_day,
SUM (DATEDIFF (hh, c1.[CheckTime], c2.[CheckTime])) as total_hours
FROM cte c1
JOIN cte c2
ON c1.rn = c2.rn -1
AND c1.[Userid] = c2.[Userid]
AND c1.rn % 2 = 1
GROUP BY c1.[Userid],
cast (c1.[CheckTime] as Date) ;
OUTPUT
| Userid | the_day | total_hours |
|--------|------------|-------------|
| 1 | 2019-06-18 | 6 |
| 2 | 2019-06-18 | 9 |
NOTE:
General syntax for DATEDIFF:
DATEDIFF(datepart, start_date, end_date)
Just realize the function DATEDIFF is used to calculate the time interval between two date values and return it as an integer.
So if you have 08:00 and 09:30 using hh as datepart you still get 1h. Maybe is better use mi and divide by 60
Perfect! Juan Carlos's solution works great!
I'm posting this because I've edited some of his code to match the original post requirements.
The code is exactly the same. Only I've changed/added a few lines
with cte as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY [Userid], cast ([CheckTime] as Date)
ORDER BY [CheckTime]) as rn
FROM Table1 t1
WHERE CAST(CheckTime AS DATE) = '2019-06-17' -- Filter by specific date
)
SELECT c1.[Userid],
cast (c1.[CheckTime] as Date) as the_day,
-- Return time as HH:MM
CONVERT(VARCHAR, SUM (DATEDIFF (SECOND , c1.[CheckTime], c2.[CheckTime]))/3600) + ':' + right('00' + CONVERT(VARCHAR, CONVERT(FLOAT, (SUM (DATEDIFF (SECOND , c1.[CheckTime], c2.[CheckTime]))/60) - ((SUM (DATEDIFF (SECOND , c1.[CheckTime], c2.[CheckTime]))/3600)*60))),2) as total_time
FROM cte c1
JOIN cte c2
ON c1.rn = c2.rn -1
AND c1.[Userid] = c2.[Userid]
AND c1.rn % 2 = 1
GROUP BY c1.[Userid],
cast (c1.[CheckTime] as Date);
This query returns:
| Userid | the_day | total_time |
|--------|------------|-------------|
| 1 | 2019-06-18 | 6:00 |
| 2 | 2019-06-18 | 9:00 |

creating complete historical timeline from overlapping intervals

I have below table which contain a code, from, to and hour. The problem is that i have overlapping dates in the intervals. Instead of it i want to create a complete historical timeline. So whe the code is identical and there is a overlap it should sum the hours like in the desired result.
** table **
+------+-------+--------------------------------------+
| code | from | to | hours |
+------+-------+--------------------------------------+
| 1 | 2013-05-01 | 2013-09-30 | 37 |
| 1 | 2013-05-01 | 2014-02-28 | 10 |
| 1 | 2013-10-01 | 9999-12-31 | 5 |
+------+-------+--------------------------------------+
desired result:
+------+-------+--------------------------------------+
| code | from | to | hours |
+------+-------+--------------------------------------+
| 1 | 2013-05-01 | 2013-09-30 | 47 |
| 1 | 2013-10-01 | 2014-02-28 | 15 |
| 1 | 2014-02-29 | 9999-12-31 | 5 |
+------+-------+--------------------------------------+
Oracle Setup:
CREATE TABLE Table1 ( code, "FROM", "TO", hours ) AS
SELECT 1, DATE '2013-05-01', DATE '2013-09-30', 37 FROM DUAL UNION ALL
SELECT 1, DATE '2013-05-01', DATE '2014-02-28', 10 FROM DUAL UNION ALL
SELECT 1, DATE '2013-10-01', DATE '9999-12-31', 5 FROM DUAL;
Query:
SELECT *
FROM (
SELECT code,
dt AS "FROM",
LEAD( dt ) OVER ( PARTITION BY code ORDER BY dt ASC, value DESC, ROWNUM ) AS "TO",
hours
FROM (
SELECT code,
dt,
SUM( hours * value ) OVER ( PARTITION BY code ORDER BY dt ASC, VALUE DESC ) AS hours,
value
FROM table1
UNPIVOT ( dt FOR value IN ( "FROM" AS 1, "TO" AS -1 ) )
)
)
WHERE "FROM" + 1 < "TO";
Results:
CODE FROM TO HOURS
---- ---------- ---------- -----
1 2013-05-01 2013-09-30 47
1 2013-10-01 2014-02-28 15
1 2014-02-28 9999-12-31 5

Split rows on different days if summing hours value to given day exceeds midnight

I have a structure like this
+-----+-----+------------+----------+------+----------------------+---+
| Row | id | date | time | hour | description | |
+-----+-----+------------+----------+------+----------------------+---+
| 1 | foo | 2018-03-02 | 19:00:00 | 8 | across single day | |
| 2 | bar | 2018-03-02 | 23:00:00 | 1 | end at midnight | |
| 3 | qux | 2018-03-02 | 10:00:00 | 3 | inside single day | |
| 4 | quz | 2018-03-02 | 23:15:00 | 2 | with minutes | |
+-----+-----+------------+----------+------+----------------------+---+
(I added the description column only to understand the context, for analysis purpose is useless)
Here is the statement to generate table
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
UNION ALL
SELECT "quz", CURRENT_dATE(), TIME(23,15,0), 2
)
SELECT * FROM table
Adding the hour value to the given time, I need to split the row on multiple ones, if the sum goes on the next day.
Jumps on multiple days are NOT to be considered, like +27 hours (this should simplify the scenario)
My initial idea was starting from adding the hours value in a date field, in order to obtain start and end limits of the interval
SELECT
id,
DATETIME(date, time) AS date_start,
DATETIME_ADD(DATETIME(date, time), INTERVAL hour HOUR) AS date_end
FROM table
here is the result
+-----+-----+---------------------+---------------------+---+
| Row | id | date_start | date_end | |
+-----+-----+---------------------+---------------------+---+
| 1 | foo | 2018-03-02T19:00:00 | 2018-03-03T03:00:00 | |
| 2 | bar | 2018-03-02T23:00:00 | 2018-03-03T00:00:00 | |
| 3 | qux | 2018-03-02T10:00:00 | 2018-03-02T13:00:00 | |
| 4 | quz | 2018-03-02T23:15:00 | 2018-03-03T01:15:00 | |
+-----+-----+---------------------+---------------------+---+
but now I'm stuck on how to proceed considering the existing interval.
Starting from this table, the rows should be splitted if the day change, like
+-----+-----+------------+-------------+----------+-------+--+
| Row | id | date | hourt_start | hour_end | hours | |
+-----+-----+------------+-------------+----------+-------+--+
| 1 | foo | 2018-03-02 | 19:00:00 | 00:00:00 | 5 | |
| 2 | foo | 2018-03-03 | 00:00:00 | 03:00:00 | 3 | |
| 3 | bar | 2018-03-02 | 23:00:00 | 00:00:00 | 1 | |
| 4 | qux | 2018-03-02 | 10:00:00 | 13:00:00 | 3 | |
| 5 | quz | 2018-03-02 | 23:15:00 | 00:00:00 | 0.75 | |
| 6 | quz | 2018-03-03 | 00:00:00 | 01:15:00 | 1.25 | |
+-----+-----+------------+-------------+----------+-------+--+
I tried to study a similar scenario from an already analyzed scenario, but I was unable to adapt it for handling the day component as well.
My whole final scenario will include both this approach and the other one analyzed in the other question (split on single days and then split on given breaks of hours), but I can approach these 2 themes separately, first query split with day (this question) and then split on time breaks (other question)
Interesting problem ... I tried the following:
Create a second table creating all the new rows starting at midnight
UNION ALL it with source table while correcting hours of old rows accordingly
Commented Result:
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
)
,table2 AS (
SELECT
id,
-- create datetime, add hours, then cast as date again
CAST( datetime_add( datetime(date, time), INTERVAL hour HOUR) AS date) date,
time(0,0,0) AS time -- losing minutes and seconds
-- substract hours to midnight
,hour - (24-EXTRACT(HOUR FROM time)) hour
FROM
table
WHERE
date != CAST( datetime_add( datetime(date,time), INTERVAL hour HOUR) AS date) )
SELECT
id
,date
,time
-- correct hour if midnight split
,IF(EXTRACT(hour from time)+hour > 24,24-EXTRACT(hour from time),hour) hour
FROM
table
UNION ALL
SELECT
*
FROM
table2
Hope, it makes sense.
Of course, if you need to consider jumps over multiple days, the correction fails :)
Here a possibile solution I came up starting from #Martin Weitzmann approach.
I used 2 different ways:
ids where there is a "jump" on the day
ids which are in the same day
and a final UNION ALL of the two data
I forgot to mention the first time that the hours value of the input value can be float (portion of hours) so I added that too.
#standardSQL
WITH
input AS (
-- change of day
SELECT "bap" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time, 8.0 AS hour UNION ALL
-- end at midnight
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1.0 UNION ALL
-- inside single day
SELECT "foo", CURRENT_dATE(), TIME(10,0,0), 3.0 UNION ALL
-- change of day with minutes and float hours
SELECT "qux", CURRENT_dATE(), TIME(23,15,0), 2.5 UNION ALL
-- start from midnight
SELECT "quz",CURRENT_dATE(), TIME(0,0,0), 4.5
),
-- Calculate end_date and end_time summing hours value
table AS (
SELECT
id,
date AS start_date,
time AS start_time,
EXTRACT(DATE FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_date,
EXTRACT(TIME FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_time
FROM input
),
-- portion that start from start_time and end at midnight
start_to_midnight AS (
SELECT
id,
start_time,
start_date,
TIME(23,59,59) as end_time,
start_date as end_date
FROM
table
WHERE end_date > start_date
),
-- portion that start from midnightand end at end_time
midnight_to_end AS (
SELECT
id,
TIME(0,0,0) as start_time,
end_date as start_date,
end_time,
end_date
FROM
table
WHERE
end_date > start_date
-- Avoid rows that starts from 0:0:0 and ends to 0:0:0 (original row ends at 0:0:0)
AND end_time != TIME(0,0,0)
)
-- Union of the 3 tables
SELECT
id,
start_date,
start_time,
end_time
FROM (
SELECT id, start_time, end_time, start_date FROM table WHERE start_date = end_date
UNION ALL
SELECT id, start_time, end_time, start_date FROM start_to_midnight
UNION ALL
SELECT id, start_time, end_time, start_date FROM midnight_to_end
)
ORDER BY id,start_date,start_time
Here is the provided output
+-----+-----+------------+------------+----------+---+
| Row | id | start_date | start_time | end_time | |
+-----+-----+------------+------------+----------+---+
| 1 | bap | 2018-03-03 | 19:00:00 | 23:59:59 | |
| 2 | bap | 2018-03-04 | 00:00:00 | 03:00:00 | |
| 3 | bar | 2018-03-03 | 23:00:00 | 23:59:59 | |
| 4 | foo | 2018-03-03 | 10:00:00 | 13:00:00 | |
| 5 | qux | 2018-03-03 | 23:15:00 | 23:59:59 | |
| 6 | qux | 2018-03-04 | 00:00:00 | 01:45:00 | |
| 7 | quz | 2018-03-03 | 00:00:00 | 04:30:00 | |
+-----+-----+------------+------------+----------+---+

Cumulative open subscriptions with start_date and end_date on Redshift

I am trying to write a query that will allow to me to count the number of active subscriptions by day in Redshift.
I have the following table:
sub_id | start_date | end_date
---------------------------------------
20001 | 2017-09-01 | NULL
20002 | 2017-08-01 | 2017-08-29
20003 | 2016-01-01 | 2017-04-25
20004 | 2016-07-01 | 2017-09-03
I would like to be able to state, for each date between two dates how many subscriptions are active, such that:
date | active_subs
------------------------
2016-06-30 | 1
2016-07-01 | 2
... |
2017-04-24 | 2
2017-04-25 | 1
... |
2017-07-31 | 1
2017-08-01 | 2
... |
2017-08-28 | 2
2017-08-29 | 1
2017-08-30 | 1
2017-08-31 | 1
2017-09-01 | 2
2017-09-02 | 2
2017-09-03 | 1
I have a reference table from which a query can draw 1 row per day with the table name of date and the relevant column being date.ref_date (in the YYYY-MM-DD format)
Do i write this query using window functions or is there a better way?
Thanks
If I understood you correctly, you don't need nor window functions, joins(except to the date table) or cumulative count. You can do this:
SELECT t.date,
COUNT(s.sub_id) as active_subs
FROM dateTable t
LEFT JOIN YourTable s
ON(t.dateCol between s.start_date
AND COALESCE(s.end_date,<Put A late date here>))
GROUP BY t.date
I would do this as:
with cte as (
select start_date as dte, 1 as inc
from t
union all
select coalesce(end_date, current_date), -1 as inc
from t
)
select dte,
sum(sum(inc)) over (order by dte)
from cte
group by dte
order by dte;
There may be off-by-one errors, depending on whether you count stops on the date given or on the next day.

TSQL query help structuring results

I have a table with the following columns:
timestamp | value | desc
example of the data:
2014-01-27 10:00:00.000 | 100 | 101
2014-01-27 10:00:00.000 | 105 | 101
2014-01-27 11:00:00.000 | 160 | 101
2014-01-27 12:00:00.000 | 200 | 101
...
...
2014-01-28 10:00:00.000 | 226 | 101
2014-01-28 10:00:00.000 | 325 | 101
2014-01-28 11:00:00.000 | 145 | 101
what I would like to obtain is a grouping by the hour part but without merging the period interval.
So that the result will be like this (in the select I will pass a date interval and a condition on the description like desc = '101':
Structure:
hour | count
Data:
10 | 2 (referring to the 20140127)
11 | 1 (referring to the 20140127)
12 | 1 (referring to the 20140127)
...
...
10 | 2 (referring to the 20140128)
11 | 1 (referring to the 20140128)
I thought about using a cursor but I was wondering if it is possible to achieve this result without it.
I'm using SQL server 2012 SP1.
Thanks for your attention.
Bye,
F.
Try this:-
SELECT Count(*) AS [Count],
Datepart(hour, timestamp) AS [Hour]
FROM yourtable
GROUP BY CONVERT(DATE, timestamp),
Datepart(hour, timestamp)
ORDER BY CONVERT(DATE, timestamp)
You may use this. This should work
SELECT DATEPART(hh,timestamp), COUNT(*)
FROM tablename
GROUP BY
DATEPART(hh,timestamp),
DATETIMEFROMPARTS (YEAR(timestamp),MONTH(timestamp),DAY(timestamp),0,0,0,0,0),
desc HAVING desc ='yourvalue'