How to subtract two consecutive rows in MS SQL Server? - sql

A table looks like:
id | location | datetime
------| ---------| --------
CD123 | loc001 | 2010-10-21 13:30:15
ZY123 | loc001 | 2010-10-21 13:40:15
YU333 | loc001 | 2010-10-21 13:41:00
AB456 | loc002 | 2011-1-21 14:30:30
FG121 | loc002 | 2011-1-21 14:31:00
BN010 | loc002 | 2011-1-21 14:32:00
Assume the table has been sorted by ascending datetime. I am trying to find the elapse (in seconds) between two consecutive rows within a location.
The result table is supposed to be:
| location | elapse
| loc001 | 600
| loc001 | 45
| loc002 | 30
| loc002 | 60
Since the id is randomly generated, it is difficult to write something like a.id = b.id + 1 in a query. And only rows within the same location is consecutively subtracted, not across different locations.
How should I write a query in MS SQL Server to accomplish it?

In SQL Server 2012 and later you can use LEAD or LAG
SELECT
location,
SUM(DATEDIFF(SECOND, DateTime,
Lead(DateTime, 1) OVER(PARTITION BY location ORDER BY DateTime))) Elepase
FROM
tableName
GROUP BY
location

with Result as
(Select *, ROW_NUMBER() Over (order by location,datetime) RowID from table_name )
Select R1.location,DATEDIFF(SECOND,R2.datetime,R1.datetime) from Result R1 Inner join Result R2 on (R1.RowID=R2.RowID+1 and R1.location=r2.location)

You have two options:
Add a new Row number column and then self join this on the ID e.g. [NEW ID] = [NEW ID] - 1. You can then do the subtraction i.e. Table1.[New ID] - Table2.[New ID]
Use the LAG function which is a shortcut for the above method. As long as you are using SQL2012+

You can try this way:
select s.location,
s.datetime,
datediff(ss, s.datetime, s.prev_datetime)
from (
select location,
datetime,
lead(datetime) over (partition by location order by datetime ) prev_datetime
from Table1
) s
where s.prev_datetime is not null
order by s.location,
s.datetime desc

Create cte and use lead to get datetime and next_datetime at same row.
Then calculate with datediff using this cte
WITH cte
AS
(
SELECT location
, datetime
, lead(datetime,1) OVER (patition BY location ORDER BY datetime asc) next_datetime
from tbl)
SELECT location
, datediff(ss,next_datetime,datetime) Elepase
FROM cte

Related

SQL procedure to show how many hours has worker worked

+-----------+-------------------------------+-------+
| Worker ID | Time(MM/DD/YYYY Hour:Min:Sec) | InOut |
+-----------+-------------------------------+-------+
| 1 | 12/04/2017 10:00:00 | In |
| 2 | 12/04/2017 10:00:00 | In |
| 2 | 12/04/2017 18:40:02 | Out |
| 3 | 12/04/2017 10:00:00 | In |
| 1 | 12/04/2017 12:01:00 | Out |
| 3 | 12/04/2017 19:40:05 | Out |
+-----------+-------------------------------+-------+
Hi! I have problem with my project and I thought some of you would help me. I have table like that. It is simple table that indicates worker getting in and out of company. I need to do procedure which would take ID and number of day as In parameters and it would show how many hours and minutes that worker has worked that day. Thanks for help.
Yeah, I had to do a number of queries like this at my old job. Here's the approach I used, and it worked out pretty well:
For each "Out" record, get the MAX(TIME) on "In" records with a time earlier than the OUT record
Does that make sense? You're basically joining the table against itself, looking for the record that represents the "clock in" time for any particular "clock out" time.
So here's the backbone:
select
*
, (
SELECT MAX(tim) from #tempTable subQ
where subQ.id = main.id
and subQ.tim <= main.tim
and subQ.InOut = 'In'
) as correspondingInTime
from #tempTable main
where InOut = 'Out'
... from here, you can get the data you need. Either by manipulating the query above, or using it as a subquery itself (which is my favored way of doing it) - something like:
select id as workerID, sum(DATEDIFF(s, correspondingInTime, tim)) as totalSecondsWorked
from
(
select
*
, (
SELECT MAX(tim) from #tempTable subQ
where subQ.id = main.id
and subQ.tim <= main.tim
and subQ.InOut = 'In'
) correspondingInTime
from #tempTable main
where InOut = 'Out'
) mainQuery
group by id
EDIT: Remove the 'as' before correspondingInTime, because oracle doesn't allow 'as' in table aliasing.
Maybe something similar to
select sum( time1 - prev_time1 ) from (
select InOut, time1,
prev(time1) over (partition by worker_id order by time1) prev_time1,
prev(InOut) over (partition by worker_id order by time1) prev_inOut
from MyTABLE
where TimeColumn between trunc(:date1) and trunc( :date1 + 1 )
and workerId = :workerId
) t1
where InOut = 'Out' and prev_InOut = 'In'
would go.
:workerId and :date1 are variables to constrain to one date and one worker as required.
I'm fairly certain Oracle allows you to use CROSS APPLY these days.
SELECT [Worker ID], yt.Time - ca.Time
FROM YourTable yt
CROSS APPLY (SELECT MAX(Time) AS Time
FROM YourTable
WHERE [Worker ID] = yt.[Worker ID] AND Time < yt.Time AND InOut = 'In') ca
WHERE yt.InOut = 'Out'

SQL grouping by datetime with a maximum difference of x minutes

I have a problem with grouping my dataset in MS SQL Server.
My table looks like
# | CustomerID | SalesDate | Turnover
---| ---------- | ------------------- | ---------
1 | 1 | 2016-08-09 12:15:00 | 22.50
2 | 1 | 2016-08-09 12:17:00 | 10.00
3 | 1 | 2016-08-09 12:58:00 | 12.00
4 | 1 | 2016-08-09 13:01:00 | 55.00
5 | 1 | 2016-08-09 23:59:00 | 10.00
6 | 1 | 2016-08-10 00:02:00 | 5.00
Now I want to group the rows where the SalesDate difference to the next row is of a maximum of 5 minutes.
So that row 1 & 2, 3 & 4 and 5 & 6 are each one group.
My approach was getting the minutes with the DATEPART() function and divide the result by 5:
(DATEPART(MINUTE, SalesDate) / 5)
For row 1 and 2 the result would be 3 and grouping here would work perfectly.
But for the other rows where there is a change in the hour or even in the day part of the SalesDate, the result cannot be used for grouping.
So this is where I'm stuck. I would really appreciate, if someone could point me in the right direction.
You want to group adjacent transactions based on the timing between them. The idea is to assign some sort of grouping identifier, and then use that for aggregation.
Here is an approach:
Identify group starts using lag() and date arithmetic.
Do a cumulative sum of the group starts to identify each group.
Aggregate
The query looks like this:
select customerid, min(salesdate), max(saledate), sum(turnover)
from (select t.*,
sum(case when salesdate > dateadd(minute, 5, prev_salesdate)
then 1 else 0
end) over (partition by customerid order by salesdate) as grp
from (select t.*,
lag(salesdate) over (partition by customerid order by salesdate) as prev_salesdate
from t
) t
) t
group by customerid, grp;
EDIT
Thanks to #JoeFarrell for pointing out I have answered the wrong question. The OP is looking for dynamic time differences between rows, but this approach creates fixed boundaries.
Original Answer
You could create a time table. This is a table that contains one record for each second of the day. Your table would have a second column that you can use to perform group bys on.
CREATE TABLE [Time]
(
TimeId TIME(0) PRIMARY KEY,
TimeGroup TIME
)
;
-- You could use a loop here instead.
INSERT INTO [Time]
(
TimeId,
TimeGroup
)
VALUES
('00:00:00', '00:00:00'), -- First group starts here.
('00:00:01', '00:00:00'),
('00:00:02', '00:00:00'),
('00:00:03', '00:00:00'),
...
('00:04:59', '00:00:00'),
('00:05:00', '00:05:00'), -- Second group starts here.
('00:05:01', '00:05:00')
;
The approach works best when:
You need to reuse your custom grouping in several different queries.
You have two or more custom groups you often use.
Once populated you can simply join to the table and output the desired result.
/* Using the time table.
*/
SELECT
t.TimeGroup,
SUM(Turnover) AS SumOfTurnover
FROM
Sales AS s
INNER JOIN [Time] AS t ON t.TimeId = CAST(s.SalesDate AS Time(0))
GROUP BY
t.TimeGroup
;

Aggregating multiple rows more than once

I've got a set of data which has an type column, and a created_at time column. I've already got a query which is pulling the relevant data from the database, and this is the data that is returned.
type | created_at | row_num
-----------------------------------------------------
"ordersPage" | "2015-07-21 11:32:40.568+12" | 1
"getQuote" | "2015-07-21 15:49:47.072+12" | 2
"completeBrief" | "2015-07-23 01:00:15.341+12" | 3
"sendBrief" | "2015-07-24 08:59:42.41+12" | 4
"sendQuote" | "2015-07-24 18:43:15.967+12" | 5
"acceptQuote" | "2015-08-03 04:40:20.573+12" | 6
The row number is returned from the standard row number function in postgres
ROW_NUMBER() OVER (ORDER BY created_at ASC) AS row_num
What I want to do is somehow aggregate this data so get a time distance between every event, so the output data might look something like this
type_1 | type_2 | time_distance
--------------------------------------------------------
"ordersPage" | "getQuote" | 123423.3423
"getQuote" | "completeBrief" | 123423.3423
"completeBrief" | "sendBrief" | 123423.3423
"sendBrief" | "sendQuote" | 123423.3423
"sendQuote" | "acceptQuote" | 123423.3423
The time distance would be a float in milliseconds, in other queries I've been using something like this to get time differences.
EXTRACT(EPOCH FROM (MAX(events.created_at) - MIN(events.created_at)))
But this time i need it for every pair of events in the sequential order of the row_num so I need the aggregate for (1,2), (2,3), (3,4)...
Any ideas if this is possible? Also doesn't have to be exact, I can deal with duplicates, and with type_1 and type_2 columns returning an existing row in a different order. I just need a way to at least get those values above.
What about a self join ? It would look like this :
SELECT
t1.type
, t2.type
, ABS(t1.created_at - t2.created_at) AS time_diff
FROM your_table t1
INNER JOIN your_table t2
ON t1.row_num = t2.row_num + 1
You can use the LAG window function to compare the current value with the previous:
with
t(type,created_at) as (
values
('ordersPage', '2015-07-21 11:32:40.568+12'::timestamptz),
('getQuote', '2015-07-21 15:49:47.072+12'),
('completeBrief', '2015-07-23 01:00:15.341+12'),
('sendBrief', '2015-07-24 08:59:42.41+12'),
('sendQuote', '2015-07-24 18:43:15.967+12'),
('acceptQuote', '2015-08-03 04:40:20.573+12'))
select *, EXTRACT(EPOCH FROM created_at - lag(created_at) over (order by created_at))
from t
order by created_at
select type_1,
type_2,
created_at_2-created_at_1 as time_distance
from
(select
type type_1,
lead(type,1) over (order by row_num) type_2,
created_at created_at_1,
lead(created_at,1) over (order by row_num) created_at_2
from table_name) temp
where type_2 is not null

Count and name content from a SQL Server table

I have a table which is structured like this:
+-----+-------------+-------------------------+
| id | name | timestamp |
+-----+-------------+-------------------------+
| 1 | someName | 2016-04-20 09:41:41.213 |
| 2 | someName | 2016-04-20 09:42:41.213 |
| 3 | anotherName | 2016-04-20 09:43:41.213 |
| ... | ... | ... |
+-----+-------------+-------------------------+
Now, I am trying to create a query, which selects all timestamps since time x and count the amount of times the same name occurs in the result.
As an example, if we would apply this query to the table above, with 2016-04-20 09:40:41.213 as the date from which on it should be counted, the result should look like this:
+-------------+-------+
| name | count |
+-------------+-------+
| someName | 2 |
| anotherName | 1 |
+-------------+-------+
What I have accomplished so far is the following query, which gives me the the names, but not their count:
WITH screenshots AS
(
SELECT * FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
)
SELECT s.name
FROM SavedScreenshotsLog s
INNER JOIN screenshots sc ON sc.name = s.name AND sc.timestamp = s.timestamp
ORDER BY s.name
I have browsed through stackoverflow but was not able to find a solution which fits my needs and as I am not very experienced with SQL, I am out of ideas.
You mention one table in your question, and then show a query with two tables. That makes it hard to follow the question.
What you are asking for is a simple aggregation:
SELECT name, COUNT(*)
FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
GROUP BY name
ORDER BY COUNT(*) DESC;
EDIT:
If you want "0" values, you can use conditional aggregation:
SELECT name,
SUM(CASE WHEN timestamp > '2016-04-20 09:40:241.213' THEN 1 ELSE 0 END) as cnt
FROM SavedScreenshotsLog
GROUP BY name
ORDER BY cnt DESC;
Note that this will run slower because there is no filter on the dates prior to aggregation.
CREATE TABLE #TEST (name varchar(100), dt datetime)
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('anotherName','2016-04-20 09:43:41.213')
declare #YourDatetime datetime = '2016-04-20 09:41:41.213'
SELECT name, count(dt)
FROM #TEST
WHERE dt >= #YourDatetime
GROUP BY name
I've posted the answer, because using the above query can generate errors in converting the string in where clause into a datetime, it depends on the format of the datetime.

SQL - Select unique rows from a group of results

I have wrecked my brain on this problem for quite some time. I've also reviewed other questions but was unsuccessful.
The problem I have is, I have a list of results/table that has multiple rows with columns
| REGISTRATION | ID | DATE | UNITTYPE
| 005DTHGP | 172 | 2007-09-11 | MBio
| 005DTHGP | 1966 | 2006-09-12 | Tracker
| 013DTHGP | 2281 | 2006-11-01 | Tracker
| 013DTHGP | 2712 | 2008-05-30 | MBio
| 017DTNGP | 2404 | 2006-10-20 | Tracker
| 017DTNGP | 508 | 2007-11-10 | MBio
I am trying to select rows with unique REGISTRATIONS and where the DATE is max (the latest). The IDs are not proportional to the DATE, meaning the ID could be a low value yet the DATE is higher than the other matching row and vise-versa. Therefore I can't use MAX() on both the DATE and ID and grouping just doesn't seem to work.
The results I want are as follows;
| REGISTRATION | ID | DATE | UNITTYPE
| 005DTHGP | 172 | 2007-09-11 | MBio
| 013DTHGP | 2712 | 2008-05-30 | MBio
| 017DTNGP | 508 | 2007-11-10 | MBio
PLEASE HELP!!!?!?!?!?!?!?
You want embedded queries, which not all SQLs support. In t-sql you'd have something like
select r.registration, r.recent, t.id, t.unittype
from (
select registration, max([date]) recent
from #tmp
group by
registration
) r
left outer join
#tmp t
on r.recent = t.[date]
and r.registration = t.registration
TSQL:
declare #R table
(
Registration varchar(16),
ID int,
Date datetime,
UnitType varchar(16)
)
insert into #R values ('A','1','20090824','A')
insert into #R values ('A','2','20090825','B')
select R.Registration,R.ID,R.UnitType,R.Date from #R R
inner join
(select Registration,Max(Date) as Date from #R group by Registration) M
on R.Registration = M.Registration and R.Date = M.Date
This can be inefficient if you have thousands of rows in your table depending upon how the query is executed (i.e. if it is a rowscan and then a select per row).
In PostgreSQL, and assuming your data is indexed so that a sort isn't needed (or there are so few rows you don't mind a sort):
select distinct on (registration), * from whatever order by registration,"date" desc;
Taking each row in registration and descending date order, you will get the latest date for each registration first. DISTINCT throws away the duplicate registrations that follow.
select registration,ID,date,unittype
from your_table
where (registration, date) IN (select registration,max(date)
from your_table
group by registration)
This should work in MySQL:
SELECT registration, id, date, unittype FROM
(SELECT registration AS temp_reg, MAX(date) as temp_date
FROM table_name GROUP BY registration) AS temp_table
WHERE registration=temp_reg and date=temp_date
The idea is to use a subquery in a FROM clause which throws up a single row containing the correct date and registration (the fields subjected to a group); then use the correct date and registration in a WHERE clause to fetch the other fields of the same row.