How to group consecutive rows together in SQL by multiple columns - sql

I have rows in a query that return something like:
Date User Time Location Service Count
1/1/2018 Nick 12:00 Location A X 1
1/1/2018 Nick 12:01 Location A Y 1
1/1/2018 John 12:02 Location B Z 1
1/1/2018 Harry 12:03 Location A X 1
1/1/2018 Harry 12:04 Location A X 1
1/1/2018 Harry 12:05 Location B Y 1
1/1/2018 Harry 12:06 Location B X 1
1/1/2018 Nick 12:07 Location A X 1
1/1/2018 Nick 12:08 Location A Y 1
where the query returns locations visited by a user and a count of picks done from the location. results are sorted by user and time ascending. I need to group it to where CONSECUTIVE rows with same User and Location are grouped with a SUM of Count column and comma separated list of unique values in Service Column, final result returns something like this:
Date User Start Time End Time Location Service Count
1/1/2018 Nick 12:00 12:01 Location A X,Y 2
1/1/2018 John 12:02 12:02 Location B Z 1
1/1/2018 Harry 12:03 12:04 Location A X 2
1/1/2018 Harry 12:05 12:06 Location B X,Y 2
1/1/2018 Nick 12:07 12:08 Location A X,Y 2
I'm not sure where to start. Maybe lag or partition clauses? hoping an SQL guru can help here...

This is a gaps and islands problem. One method for solving it uses row_number():
select Date, User, min(Time) as start_time, max(time) as end_time,
Location,
listagg(Service, ',') within group (order by service),
count(*) as cnt
from (select t.*,
row_number() over (date order by time) as seqnum,
row_number() over (partition by user, date, location order by time) as seqnum_2
from t
) t
group by Date, User, Location, (seqnum - seqnum_2);
It is a bit tricky to explain how this works. My suggestion is to run the subquery and you will see how the difference of row numbers defines the groups that you are looking for.

Use lag to get user and location values of previous row. Then use a running sum to generate a new group whenever the user and location change. Finally aggregate on the classified groups,user,location and date.
select Date, User, min(Time) as start_time,max(time) as end_time, Location,
listagg(Service, ',') within group (order by Service),
count(*) as cnt
from (select Date, User, Time, Location,
sum(case when prev_location=location and prev_user=user then 0 else 1 end) over(order by date,time) as grp
from (select Date, User, Time, Location,
lag(Location) over(order by date,time) as prev_location,
lag(User) over(order by date,time) as prev_user,
from t
) t
) t
group by Date, User, Location, grp;

Related

Add a counter based consecutive dates

I have an employee table with employee name and the dates when the employee was on leave. My task is to identify employees who have takes 3 or 5 consecutive days of leave. I tried to add a row_number but it wouldn't restart correct based on the consecutive dates. The desired counter I am after is shown below. Any suggestions please?
Employee Leave Date Desired Counter
John 25-Jan-20 1
John 26-Jan-20 2
John 27-Jan-20 3
John 28-Jan-20 4
John 15-Mar-20 1
John 16-Mar-20 2
Mary 12-Feb-20 1
Mary 13-Feb-20 2
Mary 20-Apr-20 1
Desired output (same as in text)
This is a gaps and island problem: islands represents consecutive days of leaves, and you want to enumerate the rows of each island.
Here is an approach that uses the date difference against a monotonically increasing counter to build the groups:
select t.*,
row_number() over(
partition by employee, dateadd(day, -rn, leave_date)
order by leave_date
) counter
from (
select t.*,
row_number() over(partition by employee order by leave_date) rn
from mytable t
) t
order by employee, leave_date
Demo on DB Fiddle

Grouping sets of data in Oracle SQL

I have been trying to separate groups in data being stored on my oracle database for more accurate analysis.
Current Output
Time Location
10:00 A111
11:00 A112
12:00 S111
13:00 S234
17:00 A234
18:00 S747
19:00 A878
Desired Output
Time Location Group Number
10:00 A111 1
11:00 A112 1
12:00 S111 1
13:00 S234 1
17:00 A234 2
18:00 S747 2
19:00 A878 3
I have been trying to use over and partition by to assign the values, however I can only get into to increment all the time not only on a change. Also tried using lag but I struggled to make use of that.
I only need the value in the second column to start from 1 and increment when the first letter of field 1 changes (using substr).
This is my attempt using row_number but I am far off I think. There would be a time column in the output as well not shown above.
select event_time, st_location, Row_Number() over(partition by
SUBSTR(location,1,1) order
by event_time)
as groupnumber from pic
Any help would be really appreciated!
Edit:
Time Location Group Number
10:00 A-10112 1
11:00 A-10421 1
12:00 ST-10621 1
13:00 ST-23412 1
17:00 A-19112 2
18:00 ST-74712 2
19:00 A-87812 3
It is a gap and island problem. Use the following code:
select location,
dense_rank() over (partition by SUBSTR(location,1,1) order by grp)
from
(
select (row_number() over (order by time)) -
(row_number() over (partition by SUBSTR(location,1,1) order by time)) grp,
location,
time
from data
) t
order by time
dbfiddle demo
The main idea is in the subquery which isolates consecutive sequences of items (computation of grp column). The rest is simple once you have the grp column.
select DENSE_RANK() over(partition by SUBSTR("location",1,1) ORDER BY SUBSTR("location",1,2))
as Rownumber,
"location" from Table1;
Demo
http://sqlfiddle.com/#!4/21120/16

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

Getting a row with two group by constraints

I have a table
TIMESTAMP ID Name
5/30/2016 11:45 1 Ben
5/30/2016 11:45 2 Ben
5/30/2016 23:15 2 Ben
5/30/2016 7:30 1 Peter
5/30/2016 6:05 1 Peter
5/30/2016 14:40 2 May
5/30/2016 1:05 1 May
Now, I need to get the MIN timestamp for each distinct Name.
Then if there are more than one MIN entry, choose the one with the MAX ID.
So the result should be
TIMESTAMP ID Name
5/30/2016 11:45 2 Ben
5/30/2016 6:05 1 Peter
5/30/2016 1:05 1 May
I tried using the query below:
SELECT MIN(TIMESTAMP),NAME FROM TBLSAMPLE WHERE TIMESTAMP BETWEEN TO_DATE('5/30/2016', 'MM/DD/YYYY' ) AND TO_DATE('5/30/2016', 'MM/DD/YYYY' ) + 1
GROUP BY NAME
and I could get the minimum time. But once I add in MAX(ID) the result return an entry that does not match any of the rows.
Your help are really appreciated.
You can do this with row_number():
select t.*
from (select t.*,
row_number() over (partition by name order by timestamp asc, id desc) as seqnum
from tblsample t
) t
where seqnum = 1;
Your question doesn't specify a condition on the dates. But if you want to add a where clause, then add it to the subquery.

Compare two rows in SQL Server and return only one row

I have a table (trips) that has response data with columns:
TripDate
Job
Address
DispatchDateTime
OnSceneDateTime
Vehicle
Often two vehicles will respond to the same address on the same date, and I need to find the one that was there first.
I've tried this:
SELECT
TripDate,
Job,
Vehicle,
DispatchDateTime
(SELECT min(OnSceneDateTime)
FROM Trips AS FirstOnScene
WHERE AllTrips.TripDate = FirstOnScene.TripDate
AND AllTrips.Address = FirstOnScene.Address) AS FirstOnScene
FROM
Trips AS AllTrips
But I still get both records returned, and both have the same FirstOnScene time.
How do I only get THE record, with it's DispatchDateTime and OnSceneDateTime, and not the row of the trip that was on scene second?
Here are a few example rows from the table:
2016-01-01 0169-a 150 Main St 2016-01-01 16:52 2016-01-01 16:59 Truck 1
2016-01-01 0171-a 150 Main St 2016-01-01 16:53 2016-01-01 17:05 Truck 2
2016-01-01 0190-a 29 Spring St 2016-01-01 17:19 2016-01-01 17:30 Truck 5
2016-01-02 0111-a 8 Fist St 2016-01-02 09:30 2016-01-02 09:40 Truck 1
2016-01-02 0112-a 8 Fist St 2016-01-02 09:32 2016-01-02 09:38 Truck 2
In the above examples I need to return the first, third, and last row of that data set.
Here is a total shot in the dark based on the sparse information provided. I don't really know what defines a given incident so you can adjust the partition accordingly.
with sortedValues as
(
select TripDate
, Job
, Vehicle
, OnSceneDateTime
, ROW_NUMBER() over(partition by Address, DispatchDateTime order by OnSceneDateTime desc) as RowNum
from Trips
)
select TripDate
, Job
, Vehicle
, OnSceneDateTime
from sortedValues
where RowNum = 1
You can just filter the rows down by selecting only the MIN OnSceneDateTime like below:
SELECT TripDate, Job, Vehicle, DispatchDateTime,OnSceneDateTime FirstOnScene
FROM Trips as AllTrips
WHERE AllTrips.OnSceneDateTime = (SELECT MIN(OnSceneDateTime)
FROM Trips as FirstOnScene
WHERE AllTrips.TripDate = FirstOnScene.TripDate
and AllTrips.Address = FirstOnScene.Address
)
How about use an ORDER BY on the OnSceneDateTime and then Limit 1. A simplified version like this:
SELECT TripDate, Job, Vehicle, DispatchDateTime, OnSceneDateTime FROM trips ORDER BY OnSceneDateTime LIMIT 1