Grouping sets of data in Oracle SQL - sql

I have been trying to separate groups in data being stored on my oracle database for more accurate analysis.
Current Output
Time Location
10:00 A111
11:00 A112
12:00 S111
13:00 S234
17:00 A234
18:00 S747
19:00 A878
Desired Output
Time Location Group Number
10:00 A111 1
11:00 A112 1
12:00 S111 1
13:00 S234 1
17:00 A234 2
18:00 S747 2
19:00 A878 3
I have been trying to use over and partition by to assign the values, however I can only get into to increment all the time not only on a change. Also tried using lag but I struggled to make use of that.
I only need the value in the second column to start from 1 and increment when the first letter of field 1 changes (using substr).
This is my attempt using row_number but I am far off I think. There would be a time column in the output as well not shown above.
select event_time, st_location, Row_Number() over(partition by
SUBSTR(location,1,1) order
by event_time)
as groupnumber from pic
Any help would be really appreciated!
Edit:
Time Location Group Number
10:00 A-10112 1
11:00 A-10421 1
12:00 ST-10621 1
13:00 ST-23412 1
17:00 A-19112 2
18:00 ST-74712 2
19:00 A-87812 3

It is a gap and island problem. Use the following code:
select location,
dense_rank() over (partition by SUBSTR(location,1,1) order by grp)
from
(
select (row_number() over (order by time)) -
(row_number() over (partition by SUBSTR(location,1,1) order by time)) grp,
location,
time
from data
) t
order by time
dbfiddle demo
The main idea is in the subquery which isolates consecutive sequences of items (computation of grp column). The rest is simple once you have the grp column.

select DENSE_RANK() over(partition by SUBSTR("location",1,1) ORDER BY SUBSTR("location",1,2))
as Rownumber,
"location" from Table1;
Demo
http://sqlfiddle.com/#!4/21120/16

Related

How to group consecutive rows together in SQL by multiple columns

I have rows in a query that return something like:
Date User Time Location Service Count
1/1/2018 Nick 12:00 Location A X 1
1/1/2018 Nick 12:01 Location A Y 1
1/1/2018 John 12:02 Location B Z 1
1/1/2018 Harry 12:03 Location A X 1
1/1/2018 Harry 12:04 Location A X 1
1/1/2018 Harry 12:05 Location B Y 1
1/1/2018 Harry 12:06 Location B X 1
1/1/2018 Nick 12:07 Location A X 1
1/1/2018 Nick 12:08 Location A Y 1
where the query returns locations visited by a user and a count of picks done from the location. results are sorted by user and time ascending. I need to group it to where CONSECUTIVE rows with same User and Location are grouped with a SUM of Count column and comma separated list of unique values in Service Column, final result returns something like this:
Date User Start Time End Time Location Service Count
1/1/2018 Nick 12:00 12:01 Location A X,Y 2
1/1/2018 John 12:02 12:02 Location B Z 1
1/1/2018 Harry 12:03 12:04 Location A X 2
1/1/2018 Harry 12:05 12:06 Location B X,Y 2
1/1/2018 Nick 12:07 12:08 Location A X,Y 2
I'm not sure where to start. Maybe lag or partition clauses? hoping an SQL guru can help here...
This is a gaps and islands problem. One method for solving it uses row_number():
select Date, User, min(Time) as start_time, max(time) as end_time,
Location,
listagg(Service, ',') within group (order by service),
count(*) as cnt
from (select t.*,
row_number() over (date order by time) as seqnum,
row_number() over (partition by user, date, location order by time) as seqnum_2
from t
) t
group by Date, User, Location, (seqnum - seqnum_2);
It is a bit tricky to explain how this works. My suggestion is to run the subquery and you will see how the difference of row numbers defines the groups that you are looking for.
Use lag to get user and location values of previous row. Then use a running sum to generate a new group whenever the user and location change. Finally aggregate on the classified groups,user,location and date.
select Date, User, min(Time) as start_time,max(time) as end_time, Location,
listagg(Service, ',') within group (order by Service),
count(*) as cnt
from (select Date, User, Time, Location,
sum(case when prev_location=location and prev_user=user then 0 else 1 end) over(order by date,time) as grp
from (select Date, User, Time, Location,
lag(Location) over(order by date,time) as prev_location,
lag(User) over(order by date,time) as prev_user,
from t
) t
) t
group by Date, User, Location, grp;

I need to calculate the time between dates in different lines. (PLSQL)

I have a table where I store all status changes and the time that it has been made. So, when I search the order number on the table of times I get all the dates of my changes, but what I realy want is the time (hours/minutes) that the order was in each status.
The table of time seems like this
ID_ORDER | Status | Date
1 Waiting 27/09/2017 12:00:00
1 Late 27/09/2017 14:00:00
1 In progress 28/09/2017 08:00:00
1 Validating 30/09/2017 14:00:00
1 Completed 30/09/2017 14:00:00
Thanks!
Use lead():
select t.*,
(lead(date) over (partition by id_order order by date) - date) as time_in_order
from t;

Giving a common value to groups of consecutive hours in SQL

I am using Netezza.
Let's say I have a table with two fields: one field is a timestamp corresponding to every hour in the day, the other is an indicator for whether or not a patient took an antacid during the hour. The table looks as follows:
Timestamp Antacid?
11/23/2016 08:00 1
11/23/2016 09:00 1
11/23/2016 10:00 1
11/23/2016 11:00 0
11/23/2016 12:00 0
11/23/2016 13:00 1
11/23/2016 14:00 1
11/23/2016 15:00 0
Is there a way to assign a common partition value to each set of consecutive hour intervals? Something like this...
Timestamp Antacid? Group
11/23/2016 08:00 1 1
11/23/2016 09:00 1 1
11/23/2016 10:00 1 1
11/23/2016 11:00 0 NULL
11/23/2016 12:00 0 NULL
11/23/2016 13:00 1 2
11/23/2016 14:00 1 2
11/23/2016 15:00 0 NULL
I would ultimately like to figure out the start date and end date for all consecutive hours of antacid usage (so the start and end dates for the first group would be 11/23/2016 08:00 and 11/23/2016 10:00 respectively, and the start/end dates for the second group would be 11/23/2016 13:00 and 11/23/2016 14:00, respectively). I have done this before with consecutive days using extract(epoch from date - row_number()) but I'm not sure how to handle hours.
I assume this has to be done for each patient (id in the query here). You can use
select id,antacid,min(dt) startdate,max(dt) enddate from (
select t.*,
-row_number() over(partition by id,antacid order by dt)
+ row_number() over(partition by id order by dt) grp
from t
) x
where antacid = 1
group by id,antacid,grp
order by 1,3
The inner query gets you the continuous groups of 0 and 1 for antacid for a given patient id. Because you only need the start and end dates for antacid=1, you can use a where clause to filter.
Add partition by date if this has to be done for each day.
Edit: Grouping rows only if the difference between the current row and the next row is one hour.
select id,antacid,min(dt) startdate,max(dt) enddate from (
select t.*,
--change dateadd as per Netezza functions so you add -row_number hours
dateadd(hour,-row_number() over(partition by id,antacid order by dt),dt) grp
from t
) x
where antacid = 1
group by id,antacid,grp
order by 1,3

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

SQL Time Packing of Islands

I have an sql table that has something similar to this:
EmpNo StartTime EndTime
------------------------------------------
1 7:00 7:30
1 7:15 7:45
1 13:40 15:00
2 8:00 14:00
2 8:30 9:00
3 10:30 14:30
I've seen a lot of examples where you can find the gaps between everything, and a lot of examples where you can pack overlaps for everything. But I want to be able to separate these out by user.
Sadly, I need a pure SQL solution.
Ultimately, I would like to return:
EmpNo StartTime EndTime
------------------------------------------
1 7:00 7:45
1 13:40 15:00
2 8:00 14:00
3 10:30 14:30
It seems simple enough, I have just spent the last day trying to figure it out, and come up with very little. Never will any column here be NULL, and you can assume there could be duplicates, or gaps of 0.
I know this is the classic island problem, but the solutions I have seen so far aren't incredibly friendly with keeping separate ID's grouped
"Pure SQL" would surely support the lag(), lead(), and cumulative sum functions because these are part of the standard. Here is a solution using standard SQL:
select EmpNo, min(StartTime) as StartTime, max(EndTime) as EndTime
from (select t.*, sum(StartGroup) over (partition by EmpNo order by StartTime) as grp
from (select t.*,
(case when StartTime <= lag(EndTime) over (partition by EmpNo order by StartTime)
then 0
else 1
end) as StartGroup
from table t
) t
) t
group by EmpNo, grp;
If your database doesn't support these, you can implement the same logic using correlated subqueries.