SQL Oracle - Find all combination of events possible based on date - sql

I have a table with the following information:
Data Sample
**Table 1**
palletNumber-- event-- --recordDate
-----1-----------A-------01/11/2015 01:00
-----1-----------B-------01/11/2015 02:00
-----1-----------C-------01/11/2015 03:00
-----1-----------D-------01/11/2015 04:00
-----2-----------A-------01/11/2015 01:10
-----2-----------C-------01/11/2015 01:15
-----2-----------E-------01/11/2015 01:20
I want to select all the possible combinations of events that appear in the table in the sequence of the recordDate by palletNumber. I tried various statements with Row Number, Over Partition but this did not get me close to what I am looking for... Any direction on where to go?
This would be the output table for example:
**Table 2**
event1-- event2--
---A------B------
---B------C------
---C------D------
---A------C------
---C------E------
Thanks,

You can get the previous or next event using lag() or lead():
select event,
lead(event) over (partition by palletnumber order by recorddate) as next_event
from datasample;
If you want to eliminate duplicates, I would be inclined to use group by, because this also gives the ability to count the number of times that each pair appears:
select event, next_event, count(*) as cnt
from (select event,
lead(event) over (partition by palletnumber order by recorddate) as next_event
from datasample
) ds
group by event, next_event;

Use Case:
select case when palletNumber = 1 then event else null end as event1,
case when palletNumber = 2 then event else null end as event2,
recordDate
from table1
Then you can work with the data using lead/lag or sum() / group by to get it in one row.
Assuming events 1/2 only have one record per date
select recordDate, max (event1), max (event2)
from ( select case when palletNumber = 1 then event else null end as event1,
case when palletNumber = 2 then event else null end as event2,
recordDate
from table1
order by recordDate) tab2
group by recordDate

Related

How to increment a parent group number when the child window has incrementing values?

I am using Spark SQL 3.2.0
Please see the DB Fiddle link for a simplified example of my dataset and desired outcome.
In abstract, I have a dataset with a series of related events that can be grouped by their time order and event number. When ordering by time and event number, every time the event number resets to 1, you're looking at a new set of events.
I understand how to use row_number() or dense_rank() to increment event_group_number where sub_event_number = 1, but I'm uncertain how to make the rows where sub_event_number > 1 take on the correct event_group_number.
I'm currently doing the following:
case
when sub_event_number = 1 and is_event_type
then row_number() over (partition by context_id, event_id, sub_event_number order by is_event_type asc, start_time asc) - 1
else null
end as event_group_number
I'd be grateful for any help, and I'm happy to answer any questions.
It seems you're looking for a cumulative conditional sum:
SELECT context_id,
event_id,
start_time,
NULLIF(
SUM(CASE WHEN sub_event_number = 1 THEN 1 ELSE 0 END) OVER(
PARTITION BY context_id, event_id
ORDER BY is_event_type, start_time) - 1,
0
) AS event_group_number
FROM foobar
ORDER BY context_id, event_id, is_event_type, start_time
db-fiddle

Create partitions based on column values in sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

MSSQL - Delete duplicate rows using common column values

I haven't used SQL in quite a while, so I'm a bit lost here. I wanted to check for rows with duplicate values in the "Duration" and "date" columns to remove them from the query results. I would need to keep the rows where column = "Transfer" since these hold more information about the call and how it was routed through our system.
I want to use this for a dashboard, which would include counting the total number of calls from that query, which is why I cannot have both.
Here's the (Simplified) code used:
SELECT status, user, duration, phonenumber, date
FROM (SELECT * FROM view_InboundPhoneCalls) as Phonecalls
WHERE date>=DATEADD(dd, -15, getdate())
--GROUP BY duration
Which gives something of the sort:
Status
User
Duration
phonenumber 
date
Received
Receptionnist
00:34:03
 from: +1234567890 
2021-09-30 16:01:57 
Received
Receptionnist
00:03:12
 from: +9876543210 
2021-09-30 16:02:40 
Transfer
User1
00:05:12
 +14161654965;Receptionnist;User1 
2021-09-30 16:01:57 
Received
Receptionnist
00:05:12
 from: +14161654965 
2021-09-30 16:01:57 
The end result would be something like this:
Status
User
Duration
phonenumber 
date
Received
Receptionnist
00:34:03
 from: +1234567890 
2021-09-30 16:01:57 
Received
Receptionnist
00:03:12
 from: +9876543210 
2021-09-30 16:02:40 
Transfer
Receptionnist
00:05:12
 +14161654965;Receptionnist;User1 
2021-09-30 16:01:57 
The normal "trick" is to detect duplicates first. One of the easier ways is a CTE (Common Table Expression) along with the ROW_NUMBER() function.
Part One - Mark the duplicates
WITH
cte_Sorted_List
(
status, usertype, duration, phonenumber, dated, duplicate_check
)
AS
( -- only use required fields to speed up
SELECT status, user, duration, phonenumber, date,
-- marks depend on correct columns!
Row_Number() OVER
( -- sort over relevant columns to show
PARTITION BY user, phonenumber, date, duration
-- with correct sort order
-- bit of hack: As T comes after R
-- logic: mark records to show as row number 1 in duplicate list
ORDER BY status DESC
) AS duplicate_check
FROM view_InboundPhoneCalls
-- and lose all unnecessary data
WHERE date>=DATEADD(dd, -15, getdate())
)
Part two - show relevant rows
SELECT
status, usertype, duration, phonenumber, dated
FROM
cte_Sorted_List
WHERE
Duplicate_Check = 1
;
First CTE extracts required fields in single pass, then that data only is used for output.
You could go for a blacklist, say with a CTE, then filter out the undesired rows.
Something like:
WITH Blacklist ([date], [duration]) AS (
SELECT [date], [duration] FROM view_InboundPhoneCalls
GROUP BY [date], [duration]
Having count(*) > 1
)
SELECT status, user, duration, phonenumber, date
FROM
(SELECT * FROM view_InboundPhoneCalls) as Phonecalls
LEFT JOIN
Blacklist
ON Phonecalls.[date] = Blacklist.[date]
AND Phonecalls.[duration] = Blacklist.[duration]
Where
Blacklist.[date] is null
Or
(Blacklist.[date] is not null AND Phonecalls.[Status] == 'Transfer')
You can use row-numbering for this, along with a custom ordering. There is no need for any joins.
SELECT status, [user], duration, phonenumber, date
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY duration, date
ORDER BY CASE WHEN Status = 'Transfer' THEN 1 ELSE 2 END)
FROM view_InboundPhoneCalls
WHERE date >= DATEADD(day, -15, getdate())
) as Phonecalls
WHERE rn = 1

How can I identify start and end of uninterrupted sequences?

I have a list of events sorted by TITLE and TIME e.g.:
TITLE |TIME
A |11:59
A |12:00
A |12:01
A |12:02
A |12:03
B |12:04
B |12:05
B |12:06
B |12:07
B |12:14
B |12:15
B |12:16
I want to calculate START and END of sequences. Sequence is a set of events in which minutes follow each other without gaps for same TITLE, e.g.:
TITLE |START |END
A |11:59 |12:03
B |12:04 |12:07
B |12:14 |12:16
Assuming all the window functions are supported, you can do this with lag and a running sum to assign groups based on a 1 minute time difference.
select title,min(time) as start_time,max(time) as end_time
from (select title,time,sum(col) over(partition by title order by time) as grp
from (select title,time,
case when lag(time) over(partition by title order by time) - time = 1
/*change this calculation for 1 minute time difference*/
then 0 else 1 end as col
from tbl
) t
) t
group by title,grp
Another way is
select title,min(time),max(time)
from (
select title,time,
time-row_number() over(partition by title order by time) as grp
/*change this calculation to subtract row_number from time*/
from tbl
) t
group by title,grp

SQL Server - Conditionally Increment a Counter

What I'm looking to do is create grouped sequences for continuous date ranges. Take the following sample data:
Person|BeginDate |EndDate
A |1/1/2015 |1/31/2015
A |2/1/2015 |2/28/2015
A |4/1/2015 |4/30/2015
A |5/1/2015 |5/31/2015
B |1/1/2015 |1/30/2015
B |8/1/2015 |8/30/2015
B |9/1/2015 |9/30/2015
If BeginDate in the current row is >1 day from the EndDate in the previous row then increment the counter by 1, otherwise assign the counter's current value. The sequencing would look like :
Person|BeginDate |EndDate |Sequence
A |1/1/2015 |1/31/2015|1
A |2/1/2015 |2/28/2015|1
A |4/1/2015 |4/30/2015|2
A |5/1/2015 |5/31/2015|2
B |1/1/2015 |1/30/2015|1
B |8/1/2015 |8/30/2015|2
B |9/1/2015 |9/30/2015|2
Partitioned and reset for each person.
For your testing :
CREATE TABLE ##SequencingTest(
Person char(1)
,BeginDate date
,EndDate date)
INSERT INTO ##SequencingTest
VALUES
('A','1/1/2015','1/31/2015')
,('A','2/1/2015','2/28/2015')
,('A','4/1/2015','4/30/2015')
,('A','5/1/2015','5/31/2015')
,('B','1/1/2015','1/30/2015')
,('B','8/15/2015','8/31/2015')
,('B','9/1/2015','9/30/2015')
You can do this with lag() and then a cumulative sum:
select t.*,
sum(flag) over (partition by person order by begindate) as sequence
from (select t.*,
(case when datediff(day, lag(endDate) over (partition by person order by begindate), begindate) < 2
then 0
else 1
end) as flag
from t
) t;
If the continuous end dates are always 1 day before the next start date you could do something really primitive like this:
SELECT S1.Person, S1.BeginDate, S1.EndDate, SUM(S2.Cntr) AS Sequence
FROM Sequencing S1
INNER JOIN (SELECT Person, BeginDate,
CASE WHEN EXISTS (SELECT Person FROM Sequencing S2 WHERE S2.[EndDate] =
DATEADD(d, -1, S1.[BeginDate]) AND S2.Person = S1.Person) THEN 0 ELSE 1 END AS Cntr
FROM [Sequencing] S1
) S2
ON S1.Person = S2.Person
AND S1.BeginDate >= S2.BeginDate
GROUP BY S1.Person, S1.BeginDate, S1.EndDate
ORDER BY S1.Person, S1.BeginDate, S1.EndDate
Note I think you meant to say '1/31/2015' and '8/31/2015' as end dates to work with your example.
Also, #GordonLinoff's answer is probably better. I simply do not have the version of SQL Server to test it with.