USING RECURSION IN SQL TO FIND THE LOCATION OF AN EVENT - sql

I am trying to determine the location of an event from 2 tables using sql. I believe that recursion is necessary for this solution, and I'm a bit foggy on that. Any help you can provide would be appreciated.
The first table is the movement table. It shows when and where a person moved. I also included a movement rank, although, I'm not sure it's necessary for a solution. A person may move back and forth between buildings.
The second table is the event table. It shows a person and a time of that person's event.
What I want to be able to do is find, for each event in the event_table, the location of the person. The solution would look something like this:
Thank you so much for your time and consideration.

This does it. This leaves NULL in the column if there is no movement before that time.
SELECT et.*, (
SELECT to_location
FROM movement_table
WHERE person_id=et.person_id
AND movement_time < et.event_time
ORDER BY movement_time desc
LIMIT 1) AS event_location
FROM event_table et;
Output (from sqlite):
10|2021-07-20 11:30:00|
20|2021-06-29 10:29:00|
20|2021-07-02 04:30:00|BUILDING A
20|2021-07-04 15:46:00|BUILDING B
40|2021-07-07 23:59:00|BUILDING C
50|2021-07-13 23:05:00|BUILDING D
50|2021-07-17 09:37:00|BUILDING D
sqlite>

If I understand correctly, location for an event is the TO_LOCATION column of the latest movement before the event in the Movement_table. So you need to join tables and select by max(MOVEMENT_TIME) < (or <=) EVENT_TIME. I cannot give an exact sql but it will be something like:
Select e.event_time, m.to_location
From event_table e, movement_table m
Where m.movement_time =
(Select max(m1.movement_time)
From movement_table m1
Where m1.movement_time < e.event_time)

Basically, a person is in a place for a period of time. You really just need the time period -- to get the end time use lead(). The rest is a join:
select e.*, m.*
from events e left join
(select m.*,
lead(movement_time) over (partition by person_id order by movement_time) as next_movement_time
from movements m
) m
on e.person_id = m.person_id and
e.event_time >= m.movement_time and
(e.event_time < m.next_movement_time or m.next_movement_time is null);

Related

SQL Event Rows to Columns timestamp

First post! I'm a SQL newbie and am trying to query a huge data set into something manageable.
Data below is a for a Dr. office. I have the appointment ID (don't care about the patient name for this), which can have a few different associated events. I want to show all of those events as timestamps by column, with take the most-recent one if there are multiples (the patient rescheduled).
From there, I'll datediff to get the different breakdowns, but I'm not sure how to get there. I've been searching and must be using the wrong terms, so if this has been answered elsewhere, please link me and don't use your time to explain.
Thanks for your help!
Use something resembling the following pattern.
SELECT s1.id "ID",
s2.time "Scheduled",
...
s6.time "Depart Office",
datediff(minute, s3.time, s4.time) "Wait Time"
FROM (SELECT DISTINCT
id
FROM elbat) s1
LEFT JOIN (SELECT id,
time
FROM elbat s2
WHERE event = 'Scheduled') s2
ON s2.id = s1.id
...
LEFT JOIN (SELECT id,
time
FROM elbat
WHERE event = 'Depart Office') s6
ON s6.id = s1.id;
I chose the SQL Server datediff() syntax. You may need to rewrite it for your DBMS. If you have a patient table, you should also replace (SELECT DISTINCT id FROM elbat) with that patient table. You might also need to come up with a method for multiple appointments for a patient. Include e.g. the day in the joins (or any other attribute with which the different appointments can be distinguished).

Refer to another table and return data adjacent to Max() result

I have the following two tables:
Using SQL Server 2012, I want to know the INTERVAL from the Hourly table where the MaxWaitTime and Split match what comes from the Daily table for each day. I am assuming I need to use a window function here, but I can't figure out the right answer.
There may be times where MaxWaitTime is 0 for an entire day, and thus all rows from the hourly table match. In this scenario, I would prefer a Null answer, but the earliest INTERVAL for that day would be fine.
There will also be times where multiple INTERVALs have the same wait time. In this scenario the first INTERVAL where the MaxWaitTime is present that day should be returned.
You can use outer apply if you want at most one match:
Looks like a simple left join should work between the tables. I'm simply going by the data shown above...
The query should look something like this. If the join fails, then a NULL will be returned. Give it a go..
select d.*, h.interval as maxinterval
from daily d outer apply
(select top 1 h.*
from hourly h
where convert(date, h.interval) = d.row_date and
h.split = d.split and
h.maxwaittime = d.maxwaittime
order by h.interval asc
) h;
If you want NULL for multiple matches, you can do something similar:
select d.*, h.interval as maxinterval
from daily d outer apply
(select top 1 h.callsoffered, h.split, max(h.interval) as maxinterval
from hourly h
where convert(date, h.interval) = d.row_date and
h.split = d.split and
h.maxwaittime = d.maxwaittime
group by h.maxwaittime, h.split
having count(*) = 1
) h;
Looks like a simple left join should work between the tables. I'm simply going by the data shown above...
The query should look something like this. If the join fails, then a NULL will be returned. Give it a go..
select daily.* ,hourly.callsoffered, hourly.interval as maxinterval
from daily
left join hourly
on convert(date,hourly.interval) = daily.row_date
and hourly.split = daily.split
and hourly.maxwaittime = daily.maxwaittime

Can't Make Crosstab Query on a query containing SubQuery

I have query that contain subquery: to calculate the interval between departure and arrival time, from my table "Timetable"
this Query works very fine, but when trying to execute it from the Crosstab, It prompts me an error that it cannot find table "a" which is alias I used for "Timetable"
SELECT a.VesselID, a.MovementID, a.MovementTime, (SELECT TOP 1
Timetable.MovementTime
FROM Timetable
WHERE (((Timetable.MovementID)="Arrival") AND
((Timetable.VesselID)=a.VesselID]) AND ((Timetable.MovementTime)>a.
[MovementTime]))
ORDER BY Timetable.MovementTime) AS Arrival1,
DateDiff('h',[a].[MovementTime],[Arrival1]) AS [Interval]
FROM Timetable AS a INNER JOIN Timetable ON a.ID = Timetable.ID
WHERE (((a.MovementID)="Departure"));
I think this Question is very similar, and the solution is that I split my query As #DHW said, but I couldn't do that.
and this is my try on splitting:
[Departure_Query]
SELECT Timetable.VesselID, Timetable.MovementTime AS mymov,
Timetable.MovementID
FROM Timetable
WHERE (((Timetable.MovementID)="Departure"));
[Main]
SELECT Timetable.MovementTime, Timetable.MovementID, Timetable.VesselID, Departure_Query.mymov, DateDiff('h',[mymov],[MovementTime]) AS [Interval]
FROM Timetable INNER JOIN Departure_Query ON Timetable.VesselID = Departure_Query.VesselID
WHERE (((Timetable.MovementTime)>[Departure_Query].[mymov]) AND ((Timetable.MovementID)="Arrival") AND ((Timetable.VesselID)=[Departure_Query].[VesselID]))
ORDER BY Timetable.MovementTime;
I think the problem is:
In The working query I could put SELECT TOP 1 but in the split try I dont know where to put it.
update Actually, right now i want to split it anyway, because when i am trying to build a report in top of it. It prompts me that Access cant do grouping on this field.
But anyway this my attempt
TRANSFORM DateDiff('h',[a].[MovementTime],[Arrival1]) AS [Interval]
SELECT a.MovementTime
FROM Timetable AS a INNER JOIN Timetable ON a.ID = Timetable.ID
WHERE (((a.MovementID)="Departure"))
GROUP BY a.MovementID, a.MovementTime, (SELECT TOP 1 Timetable.MovementTime
FROM Timetable
WHERE (((Timetable.MovementID)="Arrival") AND ((Timetable.VesselID)=a.[VesselID]) AND ((Timetable.MovementTime)>a.[MovementTime]))
ORDER BY Timetable.MovementTime)
PIVOT a.VesselID;
The resultsThe Design View
Consider a crosstab with a domain aggregate, DMin() to replace subquery:
TRANSFORM DateDiff('h', main.[MovementTime], main.[Arrival1]) AS [Interval]
SELECT main.MovementID, main.MovementTime
FROM
(SELECT t.VesselID, t.MovementID, t.MovementTime,
DMin("MovementTime", "Timetable", "MovementID = 'Arrival'
AND VesselID = " & t.VesselID & "
AND MovementTime > #" & t.MovementTime & "#") As Arrival1
FROM Timetable AS t
WHERE (((t.MovementID) = 'Departure'))
) As
GROUP BY main.MovementID, main.MovementTime
PIVOT main.VesselID;
Thank you #Parfait and #June7, I am adding this answer so anyone in the future can benefit from this problem.
The Problem
I figured out the problem to be: The query is subtracting all the smaller departure dates for a specific Vessel
i.e. Vessel 1 Departed 6/1, 6/3, 6/6 and Arrived 6/2,6/2,6/8. so for the last day It was subtracting 6/8-6/6, 6/8-6/3, 6/8-6/1. of the course the only first one (the bold one)is the right one.
The Solution
SELECT Min(Timetable.MovementTime) AS MinOfMovementTime, Departure_Query.mymov AS DeptDate, Min(DateDiff('h',[mymov],[MovementTime])) AS WorkingH, Timetable.MovementID, Timetable.VesselID
FROM Timetable LEFT JOIN Departure_Query ON Timetable.VesselID = Departure_Query.VesselID
WHERE (((Timetable.MovementID)="Arrival") AND ((Timetable.VesselID)=[Departure_Query].[VesselID]) AND ((Timetable.MovementTime)>[mymov]))
GROUP BY Departure_Query.mymov, Timetable.MovementID, Timetable.VesselID
ORDER BY Min(Timetable.MovementTime);
The only change here is Min(DateDiff('h',[mymov],[MovementTime])) which only give the smallest subtraction value, which translates to The biggest Departure Date.

Calculating Dates

I have this problem: List of customers with their next scheduled, reoccurring appointment, that is either yearly, monthly, or quarterly.
The tables\columns I have are:
customer
customer_ID
service
customer_ID
service_RecID
Resource
service_RecID
Recurrence_RecID
Date_Time_Start
Recurrence
Recurrence_RecID
RecurType
RecurInterval
DaysOfWeek
AbsDayNbr
SelectInterval
It is modeled such that when the schedule is setup, the date_start_time is the date of when the first reoccurring appointment took place. Ex.
Recurrence_RecID = 10
RecurType = m (could be y, or d as well for yearly or daily)
RecurInterval = 6 (if recurType = y, this would mean every 6 years)
Given that the system generates these nightly, how would I write a query to calculate the next scheduled appointment, for each customer? I originally thought of using the Resource.Date_Time_Start and just cycling through until a variable nextAppointment >= today(), but is it good practice to run loops in SQL?
If anymore info is needed, let me know. Thank you much!
Edit: I will make a sqlfiddle.
I would suggest using a sub-query as opposed to looping. More efficient that way. This may not be exact but something like...
SELECT
*
FROM
(
SELECT
customer.customer_id,
service.service_RecID,
Resource.Date_Time_Start,
Recurrence.Recurrence_RecID,
RecurType,
RecurInterval,
DaysOfWeek,
AbsDayNbr,
SelectInterval,
NextAppointmentDate=
CASE
WHEN RecurType='m' THEN DATEADD(MONTH,RecurInterval,Resource.Date_Time_Start)
WHEN RecurType='y' THEN DATEADD(YEAR,RecurInterval,Resource.Date_Time_Start)
ELSE
NULL
END
FROM
Recurrence
INNER JOIN Resource ON Resource.Recurrence_RecID=Recurrence.Recurrence_RecID
INNER JOIN service ON service.service_RecID=Resource.service_RecID
INNER JOIN customer ON customer.customer_ID=service.customerID
)AS X
WHERE
NextAppointmentDate>=GETDATE()
ORDER BY Fields...

How to use min() in where/having clause (to avoid subquery) in Hive/SQL

I have a large table of events. Per user I want to count the occurence of type A events before the earliest type B event.
I am searching for an elegant query. Hive is used so I can't do subqueries
Timestamp Type User
... A X
... A X
... B X
... A X
... A X
... A Y
... A Y
... A Y
... B Y
... A Y
Wanted Result:
User Count_Type_A
X 2
Y 3
I could not get the "cut-off" timestamp by doing:
Select User, min(Timestamp)
Where Type=B
Group BY User;
But then how can I use that information inside the next query where I want to do something like:
SELECT User, count(Timestamp)
WHERE Type=A AND Timestamp<min(User.Timestamp_Type_B)
GROUP BY User;
My only idea so far are to determine the cut-off timestamps first and then do a join with all type A events and then select from the resulting table, but that feels wrong and would look ugly.
I'm also considering the possibility that this is the wrong type of problem/analysis for Hive and that I should consider hand-written map-reduce or pig instead.
Please help me by pointing in the right direction.
First Update:
In response to Cilvic's first comment to this answer, I've adjusted my query to the following based on workarounds suggested in the comments found at https://issues.apache.org/jira/browse/HIVE-556:
SELECT [User], COUNT([Timestamp]) AS [Before_First_B_Count]
FROM [Dataset] main
CROSS JOIN (SELECT [User], min([Timestamp]) [First_B_TS] FROM [Dataset]
WHERE [Type] = 'B'
GROUP BY [User]) sub
WHERE main.[Type] = 'A'
AND (sub.[User] = main.[User])
AND (main.[Timestamp] < sub.[First_B_TS])
GROUP BY main.[User]
Original:
Give this a shot:
SELECT [User], COUNT([Timestamp]) AS [Before_First_B_Count]
FROM [Dataset] main
JOIN (SELECT [User], min([Timestamp]) [First_B_TS] FROM [Dataset]
WHERE [Type] = 'B'
GROUP BY [User]) sub
ON (sub.[User] = main.[User]) AND (main.[Timestamp] < sub.[First_B_TS])
WHERE main.[Type] = 'A'
GROUP BY main.[User]
I did my best to follow hive syntax. Let me know if you have any questions. I would like to know why you wish/need to avoid a subquery.
In general, I +1 coge.soft's solution. Here it is again for your reference:
SELECT [User], COUNT([Timestamp]) AS [Before_First_B_Count]
FROM [Dataset] main
JOIN (SELECT [User], min([Timestamp]) [First_B_TS] FROM [Dataset]
WHERE [Type] = 'B'
GROUP BY [User]) sub
ON (sub.[User] = main.[User]) AND (main.[Timestamp] < sub.[First_B_TS])
WHERE main.[Type] = 'A'
GROUP BY main.[User]
However, a couple things to note:
What happens when there are no B events? Assuming you would want to count all the A events per user in that case an inner join as specified in the solution wouldn't work since there would be no entry for that user in the sub table. You would need to change to a left outer join for that.
The solution also does 2 passes over the data - one to populate the sub table, other to join the sub table with the main table. Depending on your notion of performance and efficiency, there is an alternative where you could do this by a single pass of data. You can distribute the data by user using Hive's distribute by functionality and write a custom reducer that would do your count calculation in your favorite language using Hive's transform functionality.