How can I include in schedules today's departures after midnight using GTFS? - sql

I began with GTFS and offhand ran into big problem with my SQL query:
SELECT *, ( some columns AS shortcuts )
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN( $incodes )
AND trips.service_id IN ( $service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t,l,sm
ORDER BY t ASC, l DESC
LIMIT 14
This should show departures from some stop in next 3 hours.
It works but with approaching midnight (e.g. 23:50) it catch only "today's departure". After midnight it catch only "new day departures" and departures from previous day are missing, because they have departure_time e.g. "24:05" (=not bigger than $time 00:05).
Is possible to use something lighter than UNION same query for next day?
If UNION is using, how can I ORDER departures for trimming by LIMIT?
Trips.start_time and end_time are my auxiliary variables for accelerate SQL query execution, it means sequence1-arrival_time and MAXsequence-departure_time of any trip.

Using UNION to link together a query for each day is going to be your best bet, unless perhaps you want to issue two completely separate queries and then merge the results together in your application. The contortionism required to do all this with a single SELECT statement (assuming it's even possible) would not be worth the effort.
Part of the complexity here is that the set of active service IDs can vary between consecutive days, so a distinct set must be used for each one. (For a suggestion of how to build this set in SQL using a subquery and table join, see my answer to "How do I use calendar exceptions to generate accurate schedules using GTFS?".)
More complexity arises from the fact the results for each day must be treated differently: For the result set to be ordered correctly, we need to subtract twenty-four hours from all of (and only) yesterday's times.
Try a query like this, following the "pseudo-SQL" in your question and assuming you are using MySQL/MariaDB:
SELECT *, SUBTIME(departure_time, '24:00:00') AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $yesterdays_service_ids )
AND ( departure_time >= ADDTIME($time, '24:00:00') )
AND ( trips.end_time >= ADDTIME($time, '24:00:00') )
AND ( trips.start_time <= ADDTIME($time_plus_3hrs, '24:00:00') )
UNION
SELECT *, departure_time AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $todays_service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t, l, sm
ORDER BY t ASC, l DESC
LIMIT 14

Related

Calculate time span between two specific statuses on the database for each ID

I have a table on the database that contains statuses updated on each vehicle I have, I want to calculate how many days each vehicle spends time between two specific statuses 'Maintenance' and 'Read'.
My table looks something like this
and I want to result to be like this, only show the number of days a vehicle spends in maintenance before becoming ready on a specific day
The code I written looks like this
drop table if exists #temps1
select
VehicleId,
json_value(VehiclesHistoryStatusID.text,'$.en') as VehiclesHistoryStatus,
VehiclesHistory.CreationTime,
datediff(day, VehiclesHistory.CreationTime ,
lead(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ) ) as days,
lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) as PrevStatus,
case
when (lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) <> json_value(VehiclesHistoryStatusID.text,'$.en')) THEN datediff(day, VehiclesHistory.CreationTime , (lag(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ))) else 0 end as testing
into #temps1
from fleet.VehicleHistory VehiclesHistory
left join Fleet.Lookups as VehiclesHistoryStatusID on VehiclesHistoryStatusID.Id = VehiclesHistory.StatusId
where (year(VehiclesHistory.CreationTime) > 2021 and (VehiclesHistory.StatusId = 140 Or VehiclesHistory.StatusId = 144) )
group by VehiclesHistory.VehicleId ,VehiclesHistory.CreationTime , VehiclesHistoryStatusID.text
order by VehicleId desc
drop table if exists #temps2
select * into #temps2 from #temps1 where testing <> 0
select * from #temps2
Try this
SELECT innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
,SUM(DATEDIFF(DAY,innerQ.PrevMaintenance,innerQ.CreationDate)) AS DayDuration
FROM
(
SELECT t1.VehichleID,t1.CreationDate,t1.Status,
(SELECT top(1) t2.CreationDate FROM dbo.Test t2
WHERE t1.VehichleID=t2.VehichleID
AND t2.CreationDate<t1.CreationDate
AND t2.Status='Maintenance'
ORDER BY t2.CreationDate Desc) AS PrevMaintenance
FROM
dbo.Test t1 WHERE t1.Status='Ready'
) innerQ
WHERE innerQ.PrevMaintenance IS NOT NULL
GROUP BY innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
In this query first we are finding the most recent 'maintenance' date before each 'ready' date in the inner most query (if exists). Then calculate the time span with DATEDIFF and sum all this spans for each vehicle.

Using a date field for matching SQL Query

I'm having a bit of an issue wrapping my head around the logic of this changing dimension. I would like to associate these two tables below. I need to match the Cost - Period fact table to the cost dimension based on the Id and the effective date.
As you can see - if the month and year field is greater than the effective date of its associated Cost dimension, it should adopt that value. Once a new Effective Date is entered into the dimension, it should use that value for any period greater than said date going forward.
EDIT: I apologize for the lack of detail but the Cost Dimension will actually have a unique Index value and the changing fields to reference for the matching would be Resource, Project, Cost. I tried to match the query you provided with my fields, but I'm getting the incorrect output.
FYI: Naming convention change: EngagementId is Id, Resource is ConsultantId, and Project is ProjectId
I've changed the images below and here is my query
,_cte(HoursWorked, HoursBilled, Month, Year, EngagementId, ConsultantId, ConsultantName, ProjectId, ProjectName, ProjectRetainer, RoleId, Role, Rate, ConsultantRetainer, Salary, amount, EffectiveDate)
as
(
select sum(t.Duration), 0, Month(t.StartDate), Year(t.StartDate), t.EngagementId, c.ConsultantId, c.ConsultantName, c.ProjectId, c.ProjectName, c.ProjectRetainer, c.RoleId, c.Role, c.Rate, c.ConsultantRetainer,
c.Salary, 0, c.EffectiveDate
from timesheet t
left join Engagement c on t.EngagementId = c.EngagementId and Month(c.EffectiveDate) = Month(t.EndDate) and Year(c.EffectiveDate) = Year(t.EndDate)
group by Month(t.StartDate), Year(t.StartDate), t.EngagementId, c.ConsultantName, c.ConsultantId, c.ProjectId, c.ProjectName, c.ProjectRetainer, c.RoleId, c.Role, c.Rate, c.ConsultantRetainer,
c.Salary, c.EffectiveDate
)
select * from _cte where EffectiveDate is not null
union
select _cte.HoursWorked, _cte.HoursBilled, _cte.Month, _cte.Year, _cte.EngagementId, _cte.ConsultantId, _cte.ConsultantName, _cte.ProjectId, _Cte.ProjectName, _cte.ProjectRetainer, _cte.RoleId, _cte.Role, sub.Rate, _cte.ConsultantRetainer,_cte.Salary, _cte.amount, sub.EffectiveDate
from _cte
outer apply (
select top 1 EffectiveDate, Rate
from Engagement e
where e.ConsultantId = _cte.ConsultantId and e.ProjectId = _cte.ProjectId and e.RoleId = _cte.RoleId
and Month(e.EffectiveDate) < _cte.Month and Year(e.EffectiveDate) < _cte.Year
order by EffectiveDate desc
) sub
where _cte.EffectiveDate is null
Example:
I'm struggling with writing the query that goes along with this. At first I attempted to partition by greatest date. However, when I executed the join I got the highest effective date for every single period (even those prior to the effective date).
Is this something that can be accomplished in a query or should I be focusing on incremental updates of the destination table so that any effective date / time period in the past is left alone?
Any tips would be great!
Thanks,
Channing
Try this one:
; with _CTE as(
select p.* , c.EffectiveDate, c.Cost
from period p
left join CostDimension c on p.id = c.id and p.Month = DATEPART(month, c.EffectiveDate) and p.year = DATEPART (year, EffectiveDate)
)
select * from _CTE Where EffectiveDate is not null
Union
select _CTE.id, _CTE.Month, _CTE.Year, sub.EffectiveDate, sub.Cost
from _CTE
outer apply (select top 1 EffectiveDate, Cost
from CostDimension as cd
where cd.Id = _CTE.id and cd.EffectiveDate < DATETIMEFROMPARTS(_CTE.Year, _CTE.Month, 1, 0, 0, 0, 0)
order by EffectiveDate desc
) sub
where _Cte.EffectiveDate is null

Counting concurrent records based on startdate and enddate columns

The table structure:
StaffingRecords
PersonnelId int
GroupId int
StaffingStartDateTime datetime
StaffingEndDateTime datetime
How can I get a list of staffing records, given a date and a group id that employees belong to, where the count of present employees fell below a threshold, say, 3, at any minute of the day?
The way my brain works, I would call a stored proc repeatedly with each minute of the day, but of course this would be horribly inefficient:
SELECT COUNT(PersonnelId)
FROM DailyRosters
WHERE GroupId=#GroupId
AND StaffingStartTime <= #TimeParam
AND StaffingEndTime > #TimeParam
AND COUNT(GroupId) < 3
GROUP BY GroupId
HAVING COUNT(PersonnelId) < 3
Edit: If it helps to refine the question, employees may come and go throughout the day. Personnel may have a staffing record from 0800 - 0815, and another from 1000 - 1045, for example.
Here is a solution where I find all of the distinct start and end times, and then query to see how many other people are clocked in at the time. Everytime the answer is less than 4, you know you are understaffed at that time, and presumably until the NEXT start time.
with meaningfulDtms(meaningfulTime, timeType, group_id)
as
(
select distinct StaffingStartTime , 'start' as timeType, group_id
from DailyRosters
union
select distinct StaffingEndTime , 'end' as timeType, group_id
from DailyRosters
)
select COUNT(*), meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
from DailyRosters dr
inner join meaningfulDtms on dr.group_id = meaningfulDtms.group_id
and (
(dr.StaffingStartTime < meaningfulDtms.meaningfulTime
and dr.StaffingEndTime >= meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'start')
OR
(dr.StaffingStartTime <= meaningfulDtms.meaningfulTime
and dr.StaffingEndTime > meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'end')
)
group by meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
having COUNT(*) < 4
Create a table with all minutes in the day with dt at PK
It will have 1440 rows
this will not give you count of zero - no staff
select allMiuntes.dt, worktime.grpID, count(distinct(worktime.personID))
from allMinutes
join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
group by allMiuntes.dt, worktime.grpID
having count(distinct(worktime.personID)) < 3
for times with zero I think the best way is a master of grpID
but I am not sure about this one
select allMiuntes.dt, grpMaster.grpID, count(distinct(worktime.personID))
from grpMaster
cross join allMinutes
left join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
and worktime.grpID = grpMaster.grpID
group by allMiuntes.dt, grpMaster.grpID
having count(distinct(worktime.personID)) < 3

How do I use calendar exceptions to generate accurate schedules using GTFS?

I'm having trouble figuring out the GTFS query to obtain the next 20 schedules for a given stop ID and a given direction.
I know the stop ID, the trip direction ID, the time (now) and the date (today)
I wrote
SELECT DISTINCT ST.departure_time FROM stop_times ST
JOIN trips T ON T._id = ST.trip_id
JOIN calendar C ON C._id = T.service_id
JOIN calendar_dates CD on CD.service_id = T.service_id
WHERE ST.stop_id = 3377699724118483
AND T.direction_id = 0
AND ST.departure_time >= "16:00:00"
AND
(
( C.start_date <= 20140607 AND C.end_date >= 20140607 AND C.saturday= 1 ) // regular service today
AND ( ( CD.date != 20140607 ) // no exception today
OR ( CD.date = 20140607 AND CD.exception_type = 1 ) // or ADDED exception today
)
)
ORDER BY stopTimes.departure_time LIMIT 20
This results in no record being found.
If a remove the last part, dealgin with the CD tables (i.e. the removed or added exceptions), it works perfectly fine.
So I think I'm miswriting the check on the exceptions.
As written above with // comments, I want to check that
today is in a regular service (from checking the calendar table)
there is no removal exception for today (or in this case the trips corresponding to this service id are not included in the computation)
if there is added exception for today, the corresponding trips shall be included in the computation
can you help me with that ?
I'm fairly certain it's not possible to do what you're trying to do with only a single SELECT statement, due to the design of the calendar and calendar_dates tables.
What I do is use a second, inner query to build the set of active service IDs on the requested date, then join the outer query against this set to include only results relevant for that date. Try this:
SELECT DISTINCT ST.departure_time FROM stop_times ST
JOIN trips T ON T._id = ST.trip_id
JOIN (SELECT _id FROM calendar
WHERE start_date <= 20140607
AND end_date >= 20140607
AND saturday = 1
UNION
SELECT service_id FROM calendar_dates
WHERE date = 20140607
AND exception_type = 1
EXCEPT
SELECT service_id FROM calendar_dates
WHERE date = 20140607
AND exception_type = 2
) ASI ON ASI._id = T.service_id
WHERE ST.stop_id = 3377699724118483
AND T.direction_id = 0
AND ST.departure_time >= "16:00:00"
ORDER BY ST.departure_time
LIMIT 20

What can I do to speed up this SQL Query?

Here is some detail, I tried to make a SQLFiddle but I kept getting errors with my variables. This works in Sql Server 2008. My question is, how can I make my query faster? I know I'm doing a number of things wrong here (repeated nester queries), I'm hoping to get someone to take a look and help me get this down from its 30 minute execution time! :-S
The basic idea behind the query is that in the game I want to find all players which haven't moved 5 units for a period of time, who have fired whilst stood still and did not fire for 60 minutes before they stopped moving.
The query works, but it's the AND NOT EXISTS clause which is slowing things down to a crawl, before I added that it took 16 seconds to run! 16 seconds is still a long time, so any other improvements would be appreciated, but for now with this being my own POC game (just throwing bits and pieces together), 16 seconds is acceptable...
DECLARE #n INT , #DistanceLimit INT
SELECT #n = 2 , #DistanceLimit = 5;
WITH partitioned
AS ( SELECT * ,
CASE WHEN Distance < #DistanceLimit THEN 1
ELSE 0
END AS PartitionID
FROM EntityStateEvent
WHERE ExerciseID = '8B50D860-6C4E-11E1-8E70-0025648E65EC'
),
sequenced
AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY PlayerID ORDER BY EventTime ) AS MasterSeqID ,
ROW_NUMBER() OVER ( PARTITION BY PlayerID, PartitionID ORDER BY EventTime ) AS PartIDSeqID ,
*
FROM partitioned
),
filter
AS ( SELECT MasterSeqID - PartIDSeqID AS GroupID ,
MIN(MasterSeqID) AS GroupFirstMastSeqID ,
MAX(MasterSeqID) AS GroupFinalMastSeqID ,
PlayerID
FROM sequenced
WHERE PartitionID = 1
GROUP BY PlayerID ,
MasterSeqID - PartIDSeqID
HAVING COUNT(*) >= #n
)
SELECT
DISTINCT ( sequenced.PlayerID ) ,
MIN(sequenced.EventTime) AS StartTime ,
MAX(sequenced.EventTime) AS EndTime ,
DATEDIFF(minute, MIN(sequenced.EventTime),
MAX(sequenced.EventTime)) AS StaticTime ,
Player.Designation AS 'Player'
FROM filter
INNER JOIN sequenced ON sequenced.PlayerID = filter.PlayerID
AND sequenced.MasterSeqID >= filter.GroupFirstMastSeqID
AND sequenced.MasterSeqID <= filter.GroupFinalMastSeqID
INNER JOIN Events ON Events.FiringPlayerID = sequenced.PlayerID
INNER JOIN Player ON Player.PlayerID = sequenced.PlayerID
AND Player.Force = 'FR'
AND NOT EXISTS ( SELECT *
FROM Events
WHERE Events.FiringPlayerID = Player.PlayerID
GROUP BY Events.FiringTime
HAVING Events.FiringTime BETWEEN DATEADD(minute,
-60,
( SELECT
MIN(s.EventTime)
FROM
sequenced s
WHERE
s.PlayerID = filter.PlayerID
AND s.MasterSeqID >= filter.GroupFirstMastSeqID
AND s.MasterSeqID <= filter.GroupFinalMastSeqID
))
AND
( SELECT
MIN(s.EventTime)
FROM
sequenced s
WHERE
s.PlayerID = filter.PlayerID
AND s.MasterSeqID >= filter.GroupFirstMastSeqID
AND s.MasterSeqID <= filter.GroupFinalMastSeqID
) )
INNER JOIN Player HitPlayer ON HitPlayer.PlayerID = Events.HitPlayerID
WHERE HitPlayer.[FORCE] = 'HO'
GROUP BY GroupID ,
sequenced.PlayerID ,
Events.FiringPlayerID ,
Events.FiringTime ,
Player.Designation
HAVING DATEDIFF(minute, MIN(sequenced.EventTime),
MAX(sequenced.EventTime)) > 5
AND Events.FiringTime BETWEEN MIN(sequenced.EventTime)
AND MAX(sequenced.EventTime)
ORDER BY StartTime
The first thing I'd do is materialize the sequenced CTE, since it is used 4 times in the overall schema of things.
This would mean moving around some code and using #temp tables in place of the sequential CTEs. It would also work out an order of magnitude better since you can cluster #temp tables and create useful indexes for the JOINs.
See this SQLFiddle that shows that CTEs can be evaluated many times, once for each reference.