SQL: Joining sequential events based on time stamps - sql

I have a table containing information about buses driving around a city. Each record represents an event where a bus arrives at a bus stop, with the bus id, stop id, arrival time (military time in seconds), and departure time (military time in seconds). If I can join each event to the subsequent event, then I can compute the time each bus spends driving between stops by subtracting the departure time from stop 1 from the arrival time at stop 2.
But how can I perform this join? How can I easily find the soonest arrival time after a given departure time? edit I am using sql-server 2012.
Sample Data
Expected Result

Use lead function, which gets the values on the subsequent row based on a specified ordering.
select t.*,
lead(arrival_time) over(partition by busname order by arrival_time) as next_stop_arrival,
lead(departure_time) over(partition by busname order by arrival_time) as next_stop_departure
from tablename t

Related

Firebird time-sequence data with gaps

I'm using Firebird 2.5 through Delphi 10.2/FireDAC. I have a table of historical data from industrial devices, one row per device at one-minute intervals. I need to be able to retrieve an arbitrary time window of records on an arbitrary interval span (every minute, every 3 minutes, every 15 minutes, etc.) which then gets used to generate a trending grid display or a report. This example query, pulling every 3 minutes records for a two-day window for 2 devices, does most of what I need:
select *
from avc_history
where (avcid in (11,12))
and (tstamp >= '6/26/2022') and (tstamp < '6/27/2022')
and (mod(extract(minute from tstamp), 3) = 0)
order by tstamp, avcid;
The problem with this is that, if device #11 has no rows for a part of the time span, then there is a mismatch in the number of rows returned for each device. I need to return a row for each device on each minutes period regardless of whether a row on that timestamp actually exists in the table - the other fields of the row can be null as long as the tstamp field has the timestamp and the avcid field has the device's ID.
I can fix this up in code, but that's a memory and time hog, so I want to avoid that if possible. Is there a way to structure a query (or if necessary, a stored procedure) to do what I need? Would some capability in the FireDAC library that I'm not aware of make this easier?

Running Total Based on State Change and Date Difference

I'm wanting to do a running total based on State and Date Difference.
I have machines that enter a row of data every few milliseconds providing it's current state, (Off-line, Stopped, Ready, Active & Error), some machine data and a timestamp.
These machines can run for a few minutes or a few days so using a date range doesn't work for the current status duration.
an example of the data is:-
RowID, MachineID, Status, TimeStamp
1, Machine1, Active, 27/04/2022 10:00:00.050
I want to pick up the current status, which I do by picking up the Top 1 entry by the machineID and ordering by RowID Descending
If my current Status is Active I want to know how long its been in that state, the machine could have been active for a few minutes or a few days so using a date range doesn't work for me, I want to perform a date diff from the last entry to the first entry that the Status changed to Active.
All advise is welcomed and thanks for reading my post.

How can I optimally modify this BiqQuery query to retrieve the latest available data

I have the following query. It initially performs a sub-select by querying on a table that is partitioned (sharded) by sample_date_time. It does this by filtering using a date range in the WHERE that is passed via parameters. Then the final SELECT selects the data to be returned.
The query currently returns data for the latest complete hour (from the beginning of the previous hours hourly boundary, to the end of it). I want to adapt it to instead to get the latest hour of data that contains any data sample, up to a maximum of approximately 5hrs ago. The query can't use anything that invalidates the BigQuery cache within any given hour (e.g. I can't use a date function that gets the current date). The table data only updates every hour.
I'm thinking maybe I need to select the max sample_date_time in the initial sub-select, over a range of the last 5 hours. I could pass the hourly end boundary of the current time as a parameter, but I'm not seeing how I can limit the date range for which to retrieve the MAX, then use that max to get the start and end dates of the most recent hour that has any data.
WITH data AS (
SELECT
created_date_time,
sample_date_time,
station,
channel,
value
FROM my.mart
WHERE sample_date_time BETWEEN '2019-07-23 04:00:00.000000+00:00' AND '2019-07-23 04:59:59.999000+00:00'
AND station = '[my_guid]'
)
SELECT sample_date_time, station, channel, value
FROM data
ORDER BY value desc, channel asc, sample_date_time desc

Getting random and last value from group in SQL

I have a SQL table containing train schedules. The table looks something like this:
Schedule
TrainNumber
LegID
DepartureTime
DepartureStation
ArrivalTime
ArrivalStation
My real database contain several tables, but for this question only the one above is relevant. Different trainNumber can have different amount of legs. Based on a departure station chosen by a user, I want to output all upcoming routes from that station.
The output must contain Departure time and Arrival station. But I don't want to include the legs in between. Can anyone guide me in the right direction on how I can achieve this? I tried using a max statement. But didn't quite get it to work the way I wanted to.
Also, there can be multiple departures by the same train number on the same day.
You would need to use the combination (DepartureTime + TrainNumber) as the key to your query, get the maximum arrival time given that combination of values, and then find out what the corresponding ArrivalStation is. So you could do an inner join between the Schedule and a grouped version of itself, i.e.
SELECT
TrainTableA.TrainNumber
,TrainTableA.DepartureTime
,ArrivalStation
FROM
(SELECT /* all upcoming train routes for given station */
TrainNumber
,DepartureTime
,ArrivalTime
,ArrivalStation
FROM
Schedule
WHERE DepartureStation = givenStation
) as TrainTableA
INNER JOIN
(SELECT /* Just the last station for each departure */
TrainNumber
,DepartureTime
,Max(ArrivalTime) as a
FROM
Schedule
GROUP BY
TrainNumber
,DepartureTime
) as TrainTableB
ON
TrainTableA.TrainNumber = TrainTableB.TrainNumber
AND TrainTableA.DepartureTime = TrainTableB.DepartureTime
AND TrainTableA.ArrivalTime = TrainTableB.a
I can't quite tell from the question if you have a universal indicator of the route sequence, so I used max(ArrivalTime). You could also use max(LegID) if each LegID is greater than the one before it. Also, I assumed that The ArrivalTime includes the date, so 1:00 AM on the next day is still later than 10:00 PM on the same day. So, of course, adjust to taste.

SQL: Calculating system load statistics

I have a table like this that stores messages coming through a system:
Message
-------
ID (bigint)
CreateDate (datetime)
Data (varchar(255))
I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when we get a ton of traffic, and times when we get little traffic. I'm thinking there are two parts to this problem: 1. Determine ranges of time that are considered peak load, 2. Calculate the average messages per second during these times.
Is this the right approach? Are there things in SQL that can help with this? Any tips would be greatly appreciated.
I agree, you have to figure out what Peak Load is first before you can start to create reports on it.
The first thing I would do is figure out how I am going to define peak load. Ex. Am I going to look at an hour by hour breakdown.
Next I would do a group by on the CreateDate formated in seconds (no milleseconds). As part of the group by I would do an avg based on number of records.
I don't think you'd need to know the peak hours; you can generate them with SQL, wrapping a the full query and selecting the top 20 entries, for example:
select top 20 *
from (
[...load query here...]
) qry
order by LoadPerSecond desc
This answer had a good lesson about averages. You can calculate the load per second by looking at the load per hour, and dividing by 3600.
To get a first glimpse of the load for the last week, you could try (Sql Server syntax):
select datepart(dy,createdate) as DayOfYear,
hour(createdate) as Hour,
count(*)/3600.0 as LoadPerSecond
from message
where CreateDate > dateadd(week,-7,getdate())
group by datepart(dy,createdate), hour(createdate)
To find the peak load per minute:
select max(MessagesPerMinute)
from (
select count(*) as MessagesPerMinute
from message
where CreateDate > dateadd(days,-7,getdate())
group by datepart(dy,createdate),hour(createdate),minute(createdate)
)
Grouping by datepart(dy,...) is an easy way to distinguish between days without worrying about month borders. It works until you select more that a year back, but that would be unusual for performance queries.
warning, these will run slow!
this will group your data into "second" buckets and list them from the most activity to least:
SELECT
CONVERT(char(19),CreateDate,120) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY CONVERT(Char(19),CreateDate,120)
ORDER BY 2 Desc
this will group your data into "minute" buckets and list them from the most activity to least:
SELECT
LEFT(CONVERT(char(19),CreateDate,120),16) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY LEFT(CONVERT(char(19),CreateDate,120),16)
ORDER BY 2 Desc
I'd take those values and calculate what they want