Getting random and last value from group in SQL - sql

I have a SQL table containing train schedules. The table looks something like this:
Schedule
TrainNumber
LegID
DepartureTime
DepartureStation
ArrivalTime
ArrivalStation
My real database contain several tables, but for this question only the one above is relevant. Different trainNumber can have different amount of legs. Based on a departure station chosen by a user, I want to output all upcoming routes from that station.
The output must contain Departure time and Arrival station. But I don't want to include the legs in between. Can anyone guide me in the right direction on how I can achieve this? I tried using a max statement. But didn't quite get it to work the way I wanted to.
Also, there can be multiple departures by the same train number on the same day.

You would need to use the combination (DepartureTime + TrainNumber) as the key to your query, get the maximum arrival time given that combination of values, and then find out what the corresponding ArrivalStation is. So you could do an inner join between the Schedule and a grouped version of itself, i.e.
SELECT
TrainTableA.TrainNumber
,TrainTableA.DepartureTime
,ArrivalStation
FROM
(SELECT /* all upcoming train routes for given station */
TrainNumber
,DepartureTime
,ArrivalTime
,ArrivalStation
FROM
Schedule
WHERE DepartureStation = givenStation
) as TrainTableA
INNER JOIN
(SELECT /* Just the last station for each departure */
TrainNumber
,DepartureTime
,Max(ArrivalTime) as a
FROM
Schedule
GROUP BY
TrainNumber
,DepartureTime
) as TrainTableB
ON
TrainTableA.TrainNumber = TrainTableB.TrainNumber
AND TrainTableA.DepartureTime = TrainTableB.DepartureTime
AND TrainTableA.ArrivalTime = TrainTableB.a
I can't quite tell from the question if you have a universal indicator of the route sequence, so I used max(ArrivalTime). You could also use max(LegID) if each LegID is greater than the one before it. Also, I assumed that The ArrivalTime includes the date, so 1:00 AM on the next day is still later than 10:00 PM on the same day. So, of course, adjust to taste.

Related

Capture a Value in SQL

I want to capture the last value recorded right before a certain time was recorded. In Healthcare terms I need the max flowsheet value 0-10 that was recorded right before a pain medication was given.
I can add the max(flowsheet recorded time) but I am not sure how to add in the time of the medication so I get the max value that was recorded.
I want to capture the last value recorded right before a certain time was recorded. In Healthcare terms I need the max flowsheet value 0-10 that was recorded right before a pain medication was given.
We know little about your database, so here is the general approach. You want to look at rows before the medication (where mydate < medication_date). Of these rows, you say you to want the maximum flowsheet value (max(flowsheet)).
Furthermore, in your request comments you say the medication_date is in another table.
Putting these things together we get something like:
select max(flowsheet)
from mytable
where mydate < (select medication_date from medication);
Well, the medication table won't really have just one row with the global medication date. So let's assume both tables refer to patients, and you want the information for a particular patient. This would be something like this:
select max(flowsheet)
from mytable t
where patient_id = 12345
and mydate <
(
select medication_date
from medication m
where m.patient_id = t.patient_id
);

Count combination + frequency

I have such an assignment. I believe my guess is correct, however I didn't find anything confirming my assumption how frequency works with count function.
What was the most popular bike route (start/end station combination) in DC’s bike-share program over the first 3 months of 2012? How many times was the route taken?
• duration seconds: duration of the ride (in seconds)
• start time, end time: datetimes representing the beginning and end of the ride
• start station, end station: name of the start and end stations for the ride
This is the code I wrote, wanted to see if my guess regarding most popular route (i believe it is a frequency) is correct with COUNT combination.
If someone can confirm if my guess is right, I will appreciate.
SELECT start_station, end_station, count(*) AS ct_route_taken
FROM tutorial.dc_bikeshare_q1_2012
GROUP BY start_station, end_station
ORDER BY ct_route_taken DESC
LIMIT 1;
Just count(*).
The name of the table would indicate that we need no WHERE clause.
If that's misleading and it covers a greater time interval, add a (proper!) WHERE clause like this:
WHERE start_time >= '2012-01-01'
AND start_time < '2012-04-01'
Your query would eliminate most of '2012-03-31', since start_time is supposed to be a "datetime" type. Depending on which type exactly and where the date of "the first 3 months" is supposed to be located, we might need to adjust for time zone also.
See:
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_BETWEEN_.28especially_with_timestamps.29
Ignoring time zones altogether in Rails and PostgreSQL
From description and the query look like ok if the start station and end station description are same for each and every station. however without looking into the table data it is little difficult to confirm.

TSQL query to find latest (current) record from period column when there are past present and future records

edited as requested:
My apologies. I've been dealing with this a bit and it's well and truly in my head, but not for the reader.
We have multiple records in table A which have multiple entries in the Period column. Say it's like a football schedule. Teams will have multiple dates/times in the Period column.
When we run query:
We want records selected for the most recent games only.
We don't want the earlier games.
We don't want the games "scheduled" and not yet played.
"Last game played" i.e. Period for teams are often on different days.
Table like:
Team Period
Reds 2021020508:00
Reds 2021011107:00
City 2021030507:00
Reds 2021032607:00
City 2021041607:00
Reds 2021050707:00
When I run query, I want to see the records for last game played regardless of date. So if I run the query on 27 Mar 2021, I want:
City 2021030507:00
Reds 2021032607:00
Keep in mind I used the above as an easily understandable example. In my case I have 1000s of "Teams" each of which may have 100+ different date entries in the Period column and I would like the solution to be applicable regardless of number of records, dates, or when the query is run.
What can I do?
Thanks!
So this gives you your desired output using the sample data, does it fulfil your requirement?
create table x (Team varchar(10), period varchar(20))
insert into x values
('Reds','2021020508:00'),
('Reds','2021011107:00'),
('City','2021030507:00'),
('Reds','2021032607:00'),
('City','2021041607:00'),
('Reds','2021050707:00')
select Team, Max(period) LastPeriod
from x
where period <=Format(GetDate(), 'yyyyMMddhh:mm')
group by Team
The string-formatted date you have order by text, so I think this would work
SELECT TOP 2 *
FROM tableA
WHERE period = FORMAT( GETDATE(), 'yyyyMMddhh:mm' )
ORDER BY period
Perhaps you want:
where period = (select max(t2.period) from t t2)
This returns all rows with the last period in the table.

Use SQL to ensure I have data for each day of a certain time period

I'm looking to only select one data point from each date in my report. I want to ensure each day is accounted for and has at least one row of information, as we had to do a few different things to move a large data file into our data warehouse (import one large Google Sheet for some data, use Python for daily pulls of some of the other data - want to make sure no date was left out), and this data goes from now through last summer. I could do a COUNT DISTINCT clause to just make sure the number of days between the first data point and yesterday (the latest data point), but I want to verify each day is accounted for. Should mention I am in BigQuery. Also, an example of the created_at style is: 2021-02-09 17:05:44.583 UTC
This is what I have so far:
SELECT FIRST(created_at)
FROM 'large_table'
ORDER BY created_at
**I know FIRST is probably not the best clause for this case, and it's currently acting to grab the very first data point in created_at, but just as a jumping-off point.
You can use aggregation:
select any_value(lt).*
from large_table lt
group by created_at
order by min(created_at);
Note: This assumes that created_at is a date -- or at least only has one value per date. You might need to convert it to a date:
select any_value(lt).*
from large_table lt
group by date(created_at)
order by min(created_at);
BigQuery equivalent of the query in your question
SELECT created_at
FROM 'large_table'
ORDER BY created_at
LIMIT 1

SQL how to implement if and else by checking column value

The table below contains customer reservations. Customers come and make one record in this table, and the last day this table will be updated its checkout_date field by putting that current time.
The Table
Now I need to extract all customers spending nights.
The Query
SELECT reservations.customerid, reservations.roomno, rooms.rate,
reservations.checkin_date, reservations.billed_nights, reservations.status,
DateDiff("d",reservations.checkin_date,Date())+Abs(DateDiff("s",#12/30/1899
14:30:0#,Time())>0) AS Due_nights FROM reservations, rooms WHERE
reservations.roomno=rooms.roomno;
What I need is, if customer has checkout status, due nights will be calculated checkin_date subtracting by checkout date instead current date, also if customer has checkout date no need to add extra absolute value from 14:30.
My current query view is below, also my computer time is 14:39 so it adds 1 to every query.
Since you want to calculate the Due nights upto the checkout date, and if they are still checked in use current date. I would suggest you to use an Immediate If.
The condition to check would be the status of the room. If it is checkout, then use the checkout_date, else use the Now(), something like.
SELECT
reservations.customerid,
reservations.roomno,
rooms.rate,
reservations.checkin_date,
reservations.billed_nights,
reservations.status,
DateDiff("d", checkin_date, IIF(status = 'checkout', checkout_date, Now())) As DueNights
FROM
reservations
INNER JOIN
rooms
ON reservations.roomno = rooms.roomno;
As you might have noticed, I used a JOIN. This is more efficient than merging the two tables with common identifier. Hope this helps !