I'm learning some basic SQL skills and I was toying around with some data to see what I can do, however, I'm a bit stuck on how to proceed with my current problem.
I have a table that looks like this:
RouteID
SegmentID
Order
BeginTime
EndTime
CarID
1
45
1
10:00:30
10:01:00
1
1
46
2
10:01:00
10:01:30
1
1
47
3
10:01:30
10:02:00
1
2
50
1
10:05:00
10:05:30
1
2
49
2
10:06:00
10:06:30
1
3
900
1
20:01:00
20:01:30
2
As you can see, we have a bunch of cars and the routes they have driven. A route is simply a sequence of traveled roadsegments. Furthermore, we have the timestamps of entering the segments and leaving the segments.
The goal
What I want to do is to create an easy origin-destination analysis. I'm given a list of segments that belong to location A and similarly, I'm given a list of segments belonging to location B.
My solution
Alright, this is easy. I have made a new table that for each route lists the first and last segment. Using this table, it is then super easy to get the answer.
The problem
The routes themselves are not generated as they should be. In the above table for example, you'll see that car 1 has two routes. The first route end at 10:02:00 whereas the second route begins at 10:05:00. As the this time difference is less that 5 minutes apart, I'd very much like to consider this as 'one route'. I'm not necessarily interested in generating a new table where we glue all routes of the same car together provided that they lie within 5 minutes of each other (although, I'd be very interested in the type of queries one would have to write to accomplish this).
For now, I'd be very happy to get a clue/answer on how to somehow glue these routes together and to make a table where the origin and destination of each route is listed and routes of the same car with at most 5 minutes in between are considered as one route.
I thank you in advance. Although I had some close attempts, a proper solution seems beyond my current (very basic) SQL-skills.
You can compare ordered rows to flag a new route start for a car. And next get a virtual route number within a car. For example
select routeID, SegmentID, [Order], BeginTime, EndTime, CarID,
-- virtual route nbr within a car
sum(newRouteFlag) over(partition by CarID order by BeginTime) rtNmbr
from (
select routeID, SegmentID, [Order], BeginTime, EndTime, CarID,
case when lag(routeID) over(partition by CarID order by BeginTime) = routeID
or dateadd(minute, 5, lag(BeginTime) over(partition by CarID order by BeginTime)) >= BeginTime then 0 else 1 end newRouteFlag
from tbl
) t
Related
I'm using a dataset with fields "virtual_time" and "store_visited" and the data shows a user's activity pattern at different locations during different timestamps.
Problem is sometimes the user could be at the same location but there are several different records of the same place updated on the dataset with slightly different timestamps.
I'm trying to sort of I guess group those smaller timestamps together per location so the data makes better sense to me and I can later distinguish how much time that user has spent at each place.
For instance when I type:
SELECT DISTINCT virtual_time, store_visited
FROM public.consumer
WHERE user = 'e63a9'
ORDER BY 1;
I get back something like:
Store_visited virtual_time
1 M&S 2017-09-16 17:52:06
2 WholeFoods 2017-09-16 18:26:17
3 WholeFoods 2017-09-16 18:26:19
4 WholeFoods 2017-09-16 18:26:20
5 OysterRooms 2017-09-18 13:31:39
But I'd like to filter out the duplicate stores visited from rows 3,4, as they show the same location with only show a time difference of like 2 seconds and 1 second.
Ideally filtering it would show something like:
Store_visited virtual_time
1 M&S 2017-09-16 17:52:06
2 WholeFoods 2017-09-16 18:26:17
5 OysterRooms 2017-09-18 13:31:39
So that it's easier to distinguish the different timestamps at different stores.
Hope that make some sense. Any help would be GREATLY appreciated!
If you have any questions, please let me know!
Many thanks
You could round the timestamps to minutes:
select distinct store_visited, date_trunc('minute', virtual_time) as virtual_time
from consumer
order by 2;
This is the fastest but not very accurate solution. A better one is to check differences between consecutive rows and skip those which fall within a specific range. Use the window function lag():
select store_visited, virtual_time
from (
select
store_visited, virtual_time,
coalesce(virtual_time- lag(virtual_time) over w < '10 seconds', false) as neglible
from consumer
window w as (partition by store_visited order by virtual_time)
) s
where not neglible
order by 2;
store_visited | virtual_time
---------------+---------------------
M&S | 2017-09-16 17:52:06
WholeFoods | 2017-09-16 18:26:17
OysterRooms | 2017-09-18 13:31:39
(3 rows)
This is gasp and islands problem. You can solve this by using Row_Number Function
From Documentation :
number of the current row within its partition, counting from 1
select
store_visited,
virtual_time
from
(select
store_visited,
virtual_time,
row_number() over(partition by store_visited order by virtual_time asc) as vt
from
tbl) as new
where
vt = '1'
order by
virtual_time;
For Demo<>Fiddle
i tried to ready a lot of date comparisons that i found here on stackoverflow and spread into the internet but i wasn't able to find the solution.
I have the following table (Trips):
VehicleID DriverID xID CheckIn CheckOut DateHour
462 257 7 1 0 16/12/2017 20:40:00
462 257 7 0 1 19/12/2017 10:05:00
5032 3746 11 1 0 02/10/2017 07:00:00
5032 3746 11 0 1 06/10/2017 17:00:00
When my company receives a traffic ticket, i want to compare the date from the ticket with the hole block of dates from the table "Trips", each block starts with CheckIn = 1 and finishes with CheckOut = 1, so this way i will know which driver was responsable for the ticket through the DriverID.
For example: the traffic ticket date and time are: 17/12/2017 08:00:00 and the Vehicle is the one with id = 462, i'll insert this date and time in a field in our system to consult automaticaly which driver was driving that car at that moment, we won´t use the ticket table yet. Looking at my example, i know it should return DriverID = 257, but theres a lot of trips with the same vehicle and diferent drivers.....The major problem is how can i compare the Date and Hour from the Ticket with the range of dates from the trips, since i have to consider 1 trip = 2 lines in the table
Unfortunately i can't change the way this table was created, cause we need this 2 lines, CheckIn and CheckOut, separately.
Any thoughts or directions?
Thank you for your attention
select t1.VehicleID
,t1.DriverID
,t1.xID
,t1.DateHour as Checkin
,t2.DateHour as Checkout
from trips as t1 join trips as t2 --self join trips to get both start and end in a single row
on t1.VehicleID = t2.VehicleID -- add all columns
and t1.DriverID = t2.DriverID -- which define
and t1.xID = t2.xID -- a unique trip
and t1.Checkin = 1 -- start
and t2.Checkout = 1 -- end
join tickets -- now join tickets
on tickets.trafficDateHour between t1.DateHour and t2.DateHour
I didn't make sample tables, this will not run as is, but something like this should do it for you:
SELECT *
FROM tickets, trips
WHERE
trips.datehour in (
SELECT trips.datehour
FROM tickets, trips
WHERE
tickets.ticket_date < trips.datehour AND
trips.checkin = 0
) AND
tickets.ticket_date > trips.datehour AND
trips.checkin = 1
If you are running this for a specific date as described in the comment above, it will work. If you are trying to run it for a set of ticket dates all at once, you'll require recursion. Recursion is a different beast depending on your flavor of SQL.
I have difficulties formulating my issue.
I have a view which brings these results. There's a need to add a column to the view, which will pair up round-trip flights with identical number.
Flt_No From_Airport To_Airport Dep_Date RequiredResult
124 |LCA |CDG |10/19/14 5:00 1
125 |CDG |LCA |10/19/14 10:00 1
197 |LCA |BCN |10/4/12 5:00 2
198 |BCN |LCA |10/4/12 11:00 2
501 |LCA |HER |15/8/12 12:05 3
502 |HER |LCA |15/8/12 15:15 3
I.e. flight 124 is going from Larnaca to CDG, and flight 125 is going back from CDG to Larnaca - they both have to have the same identifier.
Round-trip flights will always have following flight numbers.
I have a bunch of conditions which I won't write now.
Omitting hours is not an option, they're important.
I was thinking dense_rank() but I don't know how to create one identifier for 2 flights with different numbers, please help.
If your data is similar to the sample data posted, then the following query should give the required result:
SELECT *,
DENSE_RANK() OVER (ORDER BY CASE
WHEN From_Airport < To_Airport THEN From_Airport
ELSE To_Airport
END)
FROM mytable
Join conditions are not limited to simple equality. Assuming {Flight No, Departure, Destination} is unique on any one day, then a self join should do it:
select whatever
from flights outbound
inner join flights inbound on outbound.flt_no+1 = inbound.flt_no
and cast(outbound.dep_date, date)
= cast(inbound.dep_date, date)
and outbound.From_Airport = inbound.To_Airport
and outbound.To_Airpott = inbound.From_Ariport
I have a person's flight history and want to find their most frequent route. All flights are stored as a single row in a table, even return trips where a->b will be in one row and b->a will be in another.
I need to identify where two legs equate to a route; for example:
This person has flown 16 times in total
New York to Paris 2 times (Flight key: JFKCDG)
Paris to New York 2 times (Flight Key: CDGJFK)
New York to London 3 times (Flight Key: JFKLHR)
Currently I don't know a way to group the first two above as a 'Route' and therefore any query I write considers JFKLHR to be the most frequent route (6 times between NY and London) even though I can see from the data that this person has flown between NY and Paris a total of 10 times
Sample Table:
User ID¦Flight Key
-------------------
1 ¦JFKCDG
1 ¦JFKCDG
1 ¦CDGJFK
1 ¦CDGJFK
1 ¦JFKLHR
1 ¦JFKLHR
1 ¦JFKLHR
Expected Output
User ID¦Flight Key¦Count
------------------------
1 ¦JFKCDGJFK ¦4
Building on the clever idea in the answer by #fancyPants. You can use string functions to compare each leg of a route and patch together a full return trip.
I believe this query should work. The first part of the common table expression turns those flights that are round trips into three parts (src-dst-src) and the second part returns those that are one way (as src-dst).
with flights_cte as (
select
USERID,
case when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end as flightkey,
count(*) count
from flights f
where exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by
userid,
case
when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end
union all
select userid, FlightKey, count(*)
from flights f
where not exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by UserID, FlightKey
)
select flights_cte.userid, flights_cte.flightkey, flights_cte.count
from flights_cte
join (select userid, max(count) _max_count from flights_cte group by userid) _max
on flights_cte.UserID=_max.UserID and flights_cte.count = _max_count
A sample SQL Fiddle gives this output:
| USERID | FLIGHTKEY | COUNT |
|--------|-----------|-------|
| 1 | JFKCDGJFK | 4 |
Assuming routes are not a single row, otherwise you wouldn't be asking.. (although I would guess that the whole route is in some other table, maybe reservation-related)
Guessing the first step is to group this data by person and flights that compose a 'route'. I have an article called T-SQL: Identify bad dates in a time series where the time series can be modified to detect gaps between legs of over a day (guess) to differentiate routes. Second step would be to convert legs into route, i.e. JFK-CDG and CDG-JFK to single value JFK-CDG-JFK.
Then it would be a single query, counting the above single value route, and ORDER BY that count.
Good luck.
I have a database that is used to track the location of physical objects, lets call them widgets. It has an audit trail table that tracks when a widget is put in a location, and when it leaves a location (and where it went after that).
So conceptually it looks like this
Widget ID Date Old Location New Location
1 01-Oct-2013 NULL 101
1 03-Oct-2013 101 108
1 08-Oct-2013 108 101
2 01-oct-2013 NULL 101
2 02-Oct-2013 101 103
3 12-oct-2013 NULL 101
I want to be able to query a list of which widgets were in location 101 between a start and end date, such as 08-09 Oct 2013, this should be widget 1 but not widget 2 or 3.
I'm not sure how to get all these cases. I can pull a list of widget's that were moved in before the end, and a list of widgets that were moved out before the start, but that would also eliminate widget 1 as it leaves and comes back.
I think I need to convert this to a table with widget, location, entry date and exit date, but I'm not sure how to do that ?
EDIT: As pointed out, My data was wrong, I've updated to make the question the 8th to 9th (it was the 4th to 5th). So Widget 1 is the only widget in location 101 in that period.
Try something like this:
select *
from
(select "Widget ID" id,
"New Location" loc,
"Date" start_date,
lead("Date", 1, sysdate) over (partition by "Widget ID" order by "Widget ID") end_date
from widgets) t
where t.loc = 101
and start_date < <<your_ending_date>> and end_date > <<your_starting_date>>
here is a sqlfiddle demo (note that I changed you data a little bit)
So you need last state of each widget within period.
Probably need subselect statement that selects all widgets between dates, groups them by id, orders by Date desc, selects top 1, so you know widget's last state within the period.
UPDATE according to new conditions
I want to know if the widget was in the location at any time during
the period
You make select with distinct IDs and a subselect with EXISTS that checks if the row with the current ID and date within period and new location = X presents in resultset. This will make you know what items came to store at least 1 time.