How can I group flight legs together into routes for counting? - sql

I have a person's flight history and want to find their most frequent route. All flights are stored as a single row in a table, even return trips where a->b will be in one row and b->a will be in another.
I need to identify where two legs equate to a route; for example:
This person has flown 16 times in total
New York to Paris 2 times (Flight key: JFKCDG)
Paris to New York 2 times (Flight Key: CDGJFK)
New York to London 3 times (Flight Key: JFKLHR)
Currently I don't know a way to group the first two above as a 'Route' and therefore any query I write considers JFKLHR to be the most frequent route (6 times between NY and London) even though I can see from the data that this person has flown between NY and Paris a total of 10 times
Sample Table:
User ID¦Flight Key
-------------------
1 ¦JFKCDG
1 ¦JFKCDG
1 ¦CDGJFK
1 ¦CDGJFK
1 ¦JFKLHR
1 ¦JFKLHR
1 ¦JFKLHR
Expected Output
User ID¦Flight Key¦Count
------------------------
1 ¦JFKCDGJFK ¦4

Building on the clever idea in the answer by #fancyPants. You can use string functions to compare each leg of a route and patch together a full return trip.
I believe this query should work. The first part of the common table expression turns those flights that are round trips into three parts (src-dst-src) and the second part returns those that are one way (as src-dst).
with flights_cte as (
select
USERID,
case when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end as flightkey,
count(*) count
from flights f
where exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by
userid,
case
when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end
union all
select userid, FlightKey, count(*)
from flights f
where not exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by UserID, FlightKey
)
select flights_cte.userid, flights_cte.flightkey, flights_cte.count
from flights_cte
join (select userid, max(count) _max_count from flights_cte group by userid) _max
on flights_cte.UserID=_max.UserID and flights_cte.count = _max_count
A sample SQL Fiddle gives this output:
| USERID | FLIGHTKEY | COUNT |
|--------|-----------|-------|
| 1 | JFKCDGJFK | 4 |

Assuming routes are not a single row, otherwise you wouldn't be asking.. (although I would guess that the whole route is in some other table, maybe reservation-related)
Guessing the first step is to group this data by person and flights that compose a 'route'. I have an article called T-SQL: Identify bad dates in a time series where the time series can be modified to detect gaps between legs of over a day (guess) to differentiate routes. Second step would be to convert legs into route, i.e. JFK-CDG and CDG-JFK to single value JFK-CDG-JFK.
Then it would be a single query, counting the above single value route, and ORDER BY that count.
Good luck.

Related

How to mix sql consults to make conditions to another one

I've the following tables
series_trailers:
ID EPISODEID CONTENT AUTHOR
-----------------------------
1 122383 url1 Peter
2 9999 url2 Ana
3 923822 stuff Jhon
4 122384 url3 Drake
series_episodes:
ID TITLE SERIESID
--------------------------------
122383 Episode 1 23
9999 Somethingweird 87
923822 Randomtitle 52
122384 Episode 2 23
series:
ID TITLE
-------------------
23 Stranger Things
87 Seriesname
512 Sometrashseries
As you can see there are three tables: one with the series info, one with the series' episodes and another one which contains urls that redirect to the episode's trailers. I'd like to get the lastest rows from series_trailers but without repeating the series where they're from.
I've tried with SELECT DISTINCT EPISODEID FROM series_trailers ORDER BY id DESCbut there are two rows with the same episodes' series so I'll get the seriies Stranger things twice. Summing up I'd like to display the lastest series with new urls but I don't want to get duplicated series (that's what i'd get with the sql above)
EDIT: What I'm supposed to get:
Last updated series:
Stranger Things
Seriesname
Sometrashseries
What I'd get with my sql code:
Stranger Things
Seriesname
Sometrashseries
Stranger Things (again)
If I understood correctly, here is the latest trailer for the latest episodes (latest as in the highest series ID / series_trailer ID, so most likely added lastest).
WITH MostRecentTrailers
AS (
SELECT MAX(st.ID) "TRAILERID"
,s.ID "SERIESID"
,s.TITLE "SERIESTITLE"
FROM series_trailers st
JOIN series_episodes se ON se.ID = st.EPISODEID
JOIN series s ON s.ID = se.SERIESID
GROUP BY s.ID
,s.TITLE
ORDER BY s.ID DESC
)
SELECT *
FROM MostRecentTrailers mrt
JOIN series_trailers st ON st.ID = mrt.TRAILERID
Let me know if that does it for ya.
Edit: Fixed some typo mistakes.
This gives you the trailer with the highest ID for each episode. This answer is based on the assumption that the episode with the highest ID is the latest one.
select id, content from series_trailer where episode_id in
(select max(id)
from series_episodes
group by seriesid)

The best way to keep count data in postgres

I need to create a statistic for some aggragete date splitted by days.
For example:
select
(select count(*) from bananas) as bananas_count,
(select count(*) from apples) as apples_count,
(select count(*) from bananas where color = 'yellow') as yellow_bananas_count;
obviously I will get:
bananas_count | apples_count | yellow_bananas_count
--------------+------------------+ ---------------------
123| 321 | 15
but I need to get that data grouped by day, we need to know how many banaras we had yesterday.
The first thought which I got is create aview, but in that case i will not be able split by dates ( or I don't know how to do it).
I need a performance-wise database sided implementation of this task.

GROUP BY and aggregate function query

I am looking at making a simple leader board for a time trial. A member may perform many time trials, but I only want for their fastest result to be displayed. My table columns are as follows:
Members { ID (PK), Forename, Surname }
TimeTrials { ID (PK), MemberID, Date, Time, Distance }
An example dataset would be:
Forename | Surname | Date | Time | Distance
Bill Smith 01-01-11 1.14 100
Dave Jones 04-09-11 2.33 100
Bill Smith 02-03-11 1.1 100
My resulting answer from the example above would be:
Forename | Surname | Date | Time | Distance
Bill Smith 02-03-11 1.1 100
Dave Jones 04-09-11 2.33 100
I have this so far, but access complains that I am not using Date as part of an aggregate function:
SELECT Members.Forename, Members.Surname, Min(TimeTrials.Time) AS MinOfTime, TimeTrials.Date
FROM Members
INNER JOIN TimeTrials ON Members.ID = TimeTrials.Member
GROUP BY Members.Forename, Members.Surname, TimeTrials.Distance
HAVING TimeTrials.Distance = 100
ORDER BY MIN(TimeTrials.Time);
IF I remove the Date from the SELECT the query works (without the date). I have tried using FIRST upon the TimeTrials.Date, but that will return the first date which is normally incorrect.
Obviously putting the Date as part of the GROUP BY would not return the result set that I am after.
Make this task easier on yourself by starting with a smaller piece of the problem. First get the minimum Time from TimeTrials for each combination of MemberID and Distance.
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance;
Assuming that SQL is correct, use it in a subquery which you join back to TimeTrials again.
SELECT tt2.*
FROM
TimeTrials AS tt2
INNER JOIN
(
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance
) AS sub
ON
tt2.MemberID = sub.MemberID
AND tt2.Distance = sub.Distance
AND tt2.Time = sub.MinOfTime
WHERE tt2.Distance = 100
ORDER BY tt2.Time;
Finally, you can join that query to Members to get Forename and Surname. Your question shows you already know how to do that, so I'll leave it for you. :-)

JavaDB: get ordered records in the subquery

I have the following "COMPANIES_BY_NEWS_REPUTATION" in my JavaDB database (this is some random data just to represent the structure)
COMPANY | NEWS_HASH | REPUTATION | DATE
-------------------------------------------------------------------
Company A | 14676757 | 0.12345 | 2011-05-19 15:43:28.0
Company B | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company C | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company A | -7874564 | 0.12345 | 2011-05-19 15:43:28.0
One news_hash may relate to several companies while a company can relate to several news_hashes as well. Reputation and date are bound to the news_hash.
What I need to do is calculate the average reputation of last 5 news for every company. In order to do that I somehow feel that I need to user 'order by' and 'offset' in a subquery as shown in the code below.
select COMPANY, avg(REPUTATION) from
(select * from COMPANY_BY_NEWS_REPUTATION order by "DATE" desc
offset 0 rows fetch next 5 row only) as TR group by COMPANY;
However, JavaDB allows neither ORDER BY, nor OFFSET in a subquery. Could anyone suggest a working solution for my problem please?
Which version of JavaDB are you using? According to the chapter TableSubquery in the JavaDB documentation, table subqueries do support order by and fetch next, at least in version 10.6.2.1.
Given that subqueries can be ordered and the size of the result set can be limited, the following (untested) query might do what you want:
select COMPANY, (select avg(REPUTATION)
from (select REPUTATION
from COMPANY_BY_NEWS_REPUTATION
where COMPANY = TR.COMPANY
order by DATE desc
fetch first 5 rows only))
from (select distinct COMPANY
from COMPANY_BY_NEWS_REPUTATION) as TR
This query retrieves all distinct company names from COMPANY_BY_NEWS_REPUTATION, then retrieves the average of the last five reputation rows for each company. I have no idea whether it will perform sufficiently, that will likely depend on the size of your data set and what indexes you have in place.
If you have a list of unique company names in another table, you can use that instead of the select distinct ... subquery to retrieve the companies for which to calculate averages.

Postgresql (Rails 3) merge rows on column (same table)

First, I've been using mysql for forever and am now upgrading to postgresql. The sql syntax is much stricter and some behavior different, thus my question.
I've been searching around for how to merge rows in a postgresql query on a table such as
id | name | amount
0 | foo | 12
1 | bar | 10
2 | bar | 13
3 | foo | 20
and get
name | amount
foo | 32
bar | 23
The closest I've found is Merge duplicate records into 1 records with the same table and table fields
sql returning duplicates of 'name':
scope :tallied, lambda { group(:name, :amount).select("charges.name AS name,
SUM(charges.amount) AS amount,
COUNT(*) AS tally").order("name, amount desc") }
What I need is
scope :tallied, lambda { group(:name, :amount).select("DISTINCT ON(charges.name) charges.name AS name,
SUM(charges.amount) AS amount,
COUNT(*) AS tally").order("name, amount desc") }
except, rather than returning the first row of a given name, should return mash of all rows with a given name (amount added)
In mysql, appending .group(:name) (not needing the initial group) to the select would work as expected.
This seems like an everyday sort of task which should be easy. What would be a simple way of doing this? Please point me on the right path.
P.S. I'm trying to learn here (so are others), don't just throw sql in my face, please explain it.
I've no idea what RoR is doing in the background, but I'm guessing that group(:name, :amount) will run a query that groups by name, amount. The one you're looking for is group by name:
select name, sum(amount) as amount, count(*) as tally
from charges
group by name
If you append amount to the group by clause, the query will do just that -- i.e. count(*) would return the number of times each amount appears per name, and the sum() would return that number times that amount.