GROUP BY and aggregate function query - sql

I am looking at making a simple leader board for a time trial. A member may perform many time trials, but I only want for their fastest result to be displayed. My table columns are as follows:
Members { ID (PK), Forename, Surname }
TimeTrials { ID (PK), MemberID, Date, Time, Distance }
An example dataset would be:
Forename | Surname | Date | Time | Distance
Bill Smith 01-01-11 1.14 100
Dave Jones 04-09-11 2.33 100
Bill Smith 02-03-11 1.1 100
My resulting answer from the example above would be:
Forename | Surname | Date | Time | Distance
Bill Smith 02-03-11 1.1 100
Dave Jones 04-09-11 2.33 100
I have this so far, but access complains that I am not using Date as part of an aggregate function:
SELECT Members.Forename, Members.Surname, Min(TimeTrials.Time) AS MinOfTime, TimeTrials.Date
FROM Members
INNER JOIN TimeTrials ON Members.ID = TimeTrials.Member
GROUP BY Members.Forename, Members.Surname, TimeTrials.Distance
HAVING TimeTrials.Distance = 100
ORDER BY MIN(TimeTrials.Time);
IF I remove the Date from the SELECT the query works (without the date). I have tried using FIRST upon the TimeTrials.Date, but that will return the first date which is normally incorrect.
Obviously putting the Date as part of the GROUP BY would not return the result set that I am after.

Make this task easier on yourself by starting with a smaller piece of the problem. First get the minimum Time from TimeTrials for each combination of MemberID and Distance.
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance;
Assuming that SQL is correct, use it in a subquery which you join back to TimeTrials again.
SELECT tt2.*
FROM
TimeTrials AS tt2
INNER JOIN
(
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance
) AS sub
ON
tt2.MemberID = sub.MemberID
AND tt2.Distance = sub.Distance
AND tt2.Time = sub.MinOfTime
WHERE tt2.Distance = 100
ORDER BY tt2.Time;
Finally, you can join that query to Members to get Forename and Surname. Your question shows you already know how to do that, so I'll leave it for you. :-)

Related

How do I calculate multiple sums and averages in a single query?

I am in a SQL class and struggling with one of the questions. We are using the AdventureWorksDW2014 database in SQL Server and this is the problem I'm stuck on:
Write a query that will return the employee key, first name, middle name, last name, total sales, and average amount per sale for every employee who has made sales to resellers. All monetary values should be rounded to two decimal places. Names should appear as a single record as "Last, First Middle." Sort the results by total sales (highest first), then by average amount per sale (highest first), then by employee name.
I have no problem selecting the EmployeeKey, nor with using concat and formatting the name as instructed. After exploring the data, it is clear that the employee information will need to come from the DimEmployee table, and the sales figures will need to come from the FactResellerSales table, and I am able to complete the inner join between the tables with no problem. I also know how to use the sum and avg functions to calculate the totals and averages for the employees individually, but those will only calculate for one employee at a time and only returns a single result. The part that I'm hung up on is creating the columns for the calculated sums and averages for each employee. The result I need to come up with needs to have a single column that shows the total sales of each employee and a single column that shows the average amount per sales for each employee, along with other information requested for each employee. So far, I have run
select distinct EmployeeKey
from FactResellerSales
to determine which employee keys are associated with sales, and it shows that there are 17. I attempted to construct the query using a subquery for each employee in the from statement,
(select EmployeeKey, sum (SalesAmount) as TotalSalesByEmp, avg (SalesAmount)
as AvgPerSaleByEmp
from FactResellerSales
where EmployeeKey = 272)
thinking that, even though it would be time consuming to do 17 subqueries, I could ultimately draw the requested data from them into the main query, but I get an error message of "Msg 8120, Level 16, State 1, Line 359
Column 'FactResellerSales.EmployeeKey' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" when I try to test the subquery. But I can't leave out the EmployeeKey as I need it for the linking field of the inner join. My query so far (including the aliases I will use for the other fields as appropriate in the order by statement) is:
USE AdventureWorksDW2014
select e.EmployeeKey,
concat (e.LastName, ', ' + e.FirstName, ' ' + e.MiddleName) as EmployeeName
from FactResellerSales as s
inner join DimEmployee as e
on s.EmployeeKey = e.EmployeeKey
order by TotalSalesByEmp desc, AvgPerSaleByEmp desc, EmployeeName
I just need to figure out how to add the other two fields.
I've already described what the results I need should look like, but since that is apparently not good enough for some people, I will try to give an example. Apologies if the formatting is weird in the transition (I promise it looks right as I'm typing it).
| EmployeeKey | EmployeeName | TotalSalesByEmp | AvgPerSaleByEmp |
| 282 | Mitchell, Linda C | 10367007.43 | 1458.70 |
| 283 | Carson, Jillian | 10065803.54 | 1286.36 |
| 281 | Blythe, Michael G | 9293903.01 | 1314.74 |
| 272 | Jiang, Stephen Y | 1092123.86 | 1378.94 |
Please help.
Simply run your aggregation with GROUP BY on employee details which will calculate the total and average reseller sales across all 17 employees:
USE AdventureWorksDW2014
select e.EmployeeKey,
concat(e.LastName, ', ' + e.FirstName, ' ' + e.MiddleName) as EmployeeName,
sum(s.SalesAmount) as TotalSalesByEmp,
avg(s.SalesAmount) as AvgPerSaleByEmp
from FactResellerSales as s
inner join DimEmployee as e
on s.EmployeeKey = e.EmployeeKey
group by e.EmployeeKey,
e.LastName,
e.FirstName,
e.MiddleName
order by TotalSalesByEmp desc,
AvgPerSaleByEmp desc,
EmployeeName

Is there a way to select results after a certain id in an order list?

I'm trying to implement a cursor-based paginating list based off of data from a Postgres database.
As an example, say I have a table with the following columns:
id | firstname | lastname
I want to paginate this data, which would be pretty simple if I only ever wanted to sort it by the id, but in my case, I want the option to sort by last name, and there's guaranteed to be multiple people with the same last name.
If I have a select statement like follows:
SELECT * FROM people
ORDER BY lastname ASC;
In the case, I could make my encoded cursor contain information about the lastname so I could pick up where I left off, but since there will be multiple users with the same last name, this will be buggy. Is there a way in SQL to only get the results after a certain id in an ordered list where it is not the column by which the results are sorted?
Example results from the select statement:
1 | John | Doe
4 | John | Price
2 | Joe | White
6 | Jim | White
3 | Sam | White
5 | Sally | Young
If I wanted a page size of 3, I couldn't add WHERE lastname <= :lastname as I'd have duplicate data on the list since it would return ids 2, 6, and 3 during that call. In my case, it'd be helpful if I could add to my query something similar to AFTER id = 6 where it could skip everything until it finds that id in the ordered list.
Yes. If I understand correctly:
select t.*
from t
where (lastname, id) > (select t2.lastname, t2.id
from t t2
where t2.id = ?
)
order by t.lastname;
I think I would add firstname into the mix, but it is the same idea.
Limit and offset are used for pagination e.g.:
SELECT id, lastname, firstname FROM people
Order by lastname, firstname, id
Offset 0
Limit 10
This will bring you the first to the 10th row, to retrieve the next page you need to specify the offset to 10
Here the documentation:
https://www.postgresql.org/docs/9.6/static/queries-limit.html

How to mix sql consults to make conditions to another one

I've the following tables
series_trailers:
ID EPISODEID CONTENT AUTHOR
-----------------------------
1 122383 url1 Peter
2 9999 url2 Ana
3 923822 stuff Jhon
4 122384 url3 Drake
series_episodes:
ID TITLE SERIESID
--------------------------------
122383 Episode 1 23
9999 Somethingweird 87
923822 Randomtitle 52
122384 Episode 2 23
series:
ID TITLE
-------------------
23 Stranger Things
87 Seriesname
512 Sometrashseries
As you can see there are three tables: one with the series info, one with the series' episodes and another one which contains urls that redirect to the episode's trailers. I'd like to get the lastest rows from series_trailers but without repeating the series where they're from.
I've tried with SELECT DISTINCT EPISODEID FROM series_trailers ORDER BY id DESCbut there are two rows with the same episodes' series so I'll get the seriies Stranger things twice. Summing up I'd like to display the lastest series with new urls but I don't want to get duplicated series (that's what i'd get with the sql above)
EDIT: What I'm supposed to get:
Last updated series:
Stranger Things
Seriesname
Sometrashseries
What I'd get with my sql code:
Stranger Things
Seriesname
Sometrashseries
Stranger Things (again)
If I understood correctly, here is the latest trailer for the latest episodes (latest as in the highest series ID / series_trailer ID, so most likely added lastest).
WITH MostRecentTrailers
AS (
SELECT MAX(st.ID) "TRAILERID"
,s.ID "SERIESID"
,s.TITLE "SERIESTITLE"
FROM series_trailers st
JOIN series_episodes se ON se.ID = st.EPISODEID
JOIN series s ON s.ID = se.SERIESID
GROUP BY s.ID
,s.TITLE
ORDER BY s.ID DESC
)
SELECT *
FROM MostRecentTrailers mrt
JOIN series_trailers st ON st.ID = mrt.TRAILERID
Let me know if that does it for ya.
Edit: Fixed some typo mistakes.
This gives you the trailer with the highest ID for each episode. This answer is based on the assumption that the episode with the highest ID is the latest one.
select id, content from series_trailer where episode_id in
(select max(id)
from series_episodes
group by seriesid)

How can I group flight legs together into routes for counting?

I have a person's flight history and want to find their most frequent route. All flights are stored as a single row in a table, even return trips where a->b will be in one row and b->a will be in another.
I need to identify where two legs equate to a route; for example:
This person has flown 16 times in total
New York to Paris 2 times (Flight key: JFKCDG)
Paris to New York 2 times (Flight Key: CDGJFK)
New York to London 3 times (Flight Key: JFKLHR)
Currently I don't know a way to group the first two above as a 'Route' and therefore any query I write considers JFKLHR to be the most frequent route (6 times between NY and London) even though I can see from the data that this person has flown between NY and Paris a total of 10 times
Sample Table:
User ID¦Flight Key
-------------------
1 ¦JFKCDG
1 ¦JFKCDG
1 ¦CDGJFK
1 ¦CDGJFK
1 ¦JFKLHR
1 ¦JFKLHR
1 ¦JFKLHR
Expected Output
User ID¦Flight Key¦Count
------------------------
1 ¦JFKCDGJFK ¦4
Building on the clever idea in the answer by #fancyPants. You can use string functions to compare each leg of a route and patch together a full return trip.
I believe this query should work. The first part of the common table expression turns those flights that are round trips into three parts (src-dst-src) and the second part returns those that are one way (as src-dst).
with flights_cte as (
select
USERID,
case when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end as flightkey,
count(*) count
from flights f
where exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by
userid,
case
when left(flightkey,3) > right(flightkey,3)
then concat(flightkey, left(flightkey,3))
else concat(right(flightkey,3), flightkey)
end
union all
select userid, FlightKey, count(*)
from flights f
where not exists (
select 1 from flights where right(f.flightkey,3) = left(flightKey,3)
)
group by UserID, FlightKey
)
select flights_cte.userid, flights_cte.flightkey, flights_cte.count
from flights_cte
join (select userid, max(count) _max_count from flights_cte group by userid) _max
on flights_cte.UserID=_max.UserID and flights_cte.count = _max_count
A sample SQL Fiddle gives this output:
| USERID | FLIGHTKEY | COUNT |
|--------|-----------|-------|
| 1 | JFKCDGJFK | 4 |
Assuming routes are not a single row, otherwise you wouldn't be asking.. (although I would guess that the whole route is in some other table, maybe reservation-related)
Guessing the first step is to group this data by person and flights that compose a 'route'. I have an article called T-SQL: Identify bad dates in a time series where the time series can be modified to detect gaps between legs of over a day (guess) to differentiate routes. Second step would be to convert legs into route, i.e. JFK-CDG and CDG-JFK to single value JFK-CDG-JFK.
Then it would be a single query, counting the above single value route, and ORDER BY that count.
Good luck.

SQL: SUM of MAX values WHERE date1 <= date2 returns "wrong" results

Hi stackoverflow users
I'm having a bit of a problem trying to combine SUM, MAX and WHERE in one query and after an intense Google search (my search engine skills usually don't fail me) you are my last hope to understand and fix the following issue.
My goal is to count people in a certain period of time and because a person can visit more than once in said period, I'm using MAX. Due to the fact that I'm defining people as male (m) or female (f) using a string (for statistic purposes), CHAR_LENGTH returns the numbers I'm in need of.
SELECT SUM(max_pers) AS "People"
FROM (
SELECT "guests"."id", MAX(CHAR_LENGTH("guests"."gender")) AS "max_pers"
FROM "guests"
GROUP BY "guests"."id")
So far, so good. But now, as stated before, I'd like to only count the guests which visited in a certain time interval (for statistic purposes as well).
SELECT "statistic"."id", SUM(max_pers) AS "People"
FROM (
SELECT "guests"."id", MAX(CHAR_LENGTH("guests"."gender")) AS "max_pers"
FROM "guests"
GROUP BY "guests"."id"),
"statistic", "guests"
WHERE ( "guests"."arrival" <= "statistic"."from" AND "guests"."departure" >= "statistic"."to")
GROUP BY "statistic"."id"
This query returns the following, x = desired result:
x * (x+1)
So if the result should be 3, it's 12. If it should be 5, it's 30 etc.
I probably could solve this algebraic but I'd rather understand what I'm doing wrong and learn from it.
Thanks in advance and I'm certainly going to answer all further questions.
PS: I'm using LibreOffice Base.
EDIT: An example
guests table:
ID | arrival | departure | gender |
10 | 1.1.14 | 10.1.14 | mf |
10 | 15.1.14 | 17.1.14 | m |
11 | 5.1.14 | 6.1.14 | m |
12 | 10.2.14 | 24.2.14 | f |
13 | 27.2.14 | 28.2.14 | mmmmmf |
statistic table:
ID | from | to | name |
1 | 1.1.14 | 31.1.14 |January | expected result: 3
2 | 1.2.14 | 28.2.14 |February| expected result: 7
MAX(...) is the wrong function: You want COUNT(DISTINCT ...).
Add proper join syntax, simplify (and remove unnecessary quotes) and this should work:
SELECT s.id, COUNT(DISTINCT g.id) AS People
FROM statistic s
LEFT JOIN guests g ON g.arrival <= s."from" AND g.departure >= s."too"
GROUP BY s.id
Note: Using LEFT join means you'll get a result of zero for statistics ids that have no guests. If you would rather no row at all, remove the LEFT keyword.
You have a very strange data structure. In any case, I think you want:
SELECT s.id, sum(numpersons) AS People
FROM (select g.id, max(char_length(g.gender)) as numpersons
from guests g join
statistic s
on g.arrival <= s."from" AND g.departure >= s."too"
group by g.id
) g join
GROUP BY s.id;
Thanks for all your inputs. I wasn't familiar with JOIN but it was necessary to solve my problem.
Since my databank is designed in german, I made quite the big mistake while translating it and I'm sorry if this caused confusion.
Selecting guests.id and later on grouping by guests.id wouldn't make any sense since the id is unique. What I actually wanted to do is select and group the guests.adr_id which links a visiting guest to an adress databank.
The correct solution to my problem is the following code:
SELECT statname, SUM (numpers) FROM (
SELECT statistic.name AS statname, guests.adr_id, MAX( CHAR_LENGTH( guests.gender ) ) AS numpers
FROM guests
JOIN statistics ON (guests.arrival <= statistics.too AND guests.departure >= statistics.from )
GROUP BY guests.adr_id, statistic.name )
GROUP BY statname
I also noted that my database structure is a mess but I created it learning by doing and haven't found any time to rewrite it yet. Next time posting, I'll try better.