Aggregate function (SUM) on 5 newest rows in table - sql

This is an perplexing SQL problem (at least to me) involving GROUP BY and AGGREGATES... would love any help.
Im working on a site that logs information about bike rides and riders. We have a table which contains a rider id, ride date, and ride distance. I want to display a table with the latest rides, and distances, as well as a total distance for each of those riders. Here is my sql and output (where id is rider id):
+--------+---------------------+----------+
| id | dated | distance |
+--------+---------------------+----------+
| 101240 | 2012-11-30 00:00:00 | 250 |
| 101332 | 2012-11-22 00:00:00 | 31 |
| 101313 | 2012-11-21 00:00:00 | 15 |
| 101319 | 2012-11-21 00:00:00 | 25 |
| 101320 | 2012-11-21 00:00:00 | 56 |
+--------+---------------------+----------+
This is easy to get with:
SELECT id, dated, distance FROM rides ORDER BY dated LIMIT 5
What I can't seem to figure out is getting the riders cumulative total for these most recent rides... Basically:
SELECT sum(distance) FROM rides GROUP BY id
Is it possible to handle all this in SQL without having to do something programmatic? I've tried doing some subqueries and JOINS but to no avail yet!
Thanks in advance SO community.

Duh, I should have known my data schema a little better. I had been trying to work with the wrong id column which was actually a serialized row id, and not a riders id. A working version of SQL (on MYSQL) is:
SELECT r.rider, rr.dated, rr.distance, i.firstname, i.lastname, sum(r.distance)
FROM rides r
INNER JOIN (SELECT rider, distance, dated FROM rides ORDER BY dated DESC LIMIT 5) rr ON r.rider = rr.rider
INNER JOIN riders i ON r.rider = i.id
GROUP BY r.rider ORDER BY rr.dated DESC;
This returns:
+-------+---------------------+----------+-----------+----------+-----------------+
| rider | dated | distance | firstname | lastname | sum(r.distance) |
+-------+---------------------+----------+-----------+----------+-----------------+
| 3304 | 2012-11-30 00:00:00 | 250 | venkatesh | ss | 250 |
| 647 | 2012-11-22 00:00:00 | 31 | ralph | suelzle | 22726 |
| 2822 | 2012-11-21 00:00:00 | 15 | humberto | calderon | 10421 |
| 2339 | 2012-11-21 00:00:00 | 25 | Judy | Rutter | 8545 |
| 1452 | 2012-11-21 00:00:00 | 56 | Fred | Stearley | 64366 |
+-------+---------------------+----------+-----------+----------+-----------------+
Thanks for your answers!

Would something like this work? BTW, what sql server are using?
SELECT sum(distance)
FROM (SELECT distance FROM rides ORDER BY dated DESC LIMIT 5)

Related

Select only record until timestamp from another table

I have three tables.
The first one is Device table
+----------+------+
| DeviceId | Type |
+----------+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
+----------+------+
The second one is History table - data received by different devices.
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 1 | 31 | 15.08.2020 1:42:00 |
| 2 | 100 | 15.08.2020 1:42:01 |
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 1 | 34 | 15.08.2020 1:45:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
| 2 | 45 | 15.08.2020 1:47:00 |
+----------+-------------+--------------------+
The third one is DeviceStatusHistory table
+----------+---------+--------------------+
| DeviceId | State | TimeStamp |
+----------+---------+--------------------+
| 1 | 1(OK) | 15.08.2020 1:42:00 |
| 2 | 1(OK) | 15.08.2020 1:43:00 |
| 1 | 1(OK) | 15.08.2020 1:44:00 |
| 1 | 0(FAIL) | 15.08.2020 1:44:30 |
| 1 | 0(FAIL) | 15.08.2020 1:46:00 |
| 2 | 0(FAIL) | 15.08.2020 1:46:10 |
+----------+---------+--------------------+
I want to select the last temperature of devices, but take into account only those history records that occurs until the first device failure.
Since device1 starts failing from 15.08.2020 1:44:30, I don't want its records that go after that timestamp.
The same for the device2.
So as a final result, I want to have only data of all devices until they get first FAIL status:
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
+----------+-------------+--------------------+
I can select an appropriate history only if device failed at least once:
SELECT * FROM Device D
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE D.Id = H.DeviceId
and H.DeviceTimeStamp <
(select MIN(UpdatedOn) from DeviceStatusHistory Y where [State]=0 and DeviceId=D.Id)
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY D.Id;
The problems is, if a device never fails, I don't get its history at all.
Update:
My idea is to use something like this
SELECT * FROM DeviceHardwarePart HP
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE HP.Id = H.DeviceId
and H.DeviceTimeStamp <
(select ISNULL((select MIN(UpdatedOn) from DeviceMetadataPart where [State]=0 and DeviceId=HP.Id),
cast('12/31/9999 23:59:59.997' as datetime)))
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY HP.Id;
I'm not sure whether it is a good solution
You can use COALESCE: coalesce(min(UpdateOn), cast('9999-12-31 23:59:59' as datetime)). This ensures you always have an upperbound for your select instead of NULL.
I will treat this as two parts problem
I will try to find the time at which device has failed and if it hasn't failed I will keep it as a large value like some timestamp in 2099
Once I have the above I can simply join with histories table and take the latest value before the failed timestamp.
In order to get one, I guess there can be several approaches. From top of my mind something like below should work
select device_id, coalesce(min(failed_timestamps), cast('01-01-2099 01:01:01' as timestamp)) as failed_at
(select device_id, case when state = 0 then timestamp else null end as failed_timestamps from History) as X
group by device_id
This gives us the minimum of failed timestamp for a particular device, and an arbitrary large value for the devices which have never failed.
I guess after this the solution is straight forward.

SQL: tricky question for finding lockout dates

Hope you can help. We have a table with two columns Customer_ID and Trip_Date. The customer receives 15% off on their first visit and on every visit where they haven't received the 15% off offer in the past thirty days. How do I write a single SQL query that finds all days where a customer received 15% off?
The table looks like this
+-----+-------+----------+
| Customer_ID | date |
+-----+-------+----------+
| 1 | 01-01-17 |
| 1 | 01-17-17 |
| 1 | 02-04-17 |
| 1 | 03-01-17 |
| 1 | 03-15-17 |
| 1 | 04-29-17 |
| 1 | 05-18-17 |
+-----+-------+----------+
The desired output would look like this:
+-----+-------+----------+--------+----------+
| Customer_ID | date | received_discount |
+-----+-------+----------+--------+----------+
| 1 | 01-01-17 | 1 |
| 1 | 01-17-17 | 0 |
| 1 | 02-04-17 | 1 |
| 1 | 03-01-17 | 0 |
| 1 | 03-15-17 | 1 |
| 1 | 04-29-17 | 1 |
| 1 | 05-18-17 | 0 |
+-----+-------+----------+--------+----------+
We are doing this work in Netezza. I can't think of a way using just window functions, only using recursion and looping. Is there some clever trick that I'm missing?
Thanks in advance,
GF
You didn't tell us what your backend is, nor you gave some sample data and expected output nor you gave a sensible data schema :( This is an example based on guess of schema using postgreSQL as backend (would be too messy as a comment):
(I think you have Customer_Id, Trip_Date and LocationId in trips table?)
select * from trips t1
where not exists (
select * from trips t2
where t1.Customer_id = t2.Customer_id and
t1.Trip_Date > t2.Trip_Date
and t1.Trip_date - t2.Trip_Date < 30
);

Select rows where one column is within a day of another column

I have two tables from a site similar to SO: one with posts, and one with up/down votes for each post. I would like to select all votes cast on the day that a post was modified.
My tables layout is as seen below:
Posts:
-----------------------------------------------
| post_id | post_author | modification_date |
-----------------------------------------------
| 0 | David | 2012-02-25 05:37:34 |
| 1 | David | 2012-02-20 10:13:24 |
| 2 | Matt | 2012-03-27 09:34:33 |
| 3 | Peter | 2012-04-11 19:56:17 |
| ... | ... | ... |
-----------------------------------------------
Votes (each vote is only counted at the end of the day for anonymity):
-------------------------------------------
| vote_id | post_id | vote_date |
-------------------------------------------
| 0 | 0 | 2012-01-13 00:00:00 |
| 1 | 0 | 2012-02-26 00:00:00 |
| 2 | 0 | 2012-02-26 00:00:00 |
| 3 | 0 | 2012-04-12 00:00:00 |
| 4 | 1 | 2012-02-21 00:00:00 |
| ... | ... | ... |
-------------------------------------------
What I want to achieve:
-----------------------------------
| post_id | post_author | vote_id |
-----------------------------------
| 0 | David | 1 |
| 0 | David | 2 |
| 1 | David | 4 |
| ... | ... | ... |
-----------------------------------
I have been able to write the following, but it selects all votes on the day before the post modification, not on the same day (so, in this example, an empty table):
SELECT Posts.post_id, Posts.post_author, Votes.vote_id
FROM Posts
LEFT JOIN Votes ON Posts.post_id = Votes.post_id
WHERE CAST(Posts.modification_date AS DATE) = Votes.vote_date;
How can I fix it so the WHERE clause takes the day before Votes.vote_date? Or, if not possible, is there another way?
Depending on which type of database you are using (SQL, Oracle ect..);To take the Previous days votes you can usually just subtract 1 from the date and it will subtract exactly 1 day:
Where Cast(Posts.modification_date - 1 as Date) = Votes.vote_date
or if modification_date is already in date format just:
Where Posts.modification_date - 1 = Votes.vote_date
If you have a site similar to Stack Overflow, then perhaps you also use SQL Server:
SELECT p.post_id, p.post_author, v.vote_id
FROM Posts p LEFT JOIN
Votes v
ON p.post_id = v.post_id
WHERE CAST(DATEDIFF(day, -1, p.modification_date) AS DATE) = v.vote_date;
Different databases have different ways of subtracting one day. If this doesn't work, then your database has something similar.
I found another solution, which is to add a day to Posts.modification_date:
...
WHERE CAST(CEILING(CAST(p.modification_date AS FLOAT)) AS datetime) = v.vote_date

HOW TO: SQL Server select distinct field based on max value in other field

id tmpname date_used tkt_nr
---|---------|------------------|--------|
1 | template| 04/03/2009 16:10 | 00011 |
2 | templat1| 04/03/2009 16:11 | 00011 |
5 | templat2| 04/03/2009 16:12 | 00011 |
3 | diffname| 03/03/2009 15:11 | 00022 |
4 | diffname| 03/03/2009 16:12 | 00022 |
6 | another | 03/03/2009 16:13 | NULL |
7 | somethin| 24/12/2008 11:12 | 00023 |
8 | name | 01/01/2009 12:12 | 00026 |
I would like to have the result:
id tmpname date_used tkt_nr
---|---------|------------------|--------|
5 | templat2| 04/03/2009 16:12 | 00011 |
4 | diffname| 03/03/2009 16:12 | 00022 |
7 | somethin| 24/12/2008 11:12 | 00023 |
8 | name | 01/01/2009 12:12 | 00026 |
So what I'm looking for is to have distinct tkt_nr values excluding NULL, based on the max value of datetime.
I have tried several options but always failed
SELECT *
FROM templateFeedback a
JOIN (
SELECT ticket_number, MAX(date_used) date_used
FROM templateFeedback
GROUP BY ticket_number
) b
ON a.ticket_number = b.ticket_number AND a.date_used = b.date_used
I would appreciate any help. Unfortunately I need the code to be compatible with SQL Server.
I've stopped doing things this way since I discovered windowing functions. Too often, there are two records with the same timestamp and I get two records in the resultset. Here's the code for tSQL. Similar for Oracle. I don't think mySQL supports this yet.
Select id, tmpname, date_used, tkt_nbr
From
(
Select id, tmpname, date_used, tkt_nbr,
rownum = Row_Number() Over (Partition by tkt_nbr Order by date_used desc)
) x
Where row_num=1

Get the highest odds from the last update

I have these tables in a PostgreSQL database:
bookmakers
-----------------------
| id | name |
-----------------------
| 1 | Unibet |
-----------------------
| 2 | 888 |
-----------------------
odds
---------------------------------------------------------------------
| id | odds_type | odds_index | bookmaker_id | created_at |
---------------------------------------------------------------------
| 1 | 1 | 1.55 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 2 | 2 | 3.22 | 2 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 3 | X | 3.00 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 4 | 2 | 1.25 | 1 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 5 | 1 | 2.30 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 6 | X | 2.00 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
What I am trying to query is the following:
Give me the 1/X/2 odds from the latest update (created_at) from ALL bookmakers and from that last update, give me the highest odds for each odds_type ('1', '2', 'X').
On my website I display them as:
Best odds right now: 1 | X | 2
--------------------
2.30 | 3.00 | 3.22
I have to first get the latest, because the odds from the update from yesterday are no longer valid. Then from that last update, I have - in this case - 2 odds from 2 different bookmakers, so I need to get the best one for type '1','2','X'.
Pseudo SQL would be something like:
SELECT MAX(odds_index) WHERE odds_type = '1' ORDER BY created_at DESC, odds_index DESC
But that doesn't work, because I would always get the latest odds (and not the highest/best from those latest)
I hope I'm making sense.
Subqueries to the rescue!
select o1.odds_type, max(o1.odds_index)
from odds o1
inner join (select odds_type, max(created_at) as created_at
from odds group by odds_type) o2
on o1.odds_type = o2.odds_type
and o1.created_at = o2.created_at
group by o1.odds_type
SQLFiddle: http://sqlfiddle.com/#!3/47df4/3
Your words "from the last update" contradict your example. Here are two methods.
To get from last update, how about getting the max created_at date aka last update and then using it for the rest.
declare #max_date date
select #max_date = max(created_at) from odds
select odds_type, odds_index
from odds
where created_at = #max_date
Or to match your example
select odds_type, odds_index
from odds
group by odds_type
having created_at = max(created_at)
Note: Different DBMS give different results depending on the select columns and whether there are more columns than in the group by clause.