Get the highest odds from the last update - sql

I have these tables in a PostgreSQL database:
bookmakers
-----------------------
| id | name |
-----------------------
| 1 | Unibet |
-----------------------
| 2 | 888 |
-----------------------
odds
---------------------------------------------------------------------
| id | odds_type | odds_index | bookmaker_id | created_at |
---------------------------------------------------------------------
| 1 | 1 | 1.55 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 2 | 2 | 3.22 | 2 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 3 | X | 3.00 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 4 | 2 | 1.25 | 1 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 5 | 1 | 2.30 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 6 | X | 2.00 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
What I am trying to query is the following:
Give me the 1/X/2 odds from the latest update (created_at) from ALL bookmakers and from that last update, give me the highest odds for each odds_type ('1', '2', 'X').
On my website I display them as:
Best odds right now: 1 | X | 2
--------------------
2.30 | 3.00 | 3.22
I have to first get the latest, because the odds from the update from yesterday are no longer valid. Then from that last update, I have - in this case - 2 odds from 2 different bookmakers, so I need to get the best one for type '1','2','X'.
Pseudo SQL would be something like:
SELECT MAX(odds_index) WHERE odds_type = '1' ORDER BY created_at DESC, odds_index DESC
But that doesn't work, because I would always get the latest odds (and not the highest/best from those latest)
I hope I'm making sense.

Subqueries to the rescue!
select o1.odds_type, max(o1.odds_index)
from odds o1
inner join (select odds_type, max(created_at) as created_at
from odds group by odds_type) o2
on o1.odds_type = o2.odds_type
and o1.created_at = o2.created_at
group by o1.odds_type
SQLFiddle: http://sqlfiddle.com/#!3/47df4/3

Your words "from the last update" contradict your example. Here are two methods.
To get from last update, how about getting the max created_at date aka last update and then using it for the rest.
declare #max_date date
select #max_date = max(created_at) from odds
select odds_type, odds_index
from odds
where created_at = #max_date
Or to match your example
select odds_type, odds_index
from odds
group by odds_type
having created_at = max(created_at)
Note: Different DBMS give different results depending on the select columns and whether there are more columns than in the group by clause.

Related

Select only record until timestamp from another table

I have three tables.
The first one is Device table
+----------+------+
| DeviceId | Type |
+----------+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
+----------+------+
The second one is History table - data received by different devices.
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 1 | 31 | 15.08.2020 1:42:00 |
| 2 | 100 | 15.08.2020 1:42:01 |
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 1 | 34 | 15.08.2020 1:45:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
| 2 | 45 | 15.08.2020 1:47:00 |
+----------+-------------+--------------------+
The third one is DeviceStatusHistory table
+----------+---------+--------------------+
| DeviceId | State | TimeStamp |
+----------+---------+--------------------+
| 1 | 1(OK) | 15.08.2020 1:42:00 |
| 2 | 1(OK) | 15.08.2020 1:43:00 |
| 1 | 1(OK) | 15.08.2020 1:44:00 |
| 1 | 0(FAIL) | 15.08.2020 1:44:30 |
| 1 | 0(FAIL) | 15.08.2020 1:46:00 |
| 2 | 0(FAIL) | 15.08.2020 1:46:10 |
+----------+---------+--------------------+
I want to select the last temperature of devices, but take into account only those history records that occurs until the first device failure.
Since device1 starts failing from 15.08.2020 1:44:30, I don't want its records that go after that timestamp.
The same for the device2.
So as a final result, I want to have only data of all devices until they get first FAIL status:
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
+----------+-------------+--------------------+
I can select an appropriate history only if device failed at least once:
SELECT * FROM Device D
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE D.Id = H.DeviceId
and H.DeviceTimeStamp <
(select MIN(UpdatedOn) from DeviceStatusHistory Y where [State]=0 and DeviceId=D.Id)
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY D.Id;
The problems is, if a device never fails, I don't get its history at all.
Update:
My idea is to use something like this
SELECT * FROM DeviceHardwarePart HP
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE HP.Id = H.DeviceId
and H.DeviceTimeStamp <
(select ISNULL((select MIN(UpdatedOn) from DeviceMetadataPart where [State]=0 and DeviceId=HP.Id),
cast('12/31/9999 23:59:59.997' as datetime)))
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY HP.Id;
I'm not sure whether it is a good solution
You can use COALESCE: coalesce(min(UpdateOn), cast('9999-12-31 23:59:59' as datetime)). This ensures you always have an upperbound for your select instead of NULL.
I will treat this as two parts problem
I will try to find the time at which device has failed and if it hasn't failed I will keep it as a large value like some timestamp in 2099
Once I have the above I can simply join with histories table and take the latest value before the failed timestamp.
In order to get one, I guess there can be several approaches. From top of my mind something like below should work
select device_id, coalesce(min(failed_timestamps), cast('01-01-2099 01:01:01' as timestamp)) as failed_at
(select device_id, case when state = 0 then timestamp else null end as failed_timestamps from History) as X
group by device_id
This gives us the minimum of failed timestamp for a particular device, and an arbitrary large value for the devices which have never failed.
I guess after this the solution is straight forward.

Logic to read multiple rows in a table where flag = 'Y'

Consider the following scenario. I have a Customer table, which includes RowStart and EndDate logic, thus writing a new row every time a field value is updated.
Relevant fields in this table are:
RowStartDate
RowEndDate
CustomerNumber
EmployeeFlag
For this, I'd like to write a query, which will return an employee's period of tenure (EmploymentStartDate, and EmploymentEndDate). I.e. The RowStartDate when EmployeeFlag first became 'Y', and then the first RowStartDate where EmployeeFlag changed to 'N' (Ordered of course, by the RowStartDate asc). There is an additional complexity in that the Flag value may change between Y and N multiple times for a single person, as they may become staff, resign and then be employed again at a later date.
Example table structure is:
| CustomerNo | StaffFlag | RowStartDate | RowEndDate |
| ---------- | --------- | ------------ | ---------- |
| 12 | N | 2019-01-01 | 2019-01-14 |
| 12 | N | 2019-01-14 | 2019-03-02 |
| 12 | Y | 2019-03-02 | 2019-10-12 |
| 01 | Y | 2020-03-13 | NULL |
| 12 | N | 2019-10-12 | 2020-01-01 |
| 12 | Y | 2020-01-01 | NULL |
Output could be something like
| CustomerNo | StaffStartDate | StaffEndDate |
| ---------- | -------------- | ------------ |
| 12 | 2019-03-02 | 2019-10-12 |
| 01 | 2020-03-13 | NULL |
| 12 | 2021-01-01 | NULL |
Any ideas on how I might be able to solve this would be really appreciated.
Make sure you order the columns by ID and by dates:
select *
from yourtable
order by CustomerNumber asc,
EmployeeFlag desc,
RowStartDate asc,
RowEndDate asc
This gives you a list of all changes over time per employee.
Subsequently, you want to map two rows into a single row with two columns (two dates mapped into overall start and end date). Others have done this using the lead() function. For details please have a look here: Merging every two rows of data in a column in SQL Server

How to select the latest date for each group by number?

I've been stuck on this question for a while, and I was wondering if the community would be able to direct me in the right direction?
I have some tag IDs that needs to be grouped, with exceptions (column: deleted) that need to be retained in the results. After which, for each grouped tag ID, I need to select the one with the latest date. How can I do this? An example below:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
4 | 400 | 05/01/20 | null
5 | 400 | 04/01/20 | null
6 | 500 | 03/01/20 | null
7 | 500 | 02/01/20 | null
I am trying to reach this outcome:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
6 | 500 | 03/01/20 | null
So, firstly if there is a date in the "DELETED" column, I would like the row to be present. Secondly, for each unique tag ID, I would like the row with the latest "DATE" to be present.
Hopefully this question is clear. Would appreciate your feedback and help! A big thanks in advance.
Your results seem to be something like this:
select t.*
from (select t.*,
row_number() over (partition by tag_id, deleted order by date desc) as seqnum
from t
) t
where seqnum = 1 or deleted is not null;
This takes one row where deleted is null -- the most recent row. It also keeps each row where deleted is not null.
You need 2 conditions combined with OR in the WHERE clause:
the 1st is deleted is not null, or
the 2nd that there isn't any other row with the same tag_id and date later than the current row's date, meaning that the current row's date is the latest:
select t.* from tablename t
where t.deleted is not null
or not exists (
select 1 from tablename
where tag_id = t.tag_id and date > t.date
)
See the demo.
Results:
| id | tag_id | date | deleted |
| --- | ------ | ---------- | -------- |
| 1 | 300 | 2020-05-01 | |
| 2 | 300 | 2020-03-01 | 04/01/20 |
| 3 | 400 | 2020-06-01 | |
| 6 | 500 | 2020-03-01 | |

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;

Select rows where one column is within a day of another column

I have two tables from a site similar to SO: one with posts, and one with up/down votes for each post. I would like to select all votes cast on the day that a post was modified.
My tables layout is as seen below:
Posts:
-----------------------------------------------
| post_id | post_author | modification_date |
-----------------------------------------------
| 0 | David | 2012-02-25 05:37:34 |
| 1 | David | 2012-02-20 10:13:24 |
| 2 | Matt | 2012-03-27 09:34:33 |
| 3 | Peter | 2012-04-11 19:56:17 |
| ... | ... | ... |
-----------------------------------------------
Votes (each vote is only counted at the end of the day for anonymity):
-------------------------------------------
| vote_id | post_id | vote_date |
-------------------------------------------
| 0 | 0 | 2012-01-13 00:00:00 |
| 1 | 0 | 2012-02-26 00:00:00 |
| 2 | 0 | 2012-02-26 00:00:00 |
| 3 | 0 | 2012-04-12 00:00:00 |
| 4 | 1 | 2012-02-21 00:00:00 |
| ... | ... | ... |
-------------------------------------------
What I want to achieve:
-----------------------------------
| post_id | post_author | vote_id |
-----------------------------------
| 0 | David | 1 |
| 0 | David | 2 |
| 1 | David | 4 |
| ... | ... | ... |
-----------------------------------
I have been able to write the following, but it selects all votes on the day before the post modification, not on the same day (so, in this example, an empty table):
SELECT Posts.post_id, Posts.post_author, Votes.vote_id
FROM Posts
LEFT JOIN Votes ON Posts.post_id = Votes.post_id
WHERE CAST(Posts.modification_date AS DATE) = Votes.vote_date;
How can I fix it so the WHERE clause takes the day before Votes.vote_date? Or, if not possible, is there another way?
Depending on which type of database you are using (SQL, Oracle ect..);To take the Previous days votes you can usually just subtract 1 from the date and it will subtract exactly 1 day:
Where Cast(Posts.modification_date - 1 as Date) = Votes.vote_date
or if modification_date is already in date format just:
Where Posts.modification_date - 1 = Votes.vote_date
If you have a site similar to Stack Overflow, then perhaps you also use SQL Server:
SELECT p.post_id, p.post_author, v.vote_id
FROM Posts p LEFT JOIN
Votes v
ON p.post_id = v.post_id
WHERE CAST(DATEDIFF(day, -1, p.modification_date) AS DATE) = v.vote_date;
Different databases have different ways of subtracting one day. If this doesn't work, then your database has something similar.
I found another solution, which is to add a day to Posts.modification_date:
...
WHERE CAST(CEILING(CAST(p.modification_date AS FLOAT)) AS datetime) = v.vote_date