Getting last_value MariaDB SQL - sql

I have this data in the table:
internal_id
match_id
company_id
market_id
selection_id
odds_value
update_date
1442
8483075
66
1
1
100
2021-01-04 18:58:19
1
8483075
66
1
1
10
2021-01-04 18:57:19
2
8483075
66
1
2
19
2021-01-04 18:57:19
3
8483075
66
1
3
1.08
2021-01-04 18:57:19
I'm trying to get last value of odds_value from whole table for each combination of match_id + company_id + market_id + selection_id based on update_date.
I wrote this query which is not working:
SELECT
odds.`internal_id`,
odds.`match_id`,
odds.`company_id`,
odds.`market_id`,
odds.`selection_id`,
odds.`update_date`,
odds.`odd_value`,
LAST_VALUE (odds.`odd_value`) OVER (PARTITION BY odds.`internal_id`, odds.`match_id`, odds.`company_id`, odds.`market_id`, odds.`selection_id` ORDER BY odds.`update_date` DESC) AS last_value
FROM
`odds`
LEFT JOIN `matches` ON matches.match_id = odds.match_id
WHERE
odds.match_id = 8483075
and odds.company_id = 66
GROUP BY
odds.match_id,
odds.company_id,
odds.market_id,
odds.selection_id
For match_id=8483075 & market_id=1 and selection_id=1 I'm getting odd_value 10 instead of 100.
What am I doing wrong? or maybe there is a better way to get that (using internal_id = higher means most recent)?

LAST_VALUE() is very strange. The problem is that the default window frame for the ordering is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
I won't go into the details, but the fix is to just always use FIRST_VALUE(). I'm also fixing the PARTITION BY to match the description in your question:
FIRST_VALUE(odds.odd_value) OVER (PARTITION BY odds.company_id, odds.market_id, odds.selection_id
ORDER BY odds.update_date DESC
) AS last_value
Ironically, you already have a descending sort, so your last value was really fetching the first value anyway, sort of.

Related

Efficient (linear time) nested queries in SQL

From this table:
events
id
event_date
event_score
12
2020-04-10
13
2020-04-11
13
2020-04-14
8
13
2020-04-13
6
12
2020-04-15
14
2020-04-16
14
2020-04-17
14
2020-04-18
11
14
2020-04-19
14
2020-04-20
14
2020-04-22
12
2020-04-25
14
2020-04-30
I'm trying to get this result
results
id
first_score
last_score
12
13
6
8
14
11
11
One way to do that is through this query:
SELECT
DISTINCT id,
(
SELECT event_score
FROM events AS subquery
WHERE final_table.id=subquery.id
AND event_score IS NOT NULL
ORDER BY event_date
LIMIT 1
) AS `first score`,
(
SELECT event_score
FROM events AS subquery
WHERE final_table.id=subquery.id
AND event_score IS NOT NULL
ORDER BY event_date DESC
LIMIT 1
) AS `last score`
FROM sensors.events as final_table
but I suspect this takes quadratic time O(n*n) to compute. I know it can be done in linear time O(n) with Python but does anyone know how to do it in linear time with SQL?
The table is in MariaDB/MySQL
If you are running MariaDB 10.2.2 or higher, you could address this as a gaps-and-islands problem. The idea is to count how many non-null values appear on the preceding and following rows. We can then filter on the first non-null value in both directions, using conditional aggregation:
select id,
max(case when grp_asc = 1 then event_score end) as first_score,
max(case when grp_desc = 1 then event_score end) as last_score
from (
select e.*,
count(event_score) over(partition by id order by event_score ) as grp_asc,
count(event_score) over(partition by id order by event_score desc) as grp_desc
from events e
) e
group by id
order by id
I cannot assess the time complexity of this algorithm, but I would suspect that this should run faster than your original query, that requires executing two subqueries per distinct id.
Demo on DB Fiddle:
id | first_score | last_score
-: | ----------: | ---------:
12 | null | null
13 | 6 | 8
14 | 11 | 11
With a an index on (id, event_date, event_sore), then this should be quite fast:
SELECT id,
(SELECT event_score
FROM events AS subquery
WHERE final_table.id = subquery.id AND event_score IS NOT NULL
ORDER BY event_date
LIMIT 1
) AS `first score`,
(SELECT event_score
FROM events AS subquery
WHERE final_table.id=subquery.id AND event_score IS NOT NULL
ORDER BY event_date DESC
LIMIT 1
) AS `last score`
FROM (SELECT DISTINCT e.id
FROM sensors.events e
) as final_table;
Note that this moves the SELECT DISTINCT to a subquery. This is to ensure that MariaDB does not actually use a "distinct" algorithm for the SELECT DISTINCT -- the other columns would probably cause that to happen.
However, this is O(n log n) because the subqueries need to sort a small amount of data for each id -- as well as using an index to get to the right place.
I cannot think of a way to do this O(n) in SQL. I'm pretty sure the following constructs are all O(n log n):
Using an index for each row.
Sorting any portion of the data.
Using any window function with an order by -- although this might be true if there is just the right index.
But, SQL queries are still fast, particularly with indexes.

Can't get the cumulative sum(running total) within a group in SQL Server

I'm trying to get a running total within a group but my current code just gives me an aggregate sum.
For example, my data looks like this
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount
12542 1 Full A 1 12.5 40 500
12542 1 Full A 1 12.5 35 420
12542 2 Full A 1 10 40 400
12542 2 Full B 1.2 10 40 480
17842 1 Full A 1 11 27 297
17842 1 Full B 1.3 11 30 429
And what I want is a running total within the same ID, Shift Number, and Status. For example, I want something like this as my final result
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount Running_Tot
12542 1 Full A 1 12.5 40 500 500
12542 1 Full A 1 12.5 35 420 920
12542 2 Full A 1 10 40 400 400
12542 2 Full B 1.2 10 40 480 880
17842 1 Full A 1 11 27 297 297
17842 1 Full B 1.3 11 30 429 726
However, my current code just gives me the total sum within each group. For example, 920, 920 for row 1&2. Here's my code.
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY ID, ShiftNum, Status) as Runnint_Tot
from table a
How do I fix my code to get the final result I want?
You need an ordering column that uniquely defines each row. There is not an obvious one in your row, but something like this:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY hours) as Running_Tot
Or:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status
ORDER BY (SELECT NULL)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as Running_Tot
The problem you are facing is because the ORDER BY keys have ties. The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Note the RANGE. That means that all rows with ties are combined.
Also note that there is no utility to including the PARTITION BY keys in the ORDER BY (well . . . there is one exception in SQL Server if you don't care about the ordering, then including a key can be a handy short-cut). The ordering occurs within a partition.
If your rows can have exact duplicates, I would first suggest that you add a primary key. But, in the meantime, you could use:
with a as (
select a.*,
row_number() over (order by id, shiftnum, status) as seqnum
from tablea a
)
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY seqnum) as Running_Tot
from a;
The ordering will be arbitrary, but it will at least accumulate.

PostgreSQL - Show the last N rows for each sorted group

With respect to the following posts:
Retrieving the last record in each group - MySQL
Grouped LIMIT in PostgreSQL: show the first N rows for each group?
Postgresql limit by N groups
I wrote a query to find the latest 3 entries of the last 3 groups of log events partitioned by day with a maximum of 9 total entries and I managed to gather the following data from a postgresql log table:
The query I used to get them is the following:
SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY created_at::date ORDER BY created_at::time DESC) AS row_number,
DENSE_RANK() OVER (ORDER BY created_at::date DESC) AS group_number,
l.*
FROM
logs l
WHERE
account_id = 1) subquery
WHERE
subquery.row_number <= 3
AND group_number <= 3
LIMIT 9;
However I'm missing one last step: The results are "grouped" by day in descending order (which is good) but within each group the ordering by time doesn't seem to work.
Effectively the expected order should be (displaying only each row's id):
| EXISTING ROW ORDERING | EXPECTED ROW ORDERING |
| --------------------- | --------------------- |
52 56
53 53
56 52
46 48
48 47
47 46
30 30
31 31
32 32
Any ideas? Thanks.
If you want the data in a particular order, then you need to have an order by. SQL tables and result sets represent unordered sets. The only exception is when the outermost query has an order by.
So:
order by created_at desc

Get last entry from each user in database

I have a Postgresql database, and I'm having trouble getting my query right, even though this seems like a common problem.
My table looks like this:
CREATE TABLE orders (
account_id INTEGER,
order_id INTEGER,
ts TIMESTAMP DEFAULT NOW()
)
Everytime there is a new order, I use it to link the account_id and order_id.
Now my problem is that I want to get a list that has the last order (by looking at ts) for each account.
For example, if my data is:
account_id order_id ts
5 178 July 1
5 129 July 6
4 190 July 1
4 181 July 9
3 348 July 1
3 578 July 4
3 198 July 1
3 270 July 12
Then I'd like the query to return only the last row for each account:
account_id order_id ts
5 129 July 6
4 181 July 9
3 270 July 12
I've tried GROUP BY account_id, and I can use that to get the MAX(ts) for each account, but then I have no way to get the associated order_id. I've also tried sub-queries, but I just can't seem to get it right.
Thanks!
select distinct on (account_id) *
from orders
order by account_id, ts desc
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
The row_number() window function can help:
select account_id, order_id, ts
from (select account_id, order_id, ts,
row_number() over(partition by account_id order by ts desc) as rn
from tbl) t
where rn = 1

Sorting of Date Format in Oracle

I am using a query in oracle which gives the below result (its a kind of month-wise transaction report):
Month Total Submitted Approved
--------------------------------------
DEC-14 2 2 0
APR-15 17 12 5
SEP-14 1 1 0
FEB-15 7 4 3
JUL-15 1 1 0
JAN-15 18 4 14
MAR-15 2 1 1
OCT-14 2 (null) (null)
JUN-15 136 91 45
JUN-14 1 1 0
MAY-15 179 63 116
I want to get the result in a sorted format, like JUN-14,SEP-14,OCT-14,DEC-14,JAN-15....so on. Thanks in advance.
order by date_column desc where date_column is the column that holds the date. This will order by the date_column in descending order.
Use asc to order in ascending order.
If month data type in character format you have to use
select * from table_name
order by to_char(to_date(month,'mm/yy'),'yy') asc,to_char(to_date(month,'mm/yy'),'mm') asc
if it is in date
select * from table_name
order by to_char(month,'yy') asc,to_char(month,'mm') asc
i assumed that you were using the following for displaying month column data.
TO_char(hiredate,'mon-yy')
if you used this then it will be easy for sorting them.
select your column list from table order by source_date_column asc;
for reference use the link