select maximum score grouped by date, display full datetime - sql

Gday, I have a table that shows a series of scores and datetimes those scores occurred.
I'd like to select the maximum of these scores for each day, but display the datetime that the score occurred.
I am using an Oracle database (10g) and the table is structured like so:
scoredatetime score (integer)
---------------------------------------
01-jan-09 00:10:00 10
01-jan-09 01:00:00 11
01-jan-09 04:00:01 9
...
I'd like to be able to present the results such the above becomes:
01-jan-09 01:00:00 11
This following query gets me halfway there.. but not all the way.
select
trunc(t.scoredatetime), max(t.score)
from
mytable t
group by
trunc(t.scoredatetime)
I cannot join on score only because the same high score may have been achieved multiple times throughout the day.
I appreciate your help!
Simon Edwards

with mytableRanked(d,scoredatetime,score,rk) as (
select
scoredatetime,
score,
row_number() over (
partition by trunc(scoredatetime)
order by score desc, scoredatetime desc
)
from mytable
)
select
scoredatetime,
score
from mytableRanked
where rk = 1
order by date desc
In the case of multiple high scores within a day, this returns the row corresponding to the one that occurred latest in the day. If you want to see all highest scores in a day, remove scoredatetime desc from the order by specification in the row_number window.
Alternatively, you can do this (it will list ties of high score for a date):
select
scoredatetime,
score
from mytable
where not exists (
select *
from mytable as M2
where trunc(M2.scoredatetime) = trunc(mytable.scoredatetime)
and M2.score > mytable.scoredatetime
)
order by scoredatetime desc

First of all, you did not yet specify what should happen if two or more rows within the same day contain an equal high score.
Two possible answers to that question:
1) Just select one of the scoredatetime's, it doesn't matter which one
In this case don't use self joins or analytics as you see in the other answers, because there is a special aggregate function that can do your job more efficient. An example:
SQL> create table mytable (scoredatetime,score)
2 as
3 select to_date('01-jan-2009 00:10:00','dd-mon-yyyy hh24:mi:ss'), 10 from dual union all
4 select to_date('01-jan-2009 01:00:00','dd-mon-yyyy hh24:mi:ss'), 11 from dual union all
5 select to_date('01-jan-2009 04:00:00','dd-mon-yyyy hh24:mi:ss'), 9 from dual union all
6 select to_date('02-jan-2009 00:10:00','dd-mon-yyyy hh24:mi:ss'), 1 from dual union all
7 select to_date('02-jan-2009 01:00:00','dd-mon-yyyy hh24:mi:ss'), 1 from dual union all
8 select to_date('02-jan-2009 04:00:00','dd-mon-yyyy hh24:mi:ss'), 0 from dual
9 /
Table created.
SQL> select max(scoredatetime) keep (dense_rank last order by score) scoredatetime
2 , max(score)
3 from mytable
4 group by trunc(scoredatetime,'dd')
5 /
SCOREDATETIME MAX(SCORE)
------------------- ----------
01-01-2009 01:00:00 11
02-01-2009 01:00:00 1
2 rows selected.
2) Select all records with the maximum score.
In this case you need analytics with a RANK or DENSE_RANK function. An example:
SQL> select scoredatetime
2 , score
3 from ( select scoredatetime
4 , score
5 , rank() over (partition by trunc(scoredatetime,'dd') order by score desc) rnk
6 from mytable
7 )
8 where rnk = 1
9 /
SCOREDATETIME SCORE
------------------- ----------
01-01-2009 01:00:00 11
02-01-2009 00:10:00 1
02-01-2009 01:00:00 1
3 rows selected.
Regards,
Rob.

You might need two SELECT statements to pull this off: the first to collect the truncated date and associated max score, and the second to pull in the actual datetime values associated with the score.
Try:
SELECT T.ScoreDateTime, T.Score
FROM
(
SELECT
TRUNC(T.ScoreDateTime) ScoreDate, MAX(T.score) BestScore
FROM
MyTable T
GROUP BY
TRUNC(T.ScoreDateTime)
) ByDate
INNER JOIN MyTable T
ON TRUNC(T.ScoreDateTime) = ByDate.ScoreDate and T.Score = ByDate.BestScore
ORDER BY T.ScoreDateTime DESC
This will pull in best score ties as well.
For a version which selects only the most recently-posted high score for each day:
SELECT T.ScoreDateTime, T.Score
FROM
(
SELECT
TRUNC(T.ScoreDateTime) ScoreDate,
MAX(T.score) BestScore,
MAX(T.ScoreDateTime) BestScoreTime
FROM
MyTable T
GROUP BY
TRUNC(T.ScoreDateTime)
) ByDate
INNER JOIN MyTable T
ON T.ScoreDateTime = ByDate.BestScoreTime and T.Score = ByDate.BestScore
ORDER BY T.ScoreDateTime DESC
This may produce multiple records per date if two different scores were posted at exactly the same time.

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

Select 2 most recent dates from the table - SQL

I have a table like below
ID_NUMBER
SALEDATA
SALEAMOUNT
1
2020-09-07
47,000
2
2020-03-25
51,470
3
2021-06-12
32,000
4
2018-10-12
37,560
I want to select the rows with the 2 most recent dates only. So my desired output would be like below
ID_NUMBER
SALEDATA
SALEAMOUNT
1
2020-09-07
47,000
3
2021-06-12
32,000
Can someone please guide me on where would i start with this in SQL? I tried using MAX() but it is only giving me the most recent.
Thank you!
In Standard SQL, you would use:
select t.*
from t
order by saledata desc
offset 0 row fetch first 2 row only;
Not all databases support fetch first. It might be spelled limit or select top or something else, depending on your database.
Another option, with the rank analytic function. Sample data till line #7, query begins at line #9. See comments within code.
SQL> with test (id_number, saledata, saleamount) as
2 -- sample data
3 (select 1, date '2020-09-07', 47000 from dual union all
4 select 2, date '2020-03-25', 51470 from dual union all
5 select 3, date '2021-06-12', 32000 from dual union all
6 select 4, date '2018-10-12', 37560 from dual
7 )
8 -- sort them by date in descending order, fetch the first two rows
9 select id_number, saledata, saleamount
10 from (select t.*,
11 rank() over (order by saledata desc) rn
12 from test t
13 )
14 where rn <= 2
15 order by saledata;
ID_NUMBER SALEDATA SALEAMOUNT
---------- ---------- ----------
1 2020-09-07 47000
3 2021-06-12 32000
SQL>
select top 2 from your data set order by the saledata column descending

Oracle SQL recursive adding values

I have the following data in the table
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 null
01/03/20 3 null
01/04/20 8 null
01/05/20 31 null
Based on the above data I would like to have the following situation.
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
Additional data
01/06/20 21 0 (previously it would be -2)
01/07/20 25 25
01/08/20 29 4
Pattern to the additional data is:
if total_amount < previous(r_total) then 0
Based on the filled data, we can spot the pattern is:
R_total = total_amount - previous(R_total)
Could you please help me out with this issue?
As Gordon Linoff suspected, it is possible to solve this problem with analytic functions. The benefit is that the query will likely be much faster. The price to pay for that benefit is that you need to do a bit of math beforehand (before ever thinking about "programming" and "computers").
A bit of elementary arithmetic shows that R_TOTAL is an alternating sum of TOTAL_AMOUNT. This can be arranged easily by using ROW_NUMBER() (to get the signs) and then an analytic SUM(), as shown below.
Table setup:
create table sample_data (period, total_amount) as
select to_date('01/01/20', 'mm/dd/rr'), 2 from dual union all
select to_date('01/02/20', 'mm/dd/rr'), 5 from dual union all
select to_date('01/03/20', 'mm/dd/rr'), 3 from dual union all
select to_date('01/04/20', 'mm/dd/rr'), 8 from dual union all
select to_date('01/05/20', 'mm/dd/rr'), 31 from dual
;
Query and result:
with
prep (period, total_amount, sgn) as (
select period, total_amount,
case mod(row_number() over (order by period), 2) when 0 then 1 else -1 end
from sample_data
)
select period, total_amount,
sgn * sum(sgn * total_amount) over (order by period) as r_total
from prep
;
PERIOD TOTAL_AMOUNT R_TOTAL
-------- ------------ ----------
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
This may be possible with window functions, but the simplest method is probably a recursive CTE:
with t as (
select t.*, row_number() over (order by period) as seqnum
from yourtable t
),
cte(period, total_amount, r_amount, seqnum) as (
select period, total_amount, r_amount, seqnum
from t
where seqnum = 1
union all
select t.period, t.total_amount, t.total_amount - cte.r_amount, t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select *
from cte;
This question explicitly talks about "recursively" adding values. If you want to solve this using another mechanism, you might explain the logic in detail and ask if there is a non-recursive CTE solution.

Do partial row in BigQuery to get last data and order by id

i want to get last id and their rank (based on order by date_update asc and then order by again by id desc ) and show id and rank of id. i do the query like below:
SELECT id as data,
RANK() OVER (ORDER BY date_update) AS rank
FROM `test.sample`
ORDER BY id DESC
LIMIT 1
and it's work for other table but didn't work some table with large data and get notice:
Resources exceeded during query execution: The query could not be executed in the allotted memory.
i have done read Troubleshooting Error Big Query
and try to remove ORDER BY but still can't running, what should i do ?
sample data:
id date_update
22 2019-10-04
14 2019-10-01
24 2019-10-03
13 2019-10-02
process :
Rank() Over (Order by date_update)
id date_update rank
14 2019-10-01 1
13 2019-10-02 2
24 2019-10-03 3
22 2019-10-04 4
order by id desc based on above
id date_update rank
24 2019-10-03 3
22 2019-10-04 4
14 2019-10-01 1
13 2019-10-02 2
this is the expected result:
id rank
24 3
You can use the query below. It basically finds the row with max ID (latest ID), then queries the source table again using date_value of max id row as a filter.
WITH
`test.sample` AS
(
select 22 AS id, DATE('2019-10-04') as date_update union all
select 14 AS id, DATE('2019-10-01') as date_update union all
select 24 AS id, DATE('2019-10-03') as date_update union all
select 13 AS id, DATE('2019-10-02') as date_update
),
max_id_row AS
(
SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)] vals
FROM `test.sample`
)
SELECT m.vals.id, m.vals.date_update, COUNT(*) as rank
FROM `test.sample` as t
JOIN max_id_row as m
ON t.date_update <= m.vals.date_update
GROUP BY 1,2
Below is for BigQuery Standard SQL and should scale to whatever "large" data you have
#standardSQL
SELECT b.id, COUNT(1) + 1 AS `rank`
FROM `project.dataset.table` a
JOIN (
SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table`
) b
ON a.date_update < b.date_update
GROUP BY id
If to apply for sample data in your question -
WITH `project.dataset.table` AS (
SELECT 22 id, DATE '2019-10-04' date_update UNION ALL
SELECT 14, '2019-10-01' UNION ALL
SELECT 24, '2019-10-03' UNION ALL
SELECT 13, '2019-10-02'
)
result is
Row id rank
1 24 3
The "trick" here is in changing focus from not scalable code with non or badly parallelized operations (RANK) to something that is as simple as COUNT'ing
So, your case (at least as it is presented in question's "process" section) can be rephrased as finding number of rows before the day with highest id - that simple - thus above simple query. Obviously adding "1" to that count gives you exactly what would RANK gave you if worked

subtract and add between columns and rows

I have some data look like this
id date total amount adj amount
1 2017-01-02 100 50
1 2017-01-02 50 0
2 2017-01-15 100 35
2 2017-01-15 35 0
3 2017-01-30 120 50
3 2017-01-30 -120 -50
3 2017-01-30 100 50
3 2017-01-30 50 0
3 2017-01-30 60 40
the output should look like, I have no clue how to do the subtraction between rows and columns.
id date due amount
1 2017-01-02 0
2 2017-01-15 0
3 2017-01-30 40
here is my current code, but it only works on maybe 1 and 2 but definitely not working for 3.
the logic for this part is to find the due amount between each entry for each id. for example, id 1 has two entry, total amount 100, then he paid 50, so the adj amount is 50, and the second entry, the total amount is 50, he paid 50, te adj amount is 0. so id 1 due amount is 0 in the end.
id 3 who has 5 entries, first there is entry show the total amount for ID 3 is 120 and he paid 70, so the adj amount is 50, but the first entry is a mistake, so all amount revised. then the third entry shows the total amount is 100, ID 3 paid 50, so the adj amount is 50. then the fourth entry shows the total amount is 50, ID 3 also paid 50, so the adj amount is 0. and the fifth entry shows that the total amount is 60, and ID 3 paid 20, so the adj amount is 40. so in final, ID 3 due amount is 40;
select distinct a.id,
a.date,
case when a.date=b.date and a.total_amount = b.adj_amount then a.adj_amount
when a.date=b.date and a.total_amount <> b.adj_amount then ABS(a.adj_amount + b.adj_amount)
else a.adj_amount
end as due_amount
from table a,
table b
where a.id=b.id;
I just wonder if there has any function which can do this kind of calculation between rows and columns.
Use GROUP BY and SUM().
SELECT the_date, SUM(due_amount)
FROM tab
GROUP BY the_date;
Something like this could work - if the transactions can be ordered. Note that I've renamed some of the columns to help clarify their meaning. I've also added a trans_seq_num column to indicate the order of a customer's transactions on a particular date. I think you're looking for the amount that the customer still owes as of their last payment.
WITH sample (id, trans_seq_num, some_date, starting_balance, ending_balance) AS
(
SELECT '1',1,'2017-01-02','100','50' FROM dual UNION ALL
SELECT '1',2,'2017-01-02','50','0' FROM dual UNION ALL
SELECT '2',1,'2017-01-15','35','0' FROM dual UNION ALL
SELECT '2',2,'2017-01-15','100','35' FROM dual UNION ALL
SELECT '3',1,'2017-01-30','120','50' FROM dual UNION ALL
SELECT '3',2,'2017-01-30','-120','-50' FROM dual UNION ALL
SELECT '3',3,'2017-01-30','100','50' FROM dual UNION ALL
SELECT '3',4,'2017-01-30','50','0' FROM dual UNION ALL
SELECT '3',5,'2017-01-30','60','40' FROM dual
)
SELECT DISTINCT id,
some_date,
LAST_VALUE(ending_balance) OVER (PARTITION BY id ORDER BY trans_seq_num RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) day_balance
FROM sample
ORDER BY 1,2,3;
ID SOME_DATE AMOUNT_DUE
----- --------------- ---------------
1 2017-01-02 0
2 2017-01-15 35
3 2017-01-30 40
The others already said: you should have any way of numbering rows. Simple sequence will do the job. With such unique column solution is trivial, we only find last row for each id.
But you have no order. Here is my try which looks OK so far and may temporary help:
with q as (
select table_a.*,
row_number() over (partition by id, date_, total_amount, adj_amount
order by null) rn
from table_a),
t as (
select a.*,
row_number() over (partition by id, date_, total_amount
order by null) r1,
row_number() over (partition by id, date_, adj_amount
order by null) r2
from q a
where not exists (
select 1 from q b
where a.id = b.id and a.date_ = b.date_ and a.rn = b.rn
and a.total_amount = -b.total_amount and a.adj_amount = -b.adj_amount))
select id, date_, max(adj_amount) due
from t
where connect_by_isleaf = 1
connect by prior id = id and prior date_ = date_
and prior adj_amount = total_amount and prior r2 = r1
group by id, date_;
dbfiddle
First I eliminate mistakes. Subquery t does this, it is simple not exists with added row_number to handle properly multiple cases ( like (120, 50) => (-120, -50) and again (120, 50) ).
Data is cleared so we can recursively find connected rows by previous adj_amount = total_amount. We have to use row_numbers again to handle identical rows (60, 40) => (40, 0) => (60, 40) again.
Then only leaves are taken and finally max value of these leaves which should contain orphaned non zero value if such exists for each id. You can add connect_by_path() clause to see if connection works properly.
Hierarchical queries are slower than others, so if your table is big, be warned. Filter data at first, if needed.
This query works for your examples and some others which I imagined and tested. But even if it works you should add ordering column (if possible) and have guaranteed, simple way to obtain correct results.