Select 2 most recent dates from the table - SQL - sql

I have a table like below
ID_NUMBER
SALEDATA
SALEAMOUNT
1
2020-09-07
47,000
2
2020-03-25
51,470
3
2021-06-12
32,000
4
2018-10-12
37,560
I want to select the rows with the 2 most recent dates only. So my desired output would be like below
ID_NUMBER
SALEDATA
SALEAMOUNT
1
2020-09-07
47,000
3
2021-06-12
32,000
Can someone please guide me on where would i start with this in SQL? I tried using MAX() but it is only giving me the most recent.
Thank you!

In Standard SQL, you would use:
select t.*
from t
order by saledata desc
offset 0 row fetch first 2 row only;
Not all databases support fetch first. It might be spelled limit or select top or something else, depending on your database.

Another option, with the rank analytic function. Sample data till line #7, query begins at line #9. See comments within code.
SQL> with test (id_number, saledata, saleamount) as
2 -- sample data
3 (select 1, date '2020-09-07', 47000 from dual union all
4 select 2, date '2020-03-25', 51470 from dual union all
5 select 3, date '2021-06-12', 32000 from dual union all
6 select 4, date '2018-10-12', 37560 from dual
7 )
8 -- sort them by date in descending order, fetch the first two rows
9 select id_number, saledata, saleamount
10 from (select t.*,
11 rank() over (order by saledata desc) rn
12 from test t
13 )
14 where rn <= 2
15 order by saledata;
ID_NUMBER SALEDATA SALEAMOUNT
---------- ---------- ----------
1 2020-09-07 47000
3 2021-06-12 32000
SQL>

select top 2 from your data set order by the saledata column descending

Related

Oracle: Get latest value from a group by query, among other aggregations

I have a group by query returning avg and max from a set of records. I need to return a new column with the latest value of a column("records") based on another column ("dates").
This query
with x as (select 'A' process, 10 records, sysdate-5 dates from dual union all
select 'A' process, 20 records, sysdate-4 dates from dual union all
select 'A' process, 30 records, sysdate-3 dates from dual union all
select 'B' process, 25 records, sysdate-2 dates from dual union all
select 'B' process, 15 records, sysdate-1 dates from dual)
select process,
avg(records) avgu,
max(records) maxu
from x
group by process
order by 1
returns:
Process
AVG.
MAX.
A.
20
30.
B
20
25.
I need a new column (LATEST) with latest value of records based on dates, keeping the old columns too:
Process
MAX.
LATEST.
A.
30
30.
B
25
15.
I'm playing with some window functions like RANK OVER PARTITION but I can't get the desired outcome in a single query.
Thank you in advance for any idea.
Here's one option:
Sample data:
SQL> with x as (
2 select 'A' process,10 records,sysdate-5 dates from dual union all
3 select 'A',20,sysdate-4 from dual union all
4 select 'A',30,sysdate-3 from dual union all
5 select 'B',25,sysdate-2 from dual union all
6 select 'B',15,sysdate-1 from dual),
Query begins here: first find the latest value per each process, then - in the final query - aggregate required values.
7 temp as
8 (select process,
9 records,
10 dates,
11 first_value(records) over (partition by process order by dates desc) latest
12 from x
13 )
14 select process,
15 avg(records) avgu,
16 max(records) maxu,
17 max(latest) latest
18 from temp
19 group by process
20 order by 1;
P AVGU MAXU LATEST
- ---------- ---------- ----------
A 20 30 30
B 20 25 15
SQL>

Do partial row in BigQuery to get last data and order by id

i want to get last id and their rank (based on order by date_update asc and then order by again by id desc ) and show id and rank of id. i do the query like below:
SELECT id as data,
RANK() OVER (ORDER BY date_update) AS rank
FROM `test.sample`
ORDER BY id DESC
LIMIT 1
and it's work for other table but didn't work some table with large data and get notice:
Resources exceeded during query execution: The query could not be executed in the allotted memory.
i have done read Troubleshooting Error Big Query
and try to remove ORDER BY but still can't running, what should i do ?
sample data:
id date_update
22 2019-10-04
14 2019-10-01
24 2019-10-03
13 2019-10-02
process :
Rank() Over (Order by date_update)
id date_update rank
14 2019-10-01 1
13 2019-10-02 2
24 2019-10-03 3
22 2019-10-04 4
order by id desc based on above
id date_update rank
24 2019-10-03 3
22 2019-10-04 4
14 2019-10-01 1
13 2019-10-02 2
this is the expected result:
id rank
24 3
You can use the query below. It basically finds the row with max ID (latest ID), then queries the source table again using date_value of max id row as a filter.
WITH
`test.sample` AS
(
select 22 AS id, DATE('2019-10-04') as date_update union all
select 14 AS id, DATE('2019-10-01') as date_update union all
select 24 AS id, DATE('2019-10-03') as date_update union all
select 13 AS id, DATE('2019-10-02') as date_update
),
max_id_row AS
(
SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)] vals
FROM `test.sample`
)
SELECT m.vals.id, m.vals.date_update, COUNT(*) as rank
FROM `test.sample` as t
JOIN max_id_row as m
ON t.date_update <= m.vals.date_update
GROUP BY 1,2
Below is for BigQuery Standard SQL and should scale to whatever "large" data you have
#standardSQL
SELECT b.id, COUNT(1) + 1 AS `rank`
FROM `project.dataset.table` a
JOIN (
SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table`
) b
ON a.date_update < b.date_update
GROUP BY id
If to apply for sample data in your question -
WITH `project.dataset.table` AS (
SELECT 22 id, DATE '2019-10-04' date_update UNION ALL
SELECT 14, '2019-10-01' UNION ALL
SELECT 24, '2019-10-03' UNION ALL
SELECT 13, '2019-10-02'
)
result is
Row id rank
1 24 3
The "trick" here is in changing focus from not scalable code with non or badly parallelized operations (RANK) to something that is as simple as COUNT'ing
So, your case (at least as it is presented in question's "process" section) can be rephrased as finding number of rows before the day with highest id - that simple - thus above simple query. Obviously adding "1" to that count gives you exactly what would RANK gave you if worked

Oracle PARTITION BY GROUPING_ID with SUM

I'm trying to implement a simple data warehouse analytic query, dealing with 'YEAR_VALUE', 'MONTH_VALUE' and a 'INVOICE_COST'
SELECT YEAR_VALUE, MONTH_VALUE, SUM (INVOICE_VALUE) AS TOTAL_INVOICE,
RANK () OVER (PARTITION BY GROUPING_ID (YEAR_VALUE, MONTH_VALUE) ORDER BY SUM (INVOICE_VALUE) DESC) AS YEAR_RANK,
RANK () OVER (PARTITION BY YEAR_VALUE, GROUPING_ID (MONTH_VALUE) ORDER BY SUM (INVOICE_VALUE) DESC) AS MONTH_RANK
FROM FACT_WH
JOIN TIME_WH ON TIME_WH.TIME_ID = FACT_WH.TIME_ID
GROUP BY (YEAR_VALUE, MONTH_VALUE);
The output is :
Output
'YEAR_RANK' should express year's total invoice value compared to other years, 2016 has a YEAR_RANK=1 and 2015 has a YEAR_RANK=2
The problem is that 'YEAR_RANK' has the values 1,2,3,4,5 it should be 1,1,2,2,1
I can't find the problem in my code, It's maybe in line #2, I tried everything and wasted much time already.
Thanks in advance.
A good approach, especially in case the query is complex and/or delivers confusing results is to divide the whole query in subqueries each solving a particular task.
In your case I'd recommend to first attack the join of the fact and dimension table and group by on YEAR and month to calculate the total_invoice
You get results such as
YEAR_VALUE MONTH_VALUE TOTAL_INVIOCE
---------- ----------- -------------
2016 3 29960
2016 1 10700
2015 11 5100
2015 8 1680
2016 2 800
Note that you don't need any GROUP BY extension such as GROUPING_ID, you'll solve everything using analytic functions
In the next step (using the previous result as a factored subquery) you calculate the year and months totals - using analytic version of SUM.
In the last step you calculate the RANK. Note that for the year you need
a DENSE_RANK, while otherwise you get 'skipped' ranks such as 1,3 (due to repeated records for one year).
The year_rank is not partitioned at all, the month_rankis partitioned on YEAR as you order the months within a year.
with data as (
-- perform join and group by in this subquery
select 2016 year_value, 3 month_value, 29960 total_invioce from dual union all
select 2016 year_value, 1 month_value, 10700 total_invioce from dual union all
select 2015 year_value, 11 month_value, 5100 total_invioce from dual union all
select 2015 year_value, 8 month_value, 1680 total_invioce from dual union all
select 2016 year_value, 2 month_value, 800 total_invioce from dual),
year_month as (
-- perform year and month summary here
select
year_value, month_value, total_invioce,
sum(total_invioce) over (partition by year_value) total_invoice_year,
sum(total_invioce) over (partition by month_value) total_invoice_month
from data
)
-- perform ranking here
select year_value, month_value, total_invioce,
dense_rank() OVER (ORDER BY total_invoice_year DESC) year_rank,
rank() OVER (partition by year_value ORDER BY total_invoice_month DESC) month_rank
from year_month
order by total_invioce desc;
YEAR_VALUE MONTH_VALUE TOTAL_INVIOCE YEAR_RANK MONTH_RANK
---------- ----------- ------------- ---------- ----------
2016 3 29960 1 1
2016 1 10700 1 2
2015 11 5100 2 1
2015 8 1680 2 2
2016 2 800 1 3

How to make a time dependent distribution in SQL?

I have an SQL Table in which I keep project information coming from primavera.
Suppose that i have columns for Start Date,End Date,Duration, and Total Qty as shown below .
How can i distribute Total Qty over Months using these information. What kind of additional columns, sql queries i need in order to get correct monthly distribution?
Thanks in Advance.
Columns in order:
itemname,quantity,startdate,duration,enddate
item1 -- 108 -- 2013-03-25 -- 720 -- 2013-07-26
item2 -- 640 -- 2013-03-25 -- 720 -- 2013-07-26
.
.
I think the key is to break the records apart by month. Here is an example of how to do it:
with months as (
select 1 as mon union all select 2 union all select 3 union all
select 4 as mon union all select 5 union all select 6 union all
select 7 as mon union all select 8 union all select 9 union all
select 10 as mon union all select 11 union all select 12
)
select item, m.mon, quantity / nummonths
from (select t.*, (month(enddate) - month(startdate) + 1) as nummonths
from t
) t join
months m
on month(t.startDate) <= m.mon and
months(t.endDate) >= m.mon;
This works because all the months are within the same year -- as in your example. You are quite vague on how the split should be calculated. So, I assumed that every month from the start to the end gets an equal amount.

select maximum score grouped by date, display full datetime

Gday, I have a table that shows a series of scores and datetimes those scores occurred.
I'd like to select the maximum of these scores for each day, but display the datetime that the score occurred.
I am using an Oracle database (10g) and the table is structured like so:
scoredatetime score (integer)
---------------------------------------
01-jan-09 00:10:00 10
01-jan-09 01:00:00 11
01-jan-09 04:00:01 9
...
I'd like to be able to present the results such the above becomes:
01-jan-09 01:00:00 11
This following query gets me halfway there.. but not all the way.
select
trunc(t.scoredatetime), max(t.score)
from
mytable t
group by
trunc(t.scoredatetime)
I cannot join on score only because the same high score may have been achieved multiple times throughout the day.
I appreciate your help!
Simon Edwards
with mytableRanked(d,scoredatetime,score,rk) as (
select
scoredatetime,
score,
row_number() over (
partition by trunc(scoredatetime)
order by score desc, scoredatetime desc
)
from mytable
)
select
scoredatetime,
score
from mytableRanked
where rk = 1
order by date desc
In the case of multiple high scores within a day, this returns the row corresponding to the one that occurred latest in the day. If you want to see all highest scores in a day, remove scoredatetime desc from the order by specification in the row_number window.
Alternatively, you can do this (it will list ties of high score for a date):
select
scoredatetime,
score
from mytable
where not exists (
select *
from mytable as M2
where trunc(M2.scoredatetime) = trunc(mytable.scoredatetime)
and M2.score > mytable.scoredatetime
)
order by scoredatetime desc
First of all, you did not yet specify what should happen if two or more rows within the same day contain an equal high score.
Two possible answers to that question:
1) Just select one of the scoredatetime's, it doesn't matter which one
In this case don't use self joins or analytics as you see in the other answers, because there is a special aggregate function that can do your job more efficient. An example:
SQL> create table mytable (scoredatetime,score)
2 as
3 select to_date('01-jan-2009 00:10:00','dd-mon-yyyy hh24:mi:ss'), 10 from dual union all
4 select to_date('01-jan-2009 01:00:00','dd-mon-yyyy hh24:mi:ss'), 11 from dual union all
5 select to_date('01-jan-2009 04:00:00','dd-mon-yyyy hh24:mi:ss'), 9 from dual union all
6 select to_date('02-jan-2009 00:10:00','dd-mon-yyyy hh24:mi:ss'), 1 from dual union all
7 select to_date('02-jan-2009 01:00:00','dd-mon-yyyy hh24:mi:ss'), 1 from dual union all
8 select to_date('02-jan-2009 04:00:00','dd-mon-yyyy hh24:mi:ss'), 0 from dual
9 /
Table created.
SQL> select max(scoredatetime) keep (dense_rank last order by score) scoredatetime
2 , max(score)
3 from mytable
4 group by trunc(scoredatetime,'dd')
5 /
SCOREDATETIME MAX(SCORE)
------------------- ----------
01-01-2009 01:00:00 11
02-01-2009 01:00:00 1
2 rows selected.
2) Select all records with the maximum score.
In this case you need analytics with a RANK or DENSE_RANK function. An example:
SQL> select scoredatetime
2 , score
3 from ( select scoredatetime
4 , score
5 , rank() over (partition by trunc(scoredatetime,'dd') order by score desc) rnk
6 from mytable
7 )
8 where rnk = 1
9 /
SCOREDATETIME SCORE
------------------- ----------
01-01-2009 01:00:00 11
02-01-2009 00:10:00 1
02-01-2009 01:00:00 1
3 rows selected.
Regards,
Rob.
You might need two SELECT statements to pull this off: the first to collect the truncated date and associated max score, and the second to pull in the actual datetime values associated with the score.
Try:
SELECT T.ScoreDateTime, T.Score
FROM
(
SELECT
TRUNC(T.ScoreDateTime) ScoreDate, MAX(T.score) BestScore
FROM
MyTable T
GROUP BY
TRUNC(T.ScoreDateTime)
) ByDate
INNER JOIN MyTable T
ON TRUNC(T.ScoreDateTime) = ByDate.ScoreDate and T.Score = ByDate.BestScore
ORDER BY T.ScoreDateTime DESC
This will pull in best score ties as well.
For a version which selects only the most recently-posted high score for each day:
SELECT T.ScoreDateTime, T.Score
FROM
(
SELECT
TRUNC(T.ScoreDateTime) ScoreDate,
MAX(T.score) BestScore,
MAX(T.ScoreDateTime) BestScoreTime
FROM
MyTable T
GROUP BY
TRUNC(T.ScoreDateTime)
) ByDate
INNER JOIN MyTable T
ON T.ScoreDateTime = ByDate.BestScoreTime and T.Score = ByDate.BestScore
ORDER BY T.ScoreDateTime DESC
This may produce multiple records per date if two different scores were posted at exactly the same time.