Find the nth greatest value per group in SQL - sql

I'm trying to find the nth greatest value in each group in a table; is there an efficient way to do this in SQL? (specifically Google BigQuery, if that's relevant)
For example, suppose we had a table sales with two fields, customer_id and amount, where each record corresponds to the sale of an item to a customer for a given amount. If I wanted the top sale to each customer, I could do
SELECT customer_id, MAX(amount) top_amount
FROM sales
GROUP BY customer_id;
If I instead wanted the 5th greatest value for each customer, is there an efficient/idiomatic way to do that in SQL?

Consider below approach
SELECT customer_id, array_agg(amount order by amount desc limit 5)[safe_offset(4)] top_5th_amount
FROM sales
GROUP BY customer_id;

Yet another option with use of nth_value() function
SELECT distinct customer_id,
nth_value(amount, 5) over win top_5th_amount
FROM sales
window win as (partition by customer_id order by amount desc rows between unbounded preceding and unbounded following )

You can use qualify:
select s.*
from sales s
where 1=1
qualify row_number() over (partition by customer_id order by amount desc) = 5;
Note: You question is unclear on how to handle tied amounts. This treats them as separate amounts (so the 5th could be the same as the 1st). If you want the 5th largest distinct value, use dense_rank() instead.

Related

Oracle SQL Return First & Last Value From Different Columns By Partition

I need help with a query that will return a single record per partition in the below dataset. I used the DENSE_RANK to get the order and first/last position within each partition, but the problem is that I need to get a single record for each EMPLOYEE ITEM_ID combination which contains:
MIN(START) which is date type with time
SUM(DURATION) which is a number type signifying seconds of activity
MIN ranked value from INIT_STATUS
MAX ranked value from FIN_STATUS
Here is the initial data table, the same data table ordered with rank, and the desired result at the end (see image below):
Also, here is the code used to get the ordered table with rank values:
SELECT T.*,
DENSE_RANK() OVER (PARTITION BY T.EMPLOYEE, T.ITEM_ID ORDER BY T.START) AS D_RANK
FROM TEST_DATA T
ORDER BY T.EMPLOYEE, T.ITEM_ID, T.START;
Use first/last option to find statuses. The rest is classic aggregation:
select employee, min(start_), sum(duration),
max(init_status) keep (dense_rank first order by start_),
max(fin_status) keep (dense_rank last order by start_)
from test_data t
group by employee, item_id
order by employee, item_id;
start is a reserved word, so I used start_ for my test.

Oracle SQL rank query

I would like help with one of my queries.
Here's the requirement:-
I have to report the records whose difference between current review date and last review date is between than 365 days and more than 455 days. However, the catch here is that my customer table has just one column for the annual review date. So I have to check the historical table to find the current annual review date which in the below example is 30/04/2019 and the last review date is 30/04/2018.
How do I get just 1 line item for each record?
Below is how my table looks like, RNK column is a calculated column to determine the rank for each record, rest columns are from the table. Please help! I use Oracle 12c.
You may use row_number() analytical function for your rnk column as in the following select statement :
select row_number() over (partition by annual_review_date order by update_date) as rnk,
t.*
from tab t;
If I understand correctly, you can use dense_rank():
select t.id, max(annual_review_dt) as latest_ard,
min(annual_review_dt) as prev_ard
from (select t.*,
dense_rank() over (partition by id order by annual_review_dt) as seqnum
from t
) t
where seqnum in (1, 2);

Finding max of transaction_date for corresponding code

I'm trying to find out how to find max of transaction_date per EAN_code
My table looks like:
Transaction_Date EAN_Code
09/04/2018 3029440000286
09/04/2018 3029440000286
08/04/2018 5000128221139
14/04/2018 5000128221139
08/04/2018 5000128221139
10/04/2018 5000128221108
Essentially what we need to do is for the list of items we want to pull out the latest date that it was sold across, e.g. one row per product, last date sold.
Both columns have non distinct values.
Simply do a GROUP BY. Use MAX() to get the latest date for each product.
select EAN_Code, max(Transaction_Date)
from tablename
group by EAN_Code
You could use ROW_NUMBER/RANK:
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY Ean_Code
ORDER BY Transaction_Date DESC) AS rn
FROM table_name) s
WHERE s.rn = 1;

I need the Top 10 results from table

I need to get the Top 10 results for each Region, Market and Name along with those with highest counts (Gaps). There are 4 Regions with 1 to N Markets. I can get the Top 10 but cannot figure out how to do this without using a Union for every Market. Any ideas on how do this?
SELECT DISTINCT TOP 10
Region, Market, Name, Gaps
FROM
TableName
ORDER BY
Region, Market, Gaps DESC
One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your Region, Market, Name - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria.
So try something like this:
;WITH RegionsMarkets AS
(
SELECT
Region, Market, Name, Gaps,
RN = ROW_NUMBER() OVER(PARTITION BY Region, Market, Name ORDER BY Gaps DESC)
FROM
dbo.TableName
)
SELECT
Region, Market, Name, Gaps
FROM
RegionsMarkets
WHERE
RN <= 10
Here, I am selecting only the "first" entry for each "partition" (i.e. for each Region, Market, Name tuple) - ordered by Gaps in a descending fashion.
With this, you get the top 10 rows for each (Region, Market, Name) tuple - does that approach what you're looking for??
I think you want row_number():
select t.*
from (select t.*,
row_number() over (partition by region, market order by gaps desc) as seqnum
from tablename t
) t
where seqnum <= 10;
I am not sure if you want name in the partition by clause. If you have more than one name within a market, that may be what you are looking for. (Hint: Sample data and desired results can really help clarify a question.)

Select values with duplicate max values sql

I have a table made up of dates and sales totals for the particular date. I would like to be able to query the table and select the following: max sales, the date associated with the max sale figure, sum of all sales, and the minimum date in the table. One additional complication is that there are duplicate max values. I don't care which max value is chosen but I just want one at random. This is for Oracle.
Here is what I tried. It was using a sub query.
Select sales, date, min(date), sum(sales) from table
Where sales = (select distinct(max(sales)) from table)
select
max(sales),
max(date_) keep (dense_rank first order by sales desc),
sum(sales),
min(date_)
from
table_
See also This SQL Fiddle