SQL Finding five largest numbers instead of one Max in a table

SQL Finding five largest numbers instead of one Max in a table - sql

I have a table and I need to run a query that contains some aggregation Functions like Maximum , Average , Standard Deviation , ...
but instead of one Maximum I should return 5 largest number.
the simplified query is something like this:
SELECT OSI_KEY , MAX(VALUE) , AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
and I need some Magical ;) Query like this:
SELECT OSI_KEY , MAX1(VALUE) ,MAX2(VALUE) ,MAX3(VALUE) ,MAX4(VALUE) , MAX5(VALUE) ,
AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
I appreciate your considerations.

Oracle has an NTH_VALUE() function. Unfortunately, it is only an analytic function and not a window function. This leads to the strange construct of SELECT DISTINCT with a bunch of analytic functions:
SELECT DISTINCT OSI_KEY,
MAX(VALUE) OVER (PARTITION BY OSI_KEY),
NTH_VALUE(VALUE, 2) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_2,
NTH_VALUE(VALUE, 3) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_3,
NTH_VALUE(VALUE, 4) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_4,
NTH_VALUE(VALUE, 5) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_5,
AVG(VALUE) OVER (PARTITION BY OSI_KEY),
STDDEV(VALUE) OVER (PARTITION BY OSI_KEY),
variance(VALUE) OVER (PARTITION BY OSI_KEY)
FROM DATA_VALUES_5MIN_6_2013
ORDER BY OSI_KEY;
You can also do this using conditional aggregation, with a row_number() or dense_rank() in a subquery.

SELECT OSI_KEY, MaxValue FROM (
SELECT OSI_KEY, MAX(value) AS MaxValue FROM table GROUP BY OSI_KEY
)
ORDER BY MaxValue DESC
FETCH FIRST 5 ROWS ONLY;

Related

The SQL equivalent of pandas sort, group by, count and first

I have a table that looks like:
I need to determine what the top 3 most common viewplanes are captured when first scanning a new patient (I believe the patients are indicated by the subject_label column).
In Pandas, this looks like:
df.sort_values('datetime').groupby('subject_label').first().viewplane
In SQL, I have tried:
WITH added_row_number
(SELECT
*,
ROW_NUMBER() OVER(PARTITION BY subject_label ORDER BY datetime ASC) AS row_number
FROM image_list_csv)
SELECT lower(viewplane),
COUNT(lower(viewplane)) OVER (ORDER BY datetime ASC) AS running_total
FROM added_row_number
WHERE ROW_NUMBER = 1
ORDER BY running_total DESC;
Which gives:
I have also tried:
WITH added_row_number AS ( SELECT
*,
ROW_NUMBER() OVER(PARTITION BY subject_label, datetime ORDER BY datetime DESC) AS row_number FROM image_list_csv ) SELECT
LOWER(viewplane), datetime FROM added_row_number WHERE row_number = 1;
Which gives:

SQL Server : using CTE row partition to serialize sequential timestamps

I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC

Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC

Ranking Over Row_Number in SQL

I am posting a sample data below.
What I have is a row number which generated a number based on Date and Name columns (achieved using ROW_NUMBER function). What I need now, is another derived column called Group_Num which creates a number for each group (3 in this case). Can this be achieved considering the fact that my Name column repeats but the Date column value changes?
Thanks in advance.

Check This.
We can achive this using Row_number() ,lag() and SUM() .
select
Date,
Name,
Row_number()over( partition by Group_Num order by ROwiD ) Row_Num,
Group_Num
from
(
SELECT * ,
SUM(R) OVER(ORDER BY RowID) Group_Num
FROM
(
select *,
Case When
lag(name ) OVER (ORDER BY RowID ) = name
then 0 else 1 end as R
from
(
select DATE,NAME,
row_number() OVER ( ORDER BY (select 1)) AS 'RowID'
from #TableName
)A
)B
)C
order by ROwiD
OutPut :

You can use DENSE_RANK:
SELECT Date
, Name
, ROW_NUMBER() OVER(Partition By Name Order By Name, Date) as Row_Num
, DENSE_RANK() Over(order by Name) as Group_Num
FROM #Table

Limit result set in sql window function

Assume I would like to rewrite the following aggregate query
select id, max(hittime)
from status
group by id
using an aggregate windowing function like
select id, max(hittime) over(partition by id order by hittime desc) from status
How can I specify, that I am only interested in the first result within the partition?
EDIT: I was thinking that there might be a solution with [ RANGE | ROWS ] BETWEEN frame_start AND frame_end. What to get not only max(hittime) but also the second, third ...

I think what you need is a ranking function, either ROW_NUMBER or DENSE_RANK depending on how you want to handle ties.
select id, hittime
from (
select id, hittime,
dense_rank() over(partition by id order by hittime desc) as ranking
from status
) as x
where ranking = 1; --to get max hittime
--where ranking <=2; --max and second largest

Use distinct statement.
select DISTINCT id, max(hittime) over(partition by id order by hittime desc) from status

calculate minutes between dates and get top 10

So I have a table that holds two different dates and I am selecting the minutes difference between:
select customerID, customers.telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
from table;
And after that I want to get only the top 5 that have the highest amount of minutes, something like
rank() over (partition by total_mins order by total_mins)
How would one go about doing that?

Something like this should work for you:
SELECT *
FROM (
SELECT customerId, telNumber, rank() over (order by total_mins) rnk
FROM (
SELECT customerId,telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
FROM YourTable
) t
) t
WHERE rnk <= 10
This will get you ties, so it could return more than 10 rows. If you only want to return 10 rows, use ROW_NUMBER() instead of RANK().
SQL Fiddle Demo

I would add to sgeddes's example that the combination of rank() and row_number() is the best as rank() may return the same rank values for all or few rows. But row_number() will always be different. I'd use row_number() in Where clause, not rank().

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Finding five largest numbers instead of one Max in a table - sql

SELECT OSI_KEY, MaxValue FROM ( SELECT OSI_KEY, MAX(value) AS MaxValue FROM table GROUP BY OSI_KEY ) ORDER BY MaxValue DESC FETCH FIRST 5 ROWS ONLY;

Related

The SQL equivalent of pandas sort, group by, count and first

SQL Server : using CTE row partition to serialize sequential timestamps

Ranking Over Row_Number in SQL

Limit result set in sql window function

calculate minutes between dates and get top 10

Categories

Resources