I have a table and I need to run a query that contains some aggregation Functions like Maximum , Average , Standard Deviation , ...
but instead of one Maximum I should return 5 largest number.
the simplified query is something like this:
SELECT OSI_KEY , MAX(VALUE) , AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
and I need some Magical ;) Query like this:
SELECT OSI_KEY , MAX1(VALUE) ,MAX2(VALUE) ,MAX3(VALUE) ,MAX4(VALUE) , MAX5(VALUE) ,
AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
I appreciate your considerations.
Oracle has an NTH_VALUE() function. Unfortunately, it is only an analytic function and not a window function. This leads to the strange construct of SELECT DISTINCT with a bunch of analytic functions:
SELECT DISTINCT OSI_KEY,
MAX(VALUE) OVER (PARTITION BY OSI_KEY),
NTH_VALUE(VALUE, 2) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_2,
NTH_VALUE(VALUE, 3) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_3,
NTH_VALUE(VALUE, 4) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_4,
NTH_VALUE(VALUE, 5) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_5,
AVG(VALUE) OVER (PARTITION BY OSI_KEY),
STDDEV(VALUE) OVER (PARTITION BY OSI_KEY),
variance(VALUE) OVER (PARTITION BY OSI_KEY)
FROM DATA_VALUES_5MIN_6_2013
ORDER BY OSI_KEY;
You can also do this using conditional aggregation, with a row_number() or dense_rank() in a subquery.
SELECT OSI_KEY, MaxValue FROM (
SELECT OSI_KEY, MAX(value) AS MaxValue FROM table GROUP BY OSI_KEY
)
ORDER BY MaxValue DESC
FETCH FIRST 5 ROWS ONLY;
Related
I have a table that looks like:
I need to determine what the top 3 most common viewplanes are captured when first scanning a new patient (I believe the patients are indicated by the subject_label column).
In Pandas, this looks like:
df.sort_values('datetime').groupby('subject_label').first().viewplane
In SQL, I have tried:
WITH added_row_number
(SELECT
*,
ROW_NUMBER() OVER(PARTITION BY subject_label ORDER BY datetime ASC) AS row_number
FROM image_list_csv)
SELECT lower(viewplane),
COUNT(lower(viewplane)) OVER (ORDER BY datetime ASC) AS running_total
FROM added_row_number
WHERE ROW_NUMBER = 1
ORDER BY running_total DESC;
Which gives:
I have also tried:
WITH added_row_number AS ( SELECT
*,
ROW_NUMBER() OVER(PARTITION BY subject_label, datetime ORDER BY datetime DESC) AS row_number FROM image_list_csv ) SELECT
LOWER(viewplane), datetime FROM added_row_number WHERE row_number = 1;
Which gives:
I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC
Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC
I am posting a sample data below.
What I have is a row number which generated a number based on Date and Name columns (achieved using ROW_NUMBER function). What I need now, is another derived column called Group_Num which creates a number for each group (3 in this case). Can this be achieved considering the fact that my Name column repeats but the Date column value changes?
Thanks in advance.
Check This.
We can achive this using Row_number() ,lag() and SUM() .
select
Date,
Name,
Row_number()over( partition by Group_Num order by ROwiD ) Row_Num,
Group_Num
from
(
SELECT * ,
SUM(R) OVER(ORDER BY RowID) Group_Num
FROM
(
select *,
Case When
lag(name ) OVER (ORDER BY RowID ) = name
then 0 else 1 end as R
from
(
select DATE,NAME,
row_number() OVER ( ORDER BY (select 1)) AS 'RowID'
from #TableName
)A
)B
)C
order by ROwiD
OutPut :
You can use DENSE_RANK:
SELECT Date
, Name
, ROW_NUMBER() OVER(Partition By Name Order By Name, Date) as Row_Num
, DENSE_RANK() Over(order by Name) as Group_Num
FROM #Table
Assume I would like to rewrite the following aggregate query
select id, max(hittime)
from status
group by id
using an aggregate windowing function like
select id, max(hittime) over(partition by id order by hittime desc) from status
How can I specify, that I am only interested in the first result within the partition?
EDIT: I was thinking that there might be a solution with [ RANGE | ROWS ] BETWEEN frame_start AND frame_end. What to get not only max(hittime) but also the second, third ...
I think what you need is a ranking function, either ROW_NUMBER or DENSE_RANK depending on how you want to handle ties.
select id, hittime
from (
select id, hittime,
dense_rank() over(partition by id order by hittime desc) as ranking
from status
) as x
where ranking = 1; --to get max hittime
--where ranking <=2; --max and second largest
Use distinct statement.
select DISTINCT id, max(hittime) over(partition by id order by hittime desc) from status
So I have a table that holds two different dates and I am selecting the minutes difference between:
select customerID, customers.telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
from table;
And after that I want to get only the top 5 that have the highest amount of minutes, something like
rank() over (partition by total_mins order by total_mins)
How would one go about doing that?
Something like this should work for you:
SELECT *
FROM (
SELECT customerId, telNumber, rank() over (order by total_mins) rnk
FROM (
SELECT customerId,telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
FROM YourTable
) t
) t
WHERE rnk <= 10
This will get you ties, so it could return more than 10 rows. If you only want to return 10 rows, use ROW_NUMBER() instead of RANK().
SQL Fiddle Demo
I would add to sgeddes's example that the combination of rank() and row_number() is the best as rank() may return the same rank values for all or few rows. But row_number() will always be different. I'd use row_number() in Where clause, not rank().