calculate minutes between dates and get top 10 - sql

So I have a table that holds two different dates and I am selecting the minutes difference between:
select customerID, customers.telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
from table;
And after that I want to get only the top 5 that have the highest amount of minutes, something like
rank() over (partition by total_mins order by total_mins)
How would one go about doing that?

Something like this should work for you:
SELECT *
FROM (
SELECT customerId, telNumber, rank() over (order by total_mins) rnk
FROM (
SELECT customerId,telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
FROM YourTable
) t
) t
WHERE rnk <= 10
This will get you ties, so it could return more than 10 rows. If you only want to return 10 rows, use ROW_NUMBER() instead of RANK().
SQL Fiddle Demo

I would add to sgeddes's example that the combination of rank() and row_number() is the best as rank() may return the same rank values for all or few rows. But row_number() will always be different. I'd use row_number() in Where clause, not rank().

Related

Get last two rows from a row_number() window function in snowflake

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,
Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;
You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc
I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

SQL filter query results based on analytic function

I'd like to find an efficient way to filter my RANK() OVER function in SQL.
I have the following query:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM
`my_table` base
GROUP BY
1
Which returns this result set:
Now I'd like to filter for items where the SLS_rank is < 10 OR the txn_rank is < 10. Ideally I'd like to do this in the HAVING clause, like this:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM
`my_table` base
GROUP BY
1
HAVING
SLS_rank < 10 OR txn_rank < 10
But bigquery throws an error:
Column SLS_rank contains an analytic function, which is not allowed in HAVING clause at [9:8]
The only option I can think of is to create this as a separate table and selecting from there, but that doesn't seem very pretty. Any other ideas on how to do this?
Update June 2021.
BigQuery announced support for the QUALIFY clause on the 10th of May, 2021.
The QUALIFY clause filters the results of analytic functions. An analytic function is required to be present in the QUALIFY clause or the SELECT list.
What you need can be achieved with QUALIFY in the following way:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM `my_table` base
GROUP BY 1
QUALIFY SLS_rank < 10 OR txn_rank < 10
Find more examples in the documentation.
SELECT * FROM (
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM `my_table` base
GROUP BY 1
)
WHERE SLS_rank < 300 OR txn_rank < 300

Finding consecutive patterns (with SQL)

A table consecutive in PostgreSQL:
Each se_id has an idx
from 0 up to 100 - here 0 to 9.
The search pattern:
SELECT *
FROM consecutive
WHERE val_3_bool = 1
AND val_1_dur > 4100 AND val_1_dur < 5900
Now I'm looking for the longest consecutive appearance of this pattern
for each p_id - and the AVG of the counted val_1_dur.
Is it possible to calculate this in pure SQL?
table as txt
"Result" as txt
One method is the difference of row numbers approach to get the sequences for each:
select pid, count(*) as in_a_row, sum(val1_dur) as dur
from (select t.*,
row_number() over (partition by pid order by idx) as seqnum,
row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
from consecutive t
) t
group by (seqnun - seqnum_d), pid, val3_bool;
If you are looking specifically for "1" values, then add where val3_bool = 1 to the outer query. To understand why this works, I would suggest that you stare at the results of the subquery, so you can understand why the difference defines the consecutive values.
You can then get the max using distinct on:
select distinct on (pid) t.*
from (select pid, count(*) as in_a_row, sum(val1_dur) as dur
from (select t.*,
row_number() over (partition by pid order by idx) as seqnum,
row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
from consecutive t
) t
group by (seqnun - seqnum_d), pid, val3_bool;
) t
order by pid, in_a_row desc;
The distinct on does not require an additional level of subquery, but I think that makes the logic clearer.
There are Window Functions, that enable you to compare one line with the previous and next one.
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.postgresql.org/docs/current/static/tutorial-window.html
As seen on How to compare the current row with next and previous row in PostgreSQL? and Filtering by window function result in Postgresql

SQL Finding five largest numbers instead of one Max in a table

I have a table and I need to run a query that contains some aggregation Functions like Maximum , Average , Standard Deviation , ...
but instead of one Maximum I should return 5 largest number.
the simplified query is something like this:
SELECT OSI_KEY , MAX(VALUE) , AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
and I need some Magical ;) Query like this:
SELECT OSI_KEY , MAX1(VALUE) ,MAX2(VALUE) ,MAX3(VALUE) ,MAX4(VALUE) , MAX5(VALUE) ,
AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
I appreciate your considerations.
Oracle has an NTH_VALUE() function. Unfortunately, it is only an analytic function and not a window function. This leads to the strange construct of SELECT DISTINCT with a bunch of analytic functions:
SELECT DISTINCT OSI_KEY,
MAX(VALUE) OVER (PARTITION BY OSI_KEY),
NTH_VALUE(VALUE, 2) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_2,
NTH_VALUE(VALUE, 3) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_3,
NTH_VALUE(VALUE, 4) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_4,
NTH_VALUE(VALUE, 5) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_5,
AVG(VALUE) OVER (PARTITION BY OSI_KEY),
STDDEV(VALUE) OVER (PARTITION BY OSI_KEY),
variance(VALUE) OVER (PARTITION BY OSI_KEY)
FROM DATA_VALUES_5MIN_6_2013
ORDER BY OSI_KEY;
You can also do this using conditional aggregation, with a row_number() or dense_rank() in a subquery.
SELECT OSI_KEY, MaxValue FROM (
SELECT OSI_KEY, MAX(value) AS MaxValue FROM table GROUP BY OSI_KEY
)
ORDER BY MaxValue DESC
FETCH FIRST 5 ROWS ONLY;

Need help to find the middle row using Row_Number

SELECT median.spaid
,median.total
,ROW_NUMBER() OVER (
ORDER BY median.total
) AS row
FROM (
SELECT SpaID
,COUNT(1) AS Total
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID
) AS median
ORDER BY median.total
My issue here is that I need to find the middle row for column "Total" using Row_number. I need to find which "SpaID" is linked to the middle row of the "Total" column.
This is a shot in the dark based on very sparse details but I think you are looking for something like this.
with numberedResults as
(
select spaid
, ROW_NUMBER() over(order by count(*)) as RowNum
from [order]
where DateCreated between '20140401' AND '20140630'
group by SpaID
)
, Medians as
(
select MAX(RowNum) / 2 as Median
, MAX(RowNum) as TotalCount
from numberedResults
)
select *
from numberedResults r
join Medians m on m.Median = r.RowNum
I would suggest not relying on ROW_NUMBER in your query as results using ROW_NUMBER can at times be unpredictable. I understand it seems bulky - -the challenge is the "median" is the middle of grouped rows. Here's the query I believe should work for you:
SELECT SpaID, d FROM
(SELECT SpaID,
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID)
WHERE D=
(SELECT ROUND(MAX(D)/2,0)
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014')
Here is one method of finding the median:
SELECT o.*
FROM (SELECT SpaID, COUNT(*) AS Total,
ROW_NUMBER() OVER (ORDER BY COUNT(*)) as seqnum,
COUNT(*) OVER () as cnt
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '2014-04-01' AND '2014-04-30'
GROUP BY SpaID
) o
WHERE 2*o.seqnum IN (cnt - 1, cnt);
This is approximate when you have an even number of rows. You are looking for the exact row id, so you have to choose either the one before or after the median (which is between two rows).
Note: You should expression date constants using the ISO standard formats, either YYYYMMDD or YYYY-MM-DD. The first is the safest way in SQL Server (although I personally prefer the hyphens for readability).