Get last two rows from a row_number() window function in snowflake - sql

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,

Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;

You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc

I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

Related

Presto: Filtering by Row Number

Can anyone help me to translate Teradata SQL QUALIFY ROW_NUMBER() OVER into Presto:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER(ORDER BY id DESC) > 5000000
AND ROW_NUMBER() OVER(ORDER BY id DESC) <= 10000000;
Or provide some suggestions how to extract large datasets by row filtering.
As far as I understand there is no direct analog for QUALIFY clause in PrestoSQL/Trino. You can just use window function in the WHERE clause. Something like this:
SELECT *
FROM table1
WHERE ROW_NUMBER() OVER(ORDER BY id DESC) BETWEEN 5000001
AND 10000000;

SQL Server : using CTE row partition to serialize sequential timestamps

I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC
Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC

Multiple records not repeated

I have a table called TABLE_SCREW where I want to get the latest records for each code.
For example, in the table below you should obtain the records with ids 3 and 7.
I am a newbie in sql and I hope you can help me.
You could use:
SELECT TOP 1 WITH TIES *
FROM TABLE_SCREW
ORDER BY ROW_NUMBER() OVER(PARTITION BY CODE ORDER BY Date DESC);
Another approach(may have better performance):
SELECT * -- here * should be replaced with actual column names
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY CODE ORDER BY Date DESC) AS rn
FROM TABLE_SCREW) sub
WHERE sub.rn = 1;

How to write a derived query in Netezza SQL?

I need to query the data for inviteid based. For each inviteid I need to have the top 5 IDs and ID Descriptions.
I see that the query I wrote is taking all the time in the world to fetch. I didn't notice an error or anything wrong with it.
The code is:
SELECT count(distinct ID),
IDdesc,
inviteid,
A
FROM (
SELECT
ID,
IDdesc,
inviteid,
RANK() OVER(order by invtypeid asc ) A
FROM Fact_s
--WHERE dateid ='26012013'
GROUP BY invteid,IDdesc,ID
ORDER BY invteid,IDdesc,ID
) B
WHERE A <=5
GROUP BY A, IDDESC, inviteid
ORDER BY A
I'm not sure I understood you requirement completely, but as far as I can tell the group by in the derived table is not necessary (just as the order by as Mark mentioned) because you are using a window function.
And you probably want row_number() instead of rank() in there.
Including the result of rank() in the outer query seems dubious as well.
So this leads to the following statement:
SELECT count(distinct ID),
IDdesc,
inviteid
FROM (
SELECT ID,
IDdesc,
inviteid,
row_number() OVER (order by invtypeid asc ) as rn
FROM Fact_s
) B
WHERE rn <= 5
GROUP BY IDDESC, inviteid;

calculate minutes between dates and get top 10

So I have a table that holds two different dates and I am selecting the minutes difference between:
select customerID, customers.telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
from table;
And after that I want to get only the top 5 that have the highest amount of minutes, something like
rank() over (partition by total_mins order by total_mins)
How would one go about doing that?
Something like this should work for you:
SELECT *
FROM (
SELECT customerId, telNumber, rank() over (order by total_mins) rnk
FROM (
SELECT customerId,telNumber,
sum(round((enddate - startdate) * 1440)) over (partition by telNumber) total_mins
FROM YourTable
) t
) t
WHERE rnk <= 10
This will get you ties, so it could return more than 10 rows. If you only want to return 10 rows, use ROW_NUMBER() instead of RANK().
SQL Fiddle Demo
I would add to sgeddes's example that the combination of rank() and row_number() is the best as rank() may return the same rank values for all or few rows. But row_number() will always be different. I'd use row_number() in Where clause, not rank().