Presto: Filtering by Row Number - sql

Can anyone help me to translate Teradata SQL QUALIFY ROW_NUMBER() OVER into Presto:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER(ORDER BY id DESC) > 5000000
AND ROW_NUMBER() OVER(ORDER BY id DESC) <= 10000000;
Or provide some suggestions how to extract large datasets by row filtering.

As far as I understand there is no direct analog for QUALIFY clause in PrestoSQL/Trino. You can just use window function in the WHERE clause. Something like this:
SELECT *
FROM table1
WHERE ROW_NUMBER() OVER(ORDER BY id DESC) BETWEEN 5000001
AND 10000000;

Related

Get last two rows from a row_number() window function in snowflake

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,
Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;
You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc
I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

SQL Server : using CTE row partition to serialize sequential timestamps

I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC
Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC

SQL filter query results based on analytic function

I'd like to find an efficient way to filter my RANK() OVER function in SQL.
I have the following query:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM
`my_table` base
GROUP BY
1
Which returns this result set:
Now I'd like to filter for items where the SLS_rank is < 10 OR the txn_rank is < 10. Ideally I'd like to do this in the HAVING clause, like this:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM
`my_table` base
GROUP BY
1
HAVING
SLS_rank < 10 OR txn_rank < 10
But bigquery throws an error:
Column SLS_rank contains an analytic function, which is not allowed in HAVING clause at [9:8]
The only option I can think of is to create this as a separate table and selecting from there, but that doesn't seem very pretty. Any other ideas on how to do this?
Update June 2021.
BigQuery announced support for the QUALIFY clause on the 10th of May, 2021.
The QUALIFY clause filters the results of analytic functions. An analytic function is required to be present in the QUALIFY clause or the SELECT list.
What you need can be achieved with QUALIFY in the following way:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM `my_table` base
GROUP BY 1
QUALIFY SLS_rank < 10 OR txn_rank < 10
Find more examples in the documentation.
SELECT * FROM (
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM `my_table` base
GROUP BY 1
)
WHERE SLS_rank < 300 OR txn_rank < 300

BigQuery select row_number() over (order by tablename) from dbc.tables

I am translating a very large CTE Teradata query and got stuck at this following portion that is its own subquery, which is being cross joined into a much large subquery.
How can I translate this query into Bigquery?
(select row_number() over (order by tablename) subsequent_month from dbc.tables qualify row_number() over (order by tablename) <= 24)
Thoughts guys?
Below is for BigQuery Standard SQL
#standardSQL
SELECT subsequent_month FROM (
SELECT ROW_NUMBER() OVER (ORDER BY tablename) subsequent_month
FROM dbc.tables
) WHERE subsequent_month <= 24
You need a subquery for the qualify:
from (select row_number() over (order by tablename) as subsequent_month
from dbc.tables
) t
where subsequent_month < 24;
In Teradata, qualify is a "where" clause that works on window functions. It is analogous to having which works on aggregation functions.
This simply returns 24 rows with consecutive numbers from 1 to 24.
So translate it to a similar query against a any table in BigQuery or use an existing numbers table.

GBQ window function AND arithmetic operations

Does anyone know if it is possible to do any arithmetic operation on a result derived using GBQ window functions?
For example, can I increase row_number by 100 (some number) using pseudocode like this:
SELECT 100 + ROW_NUMBER() OVER (PARTITION BY X ORDER BY x_id DESC) increased_row_num
FROM Table1
...
You will need to use subquery for that
SELECT 100 + row_num AS increased_row_num FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY X ORDER BY x_id DESC) AS row_num
FROM Table1
)
but I'we hoped that there is another solution
With BigQuery Standard SQL expected functionality works now as is
#standardSQL
SELECT 100 + ROW_NUMBER() OVER (PARTITION BY X ORDER BY x_id DESC) increased_row_num
FROM Table1
See Enabling Standard SQL and Migrating from legacy SQL