BigQuery/SQL: Select first row of each group - sql

I am trying to select the first row of each group. Eg: below table I would want to keep product for www/edf/ and cate for www/abc/.
I have tried multiple ways, eg: ROW_NUMBER() OVER(PARTITION BY [...]) but somehow not getting expected outputs. The challenge for me here is that category does not have numbers as values, otherwise I can filter it down using max or min.
SELECT landing_page, ROW_NUMBER OVER ( PARTITION BY landing_page ORDER BY Page_Type DESC) AS ROW_NUM
from `xxxx.TEST.draft`
However I got this error: OVER keyword must follow a function call
Appreciate any help!
Landing_page
Page_type
www/edf/
product
www/edf/
home
www/abc/
cate
www/abc/
home

I believe you are looking for the function [FIRST_VALUE][1]?
SELECT
landing_page,
FIRST_VALUE(URL)
OVER ( PARTITION BY landing_page ORDER BY Page_Type DESC) AS first_url
FROM `xxxx.TEST.draft`

I answered my own question, sharing the code in case anyone needs:
SELECT session_id,landing_page, Page_Type, ROW_NUMBER () OVER (PARTITION BY session_id,landing_page ORDER BY Page_Type DESC) AS ROW_NUM
from `xxx._TEST.draft`
order by session_id desc

Related

Get last two rows from a row_number() window function in snowflake

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,
Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;
You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc
I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

SQL Server : using CTE row partition to serialize sequential timestamps

I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC
Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC

Rank() based on column entries while the data is ordered by date

I'm trying to use dense_rank() function over the pagename column after the data is ordered by time_id.
Expected output in rank column, rn, is: [1,2,2,3,4].
Currently I wrote it as:
with tbl2 as
(select UID, pagename, date_id, time_id, source--, dense_rank() over(partition by UID order by pagename) as rn
from tbl1
order by time_id)
select *, dense_rank() over(partition by UID order by time_id, pagename) as rn
from tbl2
Any help would be appreciated
Edit 1: What I am trying to achieve here is to give ranks, as per the user on-screen action flow, to the pages that are visited. Suppose if the same page 'A' is visited back after visiting a different page 'B' then the ranks for these page visits A, B, A will be 1,2,3 (note that the same page A has different ranks 1 & 3)
step-by-step demo:db<>fiddle
SELECT
*,
SUM(is_diff) OVER (ORDER BY date_id, time_id, page)
FROM (
SELECT
*,
CASE WHEN page = lag(page) over (order by date_id, time_id) THEN 0 ELSE 1 END as is_diff
FROM mytable
)s
This looks exactly like a problem I asked some years ago: Window functions: PARTITION BY one column after ORDER BY another
You want to execute a window function on columns (uuid, page) but want to keep the current order which is given by unrelated columns (date_id, time_id).
The problem is, that PARTITION BY orders the records before the ORDER BY clause. So, it defines the primary order and this is not expected.
Once I found a solution for that. I adapted it to your used case. Please read the explanation over there: https://stackoverflow.com/a/52439794/3984221
Interesting part: Your special rank() case is not explicitly required in the query, because my solution creates that out-of-the-box ("by accident" so-to-speak ;) ).
Hmmm . . . If you want the pages ordered by their earliest time, then use two levels of window functions:
select t.*,
dense_rank() over (partition by uid order by min_rn, pagename) as ranking
from (select t.*,
min(rn) over (partition by uid, pagename) as min_rn
from t
) t
Note: This uses rn as a convenient shortcut because the date/time is split into two columns. You can also combine them:
select t.*,
dense_rank() over (partition by uid order by min_dt, pagename) as ranking
from (select t.*,
min(date_id || time_id) over (partition by uid, pagename) as min_dt
from t
) t;
Note: This solution is different from S_man's. On your sample data, they do the same thing. However, if the user returns to a page, then his gives page a new ranking. This gives the page the same ranking as the first time it appears. It is not clear what you really want.
You can use DENSE_RANK() like this for your requirment,
SELECT
u_id,
page_name,
date_id,
time_id,
source,
DENSE_RANK()
OVER (
PARTITION BY page_name
ORDER BY u_id DESC
) rn
FROM ( SELECT * FROM tbl1 ORDER BY time_id ) AS result;

Is there a way to group rankings in SQL Teradata?

I am trying to get the ranking or grouping to count like in the custom_ranking column:
I want it to count the rank like in the row custom_ranking, but everything I keep trying is counting it in the current_ranking row.
I am currently using this:
,row_number() OVER (partition by custID, propID ORDER BY trans_type desc, record_date desc) AS RANKING
Based on your sample data, this would be:
dense_rank() over (partition by custid order by propid)

Limit result set in sql window function

Assume I would like to rewrite the following aggregate query
select id, max(hittime)
from status
group by id
using an aggregate windowing function like
select id, max(hittime) over(partition by id order by hittime desc) from status
How can I specify, that I am only interested in the first result within the partition?
EDIT: I was thinking that there might be a solution with [ RANGE | ROWS ] BETWEEN frame_start AND frame_end. What to get not only max(hittime) but also the second, third ...
I think what you need is a ranking function, either ROW_NUMBER or DENSE_RANK depending on how you want to handle ties.
select id, hittime
from (
select id, hittime,
dense_rank() over(partition by id order by hittime desc) as ranking
from status
) as x
where ranking = 1; --to get max hittime
--where ranking <=2; --max and second largest
Use distinct statement.
select DISTINCT id, max(hittime) over(partition by id order by hittime desc) from status