SQL most recent using row_number() over partition - sql

I'm working with some web clicks data, and am just looking for the most recent page_name with the user_id visited (by a timestamp). Using the below code, the user_id is repeated and page_name with shown, with sorted descending. However, I would just like recent_click always = 1. The query when complete will be used as a subquery in a larger query.
Here is my current code:
SELECT user_id,
page_name,
row_number() over(partition by session_id order by ts desc) as recent_click
from clicks_data;
user_id | page_name | recent_click
--------+-------------+--------------
0001 | login | 1
0001 | login | 2
0002 | home | 1

You should be able to move your query to a subquery and add where criteria:
SELECT user_id, page_name, recent_click
FROM (
SELECT user_id,
page_name,
row_number() over (partition by session_id order by ts desc) as recent_click
from clicks_data
) T
WHERE recent_click = 1

You should move the row_number() function into a subquery and then filter it in the outer query.
Something like this:
SELECT * FROM (
SELECT
[user_id]
,[page_name]
,ROW_NUMBER() OVER (PARTITION BY [session_id]
ORDER BY [ts] DESC) AS [recent_click]
FROM [clicks_data]
)x
WHERE [recent_click] = 1

Related

Distinct particular field in select query

I have table with below sample values.
|Id|Keyword|insertedon|
|:-|:------|:---------|
|1 | abcd | 13/12/20 |
|2 | cdef | 14/12/20 |
|3 | abcd | 14/12/20 |
|4 | defg | 14/12/20 |
In the above table i need distinct values of keywords order by insertedon desc order.
I need recent top 5 results.
Expected Result:
defc
abcd
cdef
Please let me know how to achieve this.
You get the top 5 results with TOP(5) in SQL Server. You'd order the keywords by their last insertedon date:
select top(5) keyword
from mytable
group by keyword
order by max(insertedon) desc;
If you are looking for latest entries based on insertedon column, you can find using the group by clause, something like this:
select keyword, max(insertedon)
from table
group by keyword
order by 2 desc
You can just use select distinct:
select distinct keyword
from t;
If you wanted a full row, you could use row_number():
select t.*
from (select t.*,
row_number() over (partition by keyword order by newid()) as seqnum
from t
) t
where seqnum = 1;
EDIT:
For the edited version, you can use:
select distinct keyword
from (select top (5) keyword
from t
order by insertedon desc
) k
Give a row number based on the descending order of the date column and then select the row wth row number 1.
Query
;with cte as(
select [rn] = row_number() over(
partition by [keyword]
order by [insertedon] desc, [id] desc
)
)
select [keyword] from cte
where [rn] = 1;
You can use the analytical functions as follows:
select t.* from
(select t.*,
row_number() over (partition by keyword order by insertedon desc) as rn,
Dense_rank() over (order by insertedon desc) as dr
from t ) t where rn = 1 and dr <= 5;

How to count repeating values in a column in PostgreSQL?

Hi I have a table like below, and I want to count the repeating values in the status column. I don't want to calculate the overall duplicate values. For example, I just want to count how many "Offline" appears until the value changes to "Idle".
This is the result I wanted. Thank you.
This is often called gaps-and-islands.
One way to do it is with two sequences of row numbers.
Examine each intermediate result of the query to understand how it works.
WITH
CTE_rn
AS
(
SELECT
status
,dt
,ROW_NUMBER() OVER (ORDER BY dt) as rn1
,ROW_NUMBER() OVER (PARTITION BY status ORDER BY dt) as rn2
FROM
T
)
SELECT
status
,COUNT(*) AS cnt
FROM
CTE_rn
GROUP BY
status
,rn1-rn2
ORDER BY
min(dt)
;
Result
| status | cnt |
|---------|-----|
| offline | 2 |
| idle | 1 |
| offline | 2 |
| idle | 1 |
WITH
cte1 AS ( SELECT status,
"date",
workstation,
CASE WHEN status = LAG(status) OVER (PARTITION BY workstation ORDER BY "date")
THEN 0
ELSE 1 END changed
FROM test ),
cte2 AS ( SELECT status,
"date",
workstation,
SUM(changed) OVER (PARTITION BY workstation ORDER BY "date") group_num
FROM cte1 )
SELECT status, COUNT(*) "count", workstation, MIN("date") "from", MAX("date") "till"
FROM cte2
GROUP BY group_num, status, workstation;
fiddle

redshift: how to find row_number after grouping and aggregating?

Suppose I have a table of customer purchases ("my_table") like this:
--------------------------------------
customerid | date_of_purchase | price
-----------|------------------|-------
1 | 2019-09-20 | 20.23
2 | 2019-09-21 | 1.99
1 | 2019-09-21 | 123.34
...
I'd like to be able to find the nth highest spending customer in this table (say n = 5). So I tried this:
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (partition by customerid order by total_pay desc) as rn
from my_table group by customerid order by total_pay desc)
select * from cte where rn = 5;
But this gives me nonsense results. For some reason rn doesn't seem to be unique (for example there are a bunch of customers with rn = 1). I don't understand why. Isn't rn supposed to be just a row number?
Remove the partition by in the definition of row_number():
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (order by total_pay desc) as rn
from my_table
group by customerid
)
select *
from cte
where rn = 5;
You are already aggregating by customerid, so each customer has only one row. So the value of rn will always be 1.

Min Date from one column multiple rows

My apologies, I should have added every column and complete problem not just portion.
I have a table A which stores all invoices issued(id 1) payments received (id 4) from clients. Sometimes client pay in 2-3 installments. I want to find dateifference between invoice issued and last payment collected for the invoice. My data looks like this
**a.cltid**|**A.Invnum**|A.Cash|A.Date | a.type| a.status
70 |112 |-200 |2012-03-01|4 |P
70 |112 |-500 |2012-03-12|4 |P
90 |124 |-550 |2012-01-20|4 |P
70 |112 |700 |2012-02-20|1 |p
55 |101 |50 |2012-01-15|1 |d
90 |124 |550 |2012-01-15|1 |P
I am running
Select *, Datediff(dd,T.date,P.date)
from (select a.cltid, a.invnumber,a.cash, min(a.date)date
from table.A as A
where a.status<>'d' and a.type=1
group by a.cltid, a.invnumber,a.cash)T
join
Select *
from (select a.cltid, a.invnumber,a.cash, min(a.date)date
from table.A as A
where a.status<>'d' and a.type=4
group by a.cltid, a.invnumber,a.cash)P
on
T.invnumb=P.invnumber and T.cltid=P.cltid
How can I make it work? So it shows me
70|112|-500|2012-03-12|4|P 70|112|700|2012-02-20|1|p|22
90|124|-550|2012-01-20|4|P 90|124|550|2012-01-15|1|P|5
Edited***
You can use row_number to assign sequence number within each cltid in the order of decreasing date and then filter to get the first row for each cltid which will be the row with latest date for that cltid:
select *
from (
select A.*,
row_number() over (
partition by a.cltid order by a.date desc
) rn
from table.A as A
) t
where rn = 1;
It will return one row (with latest date) for each client. If you want to return all the rows which have latest date, use rank() instead.
Use a ranking function to get all the columns:
select a.*
from (select a.*,
row_number() over (partition by cltid order by date desc) as seqnum
from a
) a
where seqnum = 1;
Use aggregation if you only want the date. The issue with your query is that the group by clause has too many columns:
select a.cltid, max(a.date) as date
from table.A as A
group by a.cltid;
And the fact that min() returns the first date not the last date.
There are many ways to do this. Here are some of them:
test setup: http://rextester.com/VGUY60367
with common_table_expression as () using row_number()
with cte as (
select *
, rn = row_number() over (
partition by cltid, Invnum
order by [date] desc
)
from a
)
select cltid, Invnum, Cash, [date]
from cte
where rn = 1
cross apply version:
select distinct
a.cltid
, a.Invnum
, x.Cash
, x.[date]
from a
cross apply (
select top 1
cltid, Invnum
, [date]
, Cash
from a as i
where i.cltid =a.cltid
and i.Invnum=a.Invnum
order by i.[date] desc
) as x;
top with ties version:
select top 1 with ties
*
from a
order by
row_number() over (
partition by cltid, Invnum
order by [date] desc
)
all return:
+-------+--------+---------------------+------+
| cltid | Invnum | date | Cash |
+-------+--------+---------------------+------+
| 70 | 112 | 12.03.2012 00:00:00 | -500 |
| 90 | 124 | 20.01.2012 00:00:00 | -550 |
+-------+--------+---------------------+------+
You can achieve the desired o/p by this:
Select
a.cltid, a.invnumber,a.cash, max(a.date) [date]
from
YourTable a
group by
a.cltid, a.invnumber, a.cash, a.date

Getting the First and Last Row Using ROW_NUMBER and PARTITION BY

Sample Input
Name | Value | Timestamp
-----|-------|-----------------
One | 1 | 2016-01-01 02:00
Two | 3 | 2016-01-01 03:00
One | 2 | 2016-01-02 02:00
Two | 4 | 2016-01-03 04:00
Desired Output
Name | Value | EarliestTimestamp | LatestTimestamp
-----|-------|-------------------|-----------------
One | 2 | 2016-01-01 02:00 | 2016-01-02 02:00
Two | 4 | 2016-01-01 03:00 | 2016-01-03 04:00
Attempted Query
I am trying to use ROW_NUMBER() and PARTITION BY to get the latest Name and Value but I would also like the earliest and latest Timestamp value:
SELECT
t.Name,
t.Value,
t.????????? AS EarliestTimestamp,
t.Timestamp AS LatestTimestamp
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) AS RowNumber,
Name,
Value
Timestamp) t
WHERE t.RowNumber = 1
This can be done using window functions min and max.
select distinct name,
min(timestamp) over(partition by name), max(timestamp) over(partition by name)
from tablename
Example
Edit: Based on the comments
select t.name,t.value,t1.earliest,t1.latest
from t
join (select distinct name,
min(tm) over(partition by name) earliest, max(tm) over(partition by name) latest
from t) t1 on t1.name = t.name and t1.latest = t.tm
Edit: Another approach is using the first_value window function, which would eliminate the need for a sub-query and join.
select distinct
name,
first_value(value) over(partition by name order by timestamp desc) as latest_value,
min(tm) over(partition by name) earliest,
-- or first_value can be used
-- first_value(timestamp) over(partition by name order by timestamp)
max(tm) over(partition by name) latest
-- or first_value can be used
-- first_value(timestamp) over(partition by name order by timestamp desc)
from t
If I'm understanding your question correctly, here's one option using the row_number function twice. Then to get them on the same row, you can use conditional aggregation.
This should be close:
SELECT
t.Name,
t.Value,
max(case when t.minrn = 1 then t.timestamp end) AS EarliestTimestamp,
max(case when t.maxrn = 1 then t.timestamp end) AS LatestTimestamp
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP) as minrn,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) as maxrn,
Name,
Value
Timestamp
FROM YourTable) t
WHERE t.minrn = 1 or t.maxrn = 1
GROUP BY t.Name, t.Value
Use MIN(Timestamp) OVER (PARTITION BY Name) in addition to the ROW_NUMBER() column, like so:
SELECT
t.Name,
t.Value,
t.EarliestTimestamp AS EarliestTimestamp,
t.Timestamp AS LatestTimestamp
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) AS RowNumber,
MIN(Timestamp) OVER (PARTITION BY Name) AS EarliestTimestamp,
^^
Name,
Value
Timestamp) t
WHERE t.RowNumber = 1
You can use MIN and MAX functions + OUTER APPLY:
SELECT t.Name,
p.[Value],
MIN(t.[Timestamp]) as EarliestTimestamp ,
MAX(t.[Timestamp]) as LatestTimestamp
FROM Table1 t
OUTER APPLY (SELECT TOP 1 * FROM Table1 WHERE t.Name = Name ORDER BY [Timestamp] DESC) p
GROUP BY t.Name, p.[Value]
Output:
Name Value EarliestTimestamp LatestTimestamp
One 2 2016-01-01 02:00 2016-01-02 02:00
Two 4 2016-01-01 03:00 2016-01-03 04:00
If I understood your question, use the row_number() function as follows:
SELECT
t.Name,
t.Value,
min(t.Timestamp) Over (Partition by name) As EarliestTimestamp,
t.Timestamp AS LatestTimestamp
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) AS RowNumber,
Name,
Value,
Timestamp) t
WHERE t.RowNumber = 1
Group By t.Name, t.Value, t.TimeStamp
Think simple.
select
t.Name,
MAX(t.Value),
MIN(t.Timestamp),
MAX(t.Timestamp)
FROM
t
group by
t.Name