Rank Function not limiting to first 5 ranked values Postgresql Windows Functions - sql

I am trying to get the last previous five sessions data for a number of products. I am using the below query but it is not showing the first 5 only, its showing all sessions with a rank column.
Could someone assist me to troubleshoot to filter and show the first 5 only?
Select
sessionid,
productid,
processed_nos,
rank()
OVER (
PARTITION BY productid
ORDER BY sessionid Asc
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) AS per_session_rank
from stats_stable

You need to use a Derived Table to be able to filter the result of a RANK:
select *
from
(
Select
sessionid,
productid,
processed_nos,
rank()
OVER (PARTITION BY productid
ORDER BY sessionid Asc
) AS per_session_rank
from stats_stable
) as dt
WHERE per_session_rank <= 5

Related

How to display in Big Query ONLY duplicated records?

To view records without duplicated ones, I use this SQL
SELECT * EXCEPT(row_number)
FROM (SELECT*,ROW_NUMBER() OVER (PARTITION BY orderid) row_number
FROM `TABLE`)
WHERE row_number = 1
What is the best practice to display only duplicated records from a single table?
Below is for BigQuery Standard SQL
Me personally, I prefer not to rely on ROW_NUMBER() whenever it is possible because with big volume of data it tends to lead to Resource Exceeded error
So, from my experience I would recommend below options:
To view records for those orderid with only one entry:
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY orderid
HAVING COUNT(1) = 1
to view records for those orderid with more than one entry:
#standardSQL
SELECT * EXCEPT(flag) FROM (
SELECT *, COUNT(1) OVER(PARTITION BY orderid) > 1 flag
FROM `project.dataset.table`
)
WHERE flag
note: behind the hood - COUNT(1) OVER() can be calculated using as many workers as available while ROW_NUMBER() OVER() requires all respective data to be moved to one worker (thus Resource related issue)
OR
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE orderid IN (
SELECT orderid FROM `project.dataset.table`
GROUP BY orderid HAVING COUNT(1) > 1
)
Why not just change the row_number ? You have partitionned by order id, creating partitions of duplicates, ranked the records and take only the first element to remove the duplicates. But if you take only the row_number = 2, you'll have only elements from partitions with at least 2 elements, i.e only duplicates.
SELECT * EXCEPT(row_number)
FROM (SELECT*,ROW_NUMBER() OVER (PARTITION BY orderid) row_number
FROM `TABLE`)
WHERE row_number = 2
Note :Use row_number = 2 will give you only 1 element of duplicates. If you go with row_number > 1, the result may contain duplicates again (for example if you had 3 identical elements in the first table).
You can display the duplicated row by showing only raw with row_number greater than 1.
select
* except(row_number)
from (
select
*, row_number() over (partition by ) as row_number
from `TABLE`)
where row_number > 1
If your table has not primary key column, you are obliged to define it. Asuming my table contains 12 columns in BigQuery, I do not find shorter than:
SELECT *, sum(1) as rowcount
FROM `TABLE`
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
HAVING rowcount>1;

Sum of five lowest values

How can I find the sum of the lowest five points in the Point column
and group by ID
Table
The desired results should be;
Results
No idea where to start
Thanks
select a.ID, SUM(a.points) from(select ID , points,row_number() over
(partition by ID order by POINTS) as rownum_returned from your_table) a where
a.rownum_returned<6 group by a.ID;
Read about row_number() function here
If I've to do that I would solve it with a subquery.
In SQL server I will do subquery that retrieve the 5 lower point
Select Top 5 id, point from table
Order by point asc
Note: the keyword TOP that limit the result to the first 5
Note 2: order by point asc will order the result putting in top the lowest value
Now I use the query as subquery to complete the activity
Select id, sum (point) from
(Select top 5 id,point from table order by point asc) group by id
This should work

Selecting the newest records for 6 unique columns

I have a table of 6 currency conversions, it's updated almost daily. Unfortunately the way the software works is it inserts new rows rather than updating the existing ones. My previous SELECT was as follows
SELECT FROM_CURRENCY_ID, XCHG_RATE
FROM
(
SELECT TOP 6 FROM_CURRENCY_ID, XCHG_RATE
FROM SHARED_CURRENCY_EXCHANGE
WHERE NOT FROM_CURRENCY_ID = 'CAD'
ORDER BY RECORD_CREATED desc
) t
ORDER BY FROM_CURRENCY_ID
The issue now is some records got updated while others didn't so my query returns duplicate values for one of the currencys and nothing for one. I need it to output the 6 unique FROM_CURRENCY_IDs and their XCHG_RATE with the newest RECORD_CREATED dates
I've been trying a group by to exclude the duplicate rows with no luck.
with x as
(select row_number() over(partition by from_currency_id order by record_created desc) rn, * from shared_currency_exchange)
select from_currency_id, xchg_rate from x
where rn = 1
This gives the most recent record a rownumber 1 and you can use the cte with this condition.

Selecting 5 Most Recent Records Of Each Group

The below statement retrieves the top 2 records within each group in SQL Server. It works correctly, however as you can see it doesn't scale at all. I mean that if I wanted to retrieve the top 5 or 10 records instead of just 2, you can see how this query statement would grow very quickly.
How can I convert this query into something that returns the same records, but that I can quickly change it to return the top 5 or 10 records within each group instead, rather than just 2? (i.e. I want to just tell it to return the top 5 within each group, rather than having 5 unions as the below format would require)
Thanks!
WITH tSub
as (SELECT CustomerID,
TransactionTypeID,
Max(EventDate) as EventDate,
Max(TransactionID) as TransactionID
FROM Transactions
WHERE ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID)
SELECT *
from tSub
UNION
SELECT t.CustomerID,
t.TransactionTypeID,
Max(t.EventDate) as EventDate,
Max(t.TransactionID) as TransactionID
FROM Transactions t
WHERE t.TransactionID NOT IN (SELECT tSub.TransactionID
FROM tSub)
and ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID
Use Partition by to solve this type problem
select values from
(select values ROW_NUMBER() over (PARTITION by <GroupColumn> order by <OrderColumn>)
as rownum from YourTable) ut where ut.rownum<=5
This will partitioned the result on the column you wanted order by EventDate Column then then select those entry having rownum<=5. Now you can change this value 5 to get the top n recent entry of each group.

How do I use ROW_NUMBER()?

I want to use the ROW_NUMBER() to get...
To get the max(ROW_NUMBER()) --> Or i guess this would also be the count of all rows
I tried doing:
SELECT max(ROW_NUMBER() OVER(ORDER BY UserId)) FROM Users
but it didn't seem to work...
To get ROW_NUMBER() using a given piece of information, ie. if I have a name and I want to know what row the name came from.
I assume it would be something similar to what I tried for #1
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
but this didn't work either...
Any Ideas?
For the first question, why not just use?
SELECT COUNT(*) FROM myTable
to get the count.
And for the second question, the primary key of the row is what should be used to identify a particular row. Don't try and use the row number for that.
If you returned Row_Number() in your main query,
SELECT ROW_NUMBER() OVER (Order by Id) AS RowNumber, Field1, Field2, Field3
FROM User
Then when you want to go 5 rows back then you can take the current row number and use the following query to determine the row with currentrow -5
SELECT us.Id
FROM (SELECT ROW_NUMBER() OVER (ORDER BY id) AS Row, Id
FROM User ) us
WHERE Row = CurrentRow - 5
Though I agree with others that you could use count() to get the total number of rows, here is how you can use the row_count():
To get the total no of rows:
with temp as (
select row_number() over (order by id) as rownum
from table_name
)
select max(rownum) from temp
To get the row numbers where name is Matt:
with temp as (
select name, row_number() over (order by id) as rownum
from table_name
)
select rownum from temp where name like 'Matt'
You can further use min(rownum) or max(rownum) to get the first or last row for Matt respectively.
These were very simple implementations of row_number(). You can use it for more complex grouping. Check out my response on Advanced grouping without using a sub query
If you need to return the table's total row count, you can use an alternative way to the SELECT COUNT(*) statement.
Because SELECT COUNT(*) makes a full table scan to return the row count, it can take very long time for a large table. You can use the sysindexes system table instead in this case. There is a ROWS column that contains the total row count for each table in your database. You can use the following select statement:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This will drastically reduce the time your query takes.
You can use this for get first record where has clause
SELECT TOP(1) * , ROW_NUMBER() OVER(ORDER BY UserId) AS rownum
FROM Users
WHERE UserName = 'Joe'
ORDER BY rownum ASC
ROW_NUMBER() returns a unique number for each row starting with 1. You can easily use this by simply writing:
ROW_NUMBER() OVER (ORDER BY 'Column_Name' DESC) as ROW_NUMBER
May not be related to the question here. But I found it could be useful when using ROW_NUMBER -
SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS Any_ID
FROM #Any_Table
select
Ml.Hid,
ml.blockid,
row_number() over (partition by ml.blockid order by Ml.Hid desc) as rownumber,
H.HNAME
from MIT_LeadBechmarkHamletwise ML
join [MT.HAMLE] h on ML.Hid=h.HID
SELECT num, UserName FROM
(SELECT UserName, ROW_NUMBER() OVER(ORDER BY UserId) AS num
From Users) AS numbered
WHERE UserName='Joe'
You can use Row_Number for limit query result.
Example:
SELECT * FROM (
select row_number() OVER (order by createtime desc) AS ROWINDEX,*
from TABLENAME ) TB
WHERE TB.ROWINDEX between 0 and 10
--
With above query, I will get PAGE 1 of results from TABLENAME.
If you absolutely want to use ROW_NUMBER for this (instead of count(*)) you can always use:
SELECT TOP 1 ROW_NUMBER() OVER (ORDER BY Id)
FROM USERS
ORDER BY ROW_NUMBER() OVER (ORDER BY Id) DESC
Need to create virtual table by using WITH table AS, which is mention in given Query.
By using this virtual table, you can perform CRUD operation w.r.t row_number.
QUERY:
WITH table AS
-
(SELECT row_number() OVER(ORDER BY UserId) rn, * FROM Users)
-
SELECT * FROM table WHERE UserName='Joe'
-
You can use INSERT, UPDATE or DELETE in last sentence by in spite of SELECT.
SQL Row_Number() function is to sort and assign an order number to data rows in related record set. So it is used to number rows, for example to identify the top 10 rows which have the highest order amount or identify the order of each customer which is the highest amount, etc.
If you want to sort the dataset and number each row by seperating them into categories we use Row_Number() with Partition By clause. For example, sorting orders of each customer within itself where the dataset contains all orders, etc.
SELECT
SalesOrderNumber,
CustomerId,
SubTotal,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SubTotal DESC) rn
FROM Sales.SalesOrderHeader
But as I understand you want to calculate the number of rows of grouped by a column. To visualize the requirement, if you want to see the count of all orders of the related customer as a seperate column besides order info, you can use COUNT() aggregation function with Partition By clause
For example,
SELECT
SalesOrderNumber,
CustomerId,
COUNT(*) OVER (PARTITION BY CustomerId) CustomerOrderCount
FROM Sales.SalesOrderHeader
This query:
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
will return all rows where the UserName is 'Joe' UNLESS you have no UserName='Joe'
They will be listed in order of UserID and the row_number field will start with 1 and increment however many rows contain UserName='Joe'
If it does not work for you then your WHERE command has an issue OR there is no UserID in the table. Check spelling for both fields UserID and UserName.