How do I use ROW_NUMBER()? - sql

I want to use the ROW_NUMBER() to get...
To get the max(ROW_NUMBER()) --> Or i guess this would also be the count of all rows
I tried doing:
SELECT max(ROW_NUMBER() OVER(ORDER BY UserId)) FROM Users
but it didn't seem to work...
To get ROW_NUMBER() using a given piece of information, ie. if I have a name and I want to know what row the name came from.
I assume it would be something similar to what I tried for #1
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
but this didn't work either...
Any Ideas?

For the first question, why not just use?
SELECT COUNT(*) FROM myTable
to get the count.
And for the second question, the primary key of the row is what should be used to identify a particular row. Don't try and use the row number for that.
If you returned Row_Number() in your main query,
SELECT ROW_NUMBER() OVER (Order by Id) AS RowNumber, Field1, Field2, Field3
FROM User
Then when you want to go 5 rows back then you can take the current row number and use the following query to determine the row with currentrow -5
SELECT us.Id
FROM (SELECT ROW_NUMBER() OVER (ORDER BY id) AS Row, Id
FROM User ) us
WHERE Row = CurrentRow - 5

Though I agree with others that you could use count() to get the total number of rows, here is how you can use the row_count():
To get the total no of rows:
with temp as (
select row_number() over (order by id) as rownum
from table_name
)
select max(rownum) from temp
To get the row numbers where name is Matt:
with temp as (
select name, row_number() over (order by id) as rownum
from table_name
)
select rownum from temp where name like 'Matt'
You can further use min(rownum) or max(rownum) to get the first or last row for Matt respectively.
These were very simple implementations of row_number(). You can use it for more complex grouping. Check out my response on Advanced grouping without using a sub query

If you need to return the table's total row count, you can use an alternative way to the SELECT COUNT(*) statement.
Because SELECT COUNT(*) makes a full table scan to return the row count, it can take very long time for a large table. You can use the sysindexes system table instead in this case. There is a ROWS column that contains the total row count for each table in your database. You can use the following select statement:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This will drastically reduce the time your query takes.

You can use this for get first record where has clause
SELECT TOP(1) * , ROW_NUMBER() OVER(ORDER BY UserId) AS rownum
FROM Users
WHERE UserName = 'Joe'
ORDER BY rownum ASC

ROW_NUMBER() returns a unique number for each row starting with 1. You can easily use this by simply writing:
ROW_NUMBER() OVER (ORDER BY 'Column_Name' DESC) as ROW_NUMBER

May not be related to the question here. But I found it could be useful when using ROW_NUMBER -
SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS Any_ID
FROM #Any_Table

select
Ml.Hid,
ml.blockid,
row_number() over (partition by ml.blockid order by Ml.Hid desc) as rownumber,
H.HNAME
from MIT_LeadBechmarkHamletwise ML
join [MT.HAMLE] h on ML.Hid=h.HID

SELECT num, UserName FROM
(SELECT UserName, ROW_NUMBER() OVER(ORDER BY UserId) AS num
From Users) AS numbered
WHERE UserName='Joe'

You can use Row_Number for limit query result.
Example:
SELECT * FROM (
select row_number() OVER (order by createtime desc) AS ROWINDEX,*
from TABLENAME ) TB
WHERE TB.ROWINDEX between 0 and 10
--
With above query, I will get PAGE 1 of results from TABLENAME.

If you absolutely want to use ROW_NUMBER for this (instead of count(*)) you can always use:
SELECT TOP 1 ROW_NUMBER() OVER (ORDER BY Id)
FROM USERS
ORDER BY ROW_NUMBER() OVER (ORDER BY Id) DESC

Need to create virtual table by using WITH table AS, which is mention in given Query.
By using this virtual table, you can perform CRUD operation w.r.t row_number.
QUERY:
WITH table AS
-
(SELECT row_number() OVER(ORDER BY UserId) rn, * FROM Users)
-
SELECT * FROM table WHERE UserName='Joe'
-
You can use INSERT, UPDATE or DELETE in last sentence by in spite of SELECT.

SQL Row_Number() function is to sort and assign an order number to data rows in related record set. So it is used to number rows, for example to identify the top 10 rows which have the highest order amount or identify the order of each customer which is the highest amount, etc.
If you want to sort the dataset and number each row by seperating them into categories we use Row_Number() with Partition By clause. For example, sorting orders of each customer within itself where the dataset contains all orders, etc.
SELECT
SalesOrderNumber,
CustomerId,
SubTotal,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SubTotal DESC) rn
FROM Sales.SalesOrderHeader
But as I understand you want to calculate the number of rows of grouped by a column. To visualize the requirement, if you want to see the count of all orders of the related customer as a seperate column besides order info, you can use COUNT() aggregation function with Partition By clause
For example,
SELECT
SalesOrderNumber,
CustomerId,
COUNT(*) OVER (PARTITION BY CustomerId) CustomerOrderCount
FROM Sales.SalesOrderHeader

This query:
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
will return all rows where the UserName is 'Joe' UNLESS you have no UserName='Joe'
They will be listed in order of UserID and the row_number field will start with 1 and increment however many rows contain UserName='Joe'
If it does not work for you then your WHERE command has an issue OR there is no UserID in the table. Check spelling for both fields UserID and UserName.

Related

Find the second largest value with Groupings

In SQL Server, I am attempting to pull the second latest NOTE_ENTRY_DT_TIME (items highlighted in screenshot). With the query written below it still pulls the latest date (I believe it's because of the grouping but the grouping is required to join later). What is the best method to achieve this?
SELECT
hop.ACCOUNT_ID,
MAX(hop.NOTE_ENTRY_DT_TIME) AS latest_noteid
FROM
NOTES hop
WHERE
hop.GEN_YN IS NULL
AND hop.NOTE_ENTRY_DT_TIME < (SELECT MAX(hope.NOTE_ENTRY_DT_TIME)
FROM NOTES hope
WHERE hop.GEN_YN IS NULL)
GROUP BY
hop.ACCOUNT_ID
Data sample in the table:
One of the "easier" ways to get the Nth row in a group is to use a CTE and ROW_NUMBER:
WITH CTE AS(
SELECT Account_ID,
Note_Entry_Dt_Time,
ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY Note_Entry_Dt_Time DESC) AS RN
FROM dbo.YourTable)
SELECT Account_ID,
Note_Entry_Dt_Time
FROM CTE
WHERE RN = 2;
Of course, if an ACCOUNT_ID only has 1 row, then it will not be returned in the result set.
The OP's statement "The row will not always be 2." from the comments conflicts with their statement "I am attempting to pull the second latest NOTE_ENTRY_DT_TIME" in the question. At a best guess, this means that the OP has rows with the same date, that could be the "latest" date. If so, then would simply need to replace ROW_NUMBER with DENSE_RANK. Their sampple data, however, doesn't suggest this is the case.
You can use window functions:
select *
from (
select
n.*,
row_number() over(partition by account_id order by note_entry_dt_time desc) rn
from notes n
) t
where rn = 2

Removing duplicate rows based on one column same values but keep one record

SQL Server Version
Remove all dupe rows (row 3 thru 18) with service_date = '2018-08-29 13:05:00.000' but keep the oldest row (row 2) and of course keep row 1 since its different service_date. Don't mind the create_timestamp or document_file since it's the same customer. Any idea?
In SQL Server, we can try deleting using a CTE:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;
The strategy here is to assign a row number to each group of records sharing the same service_date, with 1 being assigned to the oldest record in that group. Then, we can phrase the delete by just targeting all records which have a row number greater than 1.
You don't need to use Partition function.please use the below query for efficient performance.i have tested its working fine.
with result as
(
select *, row_number() over(order by create_timestamp) as Row_To_Delete from TableName
)
delete from result where result.Row_To_Delete>2
I think you will want to remove these data per customer basis
I mean, if customers are different you will want to keep the entries even on the same date
If you you will require the addition of Customer column in partition by clause used to identify duplicate rows in SQL
By copying and modifying Tim's solution, you can check following
;WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer, service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;

Extract and concatenate the same field from multiple records in big query

I would like to be able to extract one field from multiple records from within a single table. For example, assuming I have a schema as follows
userId, eventTimestamp, theField
And what I want to do is be able to concatenate all instances of the field 'theField' together into a single string for a given userId ordered by eventTimestamp. And for an extra wrinkle, lets say I only want to include the first fiftiest oldest records.
My first attempt was to try something like:
SELECT
userId,
eventTimestamp,
LEAD(theField,0) OVER (PARTITION BY userId ORDER BY eventTimestamp) AS step0,
LEAD(theField,1) OVER (PARTITION BY userId ORDER BY eventTimestamp) AS step1,
....,
LEAD(theField,50) OVER (PARTITION BY userId ORDER BY eventTimestamp) AS step50,
And then the next step was to wrap that first step up in another SELECT statement as follows:
SELECT userId, eventTimestamp, CONCAT(STRING(step0), STRING(step1),...,STRING(step50)) as concatenatedString
FROM [whateverDataset.whateverTable],
GROUP BY
userId, eventTimestamp
This approach doesn't work though because if I have more than 50 steps (which I do), then I end up getting multiple rows for each of those outer SELECT statements, basically N-50 rows, where N = the total number of records for a particular userId. A 'solution' to this would be to have a HAVING statement in the inner SELECT statement to limit itself to only reporting the first 50 records, but overall this seems like a rather cumbersome solution. In non-BigQuery variants of SQL the GROUP_CONCAT seems to be a good way to go forward, but it either doesn't work here or I lack the creativity to get it to work. Anyone have any suggestions?
Thanks,
Brad
For BigQuery Legacy SQL:
SELECT
userid, GROUP_CONCAT(theField) AS Fields
FROM (
SELECT
userid, eventTimestamp, theField,
ROW_NUMBER() OVER(PARTITION BY userid ORDER BY eventTimestamp DESC) AS pos
FROM YourTable
ORDER BY eventTimestamp
)
WHERE pos < 51
GROUP BY userid
Please note: inner ORDER BY does not guarantee the order of theField in GROUP_CONCAT. But, so far, in all practical cases I see the order is carrying. So, test carefuly
For BigQuery Standard SQL:
Don't forget to uncheck Use Legacy SQL checkbox under Show Options
SELECT
userid,
(SELECT STRING_AGG(fields) FROM t.fields) AS fields
FROM (
SELECT
userid,
ARRAY(SELECT theField FROM t.fields ORDER BY eventTimestamp) fields
FROM (
SELECT
userid,
ARRAY_AGG(STRUCT(theField, eventTimestamp)) fields
FROM (
SELECT
userid,
eventTimestamp,
theField,
ROW_NUMBER() OVER(PARTITION BY userid ORDER BY eventTimestamp DESC) AS pos
FROM YourTable
)
WHERE pos < 51
GROUP BY userid
) t
) t

Return rows between a specific range, with one select statement

I'm looking to some expresion like this (using SQL Server 2008)
SELECT TOP 10 columName FROM tableName
But instead of that I need the values between 10 and 20. And I wonder if there is a way of doing it using only one SELECT statement.
For example this is useless:
SELECT columName FROM
(SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName) AS alias
WHERE RowNum BETWEEN 10 AND 20
Because the select inside brackets is already returning all the results, and I'm looking to avoid that, due to performance.
Use SQL Server 2012 to fetch/skip!
SELECT SalesOrderID, SalesOrderDetailID, ProductID, OrderQty, UnitPrice, LineTotal
FROM AdventureWorks2012.Sales.SalesOrderDetail
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY;
There's nothing better than you're describing for older versions of sql server. Maybe use CTE, but unlikely to make a difference.
WITH NumberedMyTable AS
(
SELECT
Id,
Value,
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber
FROM
MyTable
)
SELECT
Id,
Value
FROM
NumberedMyTable
WHERE
RowNumber BETWEEN #From AND #To
or, you can remove top 10 rows and then get next 10 rows, but I double anyone would want to do that.
There is a trick with row_number that does not involve sorting all the rows.
Try this:
SELECT columName
FROM (SELECT ROW_NUMBER() OVER(ORDER BY (select NULL as noorder)) AS RowNum, *
FROM tableName
) as alias
WHERE RowNum BETWEEN 10 AND 20
You cannot use a constant in the order by. However, you can use an expression that evaluates to a constant. SQL Server recognizes this and just returns the rows as encountered, properly enumerated.
Why do you think SQL Server would evaluate the entire inner query? Assuming your sort column is indexed, it'll just read the first 20 values. If you're really nervous you could do this:
Select
Id
From (
Select Top 20 -- note top 20
Row_Number() Over(Order By Id) As RowNum,
Id
From
dbo.Test
Order By
Id
) As alias
Where
RowNum Between 10 And 20
Order By
Id
but I'm pretty sure the query plan is the same either way.
(Really) Fixed as per Aaron's comment.
http://sqlfiddle.com/#!3/db162/6
One more option
SELECT TOP(11) columName
FROM dbo.tableName
ORDER BY
CASE WHEN ROW_NUMBER() OVER (ORDER BY someId) BETWEEN 10 AND 20
THEN ROW_NUMBER() OVER (ORDER BY someId) ELSE NULL END DESC
You could create a temp table that is ordered the way you want like:
SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName
into ##tempTable
...
That way you have an ordered list of rows.
and can just query by row number the subsequent times instead of doing the inner query multiple times.

How to enumerate returned rows in SQL?

I was wondering if it would be possible to enumerate returned rows. Not according to any column content but just yielding a sequential integer index. E.g.
select ?, count(*) as usercount from users group by age
would return something along the lines:
1 12
2 78
3 4
4 42
it is for https://data.stackexchange.com/
try:
SELECT
ROW_NUMBER() OVER(ORDER BY age) AS RowNumber
,count(*) as usercount
from users
group by age
If it's Oracle, use rownum.
SELECT SOMETABLE.*, ROWNUM RN
FROM SOMETABLE
WHERE SOMETABLE.SOMECOLUMN = :SOMEVALUE
ORDER BY SOMETABLE.SOMEOTHERCOLUMN;
The final answer will entirely depend on what database you're using.
For MySql:
SELECT #row := #row + 1 as row FROM anytable a, (SELECT #row := 0) r
use rownumber function available in sql server
SELECT
ROW_NUMBER() OVER (ORDER BY columnNAME) AS 'RowNumber',count(*) as usercount
FROM users
How you'd do that depends on your database server. In SQL Server, you could use row_number():
select row_number() over (order by age)
, age
, count(*) as usercount
from users
group by
age
order by
age
But it's often easier and faster to use client side row numbers.
for Mysql
set #row:=0;
select #row:=#row+1 as row, a.* from table_name as a;
In contrast to majority of other answers, and in accordance of the actual OP question, to
enumerate returned rows (...) NOT according to any column content
but rather in the order of returned query, one could use a dummy ordinal variable to ORDER BY in an ROW_NUMBER function, e.g.
ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS row_num
where one could actually use anything as an argument to the SELECT-statement, like SELECT 100, SELECT ‘A’, SELECT NULL, etc..
This way, the row numbers will be enumerated in the same order the data was added to the table.