I'm trying to implement pagination in a query that is built using information from a view, and I need to use the row_number() function over a column when I don't know which table it is from.
SELECT * FROM (
SELECT class.ID as ID, user.ID as USERID, row_number() over (ORDER BY
ID desc) as row_number FROM class, user
) out_q WHERE row_number > #startrow ORDER BY row_number
The problem is that I only have the result column name (ID or USERID) that came from a previous query. If I execute this query, it will raise the error 'Ambiguous column name "ID"'. Is there a way to specify that I'm referencing the column ID that is being selected and not from a different table?
Is it possible to specify an alias to the query result itself?
I have already tried the following,
SELECT TOP 30 * FROM (
SELECT *, row_number() over (ORDER BY ID desc) as row_number FROM(
SELECT class.ID as ID, user.ID as USERID FROM class, user
) in_q
) out_q WHERE row_number > #startrow ORDER BY row_number
It works, but the SGBD gets confused on which query plan it has to use, because of the small row goal present in the outer query and the big set of results returned by the inner query, when #startrow is a small number, the query executes in less than one second, when it is a big number the query takes minutes to execute.
Your problem is the id in the row_number itself. If you want a stable sort, then include both ids:
SELECT *
FROM (SELECT class.ID as ID, user.ID as USERID,
row_number() over (ORDER BY class.ID desc, user.id) as row_number
FROM class CROSS JOIN user
) out_q
WHERE row_number > #startrow
ORDER BY row_number;
I assume the cartesian product is intentional. Sometimes, this indicates an error in the query. In general, I would advise you to avoid using commas in the from clause. If you do want a cartesian product, then be explicit by using CROSS JOIN.
You could try using the option you already tried, then use the OPTIMIZE FOR hint.
OPTION ( OPTIMIZE FOR (#startrow = 100000) );
See a description of the hint in MSDN docs here: https://msdn.microsoft.com/en-us/library/ms181714.aspx.
Related
In SQL Server, I am attempting to pull the second latest NOTE_ENTRY_DT_TIME (items highlighted in screenshot). With the query written below it still pulls the latest date (I believe it's because of the grouping but the grouping is required to join later). What is the best method to achieve this?
SELECT
hop.ACCOUNT_ID,
MAX(hop.NOTE_ENTRY_DT_TIME) AS latest_noteid
FROM
NOTES hop
WHERE
hop.GEN_YN IS NULL
AND hop.NOTE_ENTRY_DT_TIME < (SELECT MAX(hope.NOTE_ENTRY_DT_TIME)
FROM NOTES hope
WHERE hop.GEN_YN IS NULL)
GROUP BY
hop.ACCOUNT_ID
Data sample in the table:
One of the "easier" ways to get the Nth row in a group is to use a CTE and ROW_NUMBER:
WITH CTE AS(
SELECT Account_ID,
Note_Entry_Dt_Time,
ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY Note_Entry_Dt_Time DESC) AS RN
FROM dbo.YourTable)
SELECT Account_ID,
Note_Entry_Dt_Time
FROM CTE
WHERE RN = 2;
Of course, if an ACCOUNT_ID only has 1 row, then it will not be returned in the result set.
The OP's statement "The row will not always be 2." from the comments conflicts with their statement "I am attempting to pull the second latest NOTE_ENTRY_DT_TIME" in the question. At a best guess, this means that the OP has rows with the same date, that could be the "latest" date. If so, then would simply need to replace ROW_NUMBER with DENSE_RANK. Their sampple data, however, doesn't suggest this is the case.
You can use window functions:
select *
from (
select
n.*,
row_number() over(partition by account_id order by note_entry_dt_time desc) rn
from notes n
) t
where rn = 2
I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id
Title says it all, why can't I use a windowed function in a where clause in SQL Server?
This query makes perfect sense:
select id, sales_person_id, product_type, product_id, sale_amount
from Sales_Log
where 1 = row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc)
But it doesn't work. Is there a better way than a CTE/Subquery?
EDIT
For what its worth this is the query with a CTE:
with Best_Sales as (
select id, sales_person_id, product_type, product_id, sale_amount, row_number() over (partition by sales_person_id, product_type, product_id order by sales_amount desc) rank
from Sales_log
)
select id, sales_person_id, product_type, product_id, sale_amount
from Best_Sales
where rank = 1
EDIT
+1 for the answers showing with a subquery, but really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.
why can't I use a windowed function in a where clause in SQL Server?
One answer, though not particularly informative, is because the spec says that you can't.
See the article by Itzik Ben Gan - Logical Query Processing: What It Is And What It Means to You and in particular the image here. Window functions are evaluated at the time of the SELECT on the result set remaining after all the WHERE/JOIN/GROUP BY/HAVING clauses have been dealt with (step 5.1).
really I'm looking for the reasoning behind not being able to use
windowing functions in where clauses.
The reason that they are not allowed in the WHERE clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)
Suppose your table was
CREATE TABLE T1
(
col1 CHAR(1) PRIMARY KEY
)
INSERT INTO T1 VALUES('A'),('B'),('C'),('D'),('E'),('F')
And your query
SELECT col1
FROM T1
WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3
AND col1 > 'B'
What would be the right result? Would you expect that the col1 > 'B' predicate ran before or after the row numbering?
There is no need for CTE, just use the windowing function in a subquery:
select id, sales_person_id, product_type, product_id, sale_amount
from
(
select id, sales_person_id, product_type, product_id, sale_amount,
row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) rn
from Sales_Log
) sl
where rn = 1
Edit, moving my comment to the answer.
Windowing functions are not performed until the data is actually selected which is after the WHERE clause. So if you try to use a row_number in a WHERE clause the value is not yet assigned.
"All-at-once operation" means that all expressions in the same
logical query process phase are evaluated logically at the same time.
And great chapter Impact on Window Functions:
Suppose you have:
CREATE TABLE #Test ( Id INT) ;
INSERT INTO #Test VALUES ( 1001 ), ( 1002 ) ;
SELECT Id
FROM #Test
WHERE Id = 1002
AND ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can
evaluate conditions in WHERE clause in arbitrary order, based on
estimated execution plan. So the main question here is which condition
evaluates first.
Case 1:
If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )
Result: 1002
Case 2:
If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )
Result: empty
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause.
You can think more about this and find why Window Functions are
allowed to be used just in SELECT and ORDER BY clauses!
Addendum
Teradata supports QUALIFY clause:
Filters results of a previously computed ordered analytical function according to user‑specified search conditions.
SELECT Id
FROM #Test
WHERE Id = 1002
QUALIFY ROW_NUMBER() OVER(ORDER BY Id) = 1;
Snowflake - Qualify
QUALIFY does with window functions what HAVING does with aggregate functions and GROUP BY clauses.
In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Typically, a SELECT statement’s clauses are evaluated in the order shown below:
From
Where
Group by
Having
Window
QUALIFY
Distinct
Order by
Limit
Databricks - QUALIFY clasue
Filters the results of window functions. To use QUALIFY, at least one window function is required to be present in the SELECT list or the QUALIFY clause.
You don't necessarily need to use a CTE, you can query the result set after using row_number()
select row, id, sales_person_id, product_type, product_id, sale_amount
from (
select
row_number() over(partition by sales_person_id,
product_type, product_id order by sale_amount desc) AS row,
id, sales_person_id, product_type, product_id, sale_amount
from Sales_Log
) a
where row = 1
It's an old thread, but I'll try to answer specifically the question expressed in the topic.
Why no windowed functions in where clauses?
SELECT statement has following main clauses specified in keyed-in order:
SELECT DISTINCT TOP list
FROM JOIN ON / APPLY / PIVOT / UNPIVOT
WHERE
GROUP BY WITH CUBE / WITH ROLLUP
HAVING
ORDER BY
OFFSET-FETCH
Logical Query Processing Order, or Binding Order, is conceptual interpretation order, it defines the correctness of the query. This order determines when the objects defined in one step are made available to the clauses in subsequent steps.
----- Relational result
1. FROM
1.1. ON JOIN / APPLY / PIVOT / UNPIVOT
2. WHERE
3. GROUP BY
3.1. WITH CUBE / WITH ROLLUP
4. HAVING
---- After the HAVING step the Underlying Query Result is ready
5. SELECT
5.1. SELECT list
5.2. DISTINCT
----- Relational result
----- Non-relational result (a cursor)
6. ORDER BY
7. TOP / OFFSET-FETCH
----- Non-relational result (a cursor)
For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps.
Conversely, all clauses preceding the SELECT clause cannot reference any column aliases or derived columns defined in SELECT clause. However, those columns can be referenced by subsequent clauses such as the ORDER BY clause.
OVER clause determines the partitioning and ordering of a row set before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within an Underlying Query Result set and window function computes result against that window.
Msg 4108, Level 15, State 1, …
Windowed functions can only appear in the SELECT or ORDER BY clauses.
The reason behind is because the way how Logical Query Processing works in T-SQL. Since the underlying query result is established only when logical query processing reaches the SELECT step 5.1. (that is, after processing the FROM, WHERE, GROUP BY and HAVING steps), window functions are allowed only in the SELECT and ORDER BY clauses of the query.
Note to mention, window functions are still part of relational layer even Relational Model doesn't deal with ordered data. The result after the SELECT step 5.1. with any window function is still relational.
Also, speaking strictly, the reason why window function are not allowed in the WHERE clause is not because it would create ambiguity, but because the order how Logical Query Processing processes SELECT statement in T-SQL.
Links: here, here and here
Finally, there's the old-fashioned, pre-SQL Server 2005 way, with a correlated subquery:
select *
from Sales_Log sl
where sl.id = (
Select Top 1 id
from Sales_Log sl2
where sales_person_id = sl.sales_person_id
and product_type = sl.product_type
and product_id = sl.product_id
order by sale_amount desc
)
I give you this for completeness, merely.
Basically first "WHERE" clause condition is read by sql and the same column/value id looked into the table but in table row_num=1 is not there still. Hence it will not work.
Thats the reason we will use parentheses first and after that we will write the WHERE clause.
Yes unfortunately when you do a windowed function SQL gets mad at you even if your where predicate is legitimate. You make a cte or nested select having the value in your select statement, then reference your CTE or nested select with that value later. Simple example that should be self explanatory. If you really HATE cte's for some performance issue on doing a large data set you can always drop to temp table or table variable.
declare #Person table ( PersonID int identity, PersonName varchar(8));
insert into #Person values ('Brett'),('John');
declare #Orders table ( OrderID int identity, PersonID int, OrderName varchar(8));
insert into #Orders values (1, 'Hat'),(1,'Shirt'),(1, 'Shoes'),(2,'Shirt'),(2, 'Shoes');
--Select
-- p.PersonName
--, o.OrderName
--, row_number() over(partition by o.PersonID order by o.OrderID)
--from #Person p
-- join #Orders o on p.PersonID = o.PersonID
--where row_number() over(partition by o.PersonID order by o.orderID) = 2
-- yields:
--Msg 4108, Level 15, State 1, Line 15
--Windowed functions can only appear in the SELECT or ORDER BY clauses.
;
with a as
(
Select
p.PersonName
, o.OrderName
, row_number() over(partition by o.PersonID order by o.OrderID) as rnk
from #Person p
join #Orders o on p.PersonID = o.PersonID
)
select *
from a
where rnk >= 2 -- only orders after the first one.
I am creating an SP which gives some result by applying distinct on it, now I want to implement sever side paging, so I tried using Row_number on distinct result like:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY tblA.TeamName DESC)
as Row,tblA.TeamId,tblA.TeamName,tblA.CompId,tblA.CompName,tblA.Title,tblA.Thumbnail,tblA.Rank,tblA.CountryId,tblA.CountryName
FROM
(
--The table query starts with SELECT
)tblA
)
SELECT CTE.* FROM CTE
WHERE CTE.Row BETWEEN #StartRowIndex AND #StartRowIndex+#NumRows-1
ORDER BY CTE.CountryName
but rows are first assigned RowNumber then distinct get applied that is why I am getting duplicate values, how to get distinct rows first then get row numbers for the same.
Any solution on this? Am I missing something?
need answer ASAP.
thanks in advance!
Don't you need to add "partition by" to your ROW_NUMBER statement?
ROW_NUMBER() OVER(Partition by ___, ___, ORDER BY tblA.TeamName DESC)
In the blank spaces, place the column names you would like to create a new row number for. Duplicates will receive a number that is NOT 1 so you might not need the distinct.
To gather the unique values you could write a subquery where the stored procedure only grabs the rows with a 1 in them.
select * from
(
your code
) where row = 1
Hope that helps.
I'm not sure why you're doing this:
WHERE CTE.Row BETWEEN #StartRowIndex AND #StartRowIndex+#NumRows-1
I want to use the ROW_NUMBER() to get...
To get the max(ROW_NUMBER()) --> Or i guess this would also be the count of all rows
I tried doing:
SELECT max(ROW_NUMBER() OVER(ORDER BY UserId)) FROM Users
but it didn't seem to work...
To get ROW_NUMBER() using a given piece of information, ie. if I have a name and I want to know what row the name came from.
I assume it would be something similar to what I tried for #1
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
but this didn't work either...
Any Ideas?
For the first question, why not just use?
SELECT COUNT(*) FROM myTable
to get the count.
And for the second question, the primary key of the row is what should be used to identify a particular row. Don't try and use the row number for that.
If you returned Row_Number() in your main query,
SELECT ROW_NUMBER() OVER (Order by Id) AS RowNumber, Field1, Field2, Field3
FROM User
Then when you want to go 5 rows back then you can take the current row number and use the following query to determine the row with currentrow -5
SELECT us.Id
FROM (SELECT ROW_NUMBER() OVER (ORDER BY id) AS Row, Id
FROM User ) us
WHERE Row = CurrentRow - 5
Though I agree with others that you could use count() to get the total number of rows, here is how you can use the row_count():
To get the total no of rows:
with temp as (
select row_number() over (order by id) as rownum
from table_name
)
select max(rownum) from temp
To get the row numbers where name is Matt:
with temp as (
select name, row_number() over (order by id) as rownum
from table_name
)
select rownum from temp where name like 'Matt'
You can further use min(rownum) or max(rownum) to get the first or last row for Matt respectively.
These were very simple implementations of row_number(). You can use it for more complex grouping. Check out my response on Advanced grouping without using a sub query
If you need to return the table's total row count, you can use an alternative way to the SELECT COUNT(*) statement.
Because SELECT COUNT(*) makes a full table scan to return the row count, it can take very long time for a large table. You can use the sysindexes system table instead in this case. There is a ROWS column that contains the total row count for each table in your database. You can use the following select statement:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This will drastically reduce the time your query takes.
You can use this for get first record where has clause
SELECT TOP(1) * , ROW_NUMBER() OVER(ORDER BY UserId) AS rownum
FROM Users
WHERE UserName = 'Joe'
ORDER BY rownum ASC
ROW_NUMBER() returns a unique number for each row starting with 1. You can easily use this by simply writing:
ROW_NUMBER() OVER (ORDER BY 'Column_Name' DESC) as ROW_NUMBER
May not be related to the question here. But I found it could be useful when using ROW_NUMBER -
SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS Any_ID
FROM #Any_Table
select
Ml.Hid,
ml.blockid,
row_number() over (partition by ml.blockid order by Ml.Hid desc) as rownumber,
H.HNAME
from MIT_LeadBechmarkHamletwise ML
join [MT.HAMLE] h on ML.Hid=h.HID
SELECT num, UserName FROM
(SELECT UserName, ROW_NUMBER() OVER(ORDER BY UserId) AS num
From Users) AS numbered
WHERE UserName='Joe'
You can use Row_Number for limit query result.
Example:
SELECT * FROM (
select row_number() OVER (order by createtime desc) AS ROWINDEX,*
from TABLENAME ) TB
WHERE TB.ROWINDEX between 0 and 10
--
With above query, I will get PAGE 1 of results from TABLENAME.
If you absolutely want to use ROW_NUMBER for this (instead of count(*)) you can always use:
SELECT TOP 1 ROW_NUMBER() OVER (ORDER BY Id)
FROM USERS
ORDER BY ROW_NUMBER() OVER (ORDER BY Id) DESC
Need to create virtual table by using WITH table AS, which is mention in given Query.
By using this virtual table, you can perform CRUD operation w.r.t row_number.
QUERY:
WITH table AS
-
(SELECT row_number() OVER(ORDER BY UserId) rn, * FROM Users)
-
SELECT * FROM table WHERE UserName='Joe'
-
You can use INSERT, UPDATE or DELETE in last sentence by in spite of SELECT.
SQL Row_Number() function is to sort and assign an order number to data rows in related record set. So it is used to number rows, for example to identify the top 10 rows which have the highest order amount or identify the order of each customer which is the highest amount, etc.
If you want to sort the dataset and number each row by seperating them into categories we use Row_Number() with Partition By clause. For example, sorting orders of each customer within itself where the dataset contains all orders, etc.
SELECT
SalesOrderNumber,
CustomerId,
SubTotal,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SubTotal DESC) rn
FROM Sales.SalesOrderHeader
But as I understand you want to calculate the number of rows of grouped by a column. To visualize the requirement, if you want to see the count of all orders of the related customer as a seperate column besides order info, you can use COUNT() aggregation function with Partition By clause
For example,
SELECT
SalesOrderNumber,
CustomerId,
COUNT(*) OVER (PARTITION BY CustomerId) CustomerOrderCount
FROM Sales.SalesOrderHeader
This query:
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
will return all rows where the UserName is 'Joe' UNLESS you have no UserName='Joe'
They will be listed in order of UserID and the row_number field will start with 1 and increment however many rows contain UserName='Joe'
If it does not work for you then your WHERE command has an issue OR there is no UserID in the table. Check spelling for both fields UserID and UserName.