Remove ORDER BY clause from PARTITION BY clause? - sql

Is there a way I can reduce the impact of the 'ORDER BY lro_pid' clause in the OVER portion of the inner query below?
SELECT *
FROM (SELECT a.*,
Row_Number() over (PARTITION BY search_point_type
ORDER BY lro_pid) spt_rank
FROM lro_search_point a
ORDER BY spt_rank)
WHERE spt_rank = 1;
I don't care to order this result within the partition since I want to order it by a different variable entirely. lro_pid is an indexed column, but this still seems like a waste of resources as it currently stands. (Perhaps there is a way to limit the ordering to a range of a single row?? Hopefully no time/energy would be spent on sorting within the partition at all)

A couple of things to try:
Can you e.g. ORDER BY 'constant' in the OVER clause?
If ordering by a constant is not permitted, how about ORDER BY (lro_pid * 0)?
I'm not an Oracle expert (MSSQL is more my thing) - hence questions to answer your question!

Using a constant in the analytic ORDER BY as #Will A suggested appears to be the fastest method.
The optimizer still performs a sort, but it's faster than sorting a column.
Also, you probably want to remove the second ORDER BY, or at least move it to the outer query.
Below is my test case:
--Create table, index, and dummy data.
create table lro_search_point(search_point_type number, lro_pid number, column1 number
,column2 number, column3 number);
create index lro_search_point_idx on lro_search_point(lro_pid);
insert /*+ append */ into lro_search_point
select mod(level, 10), level, level, level, level from dual connect by level <= 100000;
commit;
--Original version. Averages 0.53 seconds.
SELECT * FROM
(
SELECT a.*, Row_Number() over (PARTITION BY search_point_type ORDER BY lro_pid) spt_rank
FROM lro_search_point a
ORDER BY spt_rank
)
WHERE spt_rank=1;
--Sort by constant. Averages 0.33 seconds.
--This query and the one above have the same explain plan, basically it's
--SELECT/VIEW/SORT ORDER BY/WINDOW SORT PUSHED RANK/TABLE ACCESS FULL.
SELECT * FROM
(
SELECT a.*, Row_Number() over (PARTITION BY search_point_type ORDER BY -1) spt_rank
FROM lro_search_point a
ORDER BY spt_rank
)
WHERE spt_rank=1;
--Remove the ORDER BY (or at least move it to the outer query). Averages 0.27 seconds.
SELECT * FROM
(
SELECT a.*, Row_Number() over (PARTITION BY search_point_type ORDER BY -1) spt_rank
FROM lro_search_point a
)
WHERE spt_rank=1;
--Replace analytic with aggregate functions, averages 0.28 seconds.
--This idea is the whole reason I did this, but turns out it's no faster. *sigh*
--Plan is SELECT/SORT GROUP BY/TABLE ACCESS FULL.
--Note I'm using KEEP instead of just regular MIN.
--I assume that you want the values from the same row.
SELECT a.search_point_type
,min(lro_pid) keep (dense_rank first order by -1)
,min(column1) keep (dense_rank first order by -1)
,min(column2) keep (dense_rank first order by -1)
,min(column3) keep (dense_rank first order by -1)
FROM lro_search_point a
group by a.search_point_type;

To obmit the clause ORDER BY you could use ORDER BY rownum.

Related

sql server lag function where condition

In the database, I have +1000,+2000,+3000..... increasing values according to the previous value. These sometimes do not increase, but decrease, I wrote a listing query to find this out.
select NUMBER-lag(NUMBER) over (ORDER BY DATE_TIME) AS 'DIFF'
from exampleTable with(nolock)
WHERE CONDITION1='abcdef' AND DATE_TIME >='20220801'
This works and I export to excel and filter and find the ones less than 0, but should I add them directly to the where part in sql?
I tried HAVING because it is a non-normal field, and it didn't work either.
AND (NUMBER-lag(NUMBER) over (ORDER BY DATE_TIME))<0
ORDER BY DATE_TIME ASC
So basically it is like this,
;WITH CTE AS (
select NUMBER-lag(NUMBER) over (ORDER BY DATE_TIME) AS'DIFF' from exampleTable with(nolock)
WHERE CONDITION1='abcdef' AND DATE_TIME >='20220801'
)
SELECT * FROM CTE WHERE DIFF <0

MSSQL: Why won't ROW_NUMBER give me expected results?

I have a table with a datetime field ("time") and an int field ("index")
Please see the query and the picture below. I want ROW_NUMBER to count from 1 when the index changes, also if the index value exists in previous rows. The red text indicates the output that I want to get from the query. How can I modify the query to give me the expected results?
The query:
select rv.[time], rv.[index], ROW_NUMBER() OVER(PARTITION BY rv.[index] ORDER BY rv.[time], rv.[index] ASC) AS Row#
from
tbl
This is a gaps-and-islands problem. You need to identify groups of adjacent rows. In this case, I think the simplest method is the difference of row numbers:
select rv.*,
row_number() over (partition by index, (seqnum - seqnum_2) order by time) as row_num
from (select t.*,
row_number() over (order by time) as seqnum,
row_number() over (partition by index order by time) as seqnum_2
from tbl t
) rv;
Why this works is a little tricky to explain. If you look at the results of the subquery, you will see how the difference between the two row number values identifies adjacent values that are the same.
Also, you should not use names like time and index for columns, because these a keywords in SQL. I have not escaped the names in the above query. I encourage you to give your columns and tables names that do not need to be escaped.

Select all but last row in Oracle SQL

I want to pull all rows except the last one in Oracle SQL
My database is like this
Prikey - Auto_increment
common - varchar
miles - int
So I want to sum all rows except the last row ordered by primary key grouped by common. That means for each distinct common, the miles will be summed (except for the last one)
Note: the question was changed after this answer was posted. The first two queries work for the original question. The last query (in the addendum) works for the updated question.
This should do the trick, though it will be a bit slow for larger tables:
SELECT prikey, authnum FROM myTable
WHERE prikey <> (SELECT MAX(prikey) FROM myTable)
ORDER BY prikey
This query is longer but for a large table it should faster. I'll leave it to you to decide:
SELECT * FROM (
SELECT
prikey,
authnum,
ROW_NUMBER() OVER (ORDER BY prikey DESC) AS RowRank
FROM myTable)
WHERE RowRank <> 1
ORDER BY prikey
Addendum There was an update to the question; here's the updated answer.
SELECT
common,
SUM(miles)
FROM (
SELECT
common,
miles,
ROW_NUMBER() OVER (PARTITION BY common ORDER BY prikey DESC) AS RowRank
FROM myTable
)
WHERE RowRank <> 1
GROUP BY common
Looks like I am a little too late but here is my contribution, similar to Ed Gibbs' first solution but instead of calculating the max id for each value in the table and then comparing I get it once using an inline view.
SELECT d1.prikey,
d1.authnum
FROM myTable d1,
(SELECT MAX(prikey) prikey myTable FROM myTable) d2
WHERE d1.prikey != d2.prikey
At least I think this is more efficient if you want to go without the use of Analytics.
query to retrieve all the records in the table except first row and last row
select * from table_name
where primary_id_column not in
(
select top 1 * from table_name order by primary_id_column asc
)
and
primary_id_column not in
(
select top 1 * from table_name order by primary_id_column desc
)

Why no windowed functions in where clauses?

Title says it all, why can't I use a windowed function in a where clause in SQL Server?
This query makes perfect sense:
select id, sales_person_id, product_type, product_id, sale_amount
from Sales_Log
where 1 = row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc)
But it doesn't work. Is there a better way than a CTE/Subquery?
EDIT
For what its worth this is the query with a CTE:
with Best_Sales as (
select id, sales_person_id, product_type, product_id, sale_amount, row_number() over (partition by sales_person_id, product_type, product_id order by sales_amount desc) rank
from Sales_log
)
select id, sales_person_id, product_type, product_id, sale_amount
from Best_Sales
where rank = 1
EDIT
+1 for the answers showing with a subquery, but really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.
why can't I use a windowed function in a where clause in SQL Server?
One answer, though not particularly informative, is because the spec says that you can't.
See the article by Itzik Ben Gan - Logical Query Processing: What It Is And What It Means to You and in particular the image here. Window functions are evaluated at the time of the SELECT on the result set remaining after all the WHERE/JOIN/GROUP BY/HAVING clauses have been dealt with (step 5.1).
really I'm looking for the reasoning behind not being able to use
windowing functions in where clauses.
The reason that they are not allowed in the WHERE clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)
Suppose your table was
CREATE TABLE T1
(
col1 CHAR(1) PRIMARY KEY
)
INSERT INTO T1 VALUES('A'),('B'),('C'),('D'),('E'),('F')
And your query
SELECT col1
FROM T1
WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3
AND col1 > 'B'
What would be the right result? Would you expect that the col1 > 'B' predicate ran before or after the row numbering?
There is no need for CTE, just use the windowing function in a subquery:
select id, sales_person_id, product_type, product_id, sale_amount
from
(
select id, sales_person_id, product_type, product_id, sale_amount,
row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) rn
from Sales_Log
) sl
where rn = 1
Edit, moving my comment to the answer.
Windowing functions are not performed until the data is actually selected which is after the WHERE clause. So if you try to use a row_number in a WHERE clause the value is not yet assigned.
"All-at-once operation" means that all expressions in the same
logical query process phase are evaluated logically at the same time.
And great chapter Impact on Window Functions:
Suppose you have:
CREATE TABLE #Test ( Id INT) ;
INSERT INTO #Test VALUES ( 1001 ), ( 1002 ) ;
SELECT Id
FROM #Test
WHERE Id = 1002
AND ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can
evaluate conditions in WHERE clause in arbitrary order, based on
estimated execution plan. So the main question here is which condition
evaluates first.
Case 1:
If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )
Result: 1002
Case 2:
If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )
Result: empty
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause.
You can think more about this and find why Window Functions are
allowed to be used just in SELECT and ORDER BY clauses!
Addendum
Teradata supports QUALIFY clause:
Filters results of a previously computed ordered analytical function according to user‑specified search conditions.
SELECT Id
FROM #Test
WHERE Id = 1002
QUALIFY ROW_NUMBER() OVER(ORDER BY Id) = 1;
Snowflake - Qualify
QUALIFY does with window functions what HAVING does with aggregate functions and GROUP BY clauses.
In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Typically, a SELECT statement’s clauses are evaluated in the order shown below:
From
Where
Group by
Having
Window
QUALIFY
Distinct
Order by
Limit
Databricks - QUALIFY clasue
Filters the results of window functions. To use QUALIFY, at least one window function is required to be present in the SELECT list or the QUALIFY clause.
You don't necessarily need to use a CTE, you can query the result set after using row_number()
select row, id, sales_person_id, product_type, product_id, sale_amount
from (
select
row_number() over(partition by sales_person_id,
product_type, product_id order by sale_amount desc) AS row,
id, sales_person_id, product_type, product_id, sale_amount
from Sales_Log
) a
where row = 1
It's an old thread, but I'll try to answer specifically the question expressed in the topic.
Why no windowed functions in where clauses?
SELECT statement has following main clauses specified in keyed-in order:
SELECT DISTINCT TOP list
FROM JOIN ON / APPLY / PIVOT / UNPIVOT
WHERE
GROUP BY WITH CUBE / WITH ROLLUP
HAVING
ORDER BY
OFFSET-FETCH
Logical Query Processing Order, or Binding Order, is conceptual interpretation order, it defines the correctness of the query. This order determines when the objects defined in one step are made available to the clauses in subsequent steps.
----- Relational result
1. FROM
1.1. ON JOIN / APPLY / PIVOT / UNPIVOT
2. WHERE
3. GROUP BY
3.1. WITH CUBE / WITH ROLLUP
4. HAVING
---- After the HAVING step the Underlying Query Result is ready
5. SELECT
5.1. SELECT list
5.2. DISTINCT
----- Relational result
----- Non-relational result (a cursor)
6. ORDER BY
7. TOP / OFFSET-FETCH
----- Non-relational result (a cursor)
For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps.
Conversely, all clauses preceding the SELECT clause cannot reference any column aliases or derived columns defined in SELECT clause. However, those columns can be referenced by subsequent clauses such as the ORDER BY clause.
OVER clause determines the partitioning and ordering of a row set before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within an Underlying Query Result set and window function computes result against that window.
Msg 4108, Level 15, State 1, …
Windowed functions can only appear in the SELECT or ORDER BY clauses.
The reason behind is because the way how Logical Query Processing works in T-SQL. Since the underlying query result is established only when logical query processing reaches the SELECT step 5.1. (that is, after processing the FROM, WHERE, GROUP BY and HAVING steps), window functions are allowed only in the SELECT and ORDER BY clauses of the query.
Note to mention, window functions are still part of relational layer even Relational Model doesn't deal with ordered data. The result after the SELECT step 5.1. with any window function is still relational.
Also, speaking strictly, the reason why window function are not allowed in the WHERE clause is not because it would create ambiguity, but because the order how Logical Query Processing processes SELECT statement in T-SQL.
Links: here, here and here
Finally, there's the old-fashioned, pre-SQL Server 2005 way, with a correlated subquery:
select *
from Sales_Log sl
where sl.id = (
Select Top 1 id
from Sales_Log sl2
where sales_person_id = sl.sales_person_id
and product_type = sl.product_type
and product_id = sl.product_id
order by sale_amount desc
)
I give you this for completeness, merely.
Basically first "WHERE" clause condition is read by sql and the same column/value id looked into the table but in table row_num=1 is not there still. Hence it will not work.
Thats the reason we will use parentheses first and after that we will write the WHERE clause.
Yes unfortunately when you do a windowed function SQL gets mad at you even if your where predicate is legitimate. You make a cte or nested select having the value in your select statement, then reference your CTE or nested select with that value later. Simple example that should be self explanatory. If you really HATE cte's for some performance issue on doing a large data set you can always drop to temp table or table variable.
declare #Person table ( PersonID int identity, PersonName varchar(8));
insert into #Person values ('Brett'),('John');
declare #Orders table ( OrderID int identity, PersonID int, OrderName varchar(8));
insert into #Orders values (1, 'Hat'),(1,'Shirt'),(1, 'Shoes'),(2,'Shirt'),(2, 'Shoes');
--Select
-- p.PersonName
--, o.OrderName
--, row_number() over(partition by o.PersonID order by o.OrderID)
--from #Person p
-- join #Orders o on p.PersonID = o.PersonID
--where row_number() over(partition by o.PersonID order by o.orderID) = 2
-- yields:
--Msg 4108, Level 15, State 1, Line 15
--Windowed functions can only appear in the SELECT or ORDER BY clauses.
;
with a as
(
Select
p.PersonName
, o.OrderName
, row_number() over(partition by o.PersonID order by o.OrderID) as rnk
from #Person p
join #Orders o on p.PersonID = o.PersonID
)
select *
from a
where rnk >= 2 -- only orders after the first one.

Return rows between a specific range, with one select statement

I'm looking to some expresion like this (using SQL Server 2008)
SELECT TOP 10 columName FROM tableName
But instead of that I need the values between 10 and 20. And I wonder if there is a way of doing it using only one SELECT statement.
For example this is useless:
SELECT columName FROM
(SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName) AS alias
WHERE RowNum BETWEEN 10 AND 20
Because the select inside brackets is already returning all the results, and I'm looking to avoid that, due to performance.
Use SQL Server 2012 to fetch/skip!
SELECT SalesOrderID, SalesOrderDetailID, ProductID, OrderQty, UnitPrice, LineTotal
FROM AdventureWorks2012.Sales.SalesOrderDetail
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY;
There's nothing better than you're describing for older versions of sql server. Maybe use CTE, but unlikely to make a difference.
WITH NumberedMyTable AS
(
SELECT
Id,
Value,
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber
FROM
MyTable
)
SELECT
Id,
Value
FROM
NumberedMyTable
WHERE
RowNumber BETWEEN #From AND #To
or, you can remove top 10 rows and then get next 10 rows, but I double anyone would want to do that.
There is a trick with row_number that does not involve sorting all the rows.
Try this:
SELECT columName
FROM (SELECT ROW_NUMBER() OVER(ORDER BY (select NULL as noorder)) AS RowNum, *
FROM tableName
) as alias
WHERE RowNum BETWEEN 10 AND 20
You cannot use a constant in the order by. However, you can use an expression that evaluates to a constant. SQL Server recognizes this and just returns the rows as encountered, properly enumerated.
Why do you think SQL Server would evaluate the entire inner query? Assuming your sort column is indexed, it'll just read the first 20 values. If you're really nervous you could do this:
Select
Id
From (
Select Top 20 -- note top 20
Row_Number() Over(Order By Id) As RowNum,
Id
From
dbo.Test
Order By
Id
) As alias
Where
RowNum Between 10 And 20
Order By
Id
but I'm pretty sure the query plan is the same either way.
(Really) Fixed as per Aaron's comment.
http://sqlfiddle.com/#!3/db162/6
One more option
SELECT TOP(11) columName
FROM dbo.tableName
ORDER BY
CASE WHEN ROW_NUMBER() OVER (ORDER BY someId) BETWEEN 10 AND 20
THEN ROW_NUMBER() OVER (ORDER BY someId) ELSE NULL END DESC
You could create a temp table that is ordered the way you want like:
SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName
into ##tempTable
...
That way you have an ordered list of rows.
and can just query by row number the subsequent times instead of doing the inner query multiple times.