Migrating Oracle specific sql to postgresql - sql

Oracle:
select RANK () OVER (PARTITION BY EQUIP_UNIT_INIT_CODE ORDER BY EQUIP_UNIT_INIT_CODE, ROWNUM) from CAR_SEARCH_GTT;
Postgres: ?
Issue: there is no Rownum in postgresql, if we use row_number() over () instead of ROWNUM, the PSQLException would be thrown.
ERROR: window functions are not allowed in window definitions
Question: How to convert the query above to PostgreSQL?

Using a non-deterministic ROWNUM makes sense if you do not want RANK() numbers repeated in the case of ties.
That said, as #a_horse_with_no_name said, it make no sense whatsoever to ORDER BY the same column that you PARTITION BY.
Please try this:
with numbered as (
select *, row_number() over () as rnum
from CAR_SEARCH_GTT
)
select RANK () OVER (PARTITION BY EQUIP_UNIT_INIT_CODE
ORDER BY EQUIP_UNIT_INIT_CODE, rnum)
from CAR_SEARCH_GTT;
If there is a PK on CAR_SEARCH_GTT as id, then you can do something like this:
select RANK () OVER (PARTITION BY EQUIP_UNIT_INIT_CODE
ORDER BY EQUIP_UNIT_INIT_CODE, id)
from CAR_SEARCH_GTT;

SQL table represent unordered sets. Without an explicit sort, the rows are in arbitrary order. If the sort keys are not unique, then ties are in an arbitrary order.
Then, you don't need to repeat the PARTITION BY key in the ORDER BY for window functions. You could write what you want in Oracle as:
select RANK() OVER (PARTITION BY EQUIP_UNIT_INIT_CODE ORDER BY ROWNUM)
from CAR_SEARCH_GTT;
Despite the rownum, the ordering is arbitrary. Oracle makes no guarantee on the ordering within each group.
The one effect of rownum is to have a different value on each row. Hence, there are no ties. This is more clearly expressed using ROW_NUMBER():
select ROW_NUMBER() OVER (PARTITION BY EQUIP_UNIT_INIT_CODE ORDER BY ROWNUM)
from CAR_SEARCH_GTT;
Oracle requires an ORDER BY clause, so anything can go there.
In Postgres, you can do the same thing by removing the ORDER BY -- Postgres extends the syntax of ROW_NUMBER. So the equivalent is:
select ROW_NUMBER() OVER (PARTITION BY EQUIP_UNIT_INIT_CODE)
from CAR_SEARCH_GTT;
In both cases, you might have an appropriate key for ordering -- perhaps an identity column or creation date.

Related

MSSQL: Why won't ROW_NUMBER give me expected results?

I have a table with a datetime field ("time") and an int field ("index")
Please see the query and the picture below. I want ROW_NUMBER to count from 1 when the index changes, also if the index value exists in previous rows. The red text indicates the output that I want to get from the query. How can I modify the query to give me the expected results?
The query:
select rv.[time], rv.[index], ROW_NUMBER() OVER(PARTITION BY rv.[index] ORDER BY rv.[time], rv.[index] ASC) AS Row#
from
tbl
This is a gaps-and-islands problem. You need to identify groups of adjacent rows. In this case, I think the simplest method is the difference of row numbers:
select rv.*,
row_number() over (partition by index, (seqnum - seqnum_2) order by time) as row_num
from (select t.*,
row_number() over (order by time) as seqnum,
row_number() over (partition by index order by time) as seqnum_2
from tbl t
) rv;
Why this works is a little tricky to explain. If you look at the results of the subquery, you will see how the difference between the two row number values identifies adjacent values that are the same.
Also, you should not use names like time and index for columns, because these a keywords in SQL. I have not escaped the names in the above query. I encourage you to give your columns and tables names that do not need to be escaped.

Sequence within a partition in SQL server

I have been looking around for 2 days and have not been able to figure out this one. Using dataset below and SQL server 2016 I would like to get the row number of each row by 'id' and 'cat' ordered by 'date' in asc order but would like to see a reset of the sequence if a different value in the 'cat' column for the same 'id' is found(see rows in green). Any help would be appreciated.
This is a gaps and islands problem. The simplest solution in this case is probably a difference of row numbers:
select t.*,
row_number() over (partition by id, cat, seqnum - seqnum_c order by date) as row_num
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, cat order by date) as seqnum_c
from t
) t;
Why this works is a bit tricky to explain. But, if you look at the sequence numbers in the subquery, you'll see that the difference defines the groups you want to define.
Note: This assumes that the date column provides a stable sort. You seem to have duplicates in the column. If there really are duplicates and you have no secondary column for sorting, then try rank() or dense_rank() instead of row_number().

Grouping based on ROW_Number of each group

I had a requirement that grouping based on row_number of each group. Please view
Image
SQL queries represent unordered sets. So, the distinction between the two groups for 47641 is undefined.
You can define a query that will assign a group that has exactly one fiberid for each scname. When there are multiples, the assignment is arbitrary.
To do so, you can use dense_rank():
select t.*,
(dense_rank() over (order by scname) - 1 +
row_number() over (partition by scname, fiberid order by fiberid)
) as grp
from t;
If you do have an ordering for the rows then a more stable assignment can be calculated.

Row_Number over (partition by...) all columns

I have a view with something like 150 columns and I want to add an Id column to that view. Is it possible not to write all the column names in the over (partition by... ) statment?
something like this:
row_number over (partition by *) As ID?
If you want to add a row number to the view, don't you just want an order by with no partition?
If so, you can use one of the following, depending on the database:
select row_number() over ()
select row_number() over (order by NULL)
select row_number() over (order by (select NULL))
Your approach would be enumerating identical rows, not providing a row number over all rows.

Remove ORDER BY clause from PARTITION BY clause?

Is there a way I can reduce the impact of the 'ORDER BY lro_pid' clause in the OVER portion of the inner query below?
SELECT *
FROM (SELECT a.*,
Row_Number() over (PARTITION BY search_point_type
ORDER BY lro_pid) spt_rank
FROM lro_search_point a
ORDER BY spt_rank)
WHERE spt_rank = 1;
I don't care to order this result within the partition since I want to order it by a different variable entirely. lro_pid is an indexed column, but this still seems like a waste of resources as it currently stands. (Perhaps there is a way to limit the ordering to a range of a single row?? Hopefully no time/energy would be spent on sorting within the partition at all)
A couple of things to try:
Can you e.g. ORDER BY 'constant' in the OVER clause?
If ordering by a constant is not permitted, how about ORDER BY (lro_pid * 0)?
I'm not an Oracle expert (MSSQL is more my thing) - hence questions to answer your question!
Using a constant in the analytic ORDER BY as #Will A suggested appears to be the fastest method.
The optimizer still performs a sort, but it's faster than sorting a column.
Also, you probably want to remove the second ORDER BY, or at least move it to the outer query.
Below is my test case:
--Create table, index, and dummy data.
create table lro_search_point(search_point_type number, lro_pid number, column1 number
,column2 number, column3 number);
create index lro_search_point_idx on lro_search_point(lro_pid);
insert /*+ append */ into lro_search_point
select mod(level, 10), level, level, level, level from dual connect by level <= 100000;
commit;
--Original version. Averages 0.53 seconds.
SELECT * FROM
(
SELECT a.*, Row_Number() over (PARTITION BY search_point_type ORDER BY lro_pid) spt_rank
FROM lro_search_point a
ORDER BY spt_rank
)
WHERE spt_rank=1;
--Sort by constant. Averages 0.33 seconds.
--This query and the one above have the same explain plan, basically it's
--SELECT/VIEW/SORT ORDER BY/WINDOW SORT PUSHED RANK/TABLE ACCESS FULL.
SELECT * FROM
(
SELECT a.*, Row_Number() over (PARTITION BY search_point_type ORDER BY -1) spt_rank
FROM lro_search_point a
ORDER BY spt_rank
)
WHERE spt_rank=1;
--Remove the ORDER BY (or at least move it to the outer query). Averages 0.27 seconds.
SELECT * FROM
(
SELECT a.*, Row_Number() over (PARTITION BY search_point_type ORDER BY -1) spt_rank
FROM lro_search_point a
)
WHERE spt_rank=1;
--Replace analytic with aggregate functions, averages 0.28 seconds.
--This idea is the whole reason I did this, but turns out it's no faster. *sigh*
--Plan is SELECT/SORT GROUP BY/TABLE ACCESS FULL.
--Note I'm using KEEP instead of just regular MIN.
--I assume that you want the values from the same row.
SELECT a.search_point_type
,min(lro_pid) keep (dense_rank first order by -1)
,min(column1) keep (dense_rank first order by -1)
,min(column2) keep (dense_rank first order by -1)
,min(column3) keep (dense_rank first order by -1)
FROM lro_search_point a
group by a.search_point_type;
To obmit the clause ORDER BY you could use ORDER BY rownum.