Grouping based on ROW_Number of each group - sql

I had a requirement that grouping based on row_number of each group. Please view
Image

SQL queries represent unordered sets. So, the distinction between the two groups for 47641 is undefined.
You can define a query that will assign a group that has exactly one fiberid for each scname. When there are multiples, the assignment is arbitrary.
To do so, you can use dense_rank():
select t.*,
(dense_rank() over (order by scname) - 1 +
row_number() over (partition by scname, fiberid order by fiberid)
) as grp
from t;
If you do have an ordering for the rows then a more stable assignment can be calculated.

Related

Migrating Oracle specific sql to postgresql

Oracle:
select RANK () OVER (PARTITION BY EQUIP_UNIT_INIT_CODE ORDER BY EQUIP_UNIT_INIT_CODE, ROWNUM) from CAR_SEARCH_GTT;
Postgres: ?
Issue: there is no Rownum in postgresql, if we use row_number() over () instead of ROWNUM, the PSQLException would be thrown.
ERROR: window functions are not allowed in window definitions
Question: How to convert the query above to PostgreSQL?
Using a non-deterministic ROWNUM makes sense if you do not want RANK() numbers repeated in the case of ties.
That said, as #a_horse_with_no_name said, it make no sense whatsoever to ORDER BY the same column that you PARTITION BY.
Please try this:
with numbered as (
select *, row_number() over () as rnum
from CAR_SEARCH_GTT
)
select RANK () OVER (PARTITION BY EQUIP_UNIT_INIT_CODE
ORDER BY EQUIP_UNIT_INIT_CODE, rnum)
from CAR_SEARCH_GTT;
If there is a PK on CAR_SEARCH_GTT as id, then you can do something like this:
select RANK () OVER (PARTITION BY EQUIP_UNIT_INIT_CODE
ORDER BY EQUIP_UNIT_INIT_CODE, id)
from CAR_SEARCH_GTT;
SQL table represent unordered sets. Without an explicit sort, the rows are in arbitrary order. If the sort keys are not unique, then ties are in an arbitrary order.
Then, you don't need to repeat the PARTITION BY key in the ORDER BY for window functions. You could write what you want in Oracle as:
select RANK() OVER (PARTITION BY EQUIP_UNIT_INIT_CODE ORDER BY ROWNUM)
from CAR_SEARCH_GTT;
Despite the rownum, the ordering is arbitrary. Oracle makes no guarantee on the ordering within each group.
The one effect of rownum is to have a different value on each row. Hence, there are no ties. This is more clearly expressed using ROW_NUMBER():
select ROW_NUMBER() OVER (PARTITION BY EQUIP_UNIT_INIT_CODE ORDER BY ROWNUM)
from CAR_SEARCH_GTT;
Oracle requires an ORDER BY clause, so anything can go there.
In Postgres, you can do the same thing by removing the ORDER BY -- Postgres extends the syntax of ROW_NUMBER. So the equivalent is:
select ROW_NUMBER() OVER (PARTITION BY EQUIP_UNIT_INIT_CODE)
from CAR_SEARCH_GTT;
In both cases, you might have an appropriate key for ordering -- perhaps an identity column or creation date.

Window Function- Dense_Rank and Row_Number difference

If I use a dense_rank window function below that works in giving me my output which is the transaction refunded at dates in ascending order and assigns it 1 as rank:
select p.billing_cycle_in_months, avg(t.days)
from (
select *,
datediff(day,transaction_settled_at, transaction_refunded_at) as days,
dense_rank() over (partition by signup_id order by transaction_settled_at asc) as rank
from transactions
) t
join signups s on s.signup_id = t.signup_id
join plans p on p.id = s.plan_id
where datediff(year,s.started_at, current_date) > 1 and t.rank = 1
group by p.billing_cycle_in_months
Would I essentially get same result as using row_number window function ranked over same date (transaction_settled_at asc) column?
Basically grouped by billing cycle I want to rank the earliest day as 1, just wanted to clairfy that in this case row_number would give me same result?
Thanks
In your query, the difference between using dense_rank() and row_number() is that the former allows top ties, while the latter does not.
So if two (or more) records have the same, earliest, transaction_settled_at for a given signup_id, then condition dense_rank() ... = 1 will keep them both, while row_number() will select an undefined record out of the two.
If there no risk of ties, both functions will in your context produce the same resulting dataset.
To reduce the possibility of ties, you can also add additional sorting criterias to the order by clause of the window function:
dense_rank() over (
partition by signup_id
order by transaction_settled_at, some_other_column desc, some_more_column
)

Group By equal data inside the column aggregated by arrays

I have a dataset with three columns and I need to group by but keeping the "arrays" with small groups ordered by data:
Expected output:
This is a gaps-and-islands problem, most easily solved with the difference of row numbers:
select type, count(*), min(date_status), max(date_status)
from (select t.*,
row_number() over (order by date_status) as seqnum,
row_number() over (partition by type order by date_status) as seqnum_t
from t
) t
group by type, (seqnum - seqnum_t)
order by min(date_status);
Why this works is a little tricky to explain. I find that if someone looks at the results of the subquery, that person will usually see how the difference of the two row number columns identifies groups of adjacent types.

Sequence within a partition in SQL server

I have been looking around for 2 days and have not been able to figure out this one. Using dataset below and SQL server 2016 I would like to get the row number of each row by 'id' and 'cat' ordered by 'date' in asc order but would like to see a reset of the sequence if a different value in the 'cat' column for the same 'id' is found(see rows in green). Any help would be appreciated.
This is a gaps and islands problem. The simplest solution in this case is probably a difference of row numbers:
select t.*,
row_number() over (partition by id, cat, seqnum - seqnum_c order by date) as row_num
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, cat order by date) as seqnum_c
from t
) t;
Why this works is a bit tricky to explain. But, if you look at the sequence numbers in the subquery, you'll see that the difference defines the groups you want to define.
Note: This assumes that the date column provides a stable sort. You seem to have duplicates in the column. If there really are duplicates and you have no secondary column for sorting, then try rank() or dense_rank() instead of row_number().

How to get 10 of the results in every group of a table using Hive sql?

I have a table
I want to group the data by class, then every class pick out two of the data,whatever sorting or not.
then get results like this.
How to write the sql?
Use row_number():
select t.*
from (select t.*, row_number() over (partition by class order by class) as seqnum
from t
) t
where seqnum <= 2;
If you want two particular rows -- such as the two highest scoring or lowest scoring -- then adjust the order by clause.