SQL first order, then partition in over clause - sql

I have a problem, that I want to partition over a sorted table. Is there a way I can do that?
I am using SQL Server 2016.
Input Table:
|---------|-----------------|-----------|------------|
| prod | sortcolumn | type | value |
|---------|-----------------|-----------|------------|
| X | 1 | P | 12 |
| X | 2 | P | 23 |
| X | 3 | E | 34 |
| X | 4 | P | 45 |
| X | 5 | E | 56 |
| X | 6 | E | 67 |
| Y | 1 | P | 78 |
|---------|-----------------|-----------|------------|
Desired Output
|---------|-----------------|-----------|------------|------------|
| prod | sortcolumn | type | value | rowNr |
|---------|-----------------|-----------|------------|------------|
| X | 1 | P | 12 | 1 |
| X | 2 | P | 23 | 2 |
| X | 3 | E | 34 | 1 |
| X | 4 | P | 45 | 1 |
| X | 5 | E | 56 | 1 |
| X | 6 | E | 67 | 2 |
| Y | 1 | P | 78 | 1 |
|---------|-----------------|-----------|------------|------------|
I am this far:
SELECT
table.*,
ROW_NUMBER() OVER(PARTITION BY table.prod, table.type ORDER BY table.sortColumn) rowNr
FROM table
But this does not restart the row number on the 4th row, since it is the same prod and type.
How could I restart on every prod and also on every type change based on the sort criteria, even if the type changes back to something it already was previously? Is this even possible with a ROW_NUMBER function or do I have to work with LEAD and LAG and CASES (which would probably make it very slow, right?)
Thanks!

This is a gaps and islands problem. You can use the following query:
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
-
ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
FROM mytable t
to get:
prod sortcolumn type value grp
----------------------------------------
X 1 P 12 0
X 2 P 23 0
X 3 E 34 2
X 4 P 45 1
X 5 E 56 3
X 6 E 67 3
Y 1 P 78 0
Now, field grp can be used for partitioning:
;WITH IslandsCTE AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
-
ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
FROM mytable t
)
SELECT prod, sortcolumn, type, value,
ROW_NUMBER() OVER (PARTITION BY prod, type, grp ORDER BY sortcolumn) AS rowNr
FROM IslandsCTE
ORDER BY prod, sortcolumn
Demo here

This is a classic 'islands' problem, in that you need to find the 'islands' of records related by prod and type, but without grouping together all records matching on prod and type.
Here's one way this is typically solved. Set up:
DECLARE #t TABLE (
prod varchar(1),
sortcolumn int,
type varchar(1),
value int
);
INSERT #t VALUES
('X', 1, 'P', 12),
('X', 2, 'P', 23),
('X', 3, 'E', 34),
('X', 4, 'P', 45),
('X', 5, 'E', 56),
('X', 6, 'E', 67),
('Y', 1, 'P', 78)
;
Get some row numbers in place:
;WITH numbered AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) as rnX,
ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn) as rn
FROM
#t
)
numbered now looks like this:
prod sortcolumn type value rnX rn
---- ----------- ---- ----------- -------------------- --------------------
X 1 P 12 1 1
X 2 P 23 2 2
X 3 E 34 1 3
X 4 P 45 3 4
X 5 E 56 2 5
X 6 E 67 3 6
Y 1 P 78 1 1
Why is this useful? Well, look at the difference between rnX and rn:
prod sortcolumn type value rnX rn rn - rnX
---- ----------- ---- ----------- -------------------- -------------------- --------------------
X 1 P 12 1 1 0
X 2 P 23 2 2 0
X 3 E 34 1 3 2
X 4 P 45 3 4 1
X 5 E 56 2 5 3
X 6 E 67 3 6 3
Y 1 P 78 1 1 0
As you can see, each 'group' shares a rn - rnX value, and this changes from one group to the next.
So now if we partition by prod, type, and group number, then number within that:
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY prod, type, rn - rnX ORDER BY sortcolumn) rowNr
FROM
numbered
ORDER BY
prod, sortcolumn
we're done:
prod sortcolumn type value rnX rn rowNr
---- ----------- ---- ----------- -------------------- -------------------- --------------------
X 1 P 12 1 1 1
X 2 P 23 2 2 2
X 3 E 34 1 3 1
X 4 P 45 3 4 1
X 5 E 56 2 5 1
X 6 E 67 3 6 2
Y 1 P 78 1 1 1
Related reading: Things SQL needs: SERIES()

Try this
select prod, sortcolumn, type, value, row_number() over (partition by prod, sortcolumn, type order by value) rowNr
from table_name

Related

Get sum over a column value that is determined by two other column values in the same table

I have the following table MY_TABLE
ID | SEQ | TYPE | VAL
1 | 2 | A | 100
1 | 3 | A | 100
1 | 2 | B | 200
1 | 3 | A | 100
1 | 3 | B | 200
2 | 25 | X | 100
2 | 24 | Y | 200
2 | 24 | X | 300
2 | 25 | Y | 400
2 | 25 | X | 50
Here in MY_TABLE, each ID has a set of Seq values and Type values. I want to get the sum of VAL rows per TYPE that belong to each IDs max(Seq).
Expected output:
ID| SEQ | TYPE | SUM(VAL)
1 | 3 | A | 200 <- 100 + 100
1 | 3 | B | 200
2 | 25 | X | 150 <- 100 + 50
2 | 25 | Y | 400
What I tried:
-- this sub query finds the max(seq) for each ID
with max_seq as (
select id, max(seq) max_seq
from my_table t
group by id)
-- select query on my_table
select
bd.id,
bd.seq,
bd.type,
sum(bd.val)
from my_table bd
-- joining on id-max_seq pair
inner join max_seq
on
(max_seq.id = bd.id)
and
(max_seq.max_seq = bd.seq)
-- sum(val) per ID, MAX(SEQ), TYPE
group by bd.id, bd.seq, bd.type;
Question:
The above query works well for smaller tables but gets slower when the table is bigger. Is there an efficient way of getting this output? (Maybe without using two joins on the same table with a sub query?)
You could avoid the self-join by using a subquery which gets a ranking for each row based on the id and seq:
select id, seq, type, sum(val)
from (
select id, seq, type, val, rank() over (partition by id order by seq desc) as rnk
from my_table
)
where rnk = 1
group by id, seq, type
order by id, seq, type;
ID SEQ T SUM(VAL)
---------- ---------- - ----------
1 3 A 200
1 3 B 200
2 25 X 150
2 25 Y 400
Because of the order by seq desc, the rnk value is 1 for the highest seq for each id. The outer query then just filters on rnk = 1, limiting the output and the aggregation to those lowest-rank (highest-seq) rows.
db<>fiddle demo

query to count occurances of aparticular column value

Let's say I have a table with the following value
1
1
1
2
2
2
3
3
3
1
1
1
2
2
2
I need to get an out put like this, which counts each occurances of a
particular value
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
1 1
1 2
1 3
2 1
2 2
2 3
NB: This is a sample table Actual table is a complex table with lots of rows and columns and query contains some more conditions
If the number repeats over different "islands" then you need to calculate a value to maintain those islands first (grpnum). That first step can be undertaken by subtracting a raw top-to-bottom row number (raw_rownum) from a partitioned row number. That result gives each "island" a reference unique to that island that can then be used to partition a subsequent row number. As each order by can disturb the outcome I find it necessary to use individual steps and to pass the prior calculation up so it may be reused.
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE Table1 ([num] int);
INSERT INTO Table1 ([num])
VALUES (1),(1),(1),(2),(2),(2),(3),(3),(3),(1),(1),(1),(2),(2),(2);
Query 1:
select
num
, row_number() over(partition by (grpnum + num) order by raw_rownum) rn
, grpnum + num island_num
from (
select
num
, raw_rownum - row_number() over(partition by num order by raw_rownum) grpnum
, raw_rownum
from (
select
num
, row_number() over(order by (select null)) as raw_rownum
from table1
) r
) d
;
Results:
| num | rn | island_num |
|-----|----|------------|
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 5 |
| 2 | 2 | 5 |
| 2 | 3 | 5 |
| 1 | 1 | 7 |
| 1 | 2 | 7 |
| 1 | 3 | 7 |
| 3 | 1 | 9 |
| 3 | 2 | 9 |
| 3 | 3 | 9 |
| 2 | 1 | 11 |
| 2 | 2 | 11 |
| 2 | 3 | 11 |
SQL Server provide row_number() function :
select ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN FROM <TABLE_NAME>
EDIT :
select * , case when (row_number() over (order by (select 1))) %3 = 0 then 3 else
(row_number() over (order by (select 1))) %3 end [rn] from table
I think there is a problem with your sample, in that you have an implied order but not an explicit one. There is no guarantee that the database will keep and store the values the way you have them listed, so there has to be some inherent/explicit ordering mechanism to tell the database to give those values back exactly the way you listed.
For example, if you did this:
update test
set val = val + 2
where val < 3
You would find your select * no longer comes back the way you expected.
You indicated your actual table was huge, so I assume you have something like this you can use. There should be something in the table to indicate the order you want... a timestamp, perhaps, or maybe a surrogate key.
That said, assuming you have something like that and can leverage it, I believe a series of windowing functions would work.
with rowed as (
select
val,
case
when lag (val, 1, -1) over (order by 1) = val then 0
else 1
end as idx,
row_number() over (order by 1) as rn -- fix this once you have your order
from
test
),
partitioned as (
select
val, rn,
sum (idx) over (order by rn) as instance
from rowed
)
select
val, instance, count (1) over (partition by instance order by rn)
from
partitioned
This example orders by the way they are listed in the database, but you would want to change the row_number function to accommodate whatever your real ordering mechanism is.
1 1 1
1 1 2
1 1 3
2 2 1
2 2 2
2 2 3
3 3 1
3 3 2
3 3 3
1 4 1
1 4 2
1 4 3
2 5 1
2 5 2
2 5 3

Merge multiple rows in SQL with tie breaking on primary key

I have a table with data like the following
key | A | B | C
---------------------------
1 | x | 0 | 1
2 | x | 2 | 0
3 | x | NULL | 4
4 | y | 7 | 1
5 | y | 3 | NULL
6 | z | NULL | 4
And I want to merge the rows together based on column A with largest primary key being the 'tie breaker' between values that are not NULL
Result
key | A | B | C
---------------------------
1 | x | 2 | 4
2 | y | 3 | 1
3 | z | NULL | 4
What would be the best way to achieve this assuming my data is actually 40 columns and 1 million rows with an unknown level of duplications?
Using ROW_NUMBER and conditional aggregation:
SQL Fiddle
WITH cte AS(
SELECT *,
rnB = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN B IS NULL THEN 0 ELSE 1 END DESC, [key] DESC),
rnC = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN C IS NULL THEN 0 ELSE 1 END DESC, [key] DESC)
FROM tbl
)
SELECT
[key] = ROW_NUMBER() OVER(ORDER BY A),
A,
B = MAX(CASE WHEN rnB = 1 THEN B END),
C = MAX(CASE WHEN rnC = 1 THEN C END)
FROM cte
GROUP BY A

Select N Rows With Mixed Values

I have a table with columns like
insertTimeStamp, port, data
1 , 20 , 'aaa'
2 , 20 , 'aba'
3 , 20 , '3aa'
4 , 20 , 'aab'
2 , 21 , 'aza'
5 , 21 , 'aha'
8 , 21 , 'aaa'
15 , 22 , '2aa'
Now I need N Rows (Say 4) from that table, ordered asc by insertTimeStamp.
But if possible, I want to get them from different ports.
So the result should be:
1 , 20 , 'aaa'
2 , 20 , 'aba'
2 , 21 , 'aza'
15 , 22 , '2aa'
If there are not enough different values in port I would like select the remaining ones with the lowest insertTimeStamp.
SQL Fiddle Demo
As you can see I create a group_id so group_id = 1 will be the smaller TimeStamp for each port
The second field is time_id so in the ORDER BY after I select all the 1 bring all the 2,3,4 for any port.
SELECT *
FROM (
SELECT *,
row_number() over (partition by "port" order by "insertTimeStamp") group_id,
row_number() over (order by "insertTimeStamp") time_id
FROM Table1 T
) as T
ORDER BY CASE
WHEN group_id = 1 THEN group_id
ELSE time_id
END
LIMIT 4
OUTPUT
| insertTimeStamp | port | data | group_id | time_id |
|-----------------|------|------|----------|---------|
| 1 | 20 | aaa | 1 | 1 |
| 2 | 21 | aza | 1 | 3 |
| 15 | 22 | 2aa | 1 | 8 |
| 2 | 20 | aba | 2 | 2 |
Use row_number():
select *
from (
select insertTimeStamp, port, data
from (
select *, row_number() over (partition by port order by insertTimeStamp) rn
from a_table
) alias
order by rn, insertTimeStamp
limit 4
) alias
order by 1, 2;
inserttimestamp | port | data
-----------------+------+------
1 | 20 | aaa
2 | 20 | aba
2 | 21 | aza
15 | 22 | 2aa
(4 rows)
SqlFiddle

Partitions with number of rows oracle

I have a view in a Oracle DB, it looks as follows:
id | type | numrows
----|--------|----------
1 | S | 2
2 | L | 3
3 | S | 2
4 | S | 2
5 | L | 3
6 | S | 2
7 | L | 3
8 | S | 2
9 | L | 3
10 | L | 3
The idea is: if TYPE is 'S' then return 2 rows (randomly), and if TYPE is 'L' then return 3 rows (randomly).
Example:
id | type | numrows
----|--------|----------
1 | S | 2
3 | S | 2
2 | L | 3
5 | L | 3
7 | L | 3
you should tell oracle how to get 3 rows or 2 rows. An ideea is to fabricate a row:
select id, type, numrows
from
(select
id,
type,
numrows,
row_number() over (partition by type order by type) rnk --fabricated
from table)
where
(type = 'S' and rnk <= 2 )
or
(type = 'L' and rnk <= 3 );
You can order by anything you want in that analytic function. For example, you can order by dbms_random.random() for random choices.
If your column numrows is correct and that's the number of rows you want to get then the where clause is simpler:
select id, type, numrows
from
(select
id,
type,
numrows,
row_number() over (partition by type order by dbms_random.random()) rnk --fabricated
from table)
where
rnk <= numrows;