How to Rank By Partition with island and gap issue - sql

Is it possible to rank item by partition without use CTE method
Expected Table
item
value
ID
A
10
1
A
20
1
B
30
2
B
40
2
C
50
3
C
60
3
A
70
4
A
80
4
By giving id to the partition to allow agitated function to work the way I want.
item
MIN
MAX
ID
A
10
20
1
B
30
40
2
C
50
60
3
A
70
80
4
SQL Version: Microsoft SQL Sever 2017

Assuming that the value column provides the intended ordering of the records which we see in your question above, we can try using the difference in row numbers method here. Your problem is a type of gaps and islands problem.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY value) rn1,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY value) rn2
FROM yourTable
)
SELECT item, MIN(value) AS [MIN], MAX(value) AS [MAX], MIN(ID) AS ID
FROM cte
GROUP BY item, rn1 - rn2
ORDER BY MIN(value);
Demo
If you don't want to use a CTE here, for whatever reason, you may simply inline the SQL code in the CTE into the bottom query, as a subquery:
SELECT item, MIN(value) AS [MIN], MAX(value) AS [MAX], MIN(ID) AS ID
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY value) rn1,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY value) rn2
FROM yourTable
) t
GROUP BY item, rn1 - rn2
ORDER BY MIN(value);

You can generate group IDs by analyzing the previous row item value that could be obtained with the LAG function and finally use GROUP BY to get the minimum and maximum value in item groups.
SELECT
item,
MIN(value) AS "min",
MAX(value) AS "max",
group_id + 1 AS id
FROM (
SELECT
*,
SUM(CASE WHEN item = prev_item THEN 0 ELSE 1 END) OVER (ORDER BY value) AS group_id
FROM (
SELECT
*,
LAG(item, 1, item) OVER (ORDER BY value) AS prev_item
FROM t
) items
) groups
GROUP BY item, group_id
Query produces output
item
min
max
id
A
10
20
1
B
30
40
2
C
50
60
3
A
70
80
4
You can check a working demo here

Related

How to get longest consecutive same value?

How to get the rows of the longest consecutive same value?
Table Learning:
rowID
values
1
1
2
1
3
0
4
0
5
0
6
1
7
0
8
1
9
1
10
1
Longest consecutive value is 1 (rowID 8-10 as rowID 1-2 is 2 and rowID 6-6 is 1). How to query to get the actual rows of consecutive values (not just rowStart and rowEnd values) like :
rowID
values
8
1
9
1
10
1
And for longest consecutive values of both 1 and 0?
DB Fiddle
I think that the simplest approach is to use a window count to define the islands. Then to get the "longest" island, we just need to aggregate, sort and limit:
select min(valueid) grp_start, max(valueid) grp_end
from (select t.*, sum(value = 0) over(order by valueid) grp from testing t) t
where value = 1
group by grp
order by count(*) desc limit 1
In the DB Fiddle that you provided, the query returns:
grp_start
grp_end
8
10
This is a gaps and islands problem, and one approach is to use the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY rowID) rn1,
ROW_NUMBER() OVER (PARTITION BY values ORDER BY rowID) rn2
FROM yourTable
),
cte2 AS (
SELECT *,
MIN(rowID) OVER (PARTITION BY values, rn1 - rn2) AS minRowID,
MAX(rowID) OVER (PARTITION BY values, rn1 - rn2) AS maxRowID
FROM cte1
),
cte3 AS (
SELECT *, RANK() OVER (PARTITION BY values ORDER BY maxRowID - minRowID DESC) rnk
FROM cte2
)
SELECT rowID, values
FROM cte3
WHERE rnk = 1
ORDER BY values, rowID;

How would I extract only the latest week from a select over statement in Hiveql?

I need some help, I've created a query which keeps a running total of whether an element returns a 1 or 0 against a specific measure with the running total returning to 0 if the measure provides a 0, Example below:
year_week element measure running_total
2020_40 A 1 1
2020_41 A 1 2
2020_42 A 1 3
2020_43 A 0 0
2020_44 A 1 1
2020_45 A 1 2
2020_40 B 1 1
2020_41 B 1 2
2020_42 B 1 3
2020_43 B 1 4
2020_44 B 1 5
2020_45 B 1 6
The above is achieved using this query:
SELECT element,
year_week,
measure,
SUM(measure) OVER (PARTITION BY element, flag_sum ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM (
SELECT *,
SUM(measure_flag) OVER (PARTITION BY element ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flag_sum
FROM (
SELECT *,
CASE WHEN measure = 1 THEN 0 ELSE 1 END AS measure_flag
FROM database.table ) x ) y
This is great and works - but I'd like to provide only the latest weeks data for each element. So in the above example it would be:
year_week element measure running_total
2020_45 A 1 2
2020_45 B 1 6
Essentially I need to keep the logic the same but limit the returned data set. I've attempted this however it changes the result from the correct running total to a 1 or 0.
Any help is greatly appreciated!
You can add another level of nesting, and filter the latest record per element with row_number().
I would suggest:
select element, year_week, measure, running_total
from (
select t.*,
row_number() over(partition by element, grp order by year_week) - 1 as running_total
from (
select t.*,
sum(1 - measure) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1
I simplified the query a little, considering the fact that measure has values 0 and 1 only, as showed in your sample data. If that's not the case, then:
select element, year_week, measure, running_total
from (
select t.*,
sum(measure) over(partition by element, grp order by year_week) as running_total
from (
select t.*,
sum(case when measure = 0 then 1 else 0 end) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1

how to find the number has more than two consecutive appearences?

The source table:
id num
-------------------
1 1
2 1
3 1
4 2
5 2
6 1
The output:(appear at least 2 times)
num times
--------------
1 3
2 2
Based on the addition logic defined in the comments it appears this is what you're after:
WITH YourTable AS(
SELECT V.id,
V.num
FROM (VALUES(1,1),
(2,1),
(3,1),
(4,2),
(5,2),
(6,1),
(7,1))V(id,num)), --Added extra row due to logic defined in comments
Grps AS(
SELECT YT.id,
YT.num,
ROW_NUMBER() OVER (ORDER BY id) -
ROW_NUMBER() OVER (PARTITION BY Num ORDER BY id) AS Grp
FROM YourTable YT),
Counts AS(
SELECT num,
COUNT(num) AS Times
FROM grps
GROUP BY grp,
num)
SELECT num,
MAX(times) AS times
FROM Counts
GROUP BY num;
This uses a CTE and ROW_NUMBER to define the groups, and then an additional CTE to get the COUNT per group. Finally you can then get the MAX COUNT per num.
I would adress this with a gaps-and-islands technique:
select num, max(cnt)
from (
select num, count(*) cnt
from (
select
id,
num,
row_number() over(order by id) rn1,
row_number() over(partition by num order by id) rn2
from mytable
) t
group by num, rn1 - rn2
) t
group by num
The most inner query computes row numbers over the whole table and within num groups; the difference between the row numbers gives you the group of adjacent records that each record belong to (you can run that subquery independently and follow how the difference evolves to understand more).
Then, the next level count the number of records in each group of adjacent records. The most outer query takes the maximum count of adjacent records in for each num.
Demo on DB Fiddle:
num | (No column name)
--: | ---------------:
1 | 3
2 | 2
this will work for you
select num,count(num) times from Tabl
group by num

Select TOP 2 values for each group

I'm having problem with getting only TOP 2 values for each group (groups are in column).
Example :
ID Group Value
1 A 30
2 A 150
3 A 40
4 A 70
5 B 0
6 B 100
7 B 90
I expect my output to be
ID Group Value
1 A 150
2 A 70
3 B 100
4 B 90
Simply, for each group I want just 2 rows with the highest Value
Most databases support the ANSI standard row_number() function. You would use it as:
select group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
To set the id you can use row_number() in the outer query:
select row_number() over (order by group, value) as id,
group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
However, changing the id seems suspicious.
You can use CTE with rank function ROW_NUMBER() .
Here is query to get your result.
;WITH cte AS
( SELECT Group, value,
ROW_NUMBER() OVER (PARTITION BY Group ORDER BY value DESC) AS rn
FROM test
)
SELECT Group, value FROM cte
WHERE rn <= 2
ORDER BY value

SQL rank grouping variation

I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks
You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;
SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping