I have this data:
row_id type value
1 a 1
2 a 2
3 a 3
4 a 5 --note that type a, value 4 is missing
5 a 6
6 a 7
7 b 1
8 b 2
9 b 3
10 b 4
11 b 5 --note that type b is missing no values from 1 to 5
12 c 1
13 c 3 --note that type c, value 2 is missing
I want to find the minimum and maximum values for each consecutive "run" within each type. That is, I want to return
row_id type group_num min_value max_value
1 a 1 1 3
2 a 2 5 7
3 b 1 1 5
4 c 1 1 1
5 c 2 3 3
I am a fairly experienced SQL user, but I've never solved this problem. Obviously I know how to get the overall minimum and maximum for each type, using GROUP, MIN, and MAX, but I'm really at a loss for these local minima and maxima. I haven't found anything on other questions that answers my question.
I'm using PLSQL Developer with Oracle 11g. Thanks!
This is a gaps-and-islands problem. You can use an analytic function effect/trick to finds the chains of contiguous values for each type:
select type,
min(value) as min_value,
max(value) as max_value
from (
select type, value,
dense_rank() over (partition by type order by value)
- dense_rank() over (partition by null order by value) as chain
from your_table
)
group by type, chain
order by type, min(value);
The inner query uses the difference between the ranking of the values within the type and within the entire result set to create the 'chain' number. The outer query just uses that for the grouping.
SQL Fiddle including the result of the inner query.
This is one way to achieve the result you require:
with step_1 as (
select w.type,
w.value,
w.value - row_number() over (partition by w.type order by w.row_id) as grp
from window_test w
), step_2 as (
select x.type,
x.value,
dense_rank() over (partition by x.type order by x.grp) as grp
from step_1 x
)
select rank() over (order by y.type, y.grp) as row_id,
y.type,
y.grp as group_num,
min(y.value) as min_val,
max(y.value) as max_val
from step_2 y
group by y.type, y.grp
order by 1;
Related
have this values in a table column select a from tab:
a
1
2
3
4
5
6
7
15
16
18
Using a variable=3, how can create column b starting with min(a) and with the following values:
a
b
1
1
2
1
3
1
4
4
5
4
6
4
7
7
15
15
17
15
18
18
something like: for each a (ordered) maintain the value at most for 3, otherwise reset.
Thanks,
AAWNSD
I think you want window functions and groups of three based on arithmetic on a:
select a,
min(a) over (partition by ceiling(a / 3.0)) as b
from tab;
Here is a db<>fiddle.
Hmmm . . . I realize that the above returns "16" for the last row rather than 18. My above interpretation may not be correct. You may be saying that you want groups -- once they start -- to never exceed the group starting value plus 2.
If so, one approach is a recursive CTE:
with recursive tt as (
select a, row_number() over (order by a) as seqnum
from tab
),
cte as (
select a, seqnum, a as grp
from tt
where seqnum = 1
union all
select tt.a, tt.seqnum,
(case when tt.a <= grp + 2 then grp else tt.a end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte;
I have the following example dataset:
ID
Value
Row index (for reference purposes only, does not need to exist in final output)
a
4
1
a
7
2
a
12
3
a
12
4
a
13
5
b
1
6
b
2
7
b
3
8
b
4
9
b
5
10
I would like to write a SQL script that returns the next row which has a Value of N or more than the previously returned row starting from the first row per ID and ordered ascending by [Value]. An example of the final table for N = 3 should look like the following:
ID
Value
Row index
a
4
1
a
7
2
a
12
3
b
1
6
b
4
9
Can this script be written in a vectorised manner? Or must a loop be utilised? Any advice would be greatly appreciated. Thanks!
SQL tables represent unordered sets. There is no definition of "previous" value, unless you have a column that specifies the ordering. With such a column, you can use lag():
select t.*
from (select t.*,
lag(value) over (partition by id order by <ordering column>) as prev_value
from t
) t
where prev_value is null or prev_value <= value - 3;
EDIT:
I think I misunderstood what you want to do. You seem to want to start with the first row for each id. Then get the next row that is 3 or higher in value. Then hold onto that value and get the next that is 3 or higher than that. And so on.
You can do this in SQL using a recursive CTE:
with ts as (
select distinct t.id, t.value, dense_rank() over (partition by id order by value) as seqnum
from t
),
cte as (
select id, value, value as grp_value, 1 as within_seqnum, seqnum
from ts
where seqnum = 1
union all
select ts.id, ts.value,
(case when ts.value >= cte.grp_value + 3 then ts.value else cte.grp_value end),
(case when ts.value >= cte.grp_value + 3 then 1 else cte.within_seqnum + 1 end),
ts.seqnum
from cte join
ts
on ts.id = cte.id and ts.seqnum = cte.seqnum + 1
)
select *
from cte
where within_seqnum = 1
order by id, value;
Here is a db<>fiddle.
I have the following table, called Bucket:
bucketId bucketSpace totalItemsOnBuckets
1 1 21
1 2 21
1 3 21
1 4 21
2 1 9
2 2 9
2 3 9
I'm trying to produce the following output using NTILE
bucketId bucketSpace bucketSpaceItems totalItemsOnBuckets
1 1 5 21
1 2 5 21
1 3 5 21
1 4 6 21
2 1 3 9
2 2 3 9
2 3 3 9
As you can see, the NTILE value, is a column within the row.
I have tried several options
select
b.*,
NTILE(b.totalItemsOnBuckets) OVER(PARTITION BY bucketId order by bucketId, bucketSpace) AS bucketSpaceItems
from
Bucket b
but all of them gives me the error:
The reference to column "columnName" is not allowed in an argument to the NTILE function. Only references to columns at an outer scope or standalone expressions and subqueries are allowed here.
Using the error as a hint, I have tried subqueries, cross join, but all of them return the same error.
http://sqlfiddle.com/#!18/6decd/2
NTILE() only allows constants (for some definition of "constant") for the number of tiles. But you can easily calculate it with other window functions:
select ceiling(row_number() over (partition by bucketid order by bucketSpace) * totalItemsOnBuckets * 1.0 /
count(*) over (partition by bucketid)
) as tile
I end up using, Gordon Linoff, with some modifications.
;with count_per_bucket as
(
select
b.*,
ceiling(row_number() over (partition by bucketid order by bucketSpace) * totalItemsOnBuckets * 1.0 / count(*) over (partition by bucketid)) as cumulativeCount
from
#bucket b
),
count_per_bucket_analisys as
(
select
*,
lead(cumulativeCount,1,0) over (partition by bucketid order by bucketSpace) as nextCumulativeCount,
lag(cumulativeCount,1,0) over (partition by bucketid order by bucketSpace) as previousCumulativeCount
from
count_per_bucket
)
select
*,
ItemsPerBucket =
CASE
WHEN previousCumulativeCount = 0 THEN cumulativeCount
WHEN nextCumulativeCount = 0 THEN cumulativeCount - previousCumulativeCount
ELSE nextCumulativeCount - cumulativeCount
END
from count_per_bucket_analisys
Below is my table
a 1
a 2
a 1
b 1
a 2
a 2
b 3
b 2
a 1
My Expected output is
a 4
b 1
a 4
b 5
a 1
I want them to be grouped if they are in sequence.
If your dbms supports window functions, you can use the row_number difference to assign the same group to consecutive values (which are the same) in one column. After assigning the groups, it is easy to sum the values for each group.
select col1,sum(col2)
from (select t.*,
row_number() over(order by someid)
- row_number() over(partition by col1 order by someid) as grp
from tablename t
) x
group by col1,grp
Replace tablename, col1,col2,someid with the appropriate column names. someid should be the column to be ordered by.
I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks
You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;
SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping