PSQL, adding a "step increasing" column - sql

have this values in a table column select a from tab:
a
1
2
3
4
5
6
7
15
16
18
Using a variable=3, how can create column b starting with min(a) and with the following values:
a
b
1
1
2
1
3
1
4
4
5
4
6
4
7
7
15
15
17
15
18
18
something like: for each a (ordered) maintain the value at most for 3, otherwise reset.
Thanks,
AAWNSD

I think you want window functions and groups of three based on arithmetic on a:
select a,
min(a) over (partition by ceiling(a / 3.0)) as b
from tab;
Here is a db<>fiddle.
Hmmm . . . I realize that the above returns "16" for the last row rather than 18. My above interpretation may not be correct. You may be saying that you want groups -- once they start -- to never exceed the group starting value plus 2.
If so, one approach is a recursive CTE:
with recursive tt as (
select a, row_number() over (order by a) as seqnum
from tab
),
cte as (
select a, seqnum, a as grp
from tt
where seqnum = 1
union all
select tt.a, tt.seqnum,
(case when tt.a <= grp + 2 then grp else tt.a end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte;

Related

Is it possible to use a aggregate function over partition by as a case condition in SQL?

Problem statement is to calculate median from a table that has two columns. One specifying a number and the other column specifying the frequency of the number.
For e.g.
Table "Numbers":
Num
Freq
1
3
2
3
This median needs to be found for the flattened array with values:
1,1,1,2,2,2
Query:
with ct1 as
(select num,frequency, sum(frequency) over(order by num) as sf from numbers o)
select case when count(num) over(order by num) = 1 then num
when count(num) over (order by num) > 1 then sum(num)/2 end median
from ct1 b where sf <= (select max(sf)/2 from ct1) or (sf-frequency) <= (select max(sf)/2 from ct1)
Is it not possible to use count(num) over(order by num) as the condition in the case statement?
Find the relevant row / 2 rows based of the accumulated frequencies, and take the average of num.
The example and Fiddle will also show you the
computations leading to the result.
If you already know that num is unique, rowid can be removed from the ORDER BY clauses
with
t1 as
(
select t.*
,nvl(sum(freq) over (order by num,rowid rows between unbounded preceding and 1 preceding),0) as freq_acc_sum_1
,sum(freq) over (order by num, rowid) as freq_acc_sum_2
,sum(freq) over () as freq_sum
from t
)
select t1.*
,case
when freq_sum/2 between freq_acc_sum_1 and freq_acc_sum_2
then 'V'
end as relevant_record
from t1
order by num, rowid
Fiddle
Example:
ID
NUM
FREQ
FREQ_ACC_SUM_1
FREQ_ACC_SUM_2
FREQ_SUM
RELEVANT_RECORD
7
8
1
0
1
18
5
10
1
1
2
18
1
29
3
2
5
18
6
31
1
5
6
18
3
33
2
6
8
18
4
41
1
8
9
18
V
9
49
2
9
11
18
V
2
52
1
11
12
18
8
56
3
12
15
18
10
92
3
15
18
18
MEDIAN
45
Fiddle for 1M records
You can find the one (or two) middle value(s) and then average:
SELECT AVG(num) AS median
FROM (
SELECT num,
freq,
SUM(freq) OVER (ORDER BY num) AS cum_freq,
(SUM(freq) OVER () + 1)/2 AS median_freq
FROM table_name
)
WHERE cum_freq - freq < median_freq
AND median_freq < cum_freq + 1
Or, expand the values using a LATERAL join to a hierarchical query and then use the MEDIAN function:
SELECT MEDIAN(num) AS median
FROM table_name t
CROSS JOIN LATERAL (
SELECT LEVEL
FROM DUAL
WHERE freq > 0
CONNECT BY LEVEL <= freq
)
Which, for the sample data:
CREATE TABLE table_name (Num, Freq) AS
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 3 FROM DUAL;
Outputs:
MEDIAN
1.5
(Note: For your sample data, there are 6 items, an even number, so the MEDIAN will be half way between the value of 3rd and 4rd items; so half way between 1 and 2 = 1.5.)
db<>fiddle here

SQL query to partition rows into groups where lag (difference between rows) is greater than some value

Suppose I have a table like
id
1
3
4
10
12
19
and I'd like to group the ids (in sorted order) into the same group if they differ by 5 or less, and a new group if they differ by 6 or more. So the output would be:
id
group
1
1
3
1
4
1
10
2
12
2
19
3
Is this possible in SQL? It will be a query in Trino, and I see they have commands like lag and partition. Has anyone made a query like this that can help out?
You can use a cte with lead:
with cte(id, l1) as (
select t.id, abs(coalesce(lead(t.id) over (order by t.id), 0) - t.id) < 6 from tbl t
)
select c.id, (select sum(c1.id < c.id and c1.l1 = 0) from cte c1) + 1 from cte c

(SQL) Per ID, starting from the first row, return all successive rows with a value N greater than the prior returned row

I have the following example dataset:
ID
Value
Row index (for reference purposes only, does not need to exist in final output)
a
4
1
a
7
2
a
12
3
a
12
4
a
13
5
b
1
6
b
2
7
b
3
8
b
4
9
b
5
10
I would like to write a SQL script that returns the next row which has a Value of N or more than the previously returned row starting from the first row per ID and ordered ascending by [Value]. An example of the final table for N = 3 should look like the following:
ID
Value
Row index
a
4
1
a
7
2
a
12
3
b
1
6
b
4
9
Can this script be written in a vectorised manner? Or must a loop be utilised? Any advice would be greatly appreciated. Thanks!
SQL tables represent unordered sets. There is no definition of "previous" value, unless you have a column that specifies the ordering. With such a column, you can use lag():
select t.*
from (select t.*,
lag(value) over (partition by id order by <ordering column>) as prev_value
from t
) t
where prev_value is null or prev_value <= value - 3;
EDIT:
I think I misunderstood what you want to do. You seem to want to start with the first row for each id. Then get the next row that is 3 or higher in value. Then hold onto that value and get the next that is 3 or higher than that. And so on.
You can do this in SQL using a recursive CTE:
with ts as (
select distinct t.id, t.value, dense_rank() over (partition by id order by value) as seqnum
from t
),
cte as (
select id, value, value as grp_value, 1 as within_seqnum, seqnum
from ts
where seqnum = 1
union all
select ts.id, ts.value,
(case when ts.value >= cte.grp_value + 3 then ts.value else cte.grp_value end),
(case when ts.value >= cte.grp_value + 3 then 1 else cte.within_seqnum + 1 end),
ts.seqnum
from cte join
ts
on ts.id = cte.id and ts.seqnum = cte.seqnum + 1
)
select *
from cte
where within_seqnum = 1
order by id, value;
Here is a db<>fiddle.

Smarter GROUP BY

Consider Table like this.
I will call it Test
Id A B C D
1 1 1 8 25
2 1 2 5 35
3 1 3 2 75
4 2 2 2 45
5 3 2 5 26
Now I want rows with max 'Id' Grouped by 'A'
Id A B C D
3 1 3 2 75
4 2 2 2 45
5 3 2 5 26
-
--Work, but I do not want
SELECT MAX(Id), A FROM Test GROUP BY A
--I want but do not work
SELECT MAX(Id), A, B, C, D FROM Test GROUP BY A
--Work but I do not want
SELECT MAX(Id), A, B, C, D FROM Test GROUP BY A, B, C, D
--Work and I want
SELECT old.Id, old.A, new.B, new.C, new.D
FROM(
SELECT
MAX(Id) AS Id, A
FROM
Test GROUP BY A
)old
JOIN Test new
ON old.Id = new.Id
Is there a better way to write last query without join
Most databases support window functions:
select *
from (
select *, row_number() over (partition by a order by id desc) rn
from test
) t
where rn = 1
Most DBMS now support Common Table Expressions (CTE). You can use one.
;with maxa as (
select row_number() over(partition by a order by id desc) rn,
id,a,b,c,d from test
)
select id,a,b,c,d
from maxa
where rn=1

How to get change points in oracle select query?

How can I select change points from this data set
1 0
2 0
3 0
4 100
5 100
6 100
7 100
8 0
9 0
10 0
11 100
12 100
13 0
14 0
15 0
I want this result
4 7 100
11 12 100
This query based on analytic functions lag() and lead() gives expected output:
select id, nid, point
from (
select id, point, p1, lead(id) over (order by id) nid
from (
select id, point,
decode(lag(point) over (order by id), point, 0, 1) p1,
decode(lead(point) over (order by id), point, 0, 2) p2
from test)
where p1<>0 or p2<>0)
where p1=1 and point<>0
SQLFiddle
Edit: You may want to change line 3 in case there only one row for changing point:
...
select id, point, p1,
case when p1=1 and p2=2 then id else lead(id) over (order by id) end nid
...
It would be simple to use ROW_NUMBER analytic function, MIN and MAX.
This is a frequently asked question about finding the interval/series of values and skip the gaps. I like the word given to it as Tabibitosan method by Aketi Jyuuzou.
For example,
SQL> SELECT MIN(A),
2 MAX(A),
3 b
4 FROM
5 ( SELECT a,b, a-Row_Number() over(order by a) AS rn FROM t WHERE b <> 0
6 )
7 GROUP BY rn,
8 b
9 ORDER BY MIN(a);
MIN(A) MAX(A) B
---------- ---------- ----------
4 7 100
11 12 100
SQL>