I've imported data ("Amount" and "Narration") from a spreadsheet into a table and need help with a query to group consecutive records according to their "Narration", for example:
Expected output:
line_no amount narration calc_group <-Not part of table
----------------------------------------
1 10 Reason 1 1
2 -10 Reason 1 1
3 5 Reason 2 2
4 5 Reason 2 2
5 -10 Reason 2 2
6 -8 Reason 1 3
7 8 Reason 1 3
8 11 Reason 1 3
9 99 Reason 3 4
10 -99 Reason 3 4
I've tried some analytical functions:
select line_no, amount, narration,
first_value (line_no) over
(partition by narration order by line_no) "calc_group"
from test
order by line_no
But that does not work because the Narration of line 6 to 8 is the same as line 1 and 2.
line_no amount narration calc_group
----------------------------------------
1 10 Reason 1 1
2 -10 Reason 1 1
3 5 Reason 2 3
4 5 Reason 2 3
5 -10 Reason 2 3
6 -8 Reason 1 1
7 8 Reason 1 1
8 11 Reason 1 1
9 99 Reason 3 4
10 -99 Reason 3 4
UPDATE
I've managed to do it using lag analytical function and sequences, not very elegant but it works. There should be a better way, please comment!
create or replace function get_next_test_seq
return number
as
begin
return test_seq.nextval;
end get_next_test_seq;
create or replace function get_curr_test_seq
return number
as
begin
return test_seq.currval;
end get_curr_test_seq;
update test
set group_no =
(with cte1
as (select line_no, amount, narration,
lag (narration) over (order by line_no) prev_narration, group_no
from test
order by line_no),
cte2
as (select line_no, amount, narration, group_no,
case when prev_narration is null or prev_narration <> narration then get_next_test_seq else get_curr_test_seq end new_group_no
from cte1)
select new_group_no
from cte2
where cte2.line_no = test.line_no);
UPDATE 2
I'm satisfied with the better accepted answer. Thanks kordiko!
Try this query:
SELECT line_no,
amount,
narration,
SUM( x ) OVER ( ORDER BY line_no
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as calc_group
FROM (
SELECT t.*,
CASE lag( narration ) OVER (order by line_no )
WHEN narration THEN 0
ELSE 1 END x
FROM test t
)
ORDER BY line_no
demo --> http://www.sqlfiddle.com/#!4/6d7aa/9
Related
Query: SELECT (row_number() OVER ()) as grp, * from tbl
Edit: the rows below are returned by a pgrouting shortest path function and it does have a sequence.
seq grp id
1 1 8
2 2 3
3 3 2
4 4 null
5 5 324
6 6 82
7 7 89
8 8 null
9 9 1
10 10 2
11 11 90
12 12 null
How do I make it so that the grp column is only incremented after a null value on id - and also keep the same order of rows
seq grp id
1 1 8
2 1 3
3 1 2
4 1 null
5 2 324
6 2 82
7 2 89
8 2 null
9 3 1
10 3 2
11 3 90
12 3 null
demo:db<>fiddle
Using a cumulative SUM aggregation is a possible approach:
SELECT
SUM( -- 2
CASE WHEN id IS NULL THEN 1 ELSE 0 END -- 1
) OVER (ORDER BY seq) as grp,
id
FROM mytable
If the current (ordered!) value is NULL, then make it 1, else 0. Now you got a bunch of zeros, delimited by a 1 at each NULL record. If you'd summerize these values cumulatively, at each NULL record, the sum increased.
Execution of the cumulative SUM() using window functions
This yields:
0 8
0 3
0 2
1 null
1 324
1 82
1 89
2 null
2 1
2 2
2 90
3 null
As you can see, the groups start with the NULL records, but you are expecting to end it.
This can be achieved by adding another window function: LAG(), which moves the records to the next row:
SELECT
SUM(
CASE WHEN next_id IS NULL THEN 1 ELSE 0 END
) OVER (ORDER BY seq) as grp,
id
FROM (
SELECT
LAG(id) OVER (ORDER BY seq) as next_id,
seq,
id
FROM mytable
) s
The result is your expected one:
1 8
1 3
1 2
1 null
2 324
2 82
2 89
2 null
3 1
3 2
3 90
3 null
I have a list of stock transactions and I am using Over(Partition By) to calculate the running totals (positions) by security. Over time a holding in a particular security can be long, short or flat. I am trying to find an efficient way to extract only the transactions relating to the current position for each security.
I have created a simplified sqlfiddle to show what I have so far. The cte query generates the running total for each security (code_id) and identifies when the holdings are long (L), short (s) or flat (f). What I need is to group and number matching contiguous values of L, S or F for each code_id.
What I have so far is this:
; WITH RunningTotals as
(
SELECT
*,
RunningTotal = sum(qty) OVER (Partition By code_id Order By id)
FROM
TradeData
), LongShortFlat as
(
SELECT
*,
LSF = CASE
WHEN RunningTotal > 0 THEN 'L'
WHEN RunningTotal < 0 THEN 'S'
ELSE 'F'
END
FROM
RunningTotals
)
SELECT
*
FROM
LongShortFlat r
I think what I need to do is create a GroupNum column by applying a row_number for each group of L, S and F within each code_id so the results look like this:
id code_id qty RunningTotal LSF GroupNum
1 1 5 5 L 1
2 1 2 7 L 1
3 1 7 14 L 1
4 1 -3 11 L 1
5 1 -5 6 L 1
6 1 -6 0 F 2
7 1 5 5 L 3
8 1 5 10 L 3
9 1 -2 8 L 3
10 1 -4 4 L 3
11 2 5 5 L 1
12 2 3 8 L 1
13 2 -4 4 L 1
14 2 -2 2 L 1
15 2 -2 0 F 2
16 2 6 6 L 3
17 2 -5 1 L 3
18 2 -5 -4 S 4
19 2 2 -2 S 4
20 2 4 2 L 5
21 2 -5 -3 S 6
22 2 -2 -5 S 6
23 3 5 5 L 1
24 3 2 7 L 1
25 3 1 8 L 1
I am struggling to generate the GroupNum column.
Thanks in advance for your help.
[Revised]
Sorry about that, I read your question too quickly. I came up with a solution using a recursive common table expression (below), then saw that you've worked out a solution using LAG. I'll post my revised query anyway, for posterity. Either way, the resulting query is (imho) pretty ugly.
;WITH cteBaseAgg
as (
-- Build the "sum increases over time" data
SELECT
row_number() over (partition by td.code_id order by td.code_id, td.Id) RecurseKey
,td.code_id
,td.id
,td.qty
,sum(tdPrior.qty) RunningTotal
,case
when sum(tdPrior.qty) > 0 then 'L'
when sum(tdPrior.qty) < 0 then 'S'
else 'F'
end LSF
from dbo.TradeData td
inner join dbo.TradeData tdPrior
on tdPrior.code_id = td.code_id -- All for this code_id
and tdPrior.id <= td.Id -- For this and any prior Ids
group by
td.code_id
,td.id
,td.qty
)
,cteRecurse
as (
-- "Set" the first row for each code_id
SELECT
RecurseKey
,code_id
,id
,qty
,RunningTotal
,LSF
,1 GroupNum
from cteBaseAgg
where RecurseKey = 1
-- For each succesive row in each set, check if need to increment GroupNum
UNION ALL SELECT
agg.RecurseKey
,agg.code_id
,agg.id
,agg.qty
,agg.RunningTotal
,agg.LSF
,rec.GroupNum + case when rec.LSF = agg.LSF then 0 else 1 end
from cteBaseAgg agg
inner join cteRecurse rec
on rec.code_id = agg.code_id
and agg.RecurseKey - 1 = rec.RecurseKey
)
-- Show results
SELECT
id
,code_id
,qty
,RunningTotal
,LSF
,GroupNum
from cteRecurse
order by
code_id
,id
Sorry for making this question a bit more complicated than it needed to be but for the sake of closure I have found a solution using the lag function.
In order to achieve what I wanted I continued my cte above with the following:
, a as
(
SELECT
*,
Lag(LSF, 1, LSF) OVER(Partition By code_id ORDER BY id) AS prev_LSF,
Lag(code_id, 1, code_id) OVER(Partition By code_id ORDER BY id) AS prev_code
FROM
LongShortFlat
), b as
(
SELECT
id,
LSF,
code_id,
Sum(CASE
WHEN LSF <> prev_LSF AND code_id = prev_code
THEN 1
ELSE 0
END) OVER(Partition By code_id ORDER BY id) AS grp
FROM
a
)
select * from b order by id
Here is the updated sqlfiddle.
I want to take the max value of each partitioned block and find the correlating id(in the same row). I then want to use the singular show_id as the 'winner' and bool_flag all rows in the same partition with a matching show_id.
I am having trouble implementing this, especially the window function-- I have hit multiple issues saying that the subquery is not supported, or "must appear in the GROUP BY clause or be used in an aggregate function sql"
subQ1 as (
select subQ0.*,
case
**when show_id =
(select id from (select show_id, max(rn_max_0)
over (partition by tv_id, show_id)))**
then 1
else 0
end as winner_flag
from subQ0
)
What I have:
tv_id show_id partition_count
1 42 1
1 42 2
1 42 3
1 7 1
2 12 1
2 12 2
2 12 3
2 27 1
What I want:
tv_id show_id partition_count flag
1 42 1 1
1 42 2 1
1 42 3 1
1 7 1 0
2 12 1 1
2 12 2 1
2 12 3 1
2 27 1 0
Because tv_id 1 has the most connections to show_id 42, those rows get flagged.
Ideally, something similar to SQL select only rows with max value on a column, but the partitions and grouping have led to issues. This dataset also has billions of rows so a union would be a nightmare.
Thanks in advance!
For each tv_id, you seem to want the show_id that appears the most. If so:
select s.*,
(case when cnt = max(cnt) over (partition by tv_id)
then 1 else 0
end) as flag
from (select s.*, count(*) over (partition by tv_id, show_id) as cnt
from subQ0 s
) s;
Can I somehow assign a new group to a row when a value in a column changes in T-SQL?
I would be grateful if you can provide solution that will work on unlimited repeating numbers without CTE and functions. I made a solution that work in sutuation with 100 consecutive identical numbers(with
coalesce(lag()over(), lag() over(), lag() over() ) - it is too bulky
but can not make a solution for a case with unlimited number of consecutive identical numbers.
Data
id somevalue
1 0
2 1
3 1
4 0
5 0
6 1
7 1
8 1
9 0
10 0
11 1
12 0
13 1
14 1
15 0
16 0
Expected
id somevalue group
1 0 1
2 1 2
3 1 2
4 0 3
5 0 3
6 1 4
7 1 4
8 1 4
9 0 5
10 0 5
11 1 6
12 0 7
13 1 8
14 1 8
15 0 9
16 0 9
If you just want a group identifier, you can use:
select t.*,
min(id) over (partition by some_value, seqnum - seqnum_1) as grp
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by somevalue order by id) as sequm_1
from t
) t;
If you want them enumerated . . . well, you can enumerate the id above using dense_rank(). Or you can use lag() and a cumulative sum:
select t.*,
sum(case when some_value = prev_sv then 0 else 1 end) over (order by id) as grp
from (select t.*,
lag(somevalue) over (order by id) as prev_sv
from t
) t;
Here's a different approach:
First I created a view to provide the group increment on each row:
create view increments as
select
n2.id,n2.somevalue,
case when n1.somevalue=n2.somevalue then 0 else 1 end as increment
from
(select 0 as id,1 as somevalue union all select * from mytable) n1
join mytable n2
on n2.id = n1.id+1
Then I used this view to produce the group values as cumulative sums of the increments:
select id, somevalue,
(select sum(increment) from increments i1 where i1.id <= i2.id)
from increments i2
How can I select change points from this data set
1 0
2 0
3 0
4 100
5 100
6 100
7 100
8 0
9 0
10 0
11 100
12 100
13 0
14 0
15 0
I want this result
4 7 100
11 12 100
This query based on analytic functions lag() and lead() gives expected output:
select id, nid, point
from (
select id, point, p1, lead(id) over (order by id) nid
from (
select id, point,
decode(lag(point) over (order by id), point, 0, 1) p1,
decode(lead(point) over (order by id), point, 0, 2) p2
from test)
where p1<>0 or p2<>0)
where p1=1 and point<>0
SQLFiddle
Edit: You may want to change line 3 in case there only one row for changing point:
...
select id, point, p1,
case when p1=1 and p2=2 then id else lead(id) over (order by id) end nid
...
It would be simple to use ROW_NUMBER analytic function, MIN and MAX.
This is a frequently asked question about finding the interval/series of values and skip the gaps. I like the word given to it as Tabibitosan method by Aketi Jyuuzou.
For example,
SQL> SELECT MIN(A),
2 MAX(A),
3 b
4 FROM
5 ( SELECT a,b, a-Row_Number() over(order by a) AS rn FROM t WHERE b <> 0
6 )
7 GROUP BY rn,
8 b
9 ORDER BY MIN(a);
MIN(A) MAX(A) B
---------- ---------- ----------
4 7 100
11 12 100
SQL>