Assign rownumber in SQL grouped on value and n rows per rownumber - sql

I am trying to generate a report with 3 rows per page for each order number using the following SQL.
As you can see from the results the fields Actual & Expected do not match up.
Any help would be appreciated.
set nocount on
DECLARE #Orders TABLE (Expected int, OrderNumber INT, OrderDetailsNumber int)
Insert into #orders values (0,1,1)
Insert into #orders values (0,1,2)
Insert into #orders values (0,1,3)
Insert into #orders values (1,1,4)
Insert into #orders values (2,2,5)
Insert into #orders values (2,2,6)
Insert into #orders values (2,2,7)
Insert into #orders values (3,2,8)
Insert into #orders values (3,2,9)
select cast(((row_number() over( order by OrderNumber)) -1) /3 as int) as [Actual]
,*
from #orders
Actual Expected OrderNumber OrderDetailsNumber
----------- ----------- ----------- ------------------
0 0 1 1
0 0 1 2
0 0 1 3
1 1 1 4
1 2 2 5
1 2 2 6
2 2 2 7
2 3 2 8
2 3 2 9

Right, after a couple of edits I have the final answer:
SELECT DENSE_RANK() OVER (Order BY OrderNumber, floor(RowNumber/3)) - 1 AS Actual,
Expected,
OrderNumber,
OrderDetailsNumber
FROM
(
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY OrderNumber
ORDER BY OrderDetailsNumber
) - 1 AS RowNumber
FROM #Orders
) RowNumberTable
Gives the result (with extra rows for testing):
Actual Expected OrderNumber OrderDetailsNumber
-------------------- ----------- ----------- ------------------
0 0 1 1
0 0 1 2
0 0 1 3
1 1 1 4
1 1 1 12
2 2 2 5
2 2 2 6
2 2 2 7
3 3 2 8
3 3 2 9
3 4 2 11
4 3 2 27
5 5 3 10
This only works where OrderDetailsNumber is unique such that the result is deterministic.
Edit
I've now got the complete code working, however the dependence on OrderDetailsNumber being in order is very iffy, hopefully you can test and edit as required.
Edit 2
I've put the 'golfed' version in the main answer.
WITH FirstCTE AS
(
SELECT
OrderNumber,
OrderDetailsNumber,
Expected,
ROW_NUMBER() OVER (
PARTITION BY OrderNumber
ORDER BY OrderDetailsNumber
) - 1 AS RowNumber
FROM #Orders
)
, SecondCTE AS
(
SELECT OrderDetailsNumber as odn,
floor(RowNumber/3) as page_for_order_number,
DENSE_RANK() OVER (Order BY OrderNumber, floor(RowNumber/3)) - 1 AS Actual
FROM FirstCTE
)
SELECT c2.page_for_order_number,
c1.RowNumber,
C2.Actual,
c1.Expected,
c1.OrderNumber,
c1.OrderDetailsNumber
FROM FirstCTE AS c1
INNER JOIN SecondCTE AS c2
on c2.odn = c1.OrderDetailsNumber

This strikes me as a bit of a hack, but it works...
Divide the row_number() by 3, and use CEILINGto get the smallest integer greater than or equal to the result of that division.
select row_number() over( order by OrderNumber) as [Actual],
cast (row_number() over(order by ordernumber) as decimal(5,1)) / 3,
CEILING(cast (row_number() over(order by ordernumber) as decimal(5,1)) / 3)as GRPR,
*
from #orders
EDIT: Dang it, can never get results to line up. The 3rd column in the result set is your "page number".
Which yields:
Actual (No column name) PG_NBR Expected OrderNumber OrderDetailsNumber
1 0.333333 1 0 1 1
2 0.666666 1 0 1 2
3 1.000000 1 0 1 3
4 1.333333 2 1 1 4
5 1.666666 2 2 2 5
6 2.000000 2 2 2 6
7 2.333333 3 2 2 7
8 2.666666 3 3 2 8
9 3.000000 3 3 2 9

Related

How to give group numbers based on a condition in sql

Here is my table. I am using Snowflake
CREATE TABLE testx
(
c_order int,
c_condition varchar(10)
);
INSERT INTO testx
VALUES (1, 'and'), (2, 'and'), (3, 'or'), (4, 'and'), (5, 'or');
SELECT * FROM testx
c_order c_condition
--------------------
1 and
2 and
3 or
4 and
5 or
I am trying to write a query which will give me group numbers based on the fact that consecutive 'and's should be with same group number. when 'or' comes, it should increase the group number. by the way, we should maintain the c_order also.
Here is the expected result:
c_order c_condition group_no
-------------------------------
1 and 1
2 and 1
3 or 2
4 and 2
5 or 3
I have tried using dense_rank like this:
SELECT
*,
DENSE_RANK() OVER (ORDER BY c_condition)
FROM testx
But it doesn't return exactly what I want. Can somebody please help?`
Idea is to use same value for C_ORDER as group_no if
C_ORDER is more then previous OR's c_order.
In CTE we only select rows with OR and assign them a
group number using ROW_NUMBER() generator -
Main query -
with temp_cte as
(
select c_order,
case -- to check if 'or' is the first row or not
when (select min(c_order) from testx where c_condition='or') =
(select min(c_order) from testx)
then row_number() over (order by c_order)
else
row_number() over (order by c_order)+1
end rn
from testx, table(generator(rowcount=>1))
where c_condition='or'
)
select x.c_order, x.c_condition,
case
when x.c_order = w.c_order
then w.rn
when x.c_order > (select min(c_order) from temp_cte)
then (select max(rn) from temp_cte where c_order < x.c_order)
else 1
end seq1
from testx x left join temp_cte w
on x.c_order = w.c_order
order by x.c_order;
Output -
C_ORDER
C_CONDITION
SEQ1
1
and
1
2
and
1
3
or
2
4
or
3
5
and
3
6
and
3
7
or
4
8
and
4
9
or
5
For data-set
select * from testx;
C_ORDER
C_CONDITION
1
and
2
and
3
or
4
or
5
and
6
and
7
or
8
and
9
or
Or, just use CONDITIONAL_TRUE_EVENT. Refer
with data(C_ORDER,C_CONDITION) as(
select * from values
(1,'and'),
(2,'and'),
(3,'or'),
(4,'or'),
(5,'and'),
(6,'and'),
(7,'or'),
(8,'and'),
(9,'or')
)select c_order, c_condition,
conditional_true_event(c_condition='or') over (order by c_order) grp
from data;
C_ORDER
C_CONDITION
GRP
1
and
0
2
and
0
3
or
1
4
or
2
5
and
2
6
and
2
7
or
3
8
and
3
9
or
4

Select top rows until value in specific column has appeared twice

I have the following query where I am trying to select all records, ordered by date, until the second time EmailApproved = 1 is found. The second record where EmailApproved = 1 should not be selected.
declare #Test table (id int, EmailApproved bit, Created datetime);
insert into #Test (id, EmailApproved, Created)
values
(1,0,'2011-03-07 03:58:58.423')
, (2,0,'2011-02-21 04:55:52.103')
, (3,0,'2011-01-29 13:24:02.103')
, (4,1,'2010-10-12 14:41:54.217')
, (5,0,'2010-10-12 14:34:15.903')
, (6,0,'2010-10-12 10:10:19.123')
, (7,1,'2010-08-27 12:07:16.073')
, (8,1,'2010-08-25 12:15:49.413')
, (9,0,'2010-08-25 12:14:51.970')
, (10,1,'2010-04-12 16:43:44.777');
select *
, case when Row1 = Row2 then 1 else 0 end Row1EqualRow2
from (
select id, EmailApproved, Created
, row_number() over (partition by EmailApproved order by Created desc) Row1
, row_number() over (order by Created desc) Row2
from #Test
) X
--where Row1 = Row2
order by Created desc;
Which produces the following results:
id EmailApproved Created Row1 Row2 Row1EqualsRow2
1 0 2011-03-07 03:58:58.423 1 1 1
2 0 2011-02-21 04:55:52.103 2 2 1
3 0 2011-01-29 13:24:02.103 3 3 1
4 1 2010-10-12 14:41:54.217 1 4 0
5 0 2010-10-12 14:34:15.903 4 5 0
6 0 2010-10-12 10:10:19.123 5 6 0
7 1 2010-08-27 12:07:16.073 2 7 0
8 1 2010-08-25 12:15:49.413 3 8 0
9 0 2010-08-25 12:14:51.970 6 9 0
10 1 2010-04-12 16:43:44.777 4 10 0
What I actually want is:
id EmailApproved Created Row1 Row2 Row1EqualsRow2
1 0 2011-03-07 03:58:58.423 1 1 1
2 0 2011-02-21 04:55:52.103 2 2 1
3 0 2011-01-29 13:24:02.103 3 3 1
4 1 2010-10-12 14:41:54.217 1 4 0
5 0 2010-10-12 14:34:15.903 4 5 0
6 0 2010-10-12 10:10:19.123 5 6 0
Note: Row, Row2 & Row1EqualsRow2 are just working columns to show my calculations.
Steps:
Create a row number, rn, over all rows in case id is not in sequence.
Create a row number, approv_rn, partitioned by EmailApproved so we know when EmailApproved = 1 for the second time
Use a outer apply to find the row number of the second instance of EmailApproved = 1
In the where clause filter out all rows where the row number is >= the value found in step 3.
If there is 1 or 0 EmailApproved records available then the outer apply will return null, in which case return all available rows.
with test as
(
select *,
rn = row_number() over (order by Created desc),
approv_rn = row_number() over (partition by EmailApproved
order by Created desc)
from #Test
)
select *
from test t
outer apply
(
select x.rn
from test x
where x.EmailApproved = 1
and x.approv_rn = 2
) x
where t.rn < x.rn or x.rn is null
order by t.Created desc;

Group by values that are each multiple of number

This is the table t. I want to group it every time the TotalQty >= 5n (let n = group). i.e. once the TotalQty >= 5n I want to sum together the qty from n-1 to n.
ID DateCreated CurrQty
1 01-20-2020 1
2 01-21-2020 4
3 01-22-2020 3
4 01-23-2020 3
5 01-25-2020 1
6 02-13-2020 3
7 02-16-2020 2
With this query I can get pretty close but I doesn't consider the the previous "valid" TotalQty + 5
select DateCreated, CurrQty, TotalQty
, ceiling(TotalQty/5.0) GroupNum
from
(
select DateCreated, CurrQty
, SUM(CurrQty) OVER (ORDER BY DateCreated ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING) TotalQty
from t
) t2
ID DateCreated CurrQty TotalQty GroupNum
1 01-20-2020 1 1 1
2 01-21-2020 4 5 1
3 01-22-2020 3 8 2
4 01-23-2020 3 11 3
5 01-25-2020 1 12 3
6 02-13-2020 3 15 3
7 02-16-2020 2 17 4
---
How do I get this result?
ID DateCreated CurrQty TotalQty GroupNum
1 01-20-2020 1 1 1
2 01-21-2020 4 5 1
3 01-22-2020 3 8 2
4 01-23-2020 3 11 2 (from ID2, 11 >= (5+5))
5 01-25-2020 1 12 3
6 02-13-2020 3 15 3
7 02-16-2020 2 17 3 (from ID4, 17 >= (11+5))
And so on, the next group would be until 17+5 = 22
You need to use a recursive CTE for this:
with cte as (
select id, datecreated, currqty, currqty as totalqty, 1 as groupnum
from t
where id = 1
union all
select t.id, t.datecreated, t.currqty,
(case when cte.totalqty >= 5 then t.currqty else t.currqty + cte.totalqty end),
(case when cte.totalqty >= 5 then groupnum + 1 else groupnum end)
from cte join
t
on t.id = cte.id + 1
)
select *
from cte;
EDIT:
Hold on. I think the answer is simpler.
select t.*,
1 + ceil((totalqty - qty + 1) / 5.0)
from (select t.*,
sum(qty) over (order by date) as totalqty
from t
) t;

skip consecutive rows after specific value

Note: I have a working query, but am looking for optimisations to use it on large tables.
Suppose I have a table like this:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
I want the id's of all rows with value 1 with one exception:
skip groups with value 1 that directly follow a value 7 within the same session_id.
Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.
The desired output showing the id's:
5
6
7
11
13
I took some inspiration from this post and ended up with this code:
declare #req_data table (
id int primary key identity,
session_id int,
value int
)
insert into #req_data(session_id, value) values (5, 7)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (5, 1) -- ignore this one too
insert into #req_data(session_id, value) values (5, 12)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (6, 7)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (6, 3)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (7, 7)
insert into #req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into #req_data(session_id, value) values (8, 2)
insert into #req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from #req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.
Is there a cleaner, faster way to do this?
The following should work well for this.
WITH
cte_ControlValue AS (
SELECT
rd.id, rd.session_id, rd.value,
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
)
SELECT
cv.id, cv.session_id, cv.value
FROM
cte_ControlValue cv
WHERE
cv.value = 1
AND cv.ControlValue <> 7;
Results...
id session_id value
----------- ----------- -----------
5 5 1
6 5 1
7 5 1
11 6 1
13 8 1
Edit: How and why it works...
The basic premise is taken from Itzik Ben-Gan's "The Last non NULL Puzzle".
Essentially, we are relying 2 different behaviors that most people don't usually think about...
1) NULL + anything = NULL.
2) You can CAST or CONVERT an INT into a fixed length BINARY data type and it will continue to sort as an INT (as opposed to sorting like a text string).
This is easier to see when the intermittent steps are added to the query in the CTE...
SELECT
rd.id, rd.session_id, rd.value,
bv.BinVal,
SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
Results...
id session_id value BinVal SmearedBinVal SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1 5 7 0x0000000100000007 0x0000000100000007 7 7
2 5 1 NULL 0x0000000100000007 7 7
3 5 1 NULL 0x0000000100000007 7 7
4 5 12 0x000000040000000C 0x000000040000000C 12 12
5 5 1 NULL 0x000000040000000C 12 12
6 5 1 NULL 0x000000040000000C 12 12
7 5 1 NULL 0x000000040000000C 12 12
8 6 7 0x0000000800000007 0x0000000800000007 7 7
9 6 1 NULL 0x0000000800000007 7 7
10 6 3 0x0000000A00000003 0x0000000A00000003 3 3
11 6 1 NULL 0x0000000A00000003 3 3
12 7 7 0x0000000C00000007 0x0000000C00000007 7 7
13 8 1 NULL NULL NULL 999
14 8 2 0x0000000E00000002 0x0000000E00000002 2 2
15 8 3 0x0000000F00000003 0x0000000F00000003 3 3
Looking at the BinVal column, we see an 8 byte hex value for all non-[value] = 1 rows and NULLS where [value] = 1... The 1st 4 bytes are the Id (used for ordering) and the 2nd 4 bytes are [value] (used to set the "previous non-1 value" or set the whole thing to NULL.
The 2nd step is to "smear" the non-NULL values into the NULLs using the window framed MAX function, partitioned by session_id and ordered by id.
The 3rd step is to parse out the last 4 bytes and convert them back to an INT data type (SecondHalfAsINT) and deal with any nulls that result from not having any non-1 preceding value (ControlValue).
Since we can't reference a windowed function in the WHERE clause, we have to throw the query into a CTE (a derived table would work just as well) so that we can use the new ControlValue in the where clause.
SELECT CRow.id
FROM #req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM #req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN #req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7
You can use the following query:
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
to get:
id session_id value grp
----------------------------
1 5 7 1
2 5 1 1
3 5 1 1
4 5 12 2
5 5 1 2
6 5 1 2
7 5 1 2
8 6 7 1
9 6 1 1
10 6 3 2
11 6 1 2
12 7 7 1
13 8 1 0
14 8 2 1
15 8 3 2
So, this query detects islands of consecutive 1 records that belong to the same group, as specified by the first preceding row with value <> 1.
You can use a window function once more to detect all 7 islands. If you wrap this in a second cte, then you can finally get the desired result by filtering out all 7 islands:
;with session_islands as (
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
), islands_with_7 as (
select id, grp, value,
count(case when value = 7 then 1 end)
over (partition by session_id, grp) as cnt_7
from session_islands
)
select id
from islands_with_7
where cnt_7 = 0 and value = 1

Oracle SQL query to group consecutive records

I've imported data ("Amount" and "Narration") from a spreadsheet into a table and need help with a query to group consecutive records according to their "Narration", for example:
Expected output:
line_no amount narration calc_group <-Not part of table
----------------------------------------
1 10 Reason 1 1
2 -10 Reason 1 1
3 5 Reason 2 2
4 5 Reason 2 2
5 -10 Reason 2 2
6 -8 Reason 1 3
7 8 Reason 1 3
8 11 Reason 1 3
9 99 Reason 3 4
10 -99 Reason 3 4
I've tried some analytical functions:
select line_no, amount, narration,
first_value (line_no) over
(partition by narration order by line_no) "calc_group"
from test
order by line_no
But that does not work because the Narration of line 6 to 8 is the same as line 1 and 2.
line_no amount narration calc_group
----------------------------------------
1 10 Reason 1 1
2 -10 Reason 1 1
3 5 Reason 2 3
4 5 Reason 2 3
5 -10 Reason 2 3
6 -8 Reason 1 1
7 8 Reason 1 1
8 11 Reason 1 1
9 99 Reason 3 4
10 -99 Reason 3 4
UPDATE
I've managed to do it using lag analytical function and sequences, not very elegant but it works. There should be a better way, please comment!
create or replace function get_next_test_seq
return number
as
begin
return test_seq.nextval;
end get_next_test_seq;
create or replace function get_curr_test_seq
return number
as
begin
return test_seq.currval;
end get_curr_test_seq;
update test
set group_no =
(with cte1
as (select line_no, amount, narration,
lag (narration) over (order by line_no) prev_narration, group_no
from test
order by line_no),
cte2
as (select line_no, amount, narration, group_no,
case when prev_narration is null or prev_narration <> narration then get_next_test_seq else get_curr_test_seq end new_group_no
from cte1)
select new_group_no
from cte2
where cte2.line_no = test.line_no);
UPDATE 2
I'm satisfied with the better accepted answer. Thanks kordiko!
Try this query:
SELECT line_no,
amount,
narration,
SUM( x ) OVER ( ORDER BY line_no
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as calc_group
FROM (
SELECT t.*,
CASE lag( narration ) OVER (order by line_no )
WHEN narration THEN 0
ELSE 1 END x
FROM test t
)
ORDER BY line_no
demo --> http://www.sqlfiddle.com/#!4/6d7aa/9