Find the longest sequence of consecutive increasing numbers in SQL - sql

For this example say I have a table with two fields, AREA varchar(30) and OrderNumber INT.
The table has the following data
AREA | OrderNumber
Fontana | 32
Fontana | 42
Fontana | 76
Fontana | 12
Fontana | 3
Fontana | 99
RC | 32
RC | 1
RC | 8
RC | 9
RC | 4
I would like to return
The results I would like to return is for each area the longest length of increasing consecutive values. For Fontana it is 3 (32, 42, 76). For RC it is 2 (8,9)
AREA | LongestLength
Fontana | 3
RC | 2
How would I do this on MS Sql 2005?

One way is to use a recursive CTE that steps over each row. If the row meets the criteria (increasing order number for the same area), you increase the chain length by one. If it doesn't, you start a new chain:
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1
)
, recurse as
(
select rn
, area
, OrderNumber
, 1 as ChainLength
from numbered
where rn = 1
union all
select cur.rn
, cur.area
, cur.OrderNumber
, case
when cur.area = prev.area
and cur.OrderNumber > prev.OrderNumber
then prev.ChainLength + 1
else 1
end
from recurse prev
join numbered cur
on prev.rn + 1 = cur.rn
)
select area
, max(ChainLength)
from recurse
group by
area
Live example at SQL Fiddle.
An alternative way is to use a query to find "breaks", that is, rows that end a sequence of increasing order numbers for the same area. The number of rows between breaks is the length.
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1 t1
)
-- Select rows that break an increasing chain
, breaks as
(
select row_number() over (order by cur.rn) rn2
, cur.rn
, cur.Area
from numbered cur
left join
numbered prev
on cur.rn = prev.rn + 1
where cur.OrderNumber <= prev.OrderNumber
or cur.Area <> prev.Area
or prev.Area is null
)
-- Add a final break after the last row
, breaks2 as
(
select *
from breaks
union all
select count(*) + 1
, max(rn) + 1
, null
from breaks
)
select series_start.area
, max(series_end.rn - series_start.rn)
from breaks2 series_start
join breaks2 series_end
on series_end.rn2 = series_start.rn2 + 1
group by
series_start.area
Live example at SQL Fiddle.

You do not explain why RC's longest sequence does not include 1 while Fontana's does include 32. I take it, the 1 is excluded because it is a decrease: it comes after 32. The Fontana's 32, however, is the first ever item in the group, and I've got two ideas how to explain why it is considered an increase. That's either exactly because it's the group's first item or because it is also positive (i.e. as if coming after 0 and, therefore, an increase).
For the purpose of this answer, I'm assuming the latter, i.e. a group's first item is an increase if it is positive. The below script implements the following idea:
Enumerate the rows in every AREA group in the order of the eventtime column you nearly forgot to mention.
Join the enumerated set to itself to link every row with it's predecessor.
Get the sign of the difference between the row and its preceding value (defaulting the latter to 0). At this point the problem turns into a gaps-and-islands one.
Partition every AREA group by the signs determined in #3 and enumerate every subgroup's rows.
Find the difference between the row numbers from #1 and those found in #4. That would be a criterion to identify individual streaks (together with AREA).
Finally, group the results by AREA, the sign from #3 and the result from #5, count the rows and get the maximum count per AREA.
I implemented the above like this:
WITH enumerated AS (
SELECT
*,
row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
FROM atable
),
signed AS (
SELECT
this.eventtime,
this.AREA,
this.row,
sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
FROM enumerated AS this
LEFT JOIN enumerated AS last
ON this.AREA = last.AREA
AND this.row = last.row + 1
),
partitioned AS (
SELECT
AREA,
sgn,
grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
FROM signed
)
SELECT DISTINCT
AREA,
LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
AREA,
grp
;
A SQL Fiddle demo can be found here.

You can do some math by ROW_NUMBER() to figure out where you have consecutive items.
Here's the code sample:
;WITH rownums AS
(
SELECT [area],
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1,
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
FROM SomeTable
),
differences AS
(
SELECT [area],
[calc] = rid1 - rid2
FROM rownums
),
summation AS
(
SELECT [area], [calc], COUNT(*) AS lengths
FROM differences
GROUP BY [area], [calc]
)
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
ON differences.[calc] = summation.[calc]
AND differences.area = calc.area
GROUP BY [area]
So if I do one set of row numbers ordered by my ordernumber and another set of row numbers by my event time, the difference between those two numbers will always be the same, so long as their order is the same.
You can then get a count grouped by those differences and then pull the largest count to get what you need.
EDIT: ...
Ignore the first edit, what I get for rushing.

Related

Split column value into separate columns based on length

I have multiple comma-separated values in one column with a size up to 20000 characters, and I want to split that column into one column but its based on character values 2000 (like into one new column it will take 2000 character and if length is grater than 2000 then its will be in second column like this).
When it's comma-separated value goes into first new column, then it should be meaningful like it should be based on , and up to 2000 characters only like this.
I have done from row level value to column level only but its should be 2000 character and based on comma
Could you please help me with this ?
siddesh, although this question lacks of everything I want to point some things out and help you (as you are an unexperienced SO-user):
First I set up a minimal reproducible exampel. This is on you the next time.
I'll start with a declared table with some rows inserted.
We on SO can copy'n'paste this into our environment which makes it easy to answer.
DECLARE #tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO #tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');
--This is, what you actually need:
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);
The result should be stored within a physical table, where the fkID points to the ID of the original row and the fragmentPosition stores the order. fkID and fragmentPosition should be a combined unique key.
If you really want to do, what you are suggesting in your question (not recommended!) you can try something along this:
DECLARE #maxPerColumn INT=75; --You can set the portion's max size, in your case 2000.
WITH cte AS
(
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
SELECT *
,countPerColumn = 1
,columnCounter = 1
,sumLength = LEN(fragmentContent)
,growingString = CAST(fragmentContent AS NVARCHAR(MAX))
FROM cte WHERE fragmentPosition=0
UNION ALL
SELECT r.fkID
,cte.fragmentPosition
,cte.fragmentContent
,cte.framgentLength
,CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE r.countPerColumn + 1 END
,r.columnCounter + CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE 0 END
,CASE WHEN A.newSumLength>#maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
,CASE WHEN A.newSumLength>#maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
FROM cte
INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition+1=cte.fragmentPosition
CROSS APPLY(VALUES(r.sumLength+LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES
fkID
,growingString
,LEN(growingString)
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );
The result
fkID pos Content
1 2 1 this is long text, 2 some second fragment, 3 third fragment
1 4 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2 0 1 this is other long text
2 1 2 some second fragment to show that this works with tabular data
2 3 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2 4 5 halksjfh asdkf
The idea in short:
The first cte does the splitting (as above)
The recursive cte will iterate down the string and do the magic.
The final SELECT uses a hack with TOP 1 WITH TIES together with an ORDER BY ROW_NUMBER() OVER(...). This will return the highest intermediate result only.
Hint: Don't do this...
UPDATE
Just for fun:
You can replace the final SELECT with this
,getPortions AS
(
SELECT TOP 1 WITH TIES
fkID
,fragmentPosition
,growingString
,LEN(growingString) portionLength
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
SELECT fkID
,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
,growingString
FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;
This will return exactly what you are asking for.
But - as said before - this is against all rules of best practice...

Count Number SQL where not exist

Is there an option to get numbers where not in exist in the Table?
Example
product_number
1
2
3
5
I want only the number 4 as result, because it's a free product number.
The Problem is with connect by rownum doesn't work, because out of memory.
You can use lead():
select coalesce(min(product_number), 0) + 1
from (select t.*, lead(product_number) over (order by product_number) as next_pn
from t
) t
where next_pn <> product_number + 1;
Oracle will use an index on (product_number) if one is available.
Something like this?
Select Rownum r -- Generate all numbers from 1 to max
From dual
Connect By Rownum <= (select max(product_number) from products)
where r not in
(
select product_number from products
)
Since you didn't provide sample of real product numbers (apparently) but claim connect by runs out of memory could imply you have very large product numbers.
So restricting the numbers that need checked needs to be reduced to the range known to potentially existing, that being the min and max product numbers. Once that's known we can generate the the index on product number to see if the specific number exists, or in this doesn't exist. So:
with lh as
(select min(product_number) l
, max(product_number) h
from products
)
, range (pn) as
(select product_number pn
from products
where product_number = (select l from lh)
union all
select pn + 1
from range
where pn + 1 <= (select h from lh)
)
select pn available_product_number
from range
where not exists
( select null
from products
where pn = product_number
)
order by pn;

Find Range in Sequence

I have table #NumberRange. It has a start and end number. I have to find out ranges are in sequence
Declare #NumberRange table
(
Id int primary key,
ItemId int,
[start] int,
[end] int
)
INSERT INTO #NumberRange
VALUES
(1,1,1,10),
(2,1,11,20),
(3,1,21,30),
(4,1,40,50),
(5,1,51,60),
(6,1,61,70),
(7,1,80,90),
(8,1,100,200)
Expected Result:
Note: Result Column calculated from if any continuous numbers i.e 1 to 10 ,11-20,21-30 are continuous numbers. So result column updated as 1 and then 41-50 not continuous numbers (because previous row end with 30 next row start with 40) that is why result column will be 2 and it continuous..
In 4th end with 50 and 5 th start with 51 continuous then result would be 3 because I have differentiate with Result 1...
I have used lead functions and expected result not came,..please can someone help me get the result?
Workaround:
select
*,
[Diff] = [Lead] - [end],
[Result] = Rank() OVER (PARTITION BY ([Lead] - [end]) ORDER BY Id)
from
(select
id, [start], [end], LEAD([start]) over (order by id) as [Lead]
from
#NumberRange) Z
order by
id
Use lag() to determine where the groups start. Then a cumulative sum to enumerate them:
select nr.*,
sum(case when startr = prev_endr + 1 then 0 else 1 end) over (partition by itemid order by startr) as grp
from (select nr.*, lag(endr) over (partition by itemid order by startr) as prev_endr
from numberrange nr
) nr;
Here is a db<>fiddle.
This answer assumes that ids 4 and 5 are continuous, which makes sense based on the rest of the question.
Your expected result is not clear and the questions which are asked in the comments I have too, but I think what you want to do is something similar to
select N1.*,case when N1.[end]+1=N2.[start] then 1 else 2 end Result from #NumberRange N1 inner join #NumberRange N2 on N1.Id=N2.Id-1

SQL Implementation with Sliding-Window functions or Recursive CTEs

I have a problem that it's very easy to be solved in C# code for example, but I have no idea how to write in a SQL query with Recursive CTE-s or Sliding-Windows functions.
Here is the situation: let's say I have a table with 3 columns (ID, Date, Amount), and here is some data:
ID Date Amount
-----------------------
1 01.01.2016 -500
2 01.02.2016 1000
3 01.03.2016 -200
4 01.04.2016 300
5 01.05.2016 500
6 01.06.2016 1000
7 01.07.2016 -100
8 01.08.2016 200
The result I want to get from the table is this (ID, Amount .... Order By Date):
ID Amount
-----------------------
2 300
4 300
5 500
6 900
8 200
The idea is to distribute the amounts into installments, for each client separately, but the thing is when negative amount comes into play you need to remove amount from the last installment. I don't know how clear I am, so here is an example:
Let's say I have 3 Invoices for one client with amounts 500, 200, -300.
If i start distribute these Invoices, first i distribute the amount 500, then 200. But when i come to the third one -300, then i need to remove from the last Invoice. In other words 200 - 300 = -100, so the amount from second Invoice will disappear, but there are still -100 that needs to be substracted from first Invoice. So 500 - 100 = 400. The result i need is result with one row (first invoice with amount 400)
Another example when the first invoice is with negative amount (-500, 300, 500).
In this case, the first (-500) invoice will make the second disappear and another 200 will be substracted from the third. So the result will be: Third Invoice with amount 300.
This is something like Stack implementation in programming language, but i need to make it with sliding-window functions in SQL Server.
I need an implementation with Sliding Function or Recursive CTEs.
Not with cycles ...
Thanks.
Ok, think this is what you want. there are two recursive queries. One for upward propagation and the second one for the downward propagation.
with your_data_rn as
(
select *, row_number() over (order by date) rn
from your_data
), rcte_up(id, date, ammount, rn, running) as
(
select *, ammount as running
from your_data_rn
union all
select d.*,
d.ammount + rcte_up.running
from your_data_rn d
join rcte_up on rcte_up.running < 0 and d.rn = rcte_up.rn - 1
), data2 as
(
select id, date, min(running) ammount,
row_number() over (order by date) rn
from rcte_up
group by id, date, rn
having min(running) > 0 or rn = 1
), rcte_down(id, date, ammount, rn, running) as
(
select *, ammount as running
from data2
union all
select d.*, d.ammount + rcte_down.running
from data2 d
join rcte_down on rcte_down.running < 0 and d.rn = rcte_down.rn + 1
)
select id, date, min(running) ammount
from rcte_down
group by id, date
having min(running) > 0
demo
I can imagine that you use just the upward propagation and the downward propagation of the first row is done in some procedural language. Downward propagation is one scan through few first rows, therefore, the recursive query may be a hammer on a mosquito.
I add client ID in table for more general solution. Then I implemented the stack stored as XML in query field. And emulated a program cycle with Recursive-CTE:
with Data as( -- Numbering rows for iteration on CTE
select Client, id, Amount,
cast(row_number() over(partition by Client order by Date) as int) n
from TabW
),
CTE(Client, n, stack) as( -- Recursive CTE
select Client, 1, cast(NULL as xml) from Data where n=1
UNION ALL
select D.Client, D.n+1, (
-- Stack operations to process current row (D)
select row_number() over(order by n) n,
-- Use calculated amount in first positive and oldest stack cell
-- Else preserve value stored in stack
case when n=1 or (n=0 and last=1) then new else Amount end Amount,
-- Set ID in stack cell for positive and new data
case when n=1 and D.Amount>0 then D.id else id end id
from (
select Y.Amount, Y.id, new,
-- Count positive stack entries
sum(case when new<=0 or (n=0 and Amount<0) then 0 else 1 end) over (order by n) n,
row_number() over(order by n desc) last -- mark oldest stack cell by 1
from (
select X.*,sum(Amount) over(order by n) new
from (
select case when C.stack.value('(/row/#Amount)[1]','int')<0 then -1 else 0 end n,
D.Amount, D.id -- Data from new record
union all -- And expand current stack in XML to table
select node.value('#n','int') n, node.value('#Amount','int'), node.value('#id','int')
from C.stack.nodes('//row') N(node)
) X
) Y where n>=0 -- Suppress new cell if the stack contained a negative amount
) Z
where n>0 or (n=0 and last=1)
for xml raw, type
)
from Data D, CTE C
where D.n=C.n and D.Client=C.Client
) -- Expand stack into result table
select CTE.Client, node.value('#id','int') id, node.value('#Amount','int')
from CTE join (select Client, max(n) max_n from Data group by Client) X on CTE.Client=X.Client and CTE.n=X.max_n+1
cross apply stack.nodes('//row') N(node)
order by CTE.Client, node.value('#n','int') desc
Test on sqlfiddle.com
I think this method is slower than #RadimBača. And it is shown to demonstrate the possibilities of implementing a sequential algorithm on SQL.

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

Categories