How can I get the maximum sequential in number range? - sql

I have the specific result above in a select:
1 2
1 3
1 5
1 6
1 9
1 10
1 11
1 13
1 14
1 16
1 18
1 20
1 23
1 24
1 25
What I want to find is the longest increasing-by-one chain that occurs in the results.
For example, I know that 3 is the maximum length sequence in this number range, coming from the last 3 results (23,24,25 being 3 in a row).

A sequence will have the property that the difference between the number and a sequential ordering will be constant. In most dialects of SQL, you have a function called row_number(), which assigns sequential numbers.
We can use this observation to solve your problem:
select (num - seqnum), count(*) as NumInSequence
from (select t.*, row_number() over (order by num) as seqnum
from t
) t
group by (num - seqnum)
This gives every sequence. To get the max, either use max() with a subquery or some version of limit/top. In SQL Server, for instance, you can do:
select top 1 count(*) as NumInSequence
from (select t.*, row_number() over (order by num) as seqnum
from t
) t
group by (num - seqnum)
order by NumInSQuence desc

Using this article as the main query:
http://www.xaprb.com/blog/2006/03/22/find-contiguous-ranges-with-sql/
Just add a column that calculates the difference and select the MAX().
SELECT MAX(seq.end - seq.start)
FROM (
select l.id as start,
(
select min(a.id) as id
from sequence as a
left outer join sequence as b on a.id = b.id - 1
where b.id is null
and a.id >= l.id
) as end,
from sequence as l
left outer join sequence as r on r.id = l.id - 1
where r.id is null;
) AS seq

#Gordon gave a brilliant and more terse answer. However, I think a recursive implementation may be useful as well. Here's a very useful article on recursive CTEs: http://msdn.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
-- This first CTE is unnecessary because you presumably already have
-- your data. But I wanted to include it to make it easier test.
WITH myNumbers AS (
SELECT *
FROM (
VALUES
(2),
(3),
(5),
(6),
(9),
(10),
(11),
(13),
(14),
(16),
(18),
(20),
(23),
(24),
(25)
) AS x (num)
),
-- To get my sequences I recurse until there is no num + 1 in my set
mySequences AS (
-- Anchor member definition: Create the first invocation
SELECT v.num, 0 AS iteration, v.num AS previous, v.num AS start
FROM myNumbers v
UNION ALL
-- Recursive member definition: Recurse until value + 1 does not exist
SELECT s.num + 1, s.iteration + 1 AS iteration, s.num AS previous, s.start
FROM mySequences s -- Notice that we can reference the CTE within itself
JOIN myNumbers v
ON v.num = s.num + 1
)
-- I must increment by 1 because I chose to start my recursion at 0
SELECT MAX(iteration + 1)
FROM mySequences
That recursive query is similar to writing
public int GetSequenceLength(int start, int iteration, int[] myNumbers)
{
if (myNumbers.Contains(start + 1))
{
return GetSequenceLength(start + 1, iteration + 1, myNumbers);
}
return iteration;
}
foreach (var myNumber in myNumbers)
{
var sequenceLength = GetSequenceLength(myNumber, 0, myNumbers) + 1;
Console.WriteLine(myNumber + " : " + sequenceLength);
}

Related

Find missed max and min value in a sequence of numbers

For example, I have a sequence of numbers: {1, 2, 5, 7}.
I need to find the smallest and the biggest one, which are missed in this sequence (min=3 and max=6 for this example). Values can also be negative.
Here is my solution, but it doesn't pass on extra checking database (Wrong number of records (less by 1)), so I can't say what is exactly wrong. I also tried versions with LEFT OUTER JOIN and EXCEPT predicates - same problem. Please, help me to improve my solution.
WITH AA AS (SELECT MAX(Q_ID) MX
FROM UTQ),
BB AS (SELECT MIN(Q_ID) CODE
FROM UTQ
UNION ALL
SELECT CODE + 1
FROM BB
WHERE CODE < (SELECT MX
FROM AA)
)
SELECT MIN(CODE) MIN_RES, MAX(CODE) MAX_RES
FROM BB
WHERE CODE NOT IN (SELECT Q_ID
FROM UTQ)
One method is not exists:
select min(q_id + 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id + 1)
union all
select max(q_id - 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id - 1);
You can also use lead() and lag():
select min(case when next_q_id <> q_id + 1 then q_id + 1 end),
max(case when prev_q_id <> q_id - 1 then q_id - 1 end)
from (select utq.*,
lag(q_id) over (order by q_id) as prev_q_id,
lead(q_id) over (order by q_id) as next_q_id
from utq
) utq;
A tally based method seems like a good approach here. Especially if the sequences are large.
The first CTE summarizes the maximum and minimum q_id's in the test table. The second CTE selects the missing integers by generating the complete sequence (using the fnNumbers tvf) between the minimum and maximum q_id values and comparing WHERE NOT EXISTS to the original sequence. Something like this.
numbers function
create function [dbo].[fnNumbers](
#zero_or_one bit,
#n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where #zero_or_one = 0
union all
select top(#n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
data and query
drop table if exists #seq;
go
create table #seq(
q_id int unique not null);
insert #seq values (1),(2),(5),(7);
with
max_min_cte(max_q, min_q) as (
select max(q_id), min(q_id)
from #seq),
missing_cte(q_id) as (
select mm.min_q+fn.n
from max_min_cte mm
cross apply dbo.fnNumbers(0, mm.max_q-mm.min_q) fn
where not exists (select 1
from #seq s
where (mm.min_q+fn.n)=s.q_id))
select max(q_id) max_missing, min(q_id) min_missing
from missing_cte;
output
max_missing min_missing
6 3
You can try like following using LEAD
SELECT MIN(Q_ID + 1) AS MinValue
,MAX(Q_ID + 1) AS MaxValue
FROM (
SELECT *,LEAD(Q_ID) OVER (ORDER BY Q_ID) NQ_ID
FROM (VALUES (1),(2),(5),(7)) v(Q_ID)
) t
WHERE NQ_ID - Q_ID <> 1

SQL counting the number of ones in sequence

I have the following table and as you can see the ids are not the same. So I can't do group by. I need to count all the ones that are in sequence. Like from id 9 to 13, from id 20 to 23. How i do it?
Here's a solution with LAG and LEAD.
;WITH StackValues AS
(
SELECT
T.*,
PreviousStatus = LAG(T.Status, 1, 0) OVER (ORDER BY T.ID ASC),
NextStatus = LEAD(T.Status, 1, 0) OVER (ORDER BY T.ID ASC)
FROM
#YourTable AS T
),
ValuesToSum AS
(
SELECT
L.*,
ValueToSum = CASE
WHEN L.Status = 1 AND L.PreviousStatus = 1 AND L.NextStatus = 0 THEN 1
ELSE 0 END
FROM
StackValues AS L
)
SELECT
Total = SUM(V.ValueToSum)
FROM
ValuesToSum AS V
LAG will give you the N previous row (N = 1 for this example) while LEAD will give you the N next row (N = 1 for this example). The query generates another column (ValueToSum) based on the previous and next values and uses it's result to sum.

Alphanumeric sort on nvarchar(50) column

I am trying to write a query that will return data sorted by an alphanumeric column, Code.
Below is my query:
SELECT *
FROM <<TableName>>
CROSS APPLY (SELECT PATINDEX('[A-Z, a-z][0-9]%', [Code]),
CHARINDEX('', [Code]) ) ca(PatPos, SpacePos)
CROSS APPLY (SELECT CONVERT(INTEGER, CASE WHEN ca.PatPos = 1 THEN
SUBSTRING([Code], 2,ISNULL(NULLIF(ca.SpacePos,0)-2, 8000)) ELSE NULL END),
CASE WHEN ca.PatPos = 1 THEN LEFT([Code],
ISNULL(NULLIF(ca.SpacePos,0)-0,1)) ELSE [Code] END) ca2(OrderBy2, OrderBy1)
WHERE [TypeID] = '1'
OUTPUT:
FFS1
FFS2
...
FFS12
FFS1.1
FFS1.2
...
FFS1.1E
FFS1.1R
...
FFS12.1
FFS12.2
FFS.12.1E
FFS12.1R
FFS12.2E
FFS12.2R
DESIRED OUTPUT:
FFS1
FFS1.1
FFS1.1E
FFS1.1R
....
FFS12
FFS12.1
FFS12.1E
FFS12.1R
What am I missing or overlooking?
EDIT:
Let me try to detail the table contents a little better. There are records for FFS1 - FFS12. Those are broken into X subs, i.e., FFS1.1 - FFS1.X to FFS12.1 - FFS12.X. The E and the R was not a typo, each sub record has two codes associated with it: FFS1.1E & FFS1.1R.
Additionally I tried using ORDER BY but it sorted as
FFS1
...
FFS10
FFS2
This will work for any count of parts separated by dots. The sorting is alphanumerical for each part separately.
DECLARE #YourValues TABLE(ID INT IDENTITY, SomeVal VARCHAR(100));
INSERT INTO #YourValues VALUES
('FFS1')
,('FFS2')
,('FFS12')
,('FFS1.1')
,('FFS1.2')
,('FFS1.1E')
,('FFS1.1R')
,('FFS12.1')
,('FFS12.2')
,('FFS.12.1E')
,('FFS12.1R')
,('FFS12.2E')
,('FFS12.2R');
--The query
WITH Splittable AS
(
SELECT ID
,SomeVal
,CAST(N'<x>' + REPLACE(SomeVal,'.','</x><x>') + N'</x>' AS XML) AS Casted
FROM #YourValues
)
,Parted AS
(
SELECT Splittable.*
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PartNmbr
,A.part.value(N'text()[1]','nvarchar(max)') AS Part
FROM Splittable
CROSS APPLY Splittable.Casted.nodes(N'/x') AS A(part)
)
,AddSortCrit AS
(
SELECT ID
,SomeVal
,(SELECT LEFT(x.Part + REPLICATE(' ',10),10) AS [*]
FROM Parted AS x
WHERE x.ID=Parted.ID
ORDER BY PartNmbr
FOR XML PATH('')
) AS SortColumn
FROM Parted
GROUP BY ID,SomeVal
)
SELECT ID
,SomeVal
FROM AddSortCrit
ORDER BY SortColumn;
The result
ID SomeVal
10 FFS.12.1E
1 FFS1
4 FFS1.1
6 FFS1.1E
7 FFS1.1R
5 FFS1.2
3 FFS12
8 FFS12.1
11 FFS12.1R
9 FFS12.2
12 FFS12.2E
13 FFS12.2R
2 FFS2
Some explanation:
The first CTE will transform your codes to XML, which allows to address each part separately.
The second CTE returns each part toegther with a number.
The third CTE re-concatenates your code, but each part is padded to a length of 10 characters.
The final SELECT uses this new single-string-per-row in the ORDER BY.
Final hint:
This design is bad! You should not store these values in concatenated strings... Store them in separate columns and fiddle them together just for the output/presentation layer. Doing so avoids this rather ugly fiddle...

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Select Random Numbers from a list

This is my query.
SELECT TOP 2 NUM
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
ORDER BY NEWID()
I'm selecting 2 random numbers from a list but I don't want that these numbers to be continuous
Sometimes the result is
NUM
----
2
3
And I don't want this
Thanks , and sorry for my English u.u
Basically the same as the 2nd approach Gordon uses except it lacks the use of the lag function and therefor will work on SQL-2008.
WITH Data AS(
SELECT *, RowNum = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM sys.objects AS O
),
r AS(
SELECT TOP 1 *, SkipRow = 0
FROM Data
WHERE Data.RowNum = 1
UNION ALL
SELECT d.*, SkipRow = CASE WHEN d.object_id BETWEEN r.object_id -2 AND r.object_id + 2 THEN 1 ELSE 0 END
FROM r
JOIN Data AS D
ON r.RowNum + 1 = D.RowNum
)
SELECT TOP 2 * FROM R
WHERE R.SkipRow = 0
One approach is to select the first number, and then select an appropriate second number:
WITH r AS (
SELECT TOP 1 num
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
ORDER BY NEWId()
)
select num
from r
union all
select top 1 q.num
from qt_pivot q join
r
on q.num not in (r.num, r.num - 1, r.num + 1)
where q.num between 1 and 45
order by newid();
Another approach (if you had SQL Server 2012+) would use lag() to remove any possibilities that do not meet the conditions:
WITH r AS (
SELECT num, row_number() over (order by newid()) as seqnum
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
)
SELECT r.num
FROM (SELECT r.*, LAG(num) OVER (ORDER BY seqnum) as prevnum
FROM r
) r
WHERE prevnum is null or
prevnum not in (num - 1, num + 1);
EDIT:
The first approach doesn't work, because SQL Server always re-evaluates CTEs, and there is not even a hint to fix this problem. Here is an alternative approach, that will ensure that values are not consecutive:
WITH r as (
SELECT (1 + checksum(newid()) * 45) as r1,
(2 + checksum(newid()) * 43) as r2
)
SELECT q.num
FROM QT_PIVOT q
WHERE q.num = r.r1 or
q.num = 1 + (r.r1 + r.r2) % 45;
This calculates a two random numbers. The first is a random position. The second is an allowable offset (hence the "2" and "43") to guarantee that the numbers are not adjacent.