Find missed max and min value in a sequence of numbers - sql

For example, I have a sequence of numbers: {1, 2, 5, 7}.
I need to find the smallest and the biggest one, which are missed in this sequence (min=3 and max=6 for this example). Values can also be negative.
Here is my solution, but it doesn't pass on extra checking database (Wrong number of records (less by 1)), so I can't say what is exactly wrong. I also tried versions with LEFT OUTER JOIN and EXCEPT predicates - same problem. Please, help me to improve my solution.
WITH AA AS (SELECT MAX(Q_ID) MX
FROM UTQ),
BB AS (SELECT MIN(Q_ID) CODE
FROM UTQ
UNION ALL
SELECT CODE + 1
FROM BB
WHERE CODE < (SELECT MX
FROM AA)
)
SELECT MIN(CODE) MIN_RES, MAX(CODE) MAX_RES
FROM BB
WHERE CODE NOT IN (SELECT Q_ID
FROM UTQ)

One method is not exists:
select min(q_id + 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id + 1)
union all
select max(q_id - 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id - 1);
You can also use lead() and lag():
select min(case when next_q_id <> q_id + 1 then q_id + 1 end),
max(case when prev_q_id <> q_id - 1 then q_id - 1 end)
from (select utq.*,
lag(q_id) over (order by q_id) as prev_q_id,
lead(q_id) over (order by q_id) as next_q_id
from utq
) utq;

A tally based method seems like a good approach here. Especially if the sequences are large.
The first CTE summarizes the maximum and minimum q_id's in the test table. The second CTE selects the missing integers by generating the complete sequence (using the fnNumbers tvf) between the minimum and maximum q_id values and comparing WHERE NOT EXISTS to the original sequence. Something like this.
numbers function
create function [dbo].[fnNumbers](
#zero_or_one bit,
#n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where #zero_or_one = 0
union all
select top(#n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
data and query
drop table if exists #seq;
go
create table #seq(
q_id int unique not null);
insert #seq values (1),(2),(5),(7);
with
max_min_cte(max_q, min_q) as (
select max(q_id), min(q_id)
from #seq),
missing_cte(q_id) as (
select mm.min_q+fn.n
from max_min_cte mm
cross apply dbo.fnNumbers(0, mm.max_q-mm.min_q) fn
where not exists (select 1
from #seq s
where (mm.min_q+fn.n)=s.q_id))
select max(q_id) max_missing, min(q_id) min_missing
from missing_cte;
output
max_missing min_missing
6 3

You can try like following using LEAD
SELECT MIN(Q_ID + 1) AS MinValue
,MAX(Q_ID + 1) AS MaxValue
FROM (
SELECT *,LEAD(Q_ID) OVER (ORDER BY Q_ID) NQ_ID
FROM (VALUES (1),(2),(5),(7)) v(Q_ID)
) t
WHERE NQ_ID - Q_ID <> 1

Related

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Select Random Numbers from a list

This is my query.
SELECT TOP 2 NUM
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
ORDER BY NEWID()
I'm selecting 2 random numbers from a list but I don't want that these numbers to be continuous
Sometimes the result is
NUM
----
2
3
And I don't want this
Thanks , and sorry for my English u.u
Basically the same as the 2nd approach Gordon uses except it lacks the use of the lag function and therefor will work on SQL-2008.
WITH Data AS(
SELECT *, RowNum = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM sys.objects AS O
),
r AS(
SELECT TOP 1 *, SkipRow = 0
FROM Data
WHERE Data.RowNum = 1
UNION ALL
SELECT d.*, SkipRow = CASE WHEN d.object_id BETWEEN r.object_id -2 AND r.object_id + 2 THEN 1 ELSE 0 END
FROM r
JOIN Data AS D
ON r.RowNum + 1 = D.RowNum
)
SELECT TOP 2 * FROM R
WHERE R.SkipRow = 0
One approach is to select the first number, and then select an appropriate second number:
WITH r AS (
SELECT TOP 1 num
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
ORDER BY NEWId()
)
select num
from r
union all
select top 1 q.num
from qt_pivot q join
r
on q.num not in (r.num, r.num - 1, r.num + 1)
where q.num between 1 and 45
order by newid();
Another approach (if you had SQL Server 2012+) would use lag() to remove any possibilities that do not meet the conditions:
WITH r AS (
SELECT num, row_number() over (order by newid()) as seqnum
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
)
SELECT r.num
FROM (SELECT r.*, LAG(num) OVER (ORDER BY seqnum) as prevnum
FROM r
) r
WHERE prevnum is null or
prevnum not in (num - 1, num + 1);
EDIT:
The first approach doesn't work, because SQL Server always re-evaluates CTEs, and there is not even a hint to fix this problem. Here is an alternative approach, that will ensure that values are not consecutive:
WITH r as (
SELECT (1 + checksum(newid()) * 45) as r1,
(2 + checksum(newid()) * 43) as r2
)
SELECT q.num
FROM QT_PIVOT q
WHERE q.num = r.r1 or
q.num = 1 + (r.r1 + r.r2) % 45;
This calculates a two random numbers. The first is a random position. The second is an allowable offset (hence the "2" and "43") to guarantee that the numbers are not adjacent.

A query to SELECT a number range

I am having the following problem.
I would like to select a currency value from a database which will act as a default value on the top result of the query (this part is already done and is not a part of my main problem).
I want to use a query that kind of looks like this:
SELECT valkurs, valkurs 'vk'
FROM xx
WHERE valkod='EUR' AND foretagkod=300
UNION
--(My problem is that i can't find out what to write here)
My problem is that I would like to attach a range of values from 1.0 to 20.0 with 0.1 in incremental steps to the original query mentioned above.
An example output can look like this:
8.88, 8.88
1.0, 1.0
1.1, 1.1
1.2, 1.2
...
20.0, 20.0
Is it possible anyhow?
Due to implementation issues this has to be done in a query...
You can use the system table Master..spt_values to generate a sequential list:
SELECT Number = CAST(1 + (Number / 10.0) AS DECIMAL(4, 1)),
Number2 = CAST(1 + (Number / 10.0) AS DECIMAL(4, 1))
FROM Master..spt_values
WHERE Type = 'P'
AND Number BETWEEN 0 AND 200
So to combine in the correct order with your current query I would use:
SELECT valkurs, VK = valkurs
FROM ( SELECT valkurs, SortOrder = 0
FROM xx
WHERE valkod = 'EUR'
AND foretagkod = 300
UNION ALL
SELECT valkurs = CAST(1 + (Number / 10.0) AS DECIMAL(4, 1)), SortOrder = 1
FROM Master..spt_values
WHERE Type = 'P'
AND Number BETWEEN 0 AND 190
) T
ORDER BY T.SortOrder, t.valkurs;
ADDENDUM
There are some that do not advocate the use of Master..spt_values due to the fact that it is not documented, so it could be removed from future versions of sql-server. If this is a major concern you can use ROW_NUMBER() to generate a sequential list (using any table with enough rows as the source, I have gone for sys.all_objects):
SELECT valkurs, VK = valkurs,
FROM ( SELECT valkurs, SortOrder = 0
FROM xx
WHERE valkod = 'EUR'
AND foretagkod = 300
UNION ALL
SELECT TOP 191
valkurs = 1 + ((ROW_NUMBER() OVER(ORDER BY object_id) - 1) / 10.0),
SortOrder = 1
FROM sys.all_objects
) T
ORDER BY T.SortOrder, t.valkurs;
Old, but I think some people will benefit from my answer, which is a much better implementation than the accepted answer
WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2), -- 10*100
numbers as (SELECT n = ROW_NUMBER() OVER (ORDER BY n)/10.0
FROM e3)
select n, n from numbers
where n between 1 and 20

How can I get the maximum sequential in number range?

I have the specific result above in a select:
1 2
1 3
1 5
1 6
1 9
1 10
1 11
1 13
1 14
1 16
1 18
1 20
1 23
1 24
1 25
What I want to find is the longest increasing-by-one chain that occurs in the results.
For example, I know that 3 is the maximum length sequence in this number range, coming from the last 3 results (23,24,25 being 3 in a row).
A sequence will have the property that the difference between the number and a sequential ordering will be constant. In most dialects of SQL, you have a function called row_number(), which assigns sequential numbers.
We can use this observation to solve your problem:
select (num - seqnum), count(*) as NumInSequence
from (select t.*, row_number() over (order by num) as seqnum
from t
) t
group by (num - seqnum)
This gives every sequence. To get the max, either use max() with a subquery or some version of limit/top. In SQL Server, for instance, you can do:
select top 1 count(*) as NumInSequence
from (select t.*, row_number() over (order by num) as seqnum
from t
) t
group by (num - seqnum)
order by NumInSQuence desc
Using this article as the main query:
http://www.xaprb.com/blog/2006/03/22/find-contiguous-ranges-with-sql/
Just add a column that calculates the difference and select the MAX().
SELECT MAX(seq.end - seq.start)
FROM (
select l.id as start,
(
select min(a.id) as id
from sequence as a
left outer join sequence as b on a.id = b.id - 1
where b.id is null
and a.id >= l.id
) as end,
from sequence as l
left outer join sequence as r on r.id = l.id - 1
where r.id is null;
) AS seq
#Gordon gave a brilliant and more terse answer. However, I think a recursive implementation may be useful as well. Here's a very useful article on recursive CTEs: http://msdn.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
-- This first CTE is unnecessary because you presumably already have
-- your data. But I wanted to include it to make it easier test.
WITH myNumbers AS (
SELECT *
FROM (
VALUES
(2),
(3),
(5),
(6),
(9),
(10),
(11),
(13),
(14),
(16),
(18),
(20),
(23),
(24),
(25)
) AS x (num)
),
-- To get my sequences I recurse until there is no num + 1 in my set
mySequences AS (
-- Anchor member definition: Create the first invocation
SELECT v.num, 0 AS iteration, v.num AS previous, v.num AS start
FROM myNumbers v
UNION ALL
-- Recursive member definition: Recurse until value + 1 does not exist
SELECT s.num + 1, s.iteration + 1 AS iteration, s.num AS previous, s.start
FROM mySequences s -- Notice that we can reference the CTE within itself
JOIN myNumbers v
ON v.num = s.num + 1
)
-- I must increment by 1 because I chose to start my recursion at 0
SELECT MAX(iteration + 1)
FROM mySequences
That recursive query is similar to writing
public int GetSequenceLength(int start, int iteration, int[] myNumbers)
{
if (myNumbers.Contains(start + 1))
{
return GetSequenceLength(start + 1, iteration + 1, myNumbers);
}
return iteration;
}
foreach (var myNumber in myNumbers)
{
var sequenceLength = GetSequenceLength(myNumber, 0, myNumbers) + 1;
Console.WriteLine(myNumber + " : " + sequenceLength);
}

SQL: how to get all the distinct characters in a column, across all rows

Is there an elegant way in SQL Server to find all the distinct characters in a single varchar(50) column, across all rows?
Bonus points if it can be done without cursors :)
For example, say my data contains 3 rows:
productname
-----------
product1
widget2
nicknack3
The distinct inventory of characters would be "productwigenka123"
Here's a query that returns each character as a separate row, along with the number of occurrences. Assuming your table is called 'Products'
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT aChar, COUNT(*) FROM ProductChars
GROUP BY aChar
To combine them all to a single row, (as stated in the question), change the final SELECT to
SELECT aChar AS [text()] FROM
(SELECT DISTINCT aChar FROM ProductChars) base
FOR XML PATH('')
The above uses a nice hack I found here, which emulates the GROUP_CONCAT from MySQL.
The first level of recursion is unrolled so that the query doesn't return empty strings in the output.
Use this (shall work on any CTE-capable RDBMS):
select x.v into prod from (values('product1'),('widget2'),('nicknack3')) as x(v);
Test Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select v, x, n from a -- where n > 0
order by v, n
option (maxrecursion 0)
Final Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select distinct x from a where n > 0
order by x
option (maxrecursion 0)
Oracle version:
with a(v,x,n) as
(
select v, '' as x, 0 as n from prod
union all
select v, substr(v,n+1,1) as x, n+1 as n from a where n < length(v)
)
select distinct x from a where n > 0
Given that your column is varchar, it means it can only store characters from codes 0 to 255, on whatever code page you have. If you only use the 32-128 ASCII code range, then you can simply see if you have any of the characters 32-128, one by one. The following query does that, looking in sys.objects.name:
with cteDigits as (
select 0 as Number
union all select 1 as Number
union all select 2 as Number
union all select 3 as Number
union all select 4 as Number
union all select 5 as Number
union all select 6 as Number
union all select 7 as Number
union all select 8 as Number
union all select 9 as Number)
, cteNumbers as (
select U.Number + T.Number*10 + H.Number*100 as Number
from cteDigits U
cross join cteDigits T
cross join cteDigits H)
, cteChars as (
select CHAR(Number) as Char
from cteNumbers
where Number between 32 and 128)
select cteChars.Char as [*]
from cteChars
cross apply (
select top(1) *
from sys.objects
where CHARINDEX(cteChars.Char, name, 0) > 0) as o
for xml path('');
If you have a Numbers or Tally table which contains a sequential list of integers you can do something like:
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From dbo.Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')
If you are using SQL Server 2005 and beyond, you can generate your Numbers table on the fly using a CTE:
With Numbers As
(
Select Row_Number() Over ( Order By c1.object_id ) As Value
From sys.columns As c1
Cross Join sys.columns As c2
)
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')
Building on mdma's answer, this version gives you a single string, but decodes some of the changes that FOR XML will make, like & -> &.
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT STUFF((
SELECT N'' + aChar AS [text()]
FROM (SELECT DISTINCT aChar FROM Chars) base
ORDER BY aChar
FOR XML PATH, TYPE).value(N'.[1]', N'nvarchar(max)'),1, 1, N'')
-- Allow for a lot of recursion. Set to 0 for infinite recursion
OPTION (MAXRECURSION 365)