Returning the lowest integer not in a list in SQL - sql

Supposed you have a table T(A) with only positive integers allowed, like:
1,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18
In the above example, the result is 10. We always can use ORDER BY and DISTINCT to sort and remove duplicates. However, to find the lowest integer not in the list, I came up with the following SQL query:
select list.x + 1
from (select x from (select distinct a as x from T order by a)) as list, T
where list.x + 1 not in T limit 1;
My idea is start a counter and 1, check if that counter is in list: if it is, return it, otherwise increment and look again. However, I have to start that counter as 1, and then increment. That query works most of the cases, by there are some corner cases like in 1. How can I accomplish that in SQL or should I go about a completely different direction to solve this problem?

Because SQL works on sets, the intermediate SELECT DISTINCT a AS x FROM t ORDER BY a is redundant.
The basic technique of looking for a gap in a column of integers is to find where the current entry plus 1 does not exist. This requires a self-join of some sort.
Your query is not far off, but I think it can be simplified to:
SELECT MIN(a) + 1
FROM t
WHERE a + 1 NOT IN (SELECT a FROM t)
The NOT IN acts as a sort of self-join. This won't produce anything from an empty table, but should be OK otherwise.

SQL Fiddle
select min(y.a) as a
from
t x
right join
(
select a + 1 as a from t
union
select 1
) y on y.a = x.a
where x.a is null
It will work even in an empty table

SELECT min(t.a) - 1
FROM t
LEFT JOIN t t1 ON t1.a = t.a - 1
WHERE t1.a IS NULL
AND t.a > 1; -- exclude 0
This finds the smallest number greater than 1, where the next-smaller number is not in the same table. That missing number is returned.
This works even for a missing 1. There are multiple answers checking in the opposite direction. All of them would fail with a missing 1.
SQL Fiddle.

You can do the following, although you may also want to define a range - in which case you might need a couple of UNIONs
SELECT x.id+1
FROM my_table x
LEFT
JOIN my_table y
ON x.id+1 = y.id
WHERE y.id IS NULL
ORDER
BY x.id LIMIT 1;

You can always create a table with all of the numbers from 1 to X and then join that table with the table you are comparing. Then just find the TOP value in your SELECT statement that isn't present in the table you are comparing
SELECT TOP 1 table_with_all_numbers.number, table_with_missing_numbers.number
FROM table_with_all_numbers
LEFT JOIN table_with_missing_numbers
ON table_with_missing_numbers.number = table_with_all_numbers.number
WHERE table_with_missing_numbers.number IS NULL
ORDER BY table_with_all_numbers.number ASC;

In SQLite 3.8.3 or later, you can use a recursive common table expression to create a counter.
Here, we stop counting when we find a value not in the table:
WITH RECURSIVE counter(c) AS (
SELECT 1
UNION ALL
SELECT c + 1 FROM counter WHERE c IN t)
SELECT max(c) FROM counter;
(This works for an empty table or a missing 1.)

This query ranks (starting from rank 1) each distinct number in ascending order and selects the lowest rank that's less than its number. If no rank is lower than its number (i.e. there are no gaps in the table) the query returns the max number + 1.
select coalesce(min(number),1) from (
select min(cnt) number
from (
select
number,
(select count(*) from (select distinct number from numbers) b where b.number <= a.number) as cnt
from (select distinct number from numbers) a
) t1 where number > cnt
union
select max(number) + 1 number from numbers
) t1
http://sqlfiddle.com/#!7/720cc/3

Just another method, using EXCEPT this time:
SELECT a + 1 AS missing FROM T
EXCEPT
SELECT a FROM T
ORDER BY missing
LIMIT 1;

Related

How to cast SELECT result to a typed value

Is there any common practice to use SELECT result as a typed value, e.g. for functions arguments?
Could it be something like this?
func((SELECT number FROM numbers WHERE user_id = 1 LIMIT 1).number::numeric)
I thought about CURSOR for such a task but I'm not really sure. Thank you for any advice!
I'm using PostgreSQL so if there is any specific solution feel free to share.
Use the FROM clause or a common table expression:
SELECT func(a.x, b.y)
FROM (SELECT ... LIMIT 1) AS a
CROSS JOIN (SELECT ... LIMIT 1) AS b;
or
WITH a AS (SELECT ... LIMIT 1),
b AS (SELECT ... LIMIT 1)
SELECT func(a.x, b.y)
FROM a CROSS JOIN b;
Doesn't this do what you want?
func( (SELECT number FROM numbers WHERE user_id = 1 LIMIT 1)::numeric )
The subquery is returning (at most) one value so it is a scalar subquery and is equivalent to a scalar reference in the query.
That said, there are many other ways to express this. For instance:
func( (SELECT number::numeric FROM numbers WHERE user_id = 1 LIMIT 1) )
or:
(SELECT func(number::numeric) FROM numbers WHERE user_id = 1 LIMIT 1)
or moving to the from clause and using a lateral join. Or calculating the result in a CTE or subquery.

SQL to find missing numbers in sequence starting from min?

I have found several examples of SQL queries which will find missing numbers in a sequence. For example this one:
Select T1.val+1
from table T1
where not exists(select val from table T2 where T2.val = T1.val + 1);
This will only find gaps in an existing sequence. I would like to find gaps in a sequence starting from a minimum.
For example, if the values in my sequence are 2, 4 then the query above will return 3,5.
I would like to specify that my sequence must start at 0 so I would like the query to return 0,1,3,5.
How can I add the minimum value to my query?
A few answers to questions below:
There is no maximum, only a minimum
The DB is oracle
This is quite easy in Postgres:
select x.i as missing_sequence_value
from (
select i
from generate_series(0,5) i -- 0,5 are the lower and upper bounds you want
) x
left join the_table t on t.val = x.i
where t.val is null;
SQLFiddle: http://www.sqlfiddle.com/#!15/acb07/1
Edit
The Oracle solution is a bit more complex because generating the numbers requires a workaround
with numbers as (
select level - 1 as val
from dual
connect by level <= (select max(val) + 2 from the_table) -- this is the maximum
), number_range as (
select val
from numbers
where val >= 0 -- this is the minimum
)
select nr.val as missing_sequence_value
from number_range nr
left join the_table t on t.val = nr.val
where t.val is null;
SQLFiddle: http://www.sqlfiddle.com/#!4/71584/4
The idea (in both cases) is to generate a list of numbers you are interested in (from 0 to 5) and then doing an outer join against the values in your table. The rows where the outer join does not return something from your table (that's the condition where t.val is null) are the values that are missing.
The Oracle solution requires two common table expressions ("CTE", the "with" things) because you can't add a where level >= x in the first CTE that generates the numbers.
Note that the connect by level <= ... is relies on an undocumented (and unsupported) way of using connect by. But so many people are using that to get a "number generator" that I doubt that Oracle will actually remove this.
If you have to option to use a common table expression you can generate a sequence of numbers and use that as the source of numbers.
The variables #start and #end defines the range of numbers (you could easily use max(val) from yourtable as end instead).
This example is for MS SQL Server (but CTEs is a SQL 99 feature supported by many databases):
declare #start int, #end int
select #start=0, #end=5
;With sequence(num) as
(
select #start as num
union all
select num + 1
from sequence
where num < #end
)
select * from sequence seq
where not exists(select val from YourTable where YourTable.val = seq.num)
Option (MaxRecursion 1000)
Sample SQL Fiddle
Just presenting 2 small variations to suggestions above, both are for Oracle.
First is a small variation to one presented by jpw using a recursive Common Table Expression (CTE); simply to demonstrate that this technique is also available in Oracle.
WITH
seq (val)
AS (
SELECT 0 FROM dual
UNION ALL
SELECT val + 1
FROM seq
WHERE val < (
SELECT MAX(val) FROM the_table -- note below
)
)
SELECT
seq.val AS missing_sequence_value
FROM seq
LEFT JOIN the_table t
ON seq.val = t.val
WHERE t.val IS NULL
ORDER BY
missing_sequence_value
;
This variation at SQLfiddle
Notable difference to SQL Server: you can use a subquery to limit the recursion
Also, Oracle documentation often refers to Subquery Factoring e.g. subquery_factoring_clause::= instead of CTE
Second is a variation on the use of connect by level as used by a_horse_with_no_name
Level is a pseudo column available in Oracle's hierarchical queries and the root of a hierarchy is 1. When using connect by level, by default this will commence at 1
For this variation I just wished to demonstrate that it does not need to be coupled with CTE's at all, and hence the syntax can be quite concise.
SELECT
seq.val AS missing_sequence_value
FROM (
SELECT
level - 1 AS val
FROM dual
CONNECT BY LEVEL <= (SELECT max(val) FROM the_table)
) seq
LEFT JOIN the_table t
ON seq.val = t.val
WHERE t.val IS NULL
ORDER BY
missing_sequence_value
;
This variation at SQLfiddle

Ordering a SQL query based on the value in a column determining the value of another column in the next row

My table looks like this:
Value Previous Next
37 NULL 42
42 37 3
3 42 79
79 3 NULL
Except, that the table is all out of order. (There are no duplicates, so that is not an issue.) I was wondering if there was any way to make a query that would order the output, basically saying "Next row 'value' = this row 'next'" as it's shown above ?
I have no control over the database and how this data is stored. I am just trying to retrieve it and organize it. SQL Server I believe 2008.
I realize that this wouldn't be difficult to reorganize afterwards, but I was just curious if I could write a query that just did that out of the box so I wouldn't have to worry about it.
This should do what you need:
WITH CTE AS (
SELECT YourTable.*, 0 Depth
FROM YourTable
WHERE Previous IS NULL
UNION ALL
SELECT YourTable.*, Depth + 1
FROM YourTable JOIN CTE
ON YourTable.Value = CTE.Next
)
SELECT * FROM CTE
ORDER BY Depth;
[SQL Fiddle] (Referential integrity and indexes omitted for brevity.)
We use a recursive common table expression (CTE) to travel from the head of the list (WHERE Previous IS NULL) to the trailing nodes (ON YourTable.Value = CTE.Next) and at the same time memorize the depth of the recursion that was needed to reach the current node (in Depth).
In the end, we simply sort by the depth of recursion that was needed to reach each of the nodes (ORDER BY Depth).
Use a recursive query, with the one i list here you can have multiple paths along your linked list:
with cte (Value, Previous, Next, Level)
as
(
select Value, Previous, Next, 0 as Level
from data
where Previous is null
union all
select d.Value, d.Previous, d.Next, Level + 1
from data d
inner join cte c on d.Previous = c.Value
)
select * from cte
fiddle here
If you are using Oracle, try Starts with- connect by
select ... start with initial-condition connect by
nocycle recursive-condition;
EDIT: For SQL-Server, use WITH syntax as below:
WITH rec(value, previous, next) AS
(SELECT value, previous, next
FROM table1
WHERE previous is null
UNION ALL
SELECT nextRec.value, nextRec.previous, nextRec.next
FROM table1 as nextRec, rec
WHERE rec.next = nextRec.value)
SELECT value, previous, next FROM rec;
One way to do this is with a join:
select t.*
from t left outer join
t tnext
on t.next = tnext.val
order by tnext.value
However, won't this do?
select t.*
from t
order by t.next
Something like this should work:
With Parent As (
Select
Value,
Previous,
Next
From
table
Where
Previous Is Null
Union All
Select
t.Value,
t.Previous,
t.Next
From
table t
Inner Join
Parent
On Parent.Next = t.Value
)
Select
*
From
Parent
Example

T-SQL to find length of time a particular value is in-range

I have a table in SQL Server where, for each row r at time t, I would like to find the first t + i for some function of r where abs(f(r, t + i) - f(r, t)) > epsilon.
I can imagine doing this with two cursors, but this seems highly inefficient.
Any of the T-SQL gurus out there have any advice?
select a.t, b.t --- and what other columns you need
from tbl a -- implicitly each row of table
cross apply (
select top(1) * -- the first, going upwards along b.t
from tbl b
where a.t < b.t -- look for records with later t than the source row
and <here's your function between a row and b row>
order by b.t asc
) x
I'm not a big fan of correlated subqueries. But, it seems useful in this case. The following code returns the minimum "sequence number" of the first row after the given row subject to your condition:
with t as (
select t.*, f(r, t) as fval, row_number() over (order by <ordering>) as seqnum
from table t
)
select t.*,
(select min(t2.seqnum)
from t t2
where t2.seqnum > t.seqnum and
abs(t2.fval - t.fval) > <epsilon>
) as next_seqnum
from t
To make this work, you need to specify <ordering> and <epsilon>. is how you know the order of the rows (t would be a good guess, if I had to guess).

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.
There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.
Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.
For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant
You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.
If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.