Finding gap in column with SQL Server - sql

I have a table with a column, int type, it's not the primary key. I have thousand of record.
I'd like to find the missing ids.
I have these data :
1
2
3
4
6
8
11
14
I'd like have this as result : 5,7,9,10,12,13
DO you know how I can do this ?
Thanks,

It is easier to get this as ranges:
select (col + 1) as first_missing, (next_col - 1) as last_missing
from (select t.*, lead(col) over (order by col) as next_col
from t
) t
where next_col <> col + 1;
If you actually want this as a list, I would suggest a recursive CTE:
with cte as (
select t.col, lead(col) over (order by col) as next_col, 1 as lev
from t
union all
select cte.col + 1, next_col, lev + 1
from cte
where col + 1 < next_col
)
select cte.col
from cte
where lev > 1;
Note: If the gaps can be more than 100, you will need OPTION (MAXRECURSION 0).
Here is a db<>fiddle.

Assuming mytab is your table, the relevant column is mycol and the potential values are 1-10,000
with t(i) as (select 1 union all select i+1 from t where i<10)
,all_values(mycol) as (select row_number() over (order by (select null)) from t t0,t t1,t t2, t t3)
select *
from all_values a left join mytab t on a.mycol = t.mycol
where t.mycol is null

Related

Find all records within x units of each other

I have a table like this:
CREATE TABLE t(idx integer primary key, value integer);
INSERT INTO t(idx, value)
VALUES
(1, 1),
(2, 2),
(3, 3),
(4, 6),
(5, 7),
(6, 12)
I would like to return all the groups of records where the values are within 2 of each other, with an associated group label as a new column by which to identify them.
I thought perhaps a recursive query might be suitable...but my sql-fu is lacking.
You can use a recursive CTE:
with recursive tt as (
select t.*, row_number() over (order by idx) as seqnum
from t
),
cte as (
select idx, value, value as grp,
seqnum, 1 as lev
from tt
where seqnum = 1
union all
select tt.idx, tt.value,
(case when tt.value > grp + 2 then tt.value else cte.grp end),
tt.seqnum, 1 + lev
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte;
Here is a db<>fiddle. Note that this added a row with the value of "4" to show that the first four rows are split into two groups.
I assume you want to group rows so that any two values in each group may differ only by at most 2. Then you are right, recursive query is the solution. In each level of recursion the bounds of new group are precomputed. Groups are disjoint so finally join original table with computed group number and group by this number. Db fiddle here.
with recursive r (minv,maxv,level) as (
select min(t.value), min(t.value) + 2, 1
from t
union all
select minv, maxv, level from (
select t.value as minv, t.value + 2 as maxv, r.level + 1 as level, row_number() over (order by minv) rn
from r
join t on t.value > r.maxv
) x where x.rn = 1
)
select r.level
, format('ids from %s to %s', min(t.idx), max(t.idx)) as id_label
, format('values from %s to %s', min(t.value), max(t.value)) as value_label
from t join r on t.value between r.minv and r.maxv
group by r.level
order by r.level
(The inner query in the recursive part is just to limit number of newly added rows only to one. Simpler clause select min(t.value), min(t.value) + 2 is not possible because aggregation functions are not allowed in recursive part, analytic function is workaround.)

SQL for storing numbers from cold to hot for specific range?

We have a table that looks list this: date, val1, val2, val3, val4, val5
for a given row, val1 -val5 are unique and between 1 and 37
Using T-SQL, How can I list numbers 1 -37 by cold to hot with their frequency for a given date range?
[![enter image description here][1]][1]
Sample Output (NOT ACTUAL): Numbers by frequency descending:
36=0, 2=1, 5=1, 7=3, 34=5, 30=6, etc.
With a recursive CTE create the dataset 1-37 and then UNION ALL to create a dataset with all the numbers in the table.
Join the 2 datasets and group by the number and aggregate:
with cte(n) as (
select 1 union all select (cte.n + 1) n from cte where cte.n < 37
)
select
cte.n, count(t.number) counter
from cte left join (
select date, val1 number from tablename union all
select date, val2 from tablename union all
select date, val3 from tablename union all
select date, val4 from tablename union all
select date, val5 from tablename
) t on t.number = cte.n and t.date between '2019-05-01' and '2019-05-31'
group by cte.n
order by counter, cte.n
Generate table of 37 numbers and left join your data
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
numbers(N) AS (
SELECT TOP (37) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select n.N, count(t.val)
from numbers N
left join (
select dt, val
from
-- your table here
( values
('2017-01-01', 22, 23, 4, 22, 5)
) myTable (dt, val1, val2,val3,val4,val5)
-- end of your table
cross apply (
values (val1),(val2),(val3),(val4),(val5)
) t(val)
) t on t.val = n.N
group by n.N
order by n.N;
You need to generate a list of 37 numbers (a recursive CTE is handy for this).
Then you can use a join if the values are unique in each row:
with n(n) as (
select 1 as n union all
select (cte.n + 1) as n
from cte
where cte.n < 37
)
select n.n, count(t.id) counter
from n left join
t
on n.n in (t.val1, t.val2, t.val3, t.val4, t.val5)
group by n.n;
If the numbers can be repeated within a row, then the above only counts the row once (it really counts matching rows rather than matching values). If you want them counted separately, then unpivot. For this, I recommend apply:
select n.n, count(v.val) counter
from n left join
(t cross apply
(values (t.val1), (t.val2), (t.val3), (t.val4), (t.val5)
) v(val)
)
on n.n = v.val
group by n.n;

Oracle: Need to fetch the rows exponentially

I got below query from another post which selects 100 rows from every 2000 rows.
Like this: 1-100,2001-2100,4001-4100,6001-6100,8001-8100 and so on.
SELECT * FROM (SELECT t.*,ROWNUM AS rn FROM(SELECT * FROM your_table ORDER BY your_condition) t)WHERE MOD( rn - 1, 2000 ) < 100;
Now I want to select my data exponentially.Such that it will select 100 rows from first 1000 rows, then from next 2000 rows, then from next 4000 rows.
Like this: 1-100,2000-2100,4000-4100,8000-8100,16000-16100 and so on.
The idea is to scan rows with a specific pattern.
You asked this in a comment on your previous question and I answered there...
SELECT *
FROM (
SELECT t.*,
ROWNUM AS rn -- Secondly, assign a row number to the ordered rows
FROM (
SELECT *
FROM your_table
ORDER BY your_condition -- First, order the data
) t
)
WHERE rn - POWER( -- Finally, filter the top 100.
2,
TRUNC( CAST( LOG( 2, CEIL( rn / 1000 ) ) AS NUMBER(20,4) ) )
) * 1000 + 1000 <= 100
This will take the first 100 rows from the groups 1-1000, 1001-3000, 3001-7000, 7001-15000, etc.
Or, to get the rows:
1-100,2000-2100,4000-4100,8000-8100,16000-16100, 32000-32100 and so on.
Then:
WHERE CASE -- Finally, filter the top 100.
WHEN rn <= 2000 THEN rn
ELSE rn - POWER(
2,
TRUNC( CAST( LOG( 2, CEIL( rn / 1000 - 1 ) ) AS NUMBER(20,4) ) )
) * 1000
END <= 100
You could use power function and simple hierarchical query, then join it with your table. Here is example with all_objects view:
with rng as (select 0 num from dual union all
select 1000 * power(2, level) from dual connect by level < 10 )
select *
from (select row_number() over (order by object_name) rn, object_name from all_objects)
join rng on rn between num + 1 and num + 100
From what you describe, you can use logs to define the groups. This is probably close enough to what you want:
select t.*
from (select t.*,
row_number() over (floor(log(2, floor(1 + (seqnum - 1) / 1000) ))
order by col
) as seqnum_2
from (select t.*, row_number() over (order by col) as seqnum
from t
) t
where seqnum_2 <= 100;
The difference from your description is that the first group is 1-999, 1000-1999, and so on.

Pull out first index record in each REPEATING group ordered by index

I have this table with this data
DECLARE #tbl TABLE
(
IDX INTEGER,
VAL VARCHAR(50)
)
--Inserted values for testing
INSERT INTO #tbl(IDX, VAL) VALUES(1,'A')
INSERT INTO #tbl(IDX, VAL) VALUES(2,'A')
INSERT INTO #tbl(IDX, VAL) VALUES(3,'A')
INSERT INTO #tbl(IDX, VAL) VALUES(4,'B')
INSERT INTO #tbl(IDX, VAL) VALUES(5,'B')
INSERT INTO #tbl(IDX, VAL) VALUES(6,'B')
INSERT INTO #tbl(IDX, VAL) VALUES(7,'A')
INSERT INTO #tbl(IDX, VAL) VALUES(8,'A')
INSERT INTO #tbl(IDX, VAL) VALUES(9,'A')
INSERT INTO #tbl(IDX, VAL) VALUES(10,'C')
INSERT INTO #tbl(IDX, VAL) VALUES(11,'C')
INSERT INTO #tbl(IDX, VAL) VALUES(12,'A')
INSERT INTO #tbl(IDX, VAL) VALUES(13,'A')
--INSERT INTO #tbl(IDX, VAL) VALUES(14,'A') -- this line has bad binary code
INSERT INTO #tbl(IDX, VAL) VALUES(14,'A') -- replace with this line and it works
INSERT INTO #tbl(IDX, VAL) VALUES(15,'D')
INSERT INTO #tbl(IDX, VAL) VALUES(16,'D')
Select * From #tbl -- to see what you have inserted...
And the Output I'm looking for is the FIRST and LAST Idx and Val in each group of Val's prior ordering over Idx. Noting that Val's may repeat !!! also Idx may not be in ascending order in the table as they are in the imsert statments. No cursors please !
i.e
Val First Last
=================
A 1 3
B 4 6
A 7 9
C 10 11
A 12 14
D 15 16
If the idx values are guaranteed to be sequential, then try this:
Select f.val, f.idx first, l.idx last
From #tbl f
join #tbl l
on l.val = f.val
and l.idx > f.idx
and not exists
(Select * from #tbl
Where val = f.val
and idx = l.idx + 1)
and not exists
(Select * from #tbl
Where val = f.val
and idx = f.idx - 1)
and not exists
(Select * from #tbl
Where val <> f.val
and idx Between f.idx and l.idx)
order by f.idx
if the idx values are not sequential, then it needs to be a bit more complex...
Select f.val, f.idx first, l.idx last
From #tbl f
join #tbl l
on l.val = f.val
and l.idx > f.idx
and not exists
(Select * from #tbl
Where val = f.val
and idx = (select Min(idx)
from #tbl
where idx > l.idx))
and not exists
(Select * from #tbl
Where val = f.val
and idx = (select Max(idx)
from #tbl
where idx < f.idx))
and not exists
(Select * from #tbl
Where val <> f.val
and idx Between f.idx and l.idx)
order by f.idx
SQL Server 2012
In SQL Server 2012, you can use cte sequence with lag/lead analytical functions like below (fiddle here). The code does not assume any type or sequence about idx, and queries first and last occurrence of val within each window.
;with cte as
(
select val, idx,
ROW_NUMBER() over(order by (select 0)) as urn --row_number without ordering
from #tbl),
cte1 as
(
select urn, val, idx,
lag(val, 1) over(order by urn) as prevval,
lead(val, 1) over(order by urn) as nextval
from cte
),
cte2 as
(
select val, idx, ROW_NUMBER() over(order by (select 0)) as orn,
(ROW_NUMBER() over(order by (select 0))+1)/2 as prn from cte1
where (prevval <> nextval or prevval is null or nextval is null)
),
cte3 as
(
select val, FIRST_VALUE(idx) over(partition by prn order by prn) as firstidx,
LAST_VALUE(idx) over(partition by prn order by prn) as lastidx, orn
from cte2
),
cte4 as
(
select val, firstidx, lastidx, min(orn) as rn
from cte3
group by val, firstidx, lastidx
)
select val, firstidx, lastidx
from cte4
order by rn;
SQL Server 2008
In SQL Server 2008, it is bit more tortured code due to the lack of lag/lead analytical functions. (fiddle here). Here also, the code does not assume any type or sequence about idx, and queries first and last occurrence of val within each window.
;with cte as
(
select val, idx, ROW_NUMBER() over(order by (select 0)) as urn
from #tbl),
cte1 as
(
select m.urn, m.val, m.idx,
_lag.val as prevval, _lead.val as nextval
from cte as m
left join cte as _lag
on _lag.urn = m.urn-1
left join cte AS _lead
on _lead.urn = m.urn+1),
cte2 as
(
select val, idx, ROW_NUMBER() over(order by (select 0)) as orn,
(ROW_NUMBER() over(order by (select 0))+1)/2 as prn from cte1
where (prevval <> nextval or prevval is null or nextval is null)),
cte3 as
( select *, ROW_NUMBER() over(partition by prn order by orn) as rownum
from cte2),
cte4 as
(select o.val, (select i.idx from cte3 as i where i.rownum = 1 and i.prn = o.prn)
as firstidx,
(select i.idx from cte3 as i where i.rownum = 2 and i.prn = o.prn) as lastidx,
o.orn from cte3 as o),
cte5 as (
select val, firstidx, lastidx, min(orn) as rn
from cte4
group by val, firstidx, lastidx
)
select val, firstidx, lastidx
from cte5
order by rn;
Note:
Both of the solutions are based on the assumption that the database engine preserves the order of insertion, though relational database does not guaranteed the order in theory.
A way to do it - at least for SQL Server 2008 without using special functionality would be to introduce a helper table and helper variable.
Now whether that's actually possible for you as is (due to many other requirements) I don't know - but it might lead you on a solution path, but it does look to solve your current set up requirements of no cursor and nor lead/lag:
So basically what I do is make a helper table and a helper grouping variable:
(sorry about the naming)
DECLARE #grp TABLE
(
idx INTEGER ,
val VARCHAR(50) ,
gidx INT
)
DECLARE #gidx INT = 1
INSERT INTO #grp
( idx, val, gidx )
SELECT idx ,
val ,
0
FROM #tbl AS t
I populate this with the values from your source table #tbl.
Then I do an update trick to assign a value to gidx based on when VAL changes value:
UPDATE g
SET #gidx = gidx = CASE WHEN val <> ISNULL(( SELECT val
FROM #grp AS g2
WHERE g2.idx = g.idx - 1
), val) THEN #gidx + 1
ELSE #gidx
END
FROM #grp AS g
What this does is assign a value of 1 to gidx until VAL changes, then it assigns gidx + 1 which is also assigned to #gixd variable. And so on.
This gives you the following usable result:
idx val gidx
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 B 2
7 A 3
8 A 3
9 A 3
10 C 4
11 C 4
12 A 5
13 A 5
14 A 5
15 D 6
16 D 6
Notice that gidx now is a grouping factor.
Then it's a simple matter of extracting the data with a sub select:
SELECT ( SELECT TOP 1
VAL
FROM #GRP g3
WHERE g2.gidx = g3.gidx
) AS Val ,
MIN(idx) AS First ,
MAX(idx) AS Last
FROM #grp AS g2
GROUP BY gidx
This yields the result:
A 1 3
B 4 6
A 7 9
C 10 11
A 12 14
D 15 16
Fiddler link
I'm assuming that IDX values are unique. If they can also be assumed to start from 1 and have no gaps, as in your example, you could try the following SQL Server 2005+ solution:
WITH partitioned AS (
SELECT
IDX, Val,
grp = IDX - ROW_NUMBER() OVER (PARTITION BY Val ORDER BY IDX ASC)
FROM #tbl
)
SELECT
Val,
FirstIDX = MIN(IDX),
LastIDX = MAX(IDX)
FROM partitioned
GROUP BY
Val, grp
ORDER BY
FirstIDX
;
If IDX values may have gaps and/or may start from a value other than 1, you could use the following modification of the above instead:
WITH partitioned AS (
SELECT
IDX, Val,
grp = ROW_NUMBER() OVER ( ORDER BY IDX ASC)
- ROW_NUMBER() OVER (PARTITION BY Val ORDER BY IDX ASC)
FROM #tbl
)
SELECT
Val,
FirstIDX = MIN(IDX),
LastIDX = MAX(IDX)
FROM partitioned
GROUP BY
Val, grp
ORDER BY
FirstIDX
;
Note: If you end up using either of these queries, please make sure the statement preceding the query is delimited with a semicolon, particularly if you are using SQL Server 2008 or later version.

Find all integer gaps in SQL

I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.
Here is the format of the table:
id (pk-identity) | GameID (int) | etc. | etc.
I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.
Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?
The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag() and lead() functions. The following gets the next id:
select t.*, lead(id) over (order by id) as nextid
from t;
If there is a gap, then nextid <> id+1. You can now characterize the gaps using where:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
from t
) t
where nextid <> id+1;
EDIT:
Without the lead(), I would do the same thing with a correlated subquery:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
(select top 1 id
from t t2
where t2.id > t.id
order by t2.id
) as nextid
from t
) t
where nextid <> id+1;
Assuming the id is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.
Numbers table!
CREATE TABLE dbo.numbers (
number int NOT NULL
)
ALTER TABLE dbo.numbers
ADD
CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
WITH FILLFACTOR = 100
GO
INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As a
CROSS
JOIN (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As b
GO
Then you can perform an OUTER JOIN or EXISTS` between your two tables and find the gaps...
SELECT *
FROM dbo.numbers
WHERE NOT EXISTS (
SELECT *
FROM your_table
WHERE id = numbers.number
)
-- OR
SELECT *
FROM dbo.numbers
LEFT
JOIN your_table
ON your_table.id = numbers.number
WHERE your_table.id IS NULL
I like the "gaps and islands" approach. It goes a little something like this:
WITH Islands AS (
SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID
That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:
WITH
cte1 AS (
SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
FROM dbo.yourTable
)
, cte2 AS (
SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
FROM cte1
GROUP BY [rn]
)
,Islands AS (
SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
from cte2
)
SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
ON a.IslandID + 1 = b.IslandID
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
WHERE id + 1 <> nextId
GapStart GapEnd
----------- -----------
3 8
11 15
Try this (This covers upto 10000 Ids starting from 1, if you need more you can add more to Numbers table below):
;WITH Digits AS (
select Digit
from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as t(Digit))
,Numbers AS (
select u.Digit
+ t.Digit*10
+ h.Digit*100
+ th.Digit*1000
+ tth.Digit*10000
--Add 10000, 100000 multipliers if required here.
as myId
from Digits u
cross join Digits t
cross join Digits h
cross join Digits th
cross join Digits tth
--Add the cross join for higher numbers
)
SELECT myId
FROM Numbers
WHERE myId NOT IN (SELECT GameId FROM YourTable)
Problem: we need to find the gap range in id field
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
Solution
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id + 1, nextId - 1 FROM cte
WHERE id + 1 <> nextId
Output
GapStart GapEnd
----------- -----------
4 7
12 14