How to find a number that isn't there? - sql

We have a custom number field on a training record, the number is recorded sequentially but there are gaps. How do I find those gaps? Consider this pseudocode
SELECT MIN(X)
FROM DUAL
WHERE X BETWEEN 1 AND 999999
AND X NOT IN (SELECT AG_TRNID
FROM PS_TRAINING
WHERE AG_TRNID = X)
This doesn't work, "X" is unknown.
Thanks!
Bruce

This answer assumes that you're using Oracle.
The way to solve this is to create a resultset that contains all of the numbers in your range, then join to that. The way to do this is to use a recursive query:
SELECT LEVEL AS x
FROM DUAL
CONNECT BY LEVEL <= 999999
CONNECT BY is Oracle-specific syntax that tells the query to run recursively as long as the predicate is true. level is a pseudo-column that only exists in queries that use CONNECT BY that indicates the level of recursion. The end result is that this query will run the query against dual 999,999 times, each time being a level deeper in the recursion.
Given this method of generating numbers, plugging it into the query you tried earlier is pretty trivial:
SELECT MIN (x)
FROM (SELECT LEVEL AS x
FROM DUAL
CONNECT BY LEVEL <= 999999)
WHERE x NOT IN (SELECT ag_trnid
FROM ps_training
WHERE ag_trnid = x)

Two quick examples. The first is your classic Gaps-and-Islands, and the second will use an ad-hoc tally table to identify the missings elements via a LEFT JOIN.
The following was created in SQL Server, but if your database supports the window functions, it should be a small task to adapt.
Gaps and Islands
Declare #YourTable table (X int)
Insert Into #YourTable values
(1),(2),(3),(5),(6),(10)
Select X1 = min(X)
,X2 = max(X)
From (
Select *
,Grp = X - Row_Number() over (Order by X)
From #YourTable
) A
Group By Grp
Returns
X1 X2
1 3
5 6
10 10
Ad-hoc Tally Table
Declare #YourTable table (X int)
Insert Into #YourTable values
(1),(2),(3),(5),(6),(10)
;with cte0(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cteN(N) As (Select Row_Number() over (Order By (Select NULL)) From cte0 N1, cte0 N2, cte0 N3, cte0 N4, cte0 N5, cte0 N6) -- 1 Million
Select N
From cteN
Left Join #YourTable on X=N
Where X is Null
and N<=(Select max(X) from #YourTable)
Order By N
Returns
N
4
7
8
9

Create all numbers from min(ag_trnid) to max(ag_trnid). From these remove the existing numbers:
with nums(num, maxnum) as
(
select min(ag_trnid) as num, max(ag_trnid) as maxnum from ps_training
union all
select num + 1, maxnum from nums where num < maxnum
)
select num from nums
minus
select ag_trnid from ps_training;
Start with 1 (or 0 for that matter) instead of min(ag_trnid), if you consider numbers before the minimum ag_trnid gaps, too.

I think there's no other way besides create a table's id to find the gaps
declare #idMin bigint
declare #idMax bigint
set #idMin = (select min(AG_TRNID) from PS_TRAINING)
set #idMax = (select max(AG_TRNID) from PS_TRAINING)
create table numbers
(id bigint)
while (#idMax>#idMin)
begin
insert into numbers values(#idMin)
set #idMin=#idMin+1
end
select id from numbers
where id not in (SELECT AG_TRNID FROM PS_TRAINING)

You can create a "virtual" table that contains all numbers from 1 to 999999 then remove those that are present in your table:
select level
from dual
connect by level <= 999999
minus
select ag_trnid
from ps_training;
The above will output all gaps, not just the first one.
The connect by level <= 999999 is an undocumented trick to generate all numbers from 1 to 999999.

Here's one way of finding the start and end numbers of the gaps:
WITH sample_data AS (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual UNION ALL
SELECT 4 ID FROM dual UNION ALL
SELECT 5 ID FROM dual UNION ALL
SELECT 8 ID FROM dual UNION ALL
SELECT 10 ID FROM dual UNION ALL
SELECT 20 ID FROM dual UNION ALL
SELECT 22 ID FROM dual UNION ALL
SELECT 23 ID FROM dual UNION ALL
SELECT 27 ID FROM dual UNION ALL
SELECT 28 ID FROM dual)
-- end of mimicking data in a table called "sample_data"
-- see SQL below:
SELECT prev_id + 1 first_number_in_gap,
ID - 1 last_number_in_gap
FROM (SELECT ID,
LAG(ID, 1, ID - 1) OVER (ORDER BY ID) prev_id
FROM sample_data)
WHERE ID - prev_id > 1;
FIRST_NUMBER_IN_GAP LAST_NUMBER_IN_GAP
------------------- ------------------
3 3
6 7
9 9
11 19
21 21
24 26

In the end I created a table with all possible values and select the minimum from it that doesn't exist in the base table. It works well. I had hoped for a sql statement to avoid creating yet another object but couldn't.
Yes this is Oracle, I neglected to mention.
Thank you all for the suggestions and assistance, it is much appreciated!

Related

Insert unique value multiple times depending on variable column

I have the following columns in my table:
PromotionID | NumberOfCodes
1 10
2 5
I need to insert the PromotionID into a new SQL table a number of times - depending on the value in NumberOfCodes.
So given the above, my result set should read:
PromotionID
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
You need recursive cte :
with r_cte as (
select t.PromotionID, 1 as start, NumberOfCodes
from table t
union all
select id, start + 1, NumberOfCodes
from r_cte
where start < NumberOfCodes
)
insert into table (PromotionID)
select PromotionID
from r_cte
order by PromotionID;
Default recursion level 100, use query hint option(maxrecursion 0) if you have NumberOfCodes more.
I prefer a tally for such things. Once you start getting to larger row sets, the performance of an rCTe degrades very quickly:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2) --Up to 100 rows. Add more cross joins to N for more rows
SELECT YT.PromotionID
FROM dbo.YourTable YT
JOIN Tally T ON YT.NumberOfCodes >= T.I;
One option uses a recursive common table expression to generate the rows from the source table, that you can then insert in the target table:
with cte as (
select promotion_id, number_of_codes n from sourcetable
union all select promotion_id, n - 1 from mytable where n > 1
)
insert into targettable (promotion_id)
select promotion_id from cte

Query to return dynamic number of rows using SQL in SQL Server 2012

I have a unique requirement to return number of result rows in multiples of 10. Example, if actual data rows are 3, I must add another 7 blank rows to make it 10. If actual data rows are 16, I must add another 4 blank rows to make it 20, and so on.
Without using a procedure, is it possible to achieve this using SELECT statement?
The blank rows can simply contain NULL values or spaces or zeroes.
You can assume any simple query for data rows; the objective is to understand how to return rows dynamically in multiples of 10.
Example:
Select EmpName FROM Employees
If there are 3 employees, I should still return 10 rows, with the balance 7 rows containing either NULL value or blanks.
I am using SQL Server 2012.
This is very raw idea how it can be achieved:
WITH data(r) AS (
SELECT 1 r FROM dual
UNION ALL
SELECT r+1 r FROM data WHERE r < 10
)
SELECT sd.*
FROM data d
left join some_data sd on d.r = sd.id
This is dual table structure:
create table dual (dummy varchar(1));
insert into dual values ('x');
Fiddle: http://sqlfiddle.com/#!6/5ffcc/4
One of the possible options is this:
WITH data(r) AS (
SELECT 1 r FROM dual
UNION ALL
SELECT r+1 r FROM data WHERE r < 10
)
SELECT sd.*
FROM
(select r, row_number() over (order by r) rn from data) d
left join (
select id, name, row_number() over (order by id) rn from some_data sd
) sd
on d.rn = sd.rn
The obvious disadvantages of this colutions:
'r' value generation rule most probably is not as simple in your
case.
Number of rows must be known before query execution.
But maybe it will help you to find better solution.
Here's another, fairly easy, way to handle it...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
EmpID INT NOT NULL,
EmpName VARCHAR(20) NOT NULL
);
INSERT #TestData(EmpID, EmpName) VALUES
(47, 'Bob'),(33, 'Mary'), (88, 'Sue');
-- data as it exists...
SELECT
td.EmpID,
td.EmpName
FROM
#TestData td;
-- the desired output...
WITH
cte_AddRN AS (
SELECT
td.EmpID,
td.EmpName,
RN = ROW_NUMBER() OVER (ORDER BY td.EmpName)
FROM
#TestData td
),
cte_TenRows AS (
SELECT n.RN FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) ) n (RN)
)
SELECT
ar.EmpID,
ar.EmpName
FROM
cte_TenRows tr
LEFT JOIN cte_AddRN ar
ON tr.RN = ar.RN
ORDER BY
tr.RN;
Results...
-- data as it exists...
EmpID EmpName
----------- --------------------
47 Bob
33 Mary
88 Sue
-- the desired output...
EmpID EmpName
----------- --------------------
47 Bob
33 Mary
88 Sue
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Based on the above 2 answers, here is what I did:
WITH DATA AS
(SELECT EmpName FROM Employees),
DataSummary AS
(SELECT COUNT(*) AS NumDataRows FROM DATA),
ReqdDataRows AS
(SELECT CEILING(NumDataRows/10.0)*10 AS NumRowsReqd FROM DataSummary),
FillerRows AS
(
SELECT 1 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 2 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 3 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 4 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 5 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 6 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 7 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 8 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 9 AS SLNO, '00000' AS FillerCol
UNION ALL
SELECT 10 AS SLNO, '00000' AS FillerCol
)
SELECT * FROM DATA
--UNION ALL
--SELECT CONVERT(VARCHAR(10), NumDataRows) FROM DataSummary
--UNION ALL
--SELECT CONVERT(VARCHAR(10), NumRowsReqd) FROM ReqdDataRows
UNION ALL
SELECT FillerCol FROM FillerRows
WHERE (SELECT NumDataRows FROM DataSummary) + SLNO <= (SELECT NumRowsReqd FROM ReqdDataRows)
This gives me the output what I want. This avoids use of ROW_NUMBER and ORDERing. The table FillerRows can be further simplified using SELECT * FROM (VALUES...), and the 2nd and 3rd table DataSummary and ReqdDataRows can be merged into a single SELECT statement.
This approach is a step by step approach and easy to understand and debug, like:
Get the actual data rows
Get count of the data rows
Calculate required no. data rows
UNION the actual data rows with filler rows
Any suggestions on further simplifying this are welcome.

Selecting a sequence in SQL

There seems to be a few blog posts on this topic but the solutions really are not so intuitive. Surely there's a "Canonical" way?
I'm using Teradata SQL.
How would I select
A range of number
A date range
E.g.
SELECT 1:10 AS Nums
SELECT 1-1-2010:5-1-2014 AS Dates1
The result would be 10 rows (1 - 10) in the first SELECT query and ~(365 * 3.5) rows in the second?
The "canonical" way to do this in SQL is using recursive CTEs, which the more recent versions of Teradata support.
For your first example:
with recursive nums(n) as (
select 1 as n
union all
select n + 1
from nums
where n < 10
)
select *
from nums;
You can do something similar for dates.
EDIT:
You can also do this by using row_number() and an existing table:
with nums(n) as (
select n
from (select row_number() over (order by col) as n
from ExstingTable t
) t
where n <= 10
)
select *
from nums;
ExistingTable is just any table with enough rows. The best choice of col is the primary key.
with digits(n) as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9 union all select 10
)
select *
from digits;
If your version of Teradata supports multiple CTEs, you can build on the above:
with digits(n) as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9 union all select 10
),
nums(n) as (
select d1.n*100 + d2.n*10 + d3.n
from digits d1 cross join digits d2 cross join digits d3
)
select *
from nums;
In Teradata you can use the existing sys_calendar to get those dates:
SELECT calendar_date
FROM sys_calendar.CALENDAR
WHERE calendar_date BETWEEN DATE '2010-01-01' AND DATE '2014-05-01';
Note:
DATE '2010-01-01' is the only recommended way to write a date in Teradata
There's probably another custom calendar for the specific business needs of your company, too. Everyone will have access rights to it.
You might also use this for the range of numbers:
SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 10;
But you should check Explain to see if the estimated number of rows is correct. sys_calendar is a kind of template and day_of_calendar is a calculated column, so no statistics exists on that and Explain will return an estimated number of 14683 (20 percent of the number of rows in that table) instead of 10. If you use it in additional joins the optimizer might do a bad plan based on that totally wrong number.
Note:
If you use sys_calendar you are limited to a maximum of 73414 rows, dates between 1900-01-01 and 2100-12-31 and numbers between 1 and 73414, your business calendar might vary.
Gordon Linoff's recursive query is not really efficient in Teradata, as it's a sequential row-by-row processing in a parallel database (each loop is an "all-AMPs step" in Explain) and the optimizer doesn't know how many rows will be returned.
If you need those ranges regularly you might consider creating a numbers table, I usually got one with a million rows or I use my calendar with the full range of 10000 years :-)
--DROP TABLE nums;
CREATE TABLE nums(n INT NOT NULL PRIMARY KEY CHECK (n BETWEEN 0 AND 999999));
INSERT INTO Nums
WITH cte(n) AS
(
SELECT day_of_calendar - 1
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 1000
)
SELECT
t1.n +
t2.n * 1000
FROM cte t1 CROSS JOIN cte t2;
COLLECT STATISTICS COLUMN(n) ON Nums;
The COLLECT STATS is the most important step to get correct estimates.
Now it's a simple
SELECT n FROM nums WHERE n BETWEEN 1 AND 10;
There's also a nice UDF on GitHub for creating sequences which is easy to use:
SELECT DATE '2010-01-01' + SEQUENCE
FROM TABLE(gen_sequence(0,DATE '2014-05-01' - DATE '2010-01-01')) AS t;
SELECT SEQUENCE
FROM TABLE(gen_sequence(1,10)) AS t;
But it's usually hard to convince your DBA to install any C-UDFs and the number of rows returned is unknown again.
sequence 1 to 10
sel sum (1) over (ROWS UNBOUNDED PRECEDING) as seq_val
from sys_calendar.CALENDAR
qualify row_number () over (order by 1)<=10

how to count each characters in a string and display each instance in a table format

my table master_schedule
cable_no
D110772
D110773
D110774
D110775
D110776
D110777
D110778
D110779
D110880
I would like to create a loop so that each character in the string counted and displayed
D 9
1 18
2 1
3 1
AND SO ON .......
how can i modify in these sql query mentioned below:
select (sum(LEN(cable_no) - LEN(REPLACE(cable_no, 'D', '')))*2) as FERRUL_qtyx2
from MASTER_schedule
Something like this:
select substring(cable_no, n.n, 1) as letter, count(*) as cnt
from FERRUL_qtyx2 t cross join
(select 1 as n union all select 2 union all select 3 union all select 4 union all
select 5 union all select 6 union all select 7
) n
group by substring(cable_no, n.n, 1);
This creates a sequence of numbers n up to the length of the string. It then uses cross join and substring() to extract the nth character of each cable_no.
In general, this will be faster than doing a union all seven times. The union all approach will typically scan the table 7 times. This will scan the table only once.
you can use recursive common table expression:
with cte(symbol, cable_no) as (
select
left(cable_no, 1), right(cable_no, len(cable_no) - 1)
from Table1
union all
select
left(cable_no, 1), right(cable_no, len(cable_no) - 1)
from cte
where cable_no <> ''
)
select symbol, count(*)
from cte
group by symbol
=> sql fiddle demo
Another approach (made after Gordon Linoff solution):
;with cte(n) as (
select 1
union all
select n + 1 from cte where n < 7
)
select substring(t.cable_no, n.n, 1) as letter, count(*) as cnt
from #Table1 as t
cross join cte as n
group by substring(t.cable_no, n.n, 1);
=> sql fiddle demo

SQL Select 'n' records without a Table

Is there a way of selecting a specific number of rows without creating a table. e.g. if i use the following:
SELECT 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
It will give me 10 across, I want 10 New Rows.
Thanks
You can use a recursive CTE to generate an arbitrary sequence of numbers in T-SQL like so:
DECLARE #start INT = 1;
DECLARE #end INT = 10;
WITH numbers AS (
SELECT #start AS number
UNION ALL
SELECT number + 1
FROM numbers
WHERE number < #end
)
SELECT *
FROM numbers
OPTION (MAXRECURSION 0);
If you have a fixed number of rows, you can try:
SELECT 1
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
UNION
SELECT 8
UNION
SELECT 9
UNION
SELECT 10
This is a good way if you need a long list (so you don't need lots of UNIONstatements:
WITH CTE_Numbers AS (
SELECT n = 1
UNION ALL
SELECT n + 1 FROM CTE_Numbers WHERE n < 10
)
SELECT n FROM CTE_Numbers
The Recursive CTE approach - is realy good.
Be just aware of performance difference. Let's play with a million of records:
Recursive CTE approach. Duration = 14 seconds
declare #start int = 1;
declare #end int = 999999;
with numbers as
(
select #start as number
union all
select number + 1 from numbers where number < #end
)
select * from numbers option(maxrecursion 0);
Union All + Cross Join approach. Duration = 6 seconds
with N(n) as
(
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1
)
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
Table Value Constructor + Cross Join approach. Duration = 6 seconds
(if SQL Server >= 2008)
with N as
(
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t(n)
)
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
Recursive CTE + Cross Join approach. :) Duration = 6 seconds
with N(n) as
(
select 1
union all
select n + 1 from N where n < 10
)
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
We will get more amazing effect if we try to INSERT result into a table variable:
INSERT INTO with Recursive CTE approach. Duration = 17 seconds
declare #R table (Id int primary key clustered);
with numbers as
(
select 1 as number
union all
select number + 1 from numbers where number < 999999
)
insert into #R
select * from numbers option(maxrecursion 0);
INSERT INTO with Cross Join approach. Duration = 1 second
declare #C table (Id int primary key clustered);
with N as
(
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t(n)
)
insert into #C
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
Here is an interesting article about Tally Tables
SELECT 1
UNION
SELECT 2
UNION
...
UNION
SELECT 10 ;
Using spt_values table:
SELECT TOP (1000) n = ROW_NUMBER() OVER (ORDER BY number)
FROM [master]..spt_values ORDER BY n;
Or if the value needed is less than 1k:
SELECT DISTINCT n = number FROM master..[spt_values] WHERE number BETWEEN 1 AND 1000;
This is a table that is used by internal stored procedures for various purposes. Its use online seems to be quite prevalent, even though it is undocumented, unsupported, it may disappear one day, and because it only contains a finite, non-unique, and non-contiguous set of values. There are 2,164 unique and 2,508 total values in SQL Server 2008 R2; in 2012 there are 2,167 unique and 2,515 total. This includes duplicates, negative values, and even if using DISTINCT, plenty of gaps once you get beyond the number 2,048. So the workaround is to use ROW_NUMBER() to generate a contiguous sequence, starting at 1, based on the values in the table.
In addition, to aid more values than 2k records, you could join the table with itself, but in common cases, that table itself is enough.
Performance wise, it shouldn't be too bad (generating a million records, it took 10 seconds on my laptop), and the query is quite easy to read.
Source: http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
Using PIVOT (for some cases it would be overkill)
DECLARE #Items TABLE(a int, b int, c int, d int, e int);
INSERT INTO #Items
VALUES(1, 2, 3, 4, 5)
SELECT Items
FROM #Items as p
UNPIVOT
(Items FOR Seq IN
([a], [b], [c], [d], [e]) ) AS unpvt
;WITH nums AS
(SELECT 1 AS value
UNION ALL
SELECT value + 1 AS value
FROM nums
WHERE nums.value <= 99)
SELECT *
FROM nums
Using GENERATE_SERIES - SQL Server 2022
Generates a series of numbers within a given interval. The interval and the step between series values are defined by the user.
SELECT value
FROM GENERATE_SERIES(START = 1, STOP = 10);