get list of unused ids - sql

We have current table named Article:
id
name
1
artikel_a
2
artikel_b
3
artikel_c
id is a numeric(5, 0)
Its very important that similar articles have very similar IDs, so my client wants to see list of all the possible (currently unused) id numbers when he creates a new article record. That way they can look at a range that fits for current article creation.
How can I do this in SQL Server?

One possible solution
Declare #YourTable Table ([id] int,[name] varchar(50)) Insert Into #YourTable Values
(1,'aaa')
,(2,'bbb')
,(3,'ccc')
,(25,'ddd')
,(50,'eee')
Select R1 = min(N)
,R2 = max(N)
From (
Select N
,Grp = N-row_number() over (order by N)
From (
Select Top 99999 N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1, master..spt_values n2
) src
where not exists (Select 1 from #YourTable where N=id)
) A
Group By Grp
Results of Available IDs
R1 R2
4 24
26 49
51 99999
Note:
Subquery A will give you a long list of open ID's

Related

Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.
For this, the last column is required. I played around with
row_number() over (partition by id, value)
but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.
Please help!
First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).
That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.
So, the task is pretty simple:
calculate trigger switch - when value is changed
calculate groups
calculate rows
That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.
DECLARE #DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);
INSERT INTO #DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);
WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM #DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];
You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).
Note: this answers the original question and doesn't take into account additional timestamp column mentioned in the comment. I'm not updating my answer since there is already an accepted answer.
One way to solve it could be through a recursive CTE:
create table #tmp (i int identity,id int, value int, rn int);
insert into #tmp (id,value) VALUES
(1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
(2,0),(2,1),(2,0),(2,0);
WITH numbered AS (
SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
)
SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
This will return the following:
i id value seq
1 1 1 1
2 1 0 1
3 1 0 2
4 1 1 1
5 1 1 2
6 1 1 3
7 2 0 1
8 2 1 1
9 2 0 1
10 2 0 2
See my little demo here: https://rextester.com/ZZEIU93657
A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.
For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.
IMHO, this is easier to do with cursor and loop.
may be there is a way to do the job with selfjoin
declare #t table (id int, val int)
insert into #t (id, val)
select 1 as id, 1 as val
union all select 1, 0
union all select 1, 0
union all select 1, 1
union all select 1, 1
union all select 1, 1
;with cte1 (id , val , num ) as
(
select id, val, row_number() over (ORDER BY (SELECT 1)) as num from #t
)
, cte2 (id, val, num, N) as
(
select id, val, num, 1 from cte1 where num = 1
union all
select t1.id, t1.val, t1.num,
case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end
from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
)
select * from cte2

I need to be able to generate non-repetitive 8 character random alphanumeric for 2.5 million records

I need to be able to apply unique 8 character strings per row on a table that has almost 2.5 million records.
I have tried this:
UPDATE MyTable
SET [UniqueID]=SUBSTRING(CONVERT(varchar(255), NEWID()), 1, 8)
Which works, but when I check the uniqueness of the ID's, I receive duplicates
SELECT [UniqueID], COUNT([UniqueID])
FROM NicoleW_CQ_2019_Audi_CR_Always_On_2019_T1_EM
GROUP BY [UniqueID]
HAVING COUNT([UniqueID]) > 1
I really would just like to update the table, as above, with just a simple line of code, if possible.
Here's a way that uses a temporary table to assure the uniqueness
Create and fill a #temporary table with unique random 8 character codes.
The SQL below uses a FOR XML trick to generate the codes in BASE62 : [A-Za-z0-9]
Examples : 8Phs7ZYl, ugCKtPqT, U9soG39q
A GUID only uses the characters [0-9A-F].
For 8 characters that can generate 16^8 = 4294967296 combinations.
While with BASE62 there are 62^8 = 2.183401056e014 combinations.
So the odds that a duplicate is generated are significantly lower with BASE62.
The temp table should have an equal of larger amount of records than the destination table.
This example only generates 100000 codes. But you get the idea.
IF OBJECT_ID('tempdb..#tmpRandoms') IS NOT NULL DROP TABLE #tmpRandoms;
CREATE TABLE #tmpRandoms (
ID INT PRIMARY KEY IDENTITY(1,1),
[UniqueID] varchar(8),
CONSTRAINT UC_tmpRandoms_UniqueID UNIQUE ([UniqueID])
);
WITH DIGITS AS
(
select n
from (values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n)
),
NUMS AS
(
select (d5.n*10000 + d4.n*1000 + d3.n*100 + d2.n * 10 + d1.n) as n
from DIGITS d1
cross join DIGITS d2
cross join DIGITS d3
cross join DIGITS d4
cross join DIGITS d5
)
INSERT INTO #tmpRandoms ([UniqueID])
SELECT DISTINCT LEFT(REPLACE(REPLACE((select CAST(NEWID() as varbinary(16)), n FOR XML PATH(''), BINARY BASE64),'+',''),'/',''), 8) AS [UniqueID]
FROM NUMS;
Then update your table with it
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN, [UniqueID]
FROM YourTable
)
UPDATE t
SET t.[UniqueID] = tmp.[UniqueID]
FROM CTE t
JOIN #tmpRandoms tmp ON tmp.ID = t.RN;
A test on rextester here
Can you just use numbers and assign a randomish value?
with toupdate as (
select t.*,
row_number() over (order by newid()) as random_enough
from mytable t
)
update toupdate
set UniqueID = right(concat('00000000', random_enough), 8);
See: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/a289ed64-2038-415e-9f5d-ae84e50fe702/generate-random-string-of-length-5-az09?forum=transactsql
Alter: DECLARE #s char(5) and SELECT TOP (5) c1 to fix length you want.

Guarantee random inserting

I am trying to pregenerate some alphanumeric strings and insert the result into a table. The length of string will be 5. Example: a5r67. Basically I want to generate some readable strings for customers so they can access their orders like
www.example.com/order/a5r67. Now I have a select statement:
;WITH
cte1 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
cte2 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
cte3 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
cte4 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
cte5 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t))
INSERT INTO ProductHandles(ID, Used)
SELECT cte1.t + cte2.t + cte3.t + cte4.t + cte5.t, 0
FROM cte1
CROSS JOIN cte2
CROSS JOIN cte3
CROSS JOIN cte4
CROSS JOIN cte5
Now the problem is I need to write something like this to get a value from the table:
SELECT TOP 1 ID
FROM ProductHandles
WHERE Used = 0
I will have index on the Used column so it will be fast. The problem with this is that it comes with order:
00000
00001
00002
...
I know that I can order by NEWID(), but that will be much slower. I know that there is no guarantee of ordering unless we specify Order By clause. What is needed is opposite. I need guaranteed chaos, but not by ordering by NEWID() each time customer creates order.
I am going to use it like:
WITH cte as (
SELECT TOP 1 * FROM ProductHandles WHERE Used = 0
--I don't want to order by newid() here as it will be slow
)
UPDATE cte
SET Used = 1
OUTPUT INSERTED.ID
If you add an identity column to the table, and use order by newid() when inserting the records (that will be slow but it's a one time thing that's being done offline from what I understand) then you can use order by on the identity column to select the records in the order they where inserted to the table.
From the Limitations and Restrictions part of the INSERT page in Microsoft Docs:
INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted.
This means that by doing this you are effectively making the identity column ordered by the same random order the rows where selected in the insert...select statement.
Also, there is no need to repeat the same cte 5 times - you are already repeating the cross apply:
CREATE TABLE ProductHandles(sort int identity(1,1), ID char(5), used bit)
;WITH
cte AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t))
INSERT INTO ProductHandles(ID, Used)
SELECT a.t + b.t + c.t + d.t + e.t, 0
FROM cte a
CROSS JOIN cte b
CROSS JOIN cte c
CROSS JOIN cte d
CROSS JOIN cte e
ORDER BY NEWID()
Then the cte can have an order by clause that guarantees the same random order as the rows returned from the select statement populating this table:
WITH cte as (
SELECT TOP 1 *
FROM ProductHandles
WHERE Used = 0
ORDER BY sort
)
UPDATE cte
SET Used = 1
OUTPUT INSERTED.ID
You can see a live demo on rextester. (with only digits since it's taking too long otherwise)
Here's a slightly different option...
Rather than trying to generate all possible values in a single sitting, you could simply generate a million or two at a time and generate more as they get used up.
Using this approach, you drastically reduce the the initial creation time and eliminate the need to maintain the massive table of values, the majority of which, that will never be used.
CREATE TABLE dbo.ProductHandles (
rid INT NOT NULL
CONSTRAINT pk_ProductHandles
PRIMARY KEY CLUSTERED,
ID_Value CHAR(5) NOT NULL
CONSTRAINT uq_ProductHandles_IDValue
UNIQUE WITH (IGNORE_DUP_KEY = ON), -- prevents the insertion of duplicate values w/o generating any errors.
Used BIT NOT NULL
CONSTRAINT df_ProductHandles_Used
DEFAULT (0)
);
-- Create a filtered index to help facilitate fast searches
-- of unused values.
CREATE NONCLUSTERED INDEX ixf_ProductHandles_Used_rid
ON dbo.ProductHandles (Used, rid)
INCLUDE(ID_Value)
WHERE Used = 0;
--==========================================================
WHILE 1 = 1 -- The while loop will attempt to insert new rows, in 1M blocks, until required minimum of unused values are available.
BEGIN
IF (SELECT COUNT(*) FROM dbo.ProductHandles ph WHERE ph.Used = 0) > 1000000 -- the minimum num of unused ID's you want to keep on hand.
BEGIN
BREAK;
END;
ELSE
BEGIN
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Tally (n) AS (
SELECT TOP (1000000) -- Sets the "block size" of each insert attempt.
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT dbo.ProductHandles (rid, ID_Value, Used)
SELECT
t.n + ISNULL((SELECT MAX(ph.rid) FROM dbo.ProductHandles ph), 0),
CONCAT(ISNULL(c1.char_1, n1.num_1), ISNULL(c2.char_2, n2.num_2), ISNULL(c3.char_3, n3.num_3), ISNULL(c4.char_4, n4.num_4), ISNULL(c5.char_5, n5.num_5)),
0
FROM
cte_Tally t
-- for each of the 5 positions, randomly generate numbers between 0 & 36.
-- 0-9 are left as numbers.
-- 10 - 36 are converted to lower cased letters.
CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n1 (num_1)
CROSS APPLY ( VALUES (CHAR(CASE WHEN n1.num_1 > 9 THEN n1.num_1 + 87 END)) ) c1 (char_1)
CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n2 (num_2)
CROSS APPLY ( VALUES (CHAR(CASE WHEN n2.num_2 > 9 THEN n2.num_2 + 87 END)) ) c2 (char_2)
CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n3 (num_3)
CROSS APPLY ( VALUES (CHAR(CASE WHEN n3.num_3 > 9 THEN n3.num_3 + 87 END)) ) c3 (char_3)
CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n4 (num_4)
CROSS APPLY ( VALUES (CHAR(CASE WHEN n4.num_4 > 9 THEN n4.num_4 + 87 END)) ) c4 (char_4)
CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n5 (num_5)
CROSS APPLY ( VALUES (CHAR(CASE WHEN n5.num_5 > 9 THEN n5.num_5 + 87 END)) ) c5 (char_5);
END;
END;
After the initial creation, move the code in the WHILE loop to a stored procedure and schedule it to automatically run on a periodic basis.
If I'm understanding this right, It looks like your attempting to separate the URL/visible data from the DB record ID, as most apps use, and provide something that is not directly related to an ID field that the user will see. NEWID() does allow control of the number of characters so you could generate a smaller field with a smaller index. Or just use a portion of the full NEWID()
SELECT CONVERT(varchar(255), NEWID())
SELECT SUBSTRING(CONVERT(varchar(40), NEWID()),0,5)
You might also want to look at a checksum field, I don't know if its faster on indexing though. You could get crazier by combining random NEWID() with a checksum across 2 or 3 fields.
SELECT BINARY_CHECKSUM(5 ,'EP30461105',1)

Group rows by a certain identifier and update a group id column to track which group they belong

There is a table where the group id column needs to be populated with a generated incremental number like the following example:
name batch
a 1
b 1
c 1
d 2
e 2
f 2
g 3
h 3
i 3
j 4
k 4
Order does not matter as long as the groups have the same number of elements
Looking for some ideas how this can be achieved.
What i was thinking is to build a stored procedure that iterates through the result set.
Also i have this "pseudo code" that i'm working with, but obviously has issues, and also does not do the update part just kind of selects and I was thinking to create a temp table which seems like an overkill
SELECT
name,
batch = (ROW_NUMBER() OVER(ORDER BY name)) % ((Select COUNT(*) from [abc].[dbo].[cdg]) / 30))
FROM
[abc].[dbo].[cdg] x
Try this Integer division method
Declare #param int = 3
SELECT name,
( ( Row_number()OVER(ORDER BY name) - 1 ) / #param ) + 1 as Batch
FROM tablename
If you want to derive the #param from table count then use Count(*) Over()
SELECT name,
( ( Row_number()OVER(ORDER BY name) - 1 ) / (Count(*) Over()/30) ) + 1 as Batch
FROM tablename
To update the batch column
;with update_cte as
(
SELECT name,
Batch,
( ( Row_number()OVER(ORDER BY name) - 1 ) / #param ) + 1 as gen_seq
FROM tablename
)
Update update_cte set Batch = gen_seq
Are you looking for something like this. You can use NTILE function.
drop table if exists dbo.Tile;
create table dbo.Tile (
Chr char(1)
);
insert into dbo.Tile (Chr)
values ('a'), ('b'), ('c'), ('d'), ('e')
, ('f'), ('g'), ('h'), ('i'), ('j')
, ('k');
declare #Tile int
set #Tile = ceiling((select count(t.Chr) from dbo.Tile t) / 3.)
select
*
, ntile(#Tile) over (order by t.Chr) as batch
from dbo.Tile t

Sorting two dimensional table by reordering rows and columns

Is it possible and if it is, how to sort two dimensional table, by reordering columns and rows, and using only these two operations, that table's biggest numbers are concentrated in top-left corner?
Any help would be very greatly appreciated.
For example, we can use this table:
Column 1 Column 2 Column 3
Row 1 2 4 5
Row 2 3 2 6
Row 3 7 2 6
Result I think would be this, but I am not sure:
Column 1 Column 2 Column 3
Row 1 7 6 2
Row 2 3 6 2
Row 3 2 5 4
For now, I only thought about summing rows and columns and sorting them to left-top descending.
More of a MatLab guy when it comes to matrix manipulations, but perhaps this may help.
Here we use a TVF to create a dynamic EAV structure. If you can't use a function, it is a small matter to go in-line.
Also, the final pivot can be dynamic if needed
Example
Declare #YourTable table (Column1 int,Column2 int,Column3 int)
Insert Into #YourTable values
(2,4,5),
(3,2,6),
(7,2,6)
;with cte as (
Select RowNr=Dense_Rank() over (Order By RowTotal Desc,Entity )
,ColNr=Dense_Rank() over (Order By ColTotal Desc,Attribute)
,Value
From (
Select *
,RowTotal = max(cast(Value as float)) over(Partition By Entity)
,ColTotal = max(cast(Value as float)) over(Partition By Attribute)
From [dbo].[udf-EAV]((Select RN=Row_Number() over (Order By (Select null)),* From #YourTable for XML RAW))
) A
)
Select [1] Col1,[2] Col2,[3] Col3
From cte
Pivot (max(Value) For [ColNr] in ([1],[2],[3]) ) p
Returns
Col1 Col2 Col3
7 6 2
3 6 2
2 5 4
The UDF if Interested
CREATE FUNCTION [dbo].[udf-EAV](#XML xml)
Returns Table
As
Return (
with cteKey(k) as (Select Top 1 xAtt.value('local-name(.)','varchar(100)') From #XML.nodes('/row') As A(xRow) Cross Apply A.xRow.nodes('./#*') As B(xAtt))
Select Entity = xRow.value('#*[1]','varchar(50)')
,Attribute = xAtt.value('local-name(.)','varchar(100)')
,Value = xAtt.value('.','varchar(max)')
From #XML.nodes('/row') As A(xRow)
Cross Apply A.xRow.nodes('./#*') As B(xAtt)
Where xAtt.value('local-name(.)','varchar(100)') Not In (Select k From cteKey)
)
-- Notes: First Field in Query will be the Entity
-- Select * From [dbo].[udf-EAV]((Select UTCDate=GetUTCDate(),* From sys.dm_os_sys_info for XML RAW))