I need to generate a unique 13 digit number.
Can Sql Server generate this number for me somehow if I create a table with the 13 digit number as a primary key?
Update
I want the number to look like a random number, so not an autoincrementing number.
It has to be 13 digits, and it shouldn't be auto-incrementing, and it should be unique. The number shouldn't have many zero's in it but it can contain numbers from 0-9.
This number should look like a credit card number, so no trailing zeros.
My suggestion would be to have an identity column on the table that auto-increments. Then, define your value based on this. A simple way would be:
create table t (
tId int identity(1, 1) not null,
. . .
myId cast(rand(tId)*10000000000000 as varchar(13))
)
This shows it as a computed column. Of course, you can assign the value when each row is created. This is not guaranteed to produce different results, but it is highly, highly unlikely that you would see a collision.
The following is an alternative is also not guaranteed, but might work:
create table t (
tId varchar(13) default cast(cast(rand(checksum(getdate())*10000000000000 as bigint) as varchar(13)
. . .
)
EDIT:
The chance of a collision is a bit higher than I expected -- my intuition on 13-digit hash codes is, I guess, not what it should be.
In any case, there are two sources of collisions. The first is the random number generator producing the same value. To handle that, just make the assumption that the random number generator in conjunction with checksum() really is random. So, the question is: What is the chance of two random numbers less than 10,000,000,000,000 being the same value? I'll let interested parties search the web for a formula to calculate this.
If you generate 1,000 numbers. Then the probability is basically 0% that any two would be the same. That is, you are safe for the first 1000 numbers, if you assume they are distinct. Here is a summary:
1,000 0.0000%
10,000 0.0005%
100,000 0.0500%
1,000,000 4.8771%
10,000,000 99.3262%
Up to a few hundred thousand values, you are probably pretty safe. When you get into the millions -- even the low millions -- the chance of collision increases substantially.
At some point, if you want lots and lots of unique values, you are going to have to create a table that contains the unique values and a process for choosing a value not in the table.
As John Barça points out: Do not use this method for cat photos on Facebook.
Just create a Guid (select newid() ) & parse it... remove the { and the '-' & do a length of 13 select on it.
I use a similar method to generate random/unique table names in a reporting system. It might do the trick for you. Just adjust the multiplier to impact the final integer length.
SELECT CONVERT(BIGINT,RAND()*10000000000000)
As a table...
CREATE TABLE #test (testID INT,
UniqueID AS CONVERT(BIGINT,RAND()*10000000000000))
INSERT INTO #test (testID)
SELECT 1
SELECT * FROM #TEST
DROP TABLE #test
Just insert a value into testID (can be 1 every time) and a new UniqueID will generate. You should have a primary key on any production table though.
NB: While the chances of a duplicate ever happening are very small, it could still happen.
SELECT CEILING(RAND()*9999999999999)
something like this might work:
use [chamomile];
go
if object_id(N'[utility].[table_01]', N'U') is not null
drop table [utility].[table_01];
go
if object_id(N'[utility].[generate_random_sequence]', N'FN') is not null
drop function [utility].[generate_random_sequence];
go
/*
select [utility].[generate_random_sequence] (rand());
*/
create function [utility].[generate_random_sequence] (
#random [float])
returns [bigint]
as
begin
declare #return [bigint] = ceiling(#random * 9999999999999);
while #return > 9999999999999
or #return < 1000000000000
set #return = ceiling(#random * 9999999999999);
return #return;
end;
go
if object_id(N'[utility].[table_01]', N'U') is not null
drop table [utility].[table_01];
go
create table [utility].[table_01] (
[my_id] as [utility].[generate_random_sequence] (rand())
, [flower] [sysname]
);
go
insert into [utility].[table_01]
([flower])
values (N'rose');
select *
from [utility].[table_01];
Related
I am using the function below to generate a random number between 0 and 99999999999.
CREATE VIEW [dbo].[rndView]
AS
SELECT RAND() rndResult
GO
ALTER function [dbo].[RandomPass]()
RETURNS NUMERIC(18,0)
as
begin
DECLARE #RETURN NUMERIC(18,0)
DECLARE #Upper NUMERIC(18,0);
DECLARE #Lower NUMERIC(18,0);
DECLARE #Random float;
SELECT #Random = rndResult
FROM rndView
SET #Lower = 0
SET #Upper = 99999999999
set #RETURN= (ROUND(((#Upper - #Lower -1) * #Random + #Lower), 0))
return #RETURN
end;
However, I need to make sure that the returned number has never been used before in the same app. In .net I would create a while loop and keep looping until the returned value is not found in a table that stores previously used values. Is there a way to achieve the same result directly in SQL, ideally without using loops? If there is no way to do that without loops, I think it would still be more efficient to do it in an SQL function rather than having a loop in .net performing a number of query requests.
You will need to store the used values in a table, and a recursive query to generate the next value.
The answer depends on the RDBMS you are using.
Below are two examples, in PostgreSQL and MS SQL Server, that would solve your problem.
PostgreSQL
First, create a table that will hold your consumed ids :
CREATE TABLE consumed_ids (
id BIGINT PRIMARY KEY NOT NULL
);
The PRIMARY KEY is not strictly necessary, but it will
generate an index which will speed up the next query;
ensure that two equal ids are never generated.
Then, use the following query to obtain a new id :
WITH RECURSIVE T AS (
SELECT 1 AS n, FLOOR(RANDOM() * 100000000000) AS v
UNION ALL
SELECT n + 1, FLOOR(RANDOM() * 100000000000)
FROM T
WHERE EXISTS(SELECT * FROM consumed_ids WHERE id = v)
)
INSERT INTO consumed_ids
SELECT v
FROM T
ORDER BY n DESC
LIMIT 1
RETURNING id;
The logic is that as long as the (last) id generated is already consumed, we generate a new id. The column n of the CTE is there only to retrieve the last generated id at the end, but you may also use it to limit the number of generated random numbers (for example, give up if n > 10).
(tested using PostgreSQL 12.4)
MS SQL Server
First, create a table that will hold your consumed ids :
CREATE TABLE consumed_ids (
id BIGINT PRIMARY KEY NOT NULL
);
Then, use the following query to obtain a new id :
WITH T AS (
SELECT 1 AS n, FLOOR(RAND() * 100000000000) AS v
UNION ALL
SELECT n + 1, FLOOR(RAND() * 100000000000)
FROM T
WHERE EXISTS(SELECT * FROM consumed_ids WHERE id = v)
)
INSERT INTO consumed_ids (id)
OUTPUT Inserted.id
SELECT TOP 1 v
FROM T
ORDER BY n DESC;
(tested using MS SQL Server 2019).
Note however that MS SQL Server will give up after 100 tries by default.
There is no such thing as a random number without replacement.
You need to store the numbers that have already been used in a table, which I would define as:
create table used_random_numbers (
number decimal(11, 0) primary key
);
Then, when you create a new number, insert it into the table.
In the part of the code that generates a number, use a while loop. Within the while loop, check that the number doesn't exist.
Now, there are some things you can do to make this more efficient as the numbers grow larger -- and ways that don't require remembering all the previous values.
First, perhaps UUID/GUID is sufficient. This is the industry standard for a "random" id -- although it is a HEX string rather than a number in most databases. The exact syntax depends on the database.
Another approach is to have an 11 digit number. The first or last 10 digits could be the Unix epoch time (seconds since 1970-01-01) -- either explicitly or under some transformation so the value "looks" random. The additional digit would then be a random digit. Of course, you could extend this to minutes or days so you have more random digits.
My situation is like that :
I have these tables:
CREATE TABLE [dbo].[HeaderResultPulser]
(
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[ReportNumber] CHAR(255) NOT NULL,
[ReportDescription] CHAR(255) NOT NULL,
[CatalogNumber] NCHAR(255) NOT NULL,
[WorkerName] NCHAR(255) DEFAULT ('') NOT NULL,
[LastCalibrationDate] DATETIME NOT NULL,
[NextCalibrationDate] DATETIME NOT NULL,
[MachineNumber] INT NOT NULL,
[EditTime] DATETIME NOT NULL,
[Age] NCHAR(255) DEFAULT ((1)) NOT NULL,
[Current] INT DEFAULT ((-1)) NOT NULL,
[Time] BIGINT DEFAULT ((-1)) NOT NULL,
[MachineName] NVARCHAR(MAX) DEFAULT ('') NOT NULL,
[BatchNumber] NVARCHAR(MAX) DEFAULT ('') NOT NULL,
CONSTRAINT [PK_HeaderResultPulser]
PRIMARY KEY CLUSTERED ([Id] ASC)
);
CREATE TABLE [dbo].[ResultPulser]
(
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[ReportNumber] CHAR(255) NOT NULL,
[BatchNumber] CHAR(255) NOT NULL,
[DateTime] DATETIME NOT NULL,
[Ocv] FLOAT(53) NOT NULL,
[OcvMin] FLOAT(53) NOT NULL,
[OcvMax] FLOAT(53) NOT NULL,
[Ccv] FLOAT(53) NOT NULL,
[CcvMin] FLOAT(53) NOT NULL,
[CcvMax] FLOAT(53) NOT NULL,
[Delta] BIGINT NOT NULL,
[DeltaMin] BIGINT NOT NULL,
[DeltaMax] BIGINT NOT NULL,
[CurrentFail] BIT DEFAULT ((0)) NOT NULL,
[NumberInTest] INT NOT NULL
);
For every row in HeaderResultPulser I have multiple rows in ResultPulser
my key is the [HeaderResultPulser].[ReportNumber] to get a list of data in ResultPulser, and for every a lot of row with the same [ResultPulser].[ReportNumber]
It has multiple [ResultPulser].[NumberInTest] values
For example: in the ResultPulser table the data can look like this:
ReportNumber | NumberInTest
-------------+-------------
0000006211 | 1
0000006211 | 2
0000006211 | 3
0000006211 | 4
0000006211 | 5
0000006211 | 6
0000006212 | 1
0000006212 | 2
0000006212 | 3
0000006212 | 4
0000006212 | 5
NumberInTest can be 200, 500, 10000 and sometime even more..
The report number column contains two the first 7 chars are a number of machine and the rest is an incrementing number.
For example, 0000006212 is [0000006][212] == [the machine number][the incrementing number]
My query for example :
select
[HeaderResultPulser].[ReportNumber],
max(NumberInTest) as TotalCells
from
ResultPulser, HeaderResultPulser
where
((([ResultPulser].[ReportNumber] like '0000006%' and
CONVERT(INT, SUBSTRING([ResultPulser].[ReportNumber], 8, LEN([ResultPulser].[ReportNumber]))) BETWEEN '211' AND '815')
and ([HeaderResultPulser].[ReportNumber] = [ResultPulser].[ReportNumber])))
group by
[HeaderResultPulser].[ReportNumber]
Actually I want to get all the rows on the machine number 0000006 that number was 211 to 815 (include both)
This query takes about 6-7 seconds
There is a lot of data (in the hundreds of millions and billions and in the future can be more and can be much more in table ResultPulser), and it can get Tens of thousands of rows in HeaderResultPulser table
And In getting receive I only receive on select a few hundred in the worst case a thousand or about two thousand if I want to go far... but (in numbers) to get the max(NumberInTest) from ResultPulser I take about (It can get to a few millions of rows)
There is any way to optimize my query? Or when It's so much data it's just must this time? (That just the way it is)
The way you are doing joins is no longer standard. It's also hard to read, and dangerous if you ever need to use left joins. Instead of joining this way:
select *
from T1, T2
where T1.column = T2.column
Use ANSI-92 join syntax instead:
select *
from T1
join T2 on T1.column = T2.column
You said that your "key" was ReportNumber. Why isn't that declared in your schema? It sounds like you want a unique constraint on HeaderResultPulser.ReportNumber, and a foreign key on the the ReportPulser table, such that ReportNumber references HeaderResultPulser (ReportNumber)
Since your report number column seems to contain two different values, your table is not in First Normal Form. This is making things difficult for you. Why not split the two parts of the "report number" into two different columns when the data is entered? This will significantly improve your query performance, because you no longer need to perform an expression against the data in the table at query time to separate the ReportNumber into atomic values.
Your comment says that the first 7 characters of the ReportNumber are the MachineNumber. But you already have MachineNumber in the HeaderReportPulser table. So why not just add a separate column for Increment? If you still need ReportNumber to exist as a column, you can make it a calculated column, as the concatenation of MachineNumber and Increment.
If you don't want to touch the "existing" schema, we can do a similar thing in reverse. Your query will not be completely sargable unless you can do something to the schema, because you have to perform some kind of expression on the data in the ReportNumber column. But maybe you have the option to use a calculated column to do this up front:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7);
Now we have the increment as a column in its own right. But it's still being calculated at query time, because it's not persisted. We can make it persisted:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7) persisted;
We can also index a computed column. Since your required expression is deterministic and precise (see Indexes on Computed Columns), we don't actually have to mark it as persisted:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7);
create index ix_headerreportpulser_increment on HeaderReportPulser(Increment);
You could do a similar set of operations to create the Increment and MachineNumber on the ReportPulser table. If you always want to use both values, create an index on the combination of (MachineNumber, Increment)
The biggest performance gain might be eliminating the outer group by by using a correlated subquery or lateral join:
select hrp.[ReportNumber],
(select max(rp.NumberInTest)
from ResultPulser rp
where rp.ReportNumber = hrp.ReportNumber and
right(rp.ReportNumber, 3) between '211' and '815'
) as TotalCells
from HeaderResultPulser hrp
where hrp.ReportNumber like '0000006%';
Your logic looks like it only wants the last three characters of the ReportNumber, so I simplified the logic. I'm not 100% that is the case -- it just seems reasonable. Regardless, there is no need to convert the values to integers and then compare as strings. And similar logic can be used even for longer report numbers.
You also want an index on ResultPulser(ReportNumber, NumberInTest) :
create index idx_resultpulser_reportnumber_numberintest on ResultPulser(ReportNumber, NumberInTest)
EDIT:
Actually, I notice that the report number matches between the two tables. So this seems simplest:
select hrp.[ReportNumber],
(select max(rp.NumberInTest)
from ResultPulser rp
where rp.ReportNumber = hrp.ReportNumber
) as TotalCells
from HeaderResultPulser hrp
where hrp.ReportNumber >= '0000006211' and
hrp.ReportNumber <= '0000006815';
You still want to be sure you have the above index on ResultPulser.
If the ReportNumber is not a fixed 10 digits, then you can use:
where hrp.ReportNumber >= '0000006211' and
hrp.ReportNumber <= '0000006815' and
len(hrp.ReportNumber) = 10
This should also use the index and return exactly what you want.
Performance Optimization of any query depends on many factors including environment you are hosting and running your query. Hardware and Software play important part in optimization of heavy running database queries. In your case you can look into following things:
USE ANSI 92 JOIN syntax instead of default cross join
e.g
select *
from T1
join T2 on T1.column = T2.column
Put indexes on columns like
[ReportNumber]
[NumberInTest]
Note: You may need index for each column in the join area which is not primary key.
Remember use of MAX is always heavy and that could be the main problem in your query.
Finally you can further look into optimizing your query syntax using following online tool where you can specify your actual query and environment you are using:
https://www.eversql.com/
Hope it help you.
If you really want to optimize performance, I propose to add a bit of logic beyond SQL structures.
Is it possible that particular value of ReportNumber is present in table ResultPulser, but not in table HeaderResultPulser? If not, and I ssupose so, there is no reason to join table HeaderResultPulser.
Then, I propose to take advantage from fact, that the condition on ReportNumber can be expressed equivalently without dividing in substrings. For your example, the condition
([ResultPulser].[ReportNumber] like '0000006%' and
CONVERT(INT, SUBSTRING([ResultPulser].[ReportNumber], 8,
LEN([ResultPulser].[ReportNumber]))) BETWEEN '211' AND '815')
is equivalent to:
([ResultPulser].[ReportNumber] BETWEEN '0000006211' and '0000006815')
So the proposal is:
Create index on table ResultPulser(ReportNumber, NumberInTest)
Use selections similar to this:
select ReportNumber, max(NumberInTest) as TotalCells
from ResultPulser
where
ReportNumber BETWEEN '0000006211' and '0000006815'
group by
ReportNumber
(Please, add brackets or double quotes and capitalizations as necessary for MS SQL Server and your taste)
I would expect that good database will execute this query by index-only access, and it will be optimal from execution point of view.
Performance depends on not only on execution path, but also on setup and hardware. Please, make sure that your database has enough cache and fast disk accesses. Also concurrent load is very important.
Simple splitting the field ReportNumber into [the machine number] and [the incrementing number] will probably not improve performance of the query in form proposed by me. But it may be very convenient for other forms of access (other WHERE classes). And it will reflect the structure of the case. Even more important: It will release you from imposed limits. Currently, you have 3 digits for the [the incrementing number]. Are you sure, it will never be necessary to have more than 999 of them for single [the machine number]?
Why the field ReportNumber has type char(255), when only 10 characters are used? char(255) has fixed length, so it will be terrible wasting of space. Only database compression can help. Used space has strong influence on performance – Please, consider the above remark about the database cache.
If both these fields, [the machine number], [the incrementing number], are intergers, why not split ReportNumber and use integer type for them?
Side remark: Field names suggest that you search the total number of rows in table ResultPulser, which belong to single entry in table HeaderResultPulser. The proposed query will deliver this, only if numbers in NumberInTest are consecutive, without gaps. If this is not supplied, you have to count them rather than seek the maximum.
I need to write a view that does a merge insert/update.
When inserting, I need to insert id.
This id is also being inserted by random number generator in another program ( which I can't change ).
I wanted to do max(id) + 1, but not sure if that is a good idea. Could you suggest any better solutions for this problem?
or
How about using with id as ( dbms_random .... ) do a
select * from table where id = ?
if row is not found, I will insert this id otherwise, I will generate another random and do a select.
if this is for a Primary Key - then how about generating negative numbers for your part of the app (using a sequence) and leaving the random number wizardry in the positives...
You can use rand() function for your request with random number !
Enjoy,
remontees
I have a table that contains a column that has all NULL values. I would like to populate this column with a random number from a given set of numbers.
The set of given numbers will be generated from a SELECT statement that select these numbers from some other table.
E.G:
UPDATE tableA
SET someColumnName = SomeRandomNumberFromSet(SELECT number from tb_Numbers)
How do I accomplish this using MSSQL 2008?
The following isn't particularly efficient but works. The view is required to get around the "Invalid use of a side-effecting operator 'newid' within a function." error. The UDF is assumed to be non deterministic so will always be re-evaluated for each row.
This will avoid any problems with SQL Server adding spools to the plan and replaying earlier results.
If the number of rows to update (or numbers in the set) was much larger I wouldn't use this method.
CREATE VIEW dbo.OneNumber
AS
SELECT TOP 1 number
FROM master..spt_values
ORDER BY NEWID()
GO
CREATE FUNCTION dbo.PickNumber ()
RETURNS int
AS
BEGIN
RETURN (SELECT number FROM dbo.OneNumber)
END
GO
DECLARE #tableA TABLE (someColumnName INTEGER)
INSERT INTO #tableA VALUES (2), (2), (2), (2), (2)
UPDATE #tableA
SET someColumnName = dbo.PickNumber()
SELECT * FROM #tableA
I asked a similar question a long time ago, and got a few different options.
Is this a good or bad way of generating random numbers for each record?
Once you can generate a random number from 1 to n, you can use it to choose the Xth irem from your list. (Easiest way is to have a sequential id on your set of legitimate values.)
I have a T-SQL routine that copies user information from one table 'Radius' to another 'Tags'. However, as the rows are transfered, I would also like to include a unique randomly generated code in the INSERT (3 chars long). The code is generated by the WHILE loop below. Any way to do this?
INSERT Tags (UserID, JobID, Code)
SELECT UserID, #JobID, ?????
FROM Radius
Unique random code generator:
WHILE EXISTS (SELECT * FROM Tags WHERE Code = #code)
BEGIN
select #code=#code+char(n) from
(
select top 3 number as n from master..spt_values
where type='p' and number between 48 and 57 or number between 65 and 90
order by newid()
)
END
CLARIFICATION: The reason for doing this is that I want to keep the random code generation logic at the level of the SQL stack. Implementing this in the app code would require me to check the db everytime a potential random code is generated to see if it is unique. As the number of code records increases so will the number of calls to the db as probability increases that there will be more duplicate codes generated before a unique one is generated.
Part One, Generate a table with all possible values
DECLARE #i int
CREATE TABLE #AllChars(value CHAR(1))
SET #i=48
WHILE #i<=57
BEGIN
INSERT INTO #Allchars(value) VALUES(CHAR(#i))
SET #i=#i+1
END
SET #i=65
WHILE #i<=90
BEGIN
INSERT INTO #Allchars(value) VALUES(CHAR(#i))
SET #i=#i+1
END
CREATE TABLE AllCodes(value CHAR(3),
CONSTRAINT PK_AllChars PRIMARY KEY CLUSTERED(value))
INSERT INTO AllCodes(value)
SELECT AllChars1.Value+AllChars2.Value+AllChars3.Value
FROM #AllChars AS AllChars1,#AllChars AS AllChars2,#AllChars AS AllChars3
This is a one off operation and takes around 1 second to run on SQL Azure. Now that you have all possible values in a table any future inserts become, something along the lines of
SELECT
RadiusTable.UserID,
RadiusTable.JobID,
IDTable.Value
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY UserID,JobID) As RadiusRow,
UserID,JobID
FROM Radius
) AS RadiusTable INNER JOIN
(
SELECT ROW_NUMBER() OVER (ORDER BY newID()) As IDRow,
Value
FROM AllCodes
) AS IDTable ON RadiusTable.RadiusRow = IDTable.IDRow
Before going with any of these schemes you had better be certain that you are not going to have more than 46656 rows in your table otherwise you will run out of unique ID Values.
I do not know if this is possible and suitable for your situation, but to me it seems that a scalar-valued function would be a solution.
Well, let me start over then.
This seems kind of ugly but it might work: newid() inside sql server function
The accepted answer that is.
Ah, been there done that too. The problem with this is that I am using T-SQL Stored Procedures that are called by Asp.net Where would I put the CREATE VIEW statement? I can't add it to the function file.