I have a table that contains a column that has all NULL values. I would like to populate this column with a random number from a given set of numbers.
The set of given numbers will be generated from a SELECT statement that select these numbers from some other table.
E.G:
UPDATE tableA
SET someColumnName = SomeRandomNumberFromSet(SELECT number from tb_Numbers)
How do I accomplish this using MSSQL 2008?
The following isn't particularly efficient but works. The view is required to get around the "Invalid use of a side-effecting operator 'newid' within a function." error. The UDF is assumed to be non deterministic so will always be re-evaluated for each row.
This will avoid any problems with SQL Server adding spools to the plan and replaying earlier results.
If the number of rows to update (or numbers in the set) was much larger I wouldn't use this method.
CREATE VIEW dbo.OneNumber
AS
SELECT TOP 1 number
FROM master..spt_values
ORDER BY NEWID()
GO
CREATE FUNCTION dbo.PickNumber ()
RETURNS int
AS
BEGIN
RETURN (SELECT number FROM dbo.OneNumber)
END
GO
DECLARE #tableA TABLE (someColumnName INTEGER)
INSERT INTO #tableA VALUES (2), (2), (2), (2), (2)
UPDATE #tableA
SET someColumnName = dbo.PickNumber()
SELECT * FROM #tableA
I asked a similar question a long time ago, and got a few different options.
Is this a good or bad way of generating random numbers for each record?
Once you can generate a random number from 1 to n, you can use it to choose the Xth irem from your list. (Easiest way is to have a sequential id on your set of legitimate values.)
Related
I am using the function below to generate a random number between 0 and 99999999999.
CREATE VIEW [dbo].[rndView]
AS
SELECT RAND() rndResult
GO
ALTER function [dbo].[RandomPass]()
RETURNS NUMERIC(18,0)
as
begin
DECLARE #RETURN NUMERIC(18,0)
DECLARE #Upper NUMERIC(18,0);
DECLARE #Lower NUMERIC(18,0);
DECLARE #Random float;
SELECT #Random = rndResult
FROM rndView
SET #Lower = 0
SET #Upper = 99999999999
set #RETURN= (ROUND(((#Upper - #Lower -1) * #Random + #Lower), 0))
return #RETURN
end;
However, I need to make sure that the returned number has never been used before in the same app. In .net I would create a while loop and keep looping until the returned value is not found in a table that stores previously used values. Is there a way to achieve the same result directly in SQL, ideally without using loops? If there is no way to do that without loops, I think it would still be more efficient to do it in an SQL function rather than having a loop in .net performing a number of query requests.
You will need to store the used values in a table, and a recursive query to generate the next value.
The answer depends on the RDBMS you are using.
Below are two examples, in PostgreSQL and MS SQL Server, that would solve your problem.
PostgreSQL
First, create a table that will hold your consumed ids :
CREATE TABLE consumed_ids (
id BIGINT PRIMARY KEY NOT NULL
);
The PRIMARY KEY is not strictly necessary, but it will
generate an index which will speed up the next query;
ensure that two equal ids are never generated.
Then, use the following query to obtain a new id :
WITH RECURSIVE T AS (
SELECT 1 AS n, FLOOR(RANDOM() * 100000000000) AS v
UNION ALL
SELECT n + 1, FLOOR(RANDOM() * 100000000000)
FROM T
WHERE EXISTS(SELECT * FROM consumed_ids WHERE id = v)
)
INSERT INTO consumed_ids
SELECT v
FROM T
ORDER BY n DESC
LIMIT 1
RETURNING id;
The logic is that as long as the (last) id generated is already consumed, we generate a new id. The column n of the CTE is there only to retrieve the last generated id at the end, but you may also use it to limit the number of generated random numbers (for example, give up if n > 10).
(tested using PostgreSQL 12.4)
MS SQL Server
First, create a table that will hold your consumed ids :
CREATE TABLE consumed_ids (
id BIGINT PRIMARY KEY NOT NULL
);
Then, use the following query to obtain a new id :
WITH T AS (
SELECT 1 AS n, FLOOR(RAND() * 100000000000) AS v
UNION ALL
SELECT n + 1, FLOOR(RAND() * 100000000000)
FROM T
WHERE EXISTS(SELECT * FROM consumed_ids WHERE id = v)
)
INSERT INTO consumed_ids (id)
OUTPUT Inserted.id
SELECT TOP 1 v
FROM T
ORDER BY n DESC;
(tested using MS SQL Server 2019).
Note however that MS SQL Server will give up after 100 tries by default.
There is no such thing as a random number without replacement.
You need to store the numbers that have already been used in a table, which I would define as:
create table used_random_numbers (
number decimal(11, 0) primary key
);
Then, when you create a new number, insert it into the table.
In the part of the code that generates a number, use a while loop. Within the while loop, check that the number doesn't exist.
Now, there are some things you can do to make this more efficient as the numbers grow larger -- and ways that don't require remembering all the previous values.
First, perhaps UUID/GUID is sufficient. This is the industry standard for a "random" id -- although it is a HEX string rather than a number in most databases. The exact syntax depends on the database.
Another approach is to have an 11 digit number. The first or last 10 digits could be the Unix epoch time (seconds since 1970-01-01) -- either explicitly or under some transformation so the value "looks" random. The additional digit would then be a random digit. Of course, you could extend this to minutes or days so you have more random digits.
My current objective is creation of a dynamic select and where query. In our case we have a reference table. Reference table is contains references column values and table name. We create a dynamic query on the other table(s) with these references. For example:
select * from {table} where {pk1} in (...) and {pk2} in {...}
There was a problem in this generated query. The problem is some rows are returns in the result but they are not in query because in queries are different conditions.
Changed sql generation to like this
select * from {table} where ( {pk1}=(value1) and {pk2}=(value2) ) or ( {pk1}=(value3) and {pk2}=(value4) ) ...)
In this case the problem was solved but query execution time is not good instead of "IN" query. It is about 10-20 times slower. And the query string too big instead of "IN" query.
We can not use the table valued queries because we can not create a new TYPE for each table dynamically and the columns are not same type and order. And new table types may be inserted in future.
So, what is the best practice to do this?
Regards
I think, you can use temp table for "in" statement like this,
IF (OBJECT_ID('tempdb..#tid') IS NULL)
BEGIN
CREATE TABLE #tid( Id BIGINT);
INSERT INTO #tid (Id) VALUES (1),
(2),
(3),
(4),
(5),
(6),
(7),
(8),
--......
(10000);
END
select * from {table} where {pk1} in (SELECT Id FROM #tid);
this usage faster then "in(1,2,3,4,..., 10000).
I need to generate a unique 13 digit number.
Can Sql Server generate this number for me somehow if I create a table with the 13 digit number as a primary key?
Update
I want the number to look like a random number, so not an autoincrementing number.
It has to be 13 digits, and it shouldn't be auto-incrementing, and it should be unique. The number shouldn't have many zero's in it but it can contain numbers from 0-9.
This number should look like a credit card number, so no trailing zeros.
My suggestion would be to have an identity column on the table that auto-increments. Then, define your value based on this. A simple way would be:
create table t (
tId int identity(1, 1) not null,
. . .
myId cast(rand(tId)*10000000000000 as varchar(13))
)
This shows it as a computed column. Of course, you can assign the value when each row is created. This is not guaranteed to produce different results, but it is highly, highly unlikely that you would see a collision.
The following is an alternative is also not guaranteed, but might work:
create table t (
tId varchar(13) default cast(cast(rand(checksum(getdate())*10000000000000 as bigint) as varchar(13)
. . .
)
EDIT:
The chance of a collision is a bit higher than I expected -- my intuition on 13-digit hash codes is, I guess, not what it should be.
In any case, there are two sources of collisions. The first is the random number generator producing the same value. To handle that, just make the assumption that the random number generator in conjunction with checksum() really is random. So, the question is: What is the chance of two random numbers less than 10,000,000,000,000 being the same value? I'll let interested parties search the web for a formula to calculate this.
If you generate 1,000 numbers. Then the probability is basically 0% that any two would be the same. That is, you are safe for the first 1000 numbers, if you assume they are distinct. Here is a summary:
1,000 0.0000%
10,000 0.0005%
100,000 0.0500%
1,000,000 4.8771%
10,000,000 99.3262%
Up to a few hundred thousand values, you are probably pretty safe. When you get into the millions -- even the low millions -- the chance of collision increases substantially.
At some point, if you want lots and lots of unique values, you are going to have to create a table that contains the unique values and a process for choosing a value not in the table.
As John Barça points out: Do not use this method for cat photos on Facebook.
Just create a Guid (select newid() ) & parse it... remove the { and the '-' & do a length of 13 select on it.
I use a similar method to generate random/unique table names in a reporting system. It might do the trick for you. Just adjust the multiplier to impact the final integer length.
SELECT CONVERT(BIGINT,RAND()*10000000000000)
As a table...
CREATE TABLE #test (testID INT,
UniqueID AS CONVERT(BIGINT,RAND()*10000000000000))
INSERT INTO #test (testID)
SELECT 1
SELECT * FROM #TEST
DROP TABLE #test
Just insert a value into testID (can be 1 every time) and a new UniqueID will generate. You should have a primary key on any production table though.
NB: While the chances of a duplicate ever happening are very small, it could still happen.
SELECT CEILING(RAND()*9999999999999)
something like this might work:
use [chamomile];
go
if object_id(N'[utility].[table_01]', N'U') is not null
drop table [utility].[table_01];
go
if object_id(N'[utility].[generate_random_sequence]', N'FN') is not null
drop function [utility].[generate_random_sequence];
go
/*
select [utility].[generate_random_sequence] (rand());
*/
create function [utility].[generate_random_sequence] (
#random [float])
returns [bigint]
as
begin
declare #return [bigint] = ceiling(#random * 9999999999999);
while #return > 9999999999999
or #return < 1000000000000
set #return = ceiling(#random * 9999999999999);
return #return;
end;
go
if object_id(N'[utility].[table_01]', N'U') is not null
drop table [utility].[table_01];
go
create table [utility].[table_01] (
[my_id] as [utility].[generate_random_sequence] (rand())
, [flower] [sysname]
);
go
insert into [utility].[table_01]
([flower])
values (N'rose');
select *
from [utility].[table_01];
I have a T-SQL routine that copies user information from one table 'Radius' to another 'Tags'. However, as the rows are transfered, I would also like to include a unique randomly generated code in the INSERT (3 chars long). The code is generated by the WHILE loop below. Any way to do this?
INSERT Tags (UserID, JobID, Code)
SELECT UserID, #JobID, ?????
FROM Radius
Unique random code generator:
WHILE EXISTS (SELECT * FROM Tags WHERE Code = #code)
BEGIN
select #code=#code+char(n) from
(
select top 3 number as n from master..spt_values
where type='p' and number between 48 and 57 or number between 65 and 90
order by newid()
)
END
CLARIFICATION: The reason for doing this is that I want to keep the random code generation logic at the level of the SQL stack. Implementing this in the app code would require me to check the db everytime a potential random code is generated to see if it is unique. As the number of code records increases so will the number of calls to the db as probability increases that there will be more duplicate codes generated before a unique one is generated.
Part One, Generate a table with all possible values
DECLARE #i int
CREATE TABLE #AllChars(value CHAR(1))
SET #i=48
WHILE #i<=57
BEGIN
INSERT INTO #Allchars(value) VALUES(CHAR(#i))
SET #i=#i+1
END
SET #i=65
WHILE #i<=90
BEGIN
INSERT INTO #Allchars(value) VALUES(CHAR(#i))
SET #i=#i+1
END
CREATE TABLE AllCodes(value CHAR(3),
CONSTRAINT PK_AllChars PRIMARY KEY CLUSTERED(value))
INSERT INTO AllCodes(value)
SELECT AllChars1.Value+AllChars2.Value+AllChars3.Value
FROM #AllChars AS AllChars1,#AllChars AS AllChars2,#AllChars AS AllChars3
This is a one off operation and takes around 1 second to run on SQL Azure. Now that you have all possible values in a table any future inserts become, something along the lines of
SELECT
RadiusTable.UserID,
RadiusTable.JobID,
IDTable.Value
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY UserID,JobID) As RadiusRow,
UserID,JobID
FROM Radius
) AS RadiusTable INNER JOIN
(
SELECT ROW_NUMBER() OVER (ORDER BY newID()) As IDRow,
Value
FROM AllCodes
) AS IDTable ON RadiusTable.RadiusRow = IDTable.IDRow
Before going with any of these schemes you had better be certain that you are not going to have more than 46656 rows in your table otherwise you will run out of unique ID Values.
I do not know if this is possible and suitable for your situation, but to me it seems that a scalar-valued function would be a solution.
Well, let me start over then.
This seems kind of ugly but it might work: newid() inside sql server function
The accepted answer that is.
Ah, been there done that too. The problem with this is that I am using T-SQL Stored Procedures that are called by Asp.net Where would I put the CREATE VIEW statement? I can't add it to the function file.
I'm phrasing the question title poorly as I'm not sure what to call what I'm trying to do but it really should be simple.
I've a link / join table with two ID columns. I want to run a check before saving new rows to the table.
The user can save attributes through a webpage but I need to check that the same combination doesn't exist before saving it. With one record it's easy as obviously you just check if that attributeId is already in the table, if it is don't allow them to save it again.
However, if the user chooses a combination of that attribute and another one then they should be allowed to save it.
Here's an image of what I mean:
So if a user now tried to save an attribute with ID of 1 it will stop them, but I need it to also stop them if they tried ID's of 1, 10 so long as both 1 and 10 had the same productAttributeId.
I'm confusing this in my explanation but I'm hoping the image will clarify what I need to do.
This should be simple so I presume I'm missing something.
If I understand the question properly, you want to prevent the combination of AttributeId and ProductAttributeId from being reused. If that's the case, simply make them a combined primary key, which is by nature UNIQUE.
If that's not feasible, create a stored procedure that runs a query against the join for instances of the AttributeId. If the query returns 0 instances, insert the row.
Here's some light code to present the idea (may need to be modified to work with your database):
SELECT COUNT(1) FROM MyJoinTable WHERE AttributeId = #RequestedID
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO MyJoinTable ...
END
You can control your inserts via a stored procedure. My understanding is that
users can select a combination of Attributes, such as
just 1
1 and 10 together
1,4,5,10 (4 attributes)
These need to enter the table as a single "batch" against a (new?) productAttributeId
So if (1,10) was chosen, this needs to be blocked because 1-2 and 10-2 already exist.
What I suggest
The stored procedure should take the attributes as a single list, e.g. '1,2,3' (comma separated, no spaces, just integers)
You can then use a string splitting UDF or an inline XML trick (as shown below) to break it into rows of a derived table.
Test table
create table attrib (attributeid int, productattributeid int)
insert attrib select 1,1
insert attrib select 1,2
insert attrib select 10,2
Here I use a variable, but you can incorporate as a SP input param
declare #t nvarchar(max) set #t = '1,2,10'
select top(1)
t.productattributeid,
count(t.productattributeid) count_attrib,
count(*) over () count_input
from (select convert(xml,'<a>' + replace(#t,',','</a><a>') + '</a>') x) x
cross apply x.x.nodes('a') n(c)
cross apply (select n.c.value('.','int')) a(attributeid)
left join attrib t on t.attributeid = a.attributeid
group by t.productattributeid
order by countrows desc
Output
productattributeid count_attrib count_input
2 2 3
The 1st column gives you the productattributeid that has the most matches
The 2nd column gives you how many attributes were matched using the same productattributeid
The 3rd column is how many attributes exist in the input
If you compare the last 2 columns and the counts
match - you can use the productattributeid to attach to the product which has all these attributes
don't match - then you need to do an insert to create a new combination