Generate random numbers, letters or characters within a range - sql

I'm in the middle of a data anonymization for SQL Server.
I have this 3 formulas that help me create what I want:
SELECT CHAR(cast((90 - 65) * rand() + 65 AS INTEGER)) -- only letters
SELECT CAST((128 - 48) * RAND() + 48 AS INTEGER) -- only numbers
SELECT CHAR(CAST((128 - 48) * RAND() + 48 AS INTEGER)) -- letters, numbers, symbols
However, this only can create 1 number or 1 letter or 1 symbol.
I want to have the freedom that allows me to create a random string or number of the length I want. Like 3 or 5 numbers, 3 or 5 letters, 3 or 5 between numbers, letters or symbols.
I also have found something very close to what I want:
SELECT LEFT(CAST(NEWID() AS VARCHAR(100)), 3) -- letters and numbers
this is a very smart idea because uses NEWID() and it allows me to create a random sequence of numbers and letters of the length I want (3 in this case). But symbols are missing.
I need 3 different SELECT:
One for numbers only
One for letters only
One for numbers, letters and symbols
With the freedom of choice about the length of the data.

Some work required for a complete solution but here's the workings of an idea you might want to experiment with further, if you still need it:
declare #type varchar(10)='letters', #length tinyint=5;
with chars as (
select top(59) 31 + Row_Number() over (order by (select 1)) n from master.dbo.spt_values
), s as (
select top (#length) Char(n.n) c
from chars n
where #type='all'
or (#type='symbols' and n between 33 and 47)
or (#type='numbers' and n between 48 and 57)
or (#type='letters' and n between 65 and 90)
order by newid()
)
select String_Agg(s.c,'')
from s

Recursive query might work with rand() function:
declare #desiredlength tinyint=5;
With builder As (
Select *
From (Values (0, '', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')) initial (length, randstr, pool)
Union All
Select length+1, randstr + substring(pool,cast(rand()*len(pool)+1 AS int),1), pool
From builder
Where length<#desiredlength
)
Select randstr From builder
Where length=#desiredlength
rand() in a single select returns the same random number in each row of a select. But here in a recursive select you're in a grey area where each recursion might be treated like a separate query.
Obviously you can tailor the pool definition to be any character set you want and the rest of the code will choose from whatever's there.

Related

How can I separate a string in BigQuery into multiple columns without breaking up distinct words?

I'm trying to separate a string into two columns, but only if the total string's length is larger than 25 characters. If it's shorter than 25 characters, then I want it on the 2nd column only. If it's longer than 25, then I want the first part of the string to be in the 1st column and the second part of the string to be in the 2nd column.
Here's the kicker... I don't want words to be broken up. So if the total length of the string is 26, I know that I'll need two columns, but I need to figure out where to splice up the string so that only complete words are represented in each column.
For example, the string is "Transportation Project Manager". Since it has over 25 characters, I want the first column to say "Transportation Project" and the second column to say "Manager". "Transportation Project" has less than 25 characters but I want it to stop there since there isn't another complete word that would fit within the 25 character limit.
Another example- The string is "Caseworker I". Since it's less than 25 characters, I want the whole string to be represented in column 2.
Thank you for your time!
In order to split a string into 2 columns respecting a defined maximum length (following the logic you described), we will use JavaScript User Defined Function in BigQuery (UDF) together with the builtin function LENGTH.
First, the string will be analysed. If the character after the maximum threshold is a white space then it will be split at the given maximum string length. However, if this is not the case, every single character will be checked, counting backwards, until a white space is found and the string will be split. Having this procedure, avoids the function to break up a word and it will be always split respecting the maximum allowed length.
Below is the query with some sample data,
CREATE TEMP FUNCTION split_str_1(s string,len int64)
RETURNS string
LANGUAGE js AS """
var len_aux = len, prev = 0;
//first part of the string within the threshold
output = [];
//the rest of the string wihtout the first part
output2 = [];
//if the next character in the string is a whitespace, them split the string
if(s[len_aux++] == ' ') {
output.push(s.substring(prev,len_aux));
output2.push(s.substring(prev,s.length));
}
else{
do {
if(s.substring(len_aux - 1, len_aux) == ' ')
{
output.push(s.substring(prev,len_aux));
prev = len_aux;
output2.push(s.substring(prev,s.length));
break;
}len_aux--;
} while(len_aux > prev)
}
//outputting the first part of the string
return output[0];
""";
CREATE TEMP FUNCTION split_str_2(s string,len int64)
RETURNS string
LANGUAGE js AS """
var len_aux = len, prev = 0;
//first part of the string within the threshold
output = [];
//the rest of the string wihtout the first part
output2 = [];
//if the next character in the string is a whitespace, them split the string
if(s[len_aux++] == ' ') {
output.push(s.substring(prev,len_aux));
output2.push(s.substring(prev,s.length));
}
else{
do {
if(s.substring(len_aux - 1, len_aux) == ' ')
{
output.push(s.substring(prev,len_aux));
prev = len_aux;
output2.push(s.substring(prev,s.length));
break;
}len_aux--;
} while(len_aux > prev)
}
//outputting the first part of the string
return output2[0];
""";
WITH data AS (
SELECT "Trying to split a string with more than 25 characters length" AS str UNION ALL
SELECT "Trying to split" AS str
)
SELECT str,
IF(LENGTH(str)>25, split_str_1(str,25), null) as column_1,
CASE WHEN LENGTH(str)>25 THEN split_str_2(str,25) ELSE str END AS column_2
FROM data
And the output,
Notice that there are 2 JavaScript UDF's, this is because the first one returns the first part of the string and the second returns the second part, when the string is longer than 25 characters. Also, the maximum allowed length is passed as an argument, but it can be statically defined within the UDF as len=25.
I think your angle of attack should be to find the first space before the 25th character and then split based on that.
Using other submitted answers phrases as sample data:
with sample_data as(
select 'Transportation Project Manager' as phrase union all
select 'Caseworker I'as phrase union all
select "This's 25 characters long" as phrase union all
select "This's 25 characters long (not!)" as phrase union all
select 'Antidisestablishmentarianist' as phrase union all
select 'Trying to split a string with more than 25 characters in length' as phrase union all
select 'Trying to split' as phrase
),
temp as (
select
phrase,
length(phrase) as phrase_len,
-- Find the first space before the 25th character
-- by reversing the first 25 characters
25-strpos(reverse(substr(phrase,1,25)),' ') as first_space_before_25
from sample_data
)
select
phrase,
phrase_len,
first_space_before_25,
case when phrase_len <= 25 or first_space_before_25 = 25 then null
when phrase_len > 25 then substr(phrase,1,first_space_before_25)
else null
end as col1,
case when phrase_len <= 25 or first_space_before_25 = 25 then phrase
when phrase_len > 25 then substr(phrase,first_space_before_25+1, phrase_len)
else null
end as col2
from temp
I think this gets you pretty close using basic sql string manipulation. You might need/want to clean this up a bit depending on if you want col2 to start with a space or be trimmed, and depending on your cutoff point (you mentioned less than 25 and greater than 25, but not exactly 25).
Below is for BigQuery Standard SQL
#standardSQL
SELECT phrase,
IF(IFNULL(cut, len ) >= len, NULL, SUBSTR(phrase, 1, cut)) col1,
IF(IFNULL(cut, len ) >= len, phrase, SUBSTR(phrase, cut + 1)) col2
FROM (
SELECT phrase, LENGTH(phrase) len,
(
SELECT cut FROM (
SELECT -1 + SUM(LENGTH(word) + 1) OVER(ORDER BY OFFSET) AS cut
FROM UNNEST(SPLIT(phrase, ' ')) word WITH OFFSET
)
WHERE cut <= 25
ORDER BY cut DESC
LIMIT 1
) cut
FROM `project.dataset.table`
)
You can test, play with above using sample data (nicely provided in other answers) as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Transportation Project Manager' AS phrase UNION ALL
SELECT 'Caseworker I' UNION ALL
SELECT "This's 25 characters long" UNION ALL
SELECT "This's 25 characters long (not!)" UNION ALL
SELECT 'Antidisestablishmentarianist' UNION ALL
SELECT 'Trying to split a string with more than 25 characters in length' UNION ALL
SELECT 'Trying to split'
)
SELECT phrase,
IF(IFNULL(cut, len ) >= len, NULL, SUBSTR(phrase, 1, cut)) col1,
IF(IFNULL(cut, len ) >= len, phrase, SUBSTR(phrase, cut + 1)) col2
FROM (
SELECT phrase, LENGTH(phrase) len,
(
SELECT cut FROM (
SELECT -1 + SUM(LENGTH(word) + 1) OVER(ORDER BY OFFSET) AS cut
FROM UNNEST(SPLIT(phrase, ' ')) word WITH OFFSET
)
WHERE cut <= 25
ORDER BY cut DESC
LIMIT 1
) cut
FROM `project.dataset.table`
)
with output
Row phrase col1 col2
1 Transportation Project Manager Transportation Project Manager
2 Caseworker I null Caseworker I
3 This's 25 characters long null This's 25 characters long
4 This's 25 characters long (not!) This's 25 characters long (not!)
5 Antidisestablishmentarianist null Antidisestablishmentarianist
6 Trying to split a string with more than 25 characters in length Trying to split a string with more than 25 characters in length
7 Trying to split null Trying to split
Note: if you want to get rid of leading (in col2) and trailing (in col1) spaces - you can just add TRIM() to handle this little extra logic
Wow, this is a great interview question! Here's what I came up with:
WITH sample_data AS
(
SELECT 'Transportation Project Manager' AS phrase
UNION ALL
SELECT 'Caseworker I' AS phrase
UNION ALL
SELECT "This's 25 characters long" AS phrase
UNION ALL
SELECT "This's 25 characters long (not!)" AS phrase
UNION ALL
SELECT 'Antidisestablishmentarianist' AS phrase
),
unnested_words AS --Make a dataset with one row per "word" per phrase
(
SELECT
*,
--To preserve the spaces for character counts, prepend one to every word but the first
CASE WHEN i = 0 THEN '' ELSE ' ' END || word AS word_with_space
FROM
sample_data
CROSS JOIN
UNNEST(SPLIT(phrase, ' ')) AS word WITH OFFSET AS i
),
with_word_length AS
(
SELECT
*,
--This doesn't need its own CTE, but done here for clarity
LENGTH(word_with_space) AS word_length
FROM
unnested_words
),
running_sum AS --Mark when the total character length exceeds 25
(
SELECT
*,
SUM(word_length) OVER (PARTITION BY phrase ORDER BY i) <= 25 AS is_first_25
FROM
with_word_length
),
by_subphrase AS --Make a subphrase of words in the first 25, and one for any others
(
SELECT
phrase,
ARRAY_TO_STRING(ARRAY_AGG(word), '') AS subphrase
FROM
running_sum
GROUP BY
phrase, is_first_25
),
by_phrase AS --Put subphrases into an array (back to one row per phrase)
(
SELECT
phrase, ARRAY_AGG(subphrase) AS subphrases
FROM
by_subphrase
GROUP BY
1
)
SELECT
phrase,
--Break the array of subphrases into columns per your rules
CASE WHEN ARRAY_LENGTH(subphrases) = 1 THEN subphrases[OFFSET(0)] ELSE subphrases[OFFSET(1)] END,
CASE WHEN ARRAY_LENGTH(subphrases) = 1 THEN NULL ELSE subphrases[OFFSET(0)] END
FROM
by_phrase
Not very pretty but gets it done.

SQL procedure algorithm generating numbers in HEX (using strings)

I'm trying to make an algorithm in SQL (stored procedure) which will generate numbers in a non-decimal number system (we can use HEX here) and write them down into file. The only issue is that, I need to use only strings to do that.
I have declared my set of characters: '0123456789ABCDEF' and now I need to make a loop in witch I generate next element and save it into file.
Maybe you are looking for a decimal -> hex conversion?
with data (n) as (
values (90)
union all
select n + 1
from data
where n < 100
),
alphabet (a) as (
values ('0123456789ABCDEF')
),
dechex (orig,n,hx) as (
select n, n / 16,cast(substr(a,mod(n, 16) + 1, 1) as varchar(10))
from data
cross join alphabet
union all
select orig,n / 16,substr(a,mod(n, 16) + 1, 1) concat hx
from dechex
cross join alphabet
where n > 0
)
select orig,hx
from dechex
where n = 0
order by orig

how to repeat characters in a string

I'm trying to link two tables, one has an 'EntityRef' that's made of four alpha characters and a sequential number...
EntityRef
=========
SWIT1
LIVE32
KIRB48
MEHM38
BRAD192
The table that I'm trying to link to stores the reference in a 15 character field where the 4 alphas are at the start and the numbers are at the end but with zeros in between to make up the 15 characters...
EntityRef
=========
SWIT00000000001
LIVE00000000032
So, to get theses to link, my options are to either remove the zeros on one field or add the zeros on the other.
I've gone for the later as it seems to be a simpler approach and eliminates the risk of getting into problems if the numeric element contains a zero.
So, the alpha is always 4 characters at the beginning and the number is the remainder and 15 minus the LEN() of the EntityRef is the number of zeros that I need to insert...
left(entityref,4) as 'Alpha',
right(entityref,len(EntityRef)-4) as 'Numeric',
15-len(EntityRef) as 'No.of Zeros'
Alpha Numeric No.of Zeros
===== ======= ===========
SWIT 1 10
LIVE 32 9
KIRB 48 9
MEHM 38 9
MALL 36 9
So, I need to concatenate the three elements but I don't know how to create the string of zeros to the specified length...how do I do that??
Concat(Alpha, '0'*[No. of Zeros], Numeric)
What is the correct way to repeat a character a specified number of times?
You can use string manipulation. In this case:
LEFT() to get the alpha portion.
REPLICATE() to get the zeros.
STUFF() to get the number.
The query:
select left(val, 4) + replicate('0', 15 - len(val)) + stuff(val, 1, 4, '')
from (values ('SWIT1'), ('ABC12345')) v(val)
You may try left padding with zeroes:
SELECT
LEFT(EntityRef, 4) +
RIGHT('00000000000' + SUBSTRING(ISNULL(EntityRef,''), 5, 30), 11) AS EntityRef
FROM yourTable;
Demo
With casting to integer the numeric part:
select *
from t1 inner join t2
on concat(left(t2.EntityRef, 4), cast(right(t2.EntityRef, 11) as bigint)) = t1.EntityRef
See the demo.
I found the answer as soon as I posted the question (sometimes it helps you think it through!).
(left(entityref,4) + replicate('0',15-len(EntityRef)) +
right(entityref,len(EntityRef)-4)),

How to auto generate a ID with random numbers in sql server

For example I had a column named 'ID'
I want to get the output as
ID
---
ABCXX708
ABCXX976
ABCXX654
ABCXX081
In short ABCXX should be common for every row but the remaining 3 numbers should be random and integer..
with t (n) as (select 0 union all select n+1 from t where n <100)
select 'ABC'
+ format(n,'00')
+ cast(cast(rand(cast(newid() as varbinary(100)))*10 as int) as char(1))
from t
Alternative solution
with t (n) as (select 0 union all select n+1 from t where n <100)
select 'ABC'
+ right ('0' + cast(n as varchar(2)),2)
+ cast(cast(rand(cast(newid() as varbinary(100)))*10 as int) as char(1))
from t
You can write like this
select 'ABCXX'+CAST(FLOOR(RAND()*(1000-100)+100) as varchar(3)) 'id'
With the RAND() function you can get Random numbers. And For the 'ABCXX' you can follow your previous logic.
SELECT CAST(RAND()*10.00 AS INT)
The above RAND() function will give values between 0.0 to 1.0 in decimals every time you hit the Statement. To make it for a single digit Multiply with 10 and Cast it to INT to remove the next decimal values.
Reference " MSDN
Since SQL Server 2012 you have FORMAT function and SEQUENCE object. Hence the below query will work.
First you need to create a Sequence object.
CREATE SEQUENCE DemopSeq
START WITH 1
INCREMENT BY 1;
Then the following query will generate results as per your requirement.
SELECT CONCAT('ABC',FORMAT(NEXT VALUE FOR DemopSeq, '00'),ABS(Checksum(NewID()) % 10))
Hope this helps.

Selecting Strings With Alphabetized Characters - In SQL Server 2008 R2

This is a recreational pursuit, and is not homework. If you value academic challenges, please read on.
A radio quiz show had a segment requesting listeners to call in with words that have their characters in alphabetical order, e.g. "aim", "abbot", "celt", "deft", etc. I got these few examples by a quick Notepad++ (NPP) inspection of a Scrabble dictionary word list.
I'm looking for an elegant way in T-SQL to determine if a word qulifies for the list, i.e. all its letters are in alpha order, case insensitive.
It seemed to me that there should be some kind of T-SQL algorithm possible that will do a SELECT on a table of English words and return the complete list of all words in the Srcabble dictionary that meets the spec. I've spent considerable time looking at regex strings, but haven't hit on anything that comes even remotely close. I've thought about the obvious looping scenario, but abandoned it for now as "inelegant". I'm looking for your ideas that will obtain the qualifying word list,
preferably using
- a REGEX expression
- a tally-table-based approach
- a scalar UDF that returns 1 if the input word meets the requirement, else 0.
- Other, only limited by your creativity.
But preferably NOT using
- a looping structure
- a recursive solution
- a CLR solution
Assumptions/observations:
1. A "word" is defined here as two or more characters. My dictionary shows 55 2-character words, of which only 28 qualify.
2. No word will have more than two concecutive characters that are identical. (If you find one, please point it out.)
3. At 21 characters, "electroencephalograms" is the longest word in my Scrabble dictionary
(though why that word is in the Scrabble dictionary escapes me--the board is only a 15-by-15 grid.)
Consider 21 as the upper limit on word length.
4. All words LIKE 'Z%' can be dismissed because all you can create is {'Z','ZZ', ... , 'ZZZ...Z'}.
5. As the dictionary's words' initial character proceedes through the alphabet, fewer words will qualify.
6. As the word lengths get longer, fewer words will qualify.
7. I suspect that there will be less than 0.2% of my dictionary's 60,387 words that will qualify.
For example, I've tried NPP regex searches like "^a[a-z][b-z][b-z][c-z][c-z][d-z][d-z][e-z]" for 9-letter words starting with "a", but the character-by-character alphabetic enforcement is not handled properly. This search will return "abilities" which fails the test with the "i" that follows the "l".
There's several free Scrabble word lists available on the web, but Phil Factor gives a really interesting treatment of T-SQL/Scrabble considerations at https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-scrabble-and-rapping/ which is where I got my word list.
Care to give it a shot?
Split the word into individual characters using a numbers table. Use the numbers as one set of indices. Use ROW_NUMBER to create another set. Compare the two sets of indices to see if they match for every character to see if they match. If they do, the letters in the word are in the alphabetical order.
DECLARE #Word varchar(100) = 'abbot';
WITH indexed AS (
SELECT
Index1 = n.Number,
Index2 = ROW_NUMBER() OVER (ORDER BY x.Letter, n.Number),
x.Letter
FROM
dbo.Numbers AS n
CROSS APPLY
(SELECT SUBSTRING(#Word, n.Number, 1)) AS x (Letter)
WHERE
n.Number BETWEEN 1 AND LEN(#Word)
)
SELECT
Conclusion = CASE COUNT(NULLIF(Index1, Index2))
WHEN 0 THEN 'Alphabetical'
ELSE 'Not alphabetical'
END
FROM
indexed
;
The NULLIF(Index, Index2) expression does the comparison: it returns a NULL if the the arguments are equal, otherwise it returns the value of Index1. If all indices match, all the results will be NULL and COUNT will return 0, which means the order of letters in the word was alphabetical.
I did something similar to Andriy. I created a numbers table with value 1-21. I use it to create one set of data with the individual letters order by the index and the a second set ordered alphabetically. Joined the sets to each other on the letter and numbers. I then count nulls. Anything over 0 means it is not in order.
DECLARE #word VARCHAR(21)
SET #word = 'abbot'
SELECT Count(1)
FROM (SELECT Substring(#word, number, 1) AS Letter,
Row_number() OVER ( ORDER BY number) AS letterNum
FROM numbers
WHERE number <= CONVERT(INT, Len(#word))) a
LEFT OUTER JOIN (SELECT Substring(#word, number, 1) AS letter,
Row_number() OVER ( ORDER BY Substring(#word, number, 1)) AS letterNum
FROM numbers
WHERE number <= CONVERT(INT, Len(#word))) b
ON a.letternum = b.letternum
AND a.letter = b.letter
WHERE b.letter IS NULL
Interesting idea...
Here's my take on it. This returns a list of words that are in order, but you could easily return 1 instead.
DECLARE #WORDS TABLE (VAL VARCHAR(MAX))
INSERT INTO #WORDS (VAL)
VALUES ('AIM'), ('ABBOT'), ('CELT'), ('DAVID')
;WITH CHARS
AS
(
SELECT VAL AS SOURCEWORD, UPPER(VAL) AS EVALWORD, ASCII(LEFT(UPPER(VAL),1)) AS ASCIICODE, RIGHT(VAL,LEN(UPPER(VAL))-1) AS REMAINS, 1 AS ROWID, 1 AS INORDER, LEN(VAL) AS WORDLENGTH
FROM #WORDS
UNION ALL
SELECT SOURCEWORD, REMAINS, ASCII(LEFT(REMAINS,1)), RIGHT(REMAINS,LEN(REMAINS)-1), ROWID+1, INORDER+CASE WHEN ASCII(LEFT(REMAINS,1)) >= ASCIICODE THEN 1 ELSE 0 END AS INORDER, WORDLENGTH
FROM CHARS
WHERE LEN(REMAINS)>=1
),
ONLYINORDER
AS
(
SELECT *
FROM CHARS
WHERE ROWID=WORDLENGTH AND INORDER=WORDLENGTH
)
SELECT SOURCEWORD
FROM ONLYINORDER
Here it is as a UDF:
CREATE FUNCTION dbo.AlphabetSoup (#Word VARCHAR(MAX))
RETURNS BIT
AS
BEGIN
SET #WORD = UPPER(#WORD)
DECLARE #RESULT INT
;WITH CHARS
AS
(
SELECT #WORD AS SOURCEWORD,
#WORD AS EVALWORD,
ASCII(LEFT(#WORD,1)) AS ASCIICODE,
RIGHT(#WORD,LEN(#WORD)-1) AS REMAINS,
1 AS ROWID,
1 AS INORDER,
LEN(#WORD) AS WORDLENGTH
UNION ALL
SELECT SOURCEWORD,
REMAINS,
ASCII(LEFT(REMAINS,1)),
RIGHT(REMAINS,LEN(REMAINS)-1),
ROWID+1,
INORDER+CASE WHEN ASCII(LEFT(REMAINS,1)) >= ASCIICODE THEN 1 ELSE 0 END AS INORDER,
WORDLENGTH
FROM CHARS
WHERE LEN(REMAINS)>=1
),
ONLYINORDER
AS
(
SELECT 1 AS RESULT
FROM CHARS
WHERE ROWID=WORDLENGTH AND INORDER=WORDLENGTH
UNION
SELECT 0
FROM CHARS
WHERE NOT (ROWID=WORDLENGTH AND INORDER=WORDLENGTH)
)
SELECT #RESULT = RESULT FROM ONLYINORDER
RETURN #RESULT
END