Randomize integers in a string - sql

I am attempting to randomize all integers in a string.
E.g "Transferred to account 123456789" randomized into "Transferred to account 256829876"
I already have a slow solution in PL/SQL where I am looping through each character in the string individually. If char is an asci value between 48-57 (digits 0 to 9), I randomize the digit accordingly.
In SQL I have gotten this far:
select regexp_replace('Transferred to account 05172262116','[0-9]',
floor(dbms_random.value(0, 10)))
from dual;
However, this does not give me the expected result as integers are replaced with a single unique value. (E.g. 'Transferred to account 555555555')
Is it possible to achieve what I am looking for via use of SQL?
Thanks.

If you know the numbers are always 11 digits, you can explicitly look for that:
select regexp_replace('Transferred to account 05172262116','[0-9]{11}', floor(dbms_random.value(10000000000, 99999999999)))
from dual;
Otherwise, you can replace with an integer, but the length may not be the same length as the original one:
select regexp_replace('Transferred to account 05172262116','[0-9]+', floor(dbms_random.value(10000000000, 99999999999)))
from dual;
As a note: things like account numbers are often removed using translate(), but this produces a fixed string:
select translate('Transferred to account 05172262116', ' 0123456789', ' ##########')
from dual;
(And you can do the same thing with regexp_replace().)

This answer may be viewed as a cop-out, but I would argue that information as sensitive as an account number should not be shown in any form, even if the digits are randomly permuted. So, I recommend just completely masking the account number using e.g.
SELECT
REGEXP_REPLACE('Transferred to account 05172262116', '[0-9]', '*')
FROM dual;
Even the above presents some security risk, because it shows the same number of * as there are digits in the account number. But, it is often the case, e.g. with credit cards or account numbers at a given bank, that all account numbers have the same length anyway.

The issue that you are having is that you are doing the replace once. This gets you one value to replace each character with. To do this correctly you would have to loop through each character and get a new random value to replace it with.

You could use translate() with a single 10-digit random number:
select translate('Transferred to account 05172262116',
'1234567890',
floor(dbms_random.value(1000000000, 10000000000))) from dual;
TRANSLATE('TRANSFERREDTOACCOUNT051
----------------------------------
Transferred to account 81677787668
It will work with any number of digits anywhere in the string, and preserves the original length (number of digits) of the replaced value. It maps an original digit to the same (random) digit each time, at least within that string. (If you apply the same translate across multiple source rows at one, they will get different mappings as dbms_random is non-deterministic).
with t (s) as (
select 'Transferred to account 05172262116' from dual
union all
select 'Transferred to account 05172262116' from dual
)
select s, translate(s,
'1234567890',
floor(dbms_random.value(1000000000, 10000000000))) from t;
S TRANSLATE(S,'1234567890',FLOOR(DBM
---------------------------------- ----------------------------------
Transferred to account 05172262116 Transferred to account 57238858225
Transferred to account 05172262116 Transferred to account 95587747554
Each digit in your original string is translated to the corresponding digit in the random number. For instance, the first output above came from the generated random number 6703187918. The first digit of you original string was 0; that's the 10th digit of the second argument to translate(); so you get the 10th digit of the (random) replacement string which is the third argument to that function - which is 8. The second digit in your string is 5, which is the 5th digit in the second argument; so you get the 5th digit in the third argument - which is 7. And so on.
It's arguable if this is random enough, I suppose, but the main goal is presumably to stop you reconstructing the original value from the replacement. You could potentially learn something about the shape of the original value by looking for repetitions new one; but as you could have repeated characters in the random value too that doesn't get you very far.
For instance, in the example above the replacement has a row of three consecutive 7s, so you might think the original has three consecutive digits too - but it didn't. The random value had two positions - 2nd and 7th - which both mapped to 7 in the new string, and you can't tell which of those mapping was applied. (So even if you knew the random value you couldn't get back to the original, in this case anyway - it won't always have repeated numbers, of course.)

Related

Number of consecutive digits in a column string

I am trying to count the number of consecutive digits appear in a string column, let me give an example to illustrate better what i am trying to do. If i have table called email
email
lucas1234#gmail.com
fer12#gmail.com
lupal#gmail.com
carlos1perez222#gmail.com
my expected output would be
email count_cons_digits
lucas1234#gmail.com 4
fer12#gmail.com 2
lupal#gmail.com 0
carlos1perez222#gmail.com 3
You could use a regex replacement with length trick:
SELECT email,
LENGTH(email) - LENGTH(REGEXP_REPLACE(email, '[0-9]{2,}', '')) AS count_cons_digits
FROM yourTable;
Note that this answer assumes that there would be at most one segment of a given email string having continuous digits. If not, and there could be more than one, then you would need to define what happens in that case.

How to extract a number part of a field using regex_substr function?

I need to extract the numerical part of values in a column (varchar) if there exists a number in the value.
ColumnA has values like ABC, M365, J344, MCT etc.
I would like to check the entire value from second position and if is a number I would like to extract it, for instance,
a. M365, from 2nd position 365 is a number so I would like to return this substring.
b. M3AB, from 2nd position 3AB is not a number so I would not want to return this substring.
I tried regex_substr('M365', '[0-9]', 2) but this is not how I want and it only returns what is there in the second position but not the entire substring.
This seems to do what you want:
select regexp_substr(substr(x, 2), '^\d+$')
This starts matching the pattern at the second position in the string, requiring that a number start there.
[0-9] only searches for one number. You want to know if they are all numbers, so you need the '+' operator. For more info, visit:
https://www.techonthenet.com/oracle/functions/regexp_substr.php
The following code should work for you.
regex_substr('M365', '[0-9]+', 2)

Returning text within a string in Oracle SQL

I have string data in a column of a table that will contain a monetary amount in it somewhere.
E.G. the column may contain something like:
"Dave once paid £50.00 to a lottery syndicate"
"Total Investment Returns for the fund in 2017 came to £150,964.39"
How can I search for the occurrence of the '£' sign and then return the number that occurs after it?
Thanks
Here is one way. The search expression is a bit complicated because it must allow for thousand separators and decimal points, all optional. It assumes "Western" usage of thousands separators - it would have to be modified slightly to allow for Lakh (Indian) notation, for example. It will produce NULL when there is no pound sign, or if there is a pound sign not immediately followed by at least one digit. (So it will have to be modified slightly if you allow things like £.60 instead of £0.60.) You can also capture just the amount (without the currency symbol) if desired - that is also a slight modification to the use of REGEXP_SUBSTR (use capture groups).
The biggest change would be needed if you may have more than one amount per input row.
with
inputs ( str ) as (
select 'Dave once paid £50.00 to a lottery syndicate.' from dual union all
select 'Total Returns in 2017 came to £150,964.39.' from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- Use your actual table and column names in the SQL query below.
select str, regexp_substr(str, '£\d{1,3}(,?\d{3})*(\.\d+)?') as amount
from inputs
;
STR AMOUNT
--------------------------------------------- -----------
Dave once paid £50.00 to a lottery syndicate. £50.00
Total Returns in 2017 came to £150,964.39. £150,964.39
Edit
In a Comment below, the OP asked how to obtain just the amount, without the currency symbol. The easiest way is to use capture groups directly in the REGEXP_SUBSTR() function. The version below uses all six arguments to the function: as before the first is the input string and the second is the search pattern. The third and forth are the starting position and the occurrence (both always equal to 1 for this problem). The fifth, NULL, is for some special options we don't need. The sixth argument is relevant: 1 means return the first capture group, i.e. the part of the search pattern included in the first pair of matching parentheses (counting from left to right). Notice the additional pair of parentheses in the search pattern, to isolate the amount from the pound symbol:
regexp_substr(str, '£(\d{1,3}(,?\d{3})*(\.\d+)?)', 1, 1, null, 1)
Edit #2
To extract the amount in NUMBER data type, it is not necessary to remove the pound sign; the TO_NUMBER() function can handle it. Instead, the substring that is just the pound sign followed by the amount must be wrapped within TO_NUMBER(), using the proper format model and explicit currency symbol:
to_number(regexp_substr(str, '£\d{1,3}(,?\d{3})*(\.\d+)?'),
'L999,999,999,999,999.000000', 'nls_currency=£')
Just make sure to include enough digits on the right of the decimal point to accommodate all possible amounts. (Too many digits in the format model is never a problem.)

Not selecting Max Value

I am using the query
select max(entry_no) from tbl_Invmaster
but its giving me ans 9 however the max value is 10.
You probably have the numbers in a VARCHAR column. Ordering in those fields is by alphabetcal order. That way 9 is bigger than 10. Explanation from the link:
To determine which of two strings comes first in alphabetical order, their first letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet is the one which comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first (shorter) string is deemed to come first in alphabetical order.
Your best solution is not to store numbers in VARCHAR columns but instead use the appropriate type, eg INT. That way your query would return the correct result.
If that is not an option for you, you could CAST the column to an integer type. Eg in SQL Server you would write:
select max(CAST(entry_no AS INT)) from tbl_Invmaster
select max( to_number( entry_no )) from tbl_invmaster

TOAD SQL - LPAD Number, but also add decimals?

Basically what I'm trying to do is add '000' to a number (between 5-8 characters in length) and make the whole numbers have decimals.
What I came up with is:
SELECT DISTINCT
'000' || TO_CHAR(Blah, '9,999,999.99') AS "Data"
FROM Blah database
While this does what I ideally want, there is a gap between the zeroes of either 3 or 4 depending on the number. Obviously I don't want the gap there. Where am I going astray?
Use trim(to_char(x, '9,999,999.99')) this way you will avoid gap