How to anonymize data in SQL? - sql

I need to anonymize a variable in SQL data (VAR NAME = "ArId").
The variable contains 10 numbers + 1 letter + 2 numbers. I need to randomize the 10 first numbers and then keep the letter + the last two numbers.
I have tried the rand() function, but this randomize the whole value.
SELECT TOP 1000 *
FROM [XXXXXXXXXXX].[XXXXXXXXXX].[XXXXX.TEST]
I have only loaded the data.
EDIT (from "answer"):
I have tried: UPDATE someTable
SET someColumn = CONCAT(CAST(RAND() * 10000000000 as BIGINT), RIGHT(someColumn, 3))
However as i am totally new to SQL i don't know how to make this work. I put 'someColumn = new column name for the variable i am crating. RIGHT(someColumn) = the column i am changing. When i do that i get the message that the right function requires 2 arguments??
Example for Zohar: I have a variable containing for example: 1724981628R01On all these values in this variable i would like to randomize the first 10 letters and keep the last three (R01). How can i do that?

A couple things. First, your conversion to a big int does not guarantee that the results has the right number of characters.
Second, rand() is constant for all rows of the query. Try this version:
UPDATE someTable
SET someColumn = CONCAT(FORMAT(RAND(CHECKSUM(NEWID())
), '0000000000'
),
RIGHT(someColumn, 3)
);

Related

Take % out of an SQL column

I'm trying to convert 60% -> 60 of all the columns in a table. I have tried this, but it does not work because % is an SQL operator.
UPDATE host_info set host_response_rate = replace(host_response_rate,'%', '');
But I get all the values to be NULL...
I'm using postgresql
you can use this function, this take out the last n characters of a string. if you use 1, it will take out the last digit.
SELECT RIGHT(host_response_rate, 1)
FROM ...

selecting rows depending on the first digit of an integer in a column

Using SQL in PostgreSQL I need to select all the rows from my table called "crop" when the first digit of the integer numbers in column "field_id" is 7.
select *
from crop
where (left (field_id,1) = 7)
First, you know that the column is a number, so I would be inclined to explicitly convert it, no matter what you do:
where left(crop::text, 1) = '7'
where crop::text like '7%'
The conversion to text is simply to be explicit about what is happening and it makes it easier for Postgres to parse the query.
More importantly, if the value has a fixed number of digits, then I would suggest using a numeric range; something like this:
where crop >= 700000 and crop < 800000
This makes it easier for Postgres to use an index on the column.
Try with cast, like this:
select *
from crop
where cast(substring(cast(field_id as varchar(5)),1,1) as int) = 7
where 5 in varchar(5) you should put number how long is your integer.

update multiple rows with random 9 digit number using rand() function

I am trying to update multiple rows with random 9 digit number using the following code.
UPDATE SGT_EMPLOYER
SET SSN = (CONVERT(NUMERIC(10,0),RAND() * 899999999) + 100000000)
WHERE EMPLOYER_ACCOUNT_ID = 123456789;
Expected result: the query should update 300 rows with 300 random 9 digit numbers.
Actual: query is updating 300 rows with same number as the ran() function is executing only once.
Please help. Thank You.
As you already figured out yourself, RAND is a run-time constant function in SQL Server. It means that it is called once per statement and the generated value is used for each affected row.
There are other functions that are called for each row. Often people use NEWID usually together with CHECKSUM as a substitute for a random number, but I would not recommend it because the distribution of such random numbers is likely to be poor.
There is a good function specifically designed to generate random numbers: CRYPT_GEN_RANDOM. It is available since at least SQL Server 2008.
It generates a given number of random bytes.
In your case it would be convenient to have a random number as a float value in the range of [0;1], same as the value returned by RAND.
So, CRYPT_GEN_RANDOM(4) generates 4 random bytes as varbinary.
Convert them to int, divide by the maximum value of 32-bit integer (4294967295) and add 0.5 to shift the range from [-0.5;+0.5] to [0;1]:
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5)
Your query becomes:
UPDATE SGT_EMPLOYER
SET SSN =
CONVERT(NUMERIC(10,0),
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 899999999.0 + 100000000.0)
WHERE EMPLOYER_ACCOUNT_ID = 123456789;
Yes, the rand() line will only be executed once, before the rows are being updated, not every time a row is updated.
You can use a Stored Procedure to update every row with (CONVERT(NUMERIC(10,0),RAND() * 899999999) + 100000000).
Sean Lange is 100% correct. However, if you want to quickly mask your SSN, perhaps the following using HashBytes() may help.
Example
Declare #Table table (SSN varchar(25))
Insert into #Table values
('070-99-12345'),
('123-45-67890')
Select SSN
,AsInt = abs(cast(HashBytes('MD5', SSN) as int))
From #Table
Returns
SSN AsInt
070-99-12345 508860145
123-45-67890 843256257

How to insert the same random number for another column's value?

Within my table 'SERVICE_TICKET' are two columns, namely 'Defect_Description' and 'Defect_Description_Code'.
I'd like to populate the second column with random numbers between 1000000 and 9999999 (7-digit-number). However, the random number should be the same for equal values within the first column. So for example if the 'Defect_Description'= 'microphone for hands-free device',the'Defect_Description_Code'should always equal the same arbitrary number, e.g.'8374917'`.
I came up with the following expression, but this creates a diffirent number for each 'Defect_Description'. What do I need to change in order to get the same number for each of these?
UPDATE dbo.SERVICE_TICKET
SET Defect_Description_Code =
CASE Defect_Description
WHEN 'microphone for hands-free device' THEN (ABS(CHECKSUM(NewId())) % 1111111 + 9999999)
ELSE '-'
END
I think you want to avoid newid() in this case. I would recommend simply using Defect_Desription itself.
The following query also fixes the logic to get the 7 digit number:
UPDATE dbo.SERVICE_TICKET
SET Defect_Description_Code = ABS(CHECKSUM(Defect_Description)) % 9000000 + 1000000;

SQL update to a table based on a flag word?

I've got a field in my DB that's an arbitrary value on a per-row basis, and I'd like to add X to this. I'd only like to add X if a flag word (held as an int in this row) has the 2nd and 10th bits set true. Is it possible to create an SQL statement to do this for every row in the table? Or do I have to iterate through my entire table?
Using MySQL (5.5)
Bonus points question: I say add X based on a flag, but there's also a scaling factor. For example, based on a value of bits 20-12 interpreted as a short unsigned integer, I'd really like to assign:
value = value + ('X' * thatShort * (bit2 and bit10));
In MS SQL:
update MyTable
set Field1 = Field1 + 'X'
where Field2 & 0x202 = 0x202
[EDIT]
value = value (X * (field & 0x1FF800 >> 12) * 0x202)
0x1FF800 - is the mask from 12 to 20.
>> 11 - shift it to remove bits from 0 to 12.
since you are filtering by bit2 and bit10 set, then (bit2 and bit10) = 0x202
Hope this will answer your question. Not sure how you are going to grant 'bonus points' though :).