SQL: Replacing dates contained within a text string - sql

I am using SQL Server Management Studio 2012. I work with medical records and need to de-identify reports. The reports are structured in a table with columns Report_Date, Report_Subject, Report_Text, etc... The string I need to update is in report_text and there are ~700,000 records.
So if I have:
"patient had an EKG on 04/09/2012"
I need to replace that with:
"patient had an EKG on [DEIDENTIFIED]"
I tried
UPDATE table
SET Report_Text = REPLACE(Report_Text, '____/___/____', '[DEIDENTIFED]')
because I need to replace anything in there that looks like a date, and it runs but doesn't actually replace anything, because apparently I can't use the _ wildcard in this command.
Any recommendations on this? Advance thanks!

You can use PATINDEX to find the location of Date and then use SUBSTRING and REPLACE to replace the dates.
Since there may be multiple dates in the Text you have to run a while loop to replace all the dates.
Below sql will work for all dates in the form of MM/DD/YYYY
WHILE EXISTS( SELECT 1 FROM dbo.MyTable WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0 )
BEGIN
UPDATE t
SET Report_Text = REPLACE(Report_Text, DateToBeReplaced, '[DEIDENTIFIED]')
FROM ( SELECT * ,
SUBSTRING(Report_Text,PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text), 10) AS DateToBeReplaced
FROM dbo.MyTable AS a
WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0
) AS t
END
I have tested the above sql on a dummy table with few rows.I don't know how it will scale for your data but recommend you to give it a try.

To keep it simple, assume that a number represents an identifying element in the string so look for the position of the first number in the string and the position of the last number in the string. Not sure if this will apply to your entire set of records but here is the code ...
I created two test strings ... the one you supplied and one with the date at the beginning of the string.
Declare #tstString varchar(100)
Set #tstString = 'patient had an EKG on 04/09/2012'
Set #tstString = '04/09/2012 EKG for patient'
Select #tstString
-- Calculate 1st Occurrence of a Number
,PATINDEX('%[0-9]%',#tstString)
-- Calculate last Occurrence of a Number
,LEN(#tstString) - PATINDEX('%[0-9]%',REVERSE(#tstString))
,CASE
-- No numbers in the string, return the string
WHEN PATINDEX('%[0-9]%',#tstString) = 0 THEN #tstString
-- Number is the first character to find the last position and remove front
WHEN PATINDEX('%[0-9]%',#tstString) = 1 THEN
CONCAT('[DEIDENTIFIED]',SUBSTRING(#tstString, LEN(#tstString)-PATINDEX('%[0-9]%',REVERSE(#tstString))+2,LEN(#tstString)))
-- Just select string up to the first number
ELSE CONCAT(SUBSTRING(#tstString,1,PATINDEX('%[0-9]%',#tstString)-1),'[DEIDENTIFIED]')
END AS 'newString'
As you can see, this is messy in SQL.
I would rather achieve this with a parser service and move the data with SSIS and call the service.

Related

How can I replace the first two characters of every row in a specific column using SQL?

I'm working with a table. In the table there is a column called ticket number which contains several rows of data. All of the values in the row begin with J2. I'd like to change the first two characters of all the rows to A3. How can I use SQL to do this. I'm familiar with the replace function:
SELECT REPLACE ([ticket number],'J2','A3')
But clearly the example above will not work, since it will change all of the J2 occurrences to A3 while I need to replace the first one at the beginning of ticket number. Any help would be appreciated.
Ticket Number
J2F4T45T
J2J3J3J2
J25TGYHJ2
J2FFJ2J2
J2MG8NGJ2
The desired result should be:
Ticket Number
A3F4T45T
A3J3J3J2
A35TGYHJ2
A3FFJ2J2
A3MG8NGJ2
Not sure if this is what you are looking for. But you could try to use a right function to get all but the 1st two character from the ticket_number:
SELECT 'A3' + RIGHT(ticket_number ,len(ticket_number)-2)
And if you need to update the table you could try something like this:
UPDATE ticket
set ticket_number = 'A3' + RIGHT(ticket_number ,len(ticket_number)-2)
db fiddle
This is also another method:
UPDATE ticket
set ticket_number = STUFF(ticket_number,1,2,'A3')

Replace a range of values in an SQL table to a single value

I am trying to replace a range of values with a string. I know how to do it with the replace function but that, as far as I know, requires them to be done one at a time.
Is there a way to select a range of values, for example (1-200), and replace them with a singular string value say "BLANK"?
I have tried WHEN, THEN and SET but get a syntax error near WHEN or SET as I try these.
Base Code Idea
Select DATA
WHEN DATA >= 1 THEN 'BLANK'
WHEN DATA <200 THEN 'BLANK
END
FROM DATABANK
Thanks!
Is this what you want?
select data,
case when data not between 1 and 200 then data end as new_data
from databank
What this does is take the integer value of data, and replace any value that's in the 1-200 range with null values, while leaving other values unchanged. The result goes into column new_data.
The assumption here is that data is a number - so the alternative value has to be consistent with that datatype (string 'BLANK' isn't): I went for null, which is consistent with any datatype, and is the default value returned by a case expression when no branch matches.
If you wanted something else, say 0, you would do:
select data,
case when data between 1 and 200 then 0 else data end as new_data
from databank

Update Query to get rid of "AM" & "PM" in string

I wrote a basic update query:
Update WA SET WA.Time_Updated = Replace(Time_Updated, 'PM', ' ');
to which I don't get any real error message other than
Microsoft can't update 251 records etc due to type conversion error
There are 5000 records in there. I have the date column as Date/Time and all my other columns (non-dates) as Short Text. The query just does not update anything in the table and keeps it previously was. Any ideas?
Just convert your text times to Date values:
Select *, TimeValue([Time_Updated]) As TimeUpdated From WA
Then, when you display TimeUpdate, format the value as you like.
Can deal with the imported structure.
Consider:
Hour("12:03:00 PM") + Minute("12:03:00 PM")/60 + Second("12:03:00 PM")/3600
This calculates to 12.05
So don't change the raw data, calculate in query. Just use your field name in place of the static value in the expression.

SQL Server: How to select rows which contain value comprising of only one digit

I am trying to write a SQL query that only returns rows where a specific column (let's say 'amount' column) contains numbers comprising of only one digit, e.g. only '1's (1111111...) or only '2's (2222222...), etc.
In addition, 'amount' column contains numbers with decimal points as well and these kind of values should also be returned, e.g. 1111.11, 2222.22, etc
If you want to make the query generic that you don't have to specify each possible digit you could change the where to the following:
WHERE LEN(REPLACE(REPLACE(amount,LEFT(amount,1),''),'.','') = 0
This will always use the first digit as comparison for the rest of the string
If you are using SQL Server, then you can try this script:
SELECT *
FROM (
SELECT CAST(amount AS VARCHAR(30)) AS amount
FROM TableName
)t
WHERE LEN(REPLACE(REPLACE(amount,'1',''),'.','') = 0 OR
LEN(REPLACE(REPLACE(amount,'2',''),'.','') = 0
I tried like this in place of 1111111 replace with column name:
Select replace(Str(1111111, 12, 2),0,left(11111,1))

Hard request SQL Server 2008

I have a pretty hard request to do in SQL Server 2008, but I'm not able to do the whole...
I have two kind of records :
16HENFC******** (8 numbers after more 'FC')
16HEN******* (7 numbers after more 'EN')
I have to select the * (which are in fact numbers), and add a 0 at the beginning of the second form of record to just have 8 long selected values.
Then I have to insert the result in a empty table.
I think I did the first part which is :
SUBSTRING(SELECT mycolumn1 FROM mytable1 WHERE mycolumn1 LIKE '16HENFC%', 5, 8) ;
In summary,
I have those records in my column :
'16HENFC071052'
'16HEN5130026'
I want to select them and transform them to insert those ones in an other column :
'05130026'
'FC071052'
[EDIT]=>
CREATE TABLE nom_de_la_table
(
colonne1 VARCHAR(250),
colonne2 VARCHAR(250)
)
INSERT INTO nom_de_la_table (colonne1)
VALUES
('16HEN5138745'),
('16HENFC071052v2'),
('16HENFC78942878'),
('16HEN4830026'),
('16HEN7815934'),
('16HENFC74859422'),
('16HEN9687326'),
('16HENFC74889639'),
('16HEN9798556');
[etc...]
So two different types of records, and I want to insert the result of what you did first with just two records in an other column but for the 956 records of my table. And this is the result with the two examples :
'05130026'
'FC071052'
Left-Filling a string is a relatively easy request. Here's an example:
select right(replicate('0',8) + right(test,len(test)-len('16HEN')),8)
from (
select '16HENFC071052' as test
union all
select '16HEN5130026' as test
) z
Use replicate to left-fill your string with the amount of digits you wish to end up with. Append your desired string, in this case, slice your prefix off by taking the right X characters where X = len(target) - len(prefix). Finally, take the right characters of the whole string equal to your desired length.