Search and Replace Serialized DB Dump - sql

I am moving a database from one server to an other and have lots of serialized data in there. So, I am wondering:
Is it possible to use regex to replace all occurrences like the following (and similar)
s:22:\"http://somedomain.com/\"
s:26:\"http://somedomain.com/abc/\"
s:29:\"http://somedomain.com/abcdef/\"
to
s:27:\"http://someOtherdomain.com/\"
s:31:\"http://someOtherdomain.com/abc/\"
s:34:\"http://someOtherdomain.com/abcdef/\"

If that column, that holds these data, is of the same length, and these occurrences 22, 26, 29,... are at the same position from the beginning of the string. Then, for SQL Server, you can use REPLACE , SUBSTRING with CHARINDEX to do that:
DECLARE #s VARCHAR(50);
DECLARE #sub INT;
SET #s = 's:27:\"http://somedomain.com/\"';
SET #sub = CONVERT(INT, SUBSTRING(#s, CHARINDEX(':', #s) + 1, 2));
SELECT REPLACE(REPLACE(#s, 'somedomain', 'someOtherdomain'), #sub, #sub + 5);
So s:number:\"http://somedomain.com/\" will become s:number + 5:\"http://someOtherdomain.com/\".
If you want to run an UPDATE against that table you can write it this way:
UPDATE #t
SET s = REPLACE(REPLACE(s, 'somedomain', 'someOtherdomain'),
CONVERT(INT, SUBSTRING(s, CHARINDEX(':', s) + 1, 2)),
CONVERT(INT, SUBSTRING(s, CHARINDEX(':', s) + 1, 2)) + 5);
What does this query do, is that, it searches for the occurrence of somedomain and replaces it with someOtherdomain, get the number between the first two :'s, convert it to INT and replace it with the same number + 5. The following is how your data should looks like after you run the previous query:
s:27:\"http://someOtherdomain.com/\"
s:31:\"http://someOtherdomain.com/abc/\"
s:34:\"http://someOtherdomain.com/abcdef/\"
Here is a Live Demo.

Related

Returning the column value even if the special character isn't present using SUBSTRING & CHARINDEX

I'm using SQL and trying to show all the data in a column before a special character like <.
I've used this SQL:
SUBSTRING(ACTIVITY.Name, 0, CHARINDEX('<', ACTIVITY.Name, 2)) AS ActivityIdentifier
It works absolutely fine when there is a < in the column, but when one isn't present I get no result. I need to be able to return the column value even if the character isn't present.
I looked at RTRIM, LEFT and LEN functions but as my Activity Name can be different lengths, they didn't seem to fit.
I'd appreciate any advice.
I think the simplest fix is to append a '<' to the string:
SUBSTRING(ACTIVITY.Name, 1, CHARINDEX('<', ACTIVITY.Name + '<', 2)) as ActivityIdentifier
If you don't want the '<' in the result, then:
SUBSTRING(ACTIVITY.Name, 1, CHARINDEX('<', ACTIVITY.Name + '<', 2) - 1) as ActivityIdentifier
You could also write a stored function that would handle this - since it's just operating on data passed in, and not doing any "hidden" data access in the background, it should be fairly well behaved in terms of performance.
Try this:
CREATE OR ALTER FUNCTION dbo.TrimSpecialChar
(#Input NVARCHAR(500), #SpecialChar NCHAR(1))
RETURNS NVARCHAR(500)
AS
BEGIN
DECLARE #Result NVARCHAR(500);
-- if "special char" is not found - just return input
DECLARE #SpecCharIx INT = CHARINDEX(#SpecialChar, #Input);
IF (#SpecCharIx = 0)
SET #Result = #Input;
ELSE
SET #Result = SUBSTRING(#Input, 1, #SpecCharIx-1);
RETURN #Result;
END
You can then call it like this:
SELECT dbo.TrimSpecialChar(N'Testinput without special characters', N'<')
should return back the whole input, while
SELECT dbo.TrimSpecialChar(N'Testinput with the < special characters', N'<')
would just return
Testinput with the
You can simply add case expression :
(case when charindex('<', ACTIVITY.Name) > 0
then SUBSTRING(ACTIVITY.Name, 1, CHARINDEX('<', ACTIVITY.Name, 2))
else ACTIVITY.Name
end) as ActivityIdentifier

How to identify and redact all instances of a matching pattern in T-SQL

I have a requirement to run a function over certain fields to identify and redact any numbers which are 5 digits or longer, ensuring all but the last 4 digits are replaced with *
For example: "Some text with 12345 and 1234 and 12345678" would become "Some text with *2345 and 1234 and ****5678"
I've used PATINDEX to identify the the starting character of the pattern:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)
I can recursively call that to get the starting character of all the occurrences, but I'm struggling with the actual redaction.
Does anyone have any pointers on how this can be done? I know to use REPLACE to insert the *s where they need to be, it's just the identification of what I should actually be replacing I'm struggling with.
Could do it on a program, but I need it to be T-SQL (can be a function if needed).
Any tips greatly appreciated!
You can do this using the built in functions of SQL Server. All of which used in this example are present in SQL Server 2008 and higher.
DECLARE #String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE #StartPos INT = 1, #EndPos INT = 1;
DECLARE #Input VARCHAR(500) = ISNULL(#String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE #OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors
WHILE (#StartPOS <> 0)
BEGIN
SET #StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #Input);
IF #StartPOS <> 0
BEGIN
SET #OutputString += SUBSTRING(#Input, 1, #StartPOS - 1); --Seperate all contents before the first occurance of our filter
SET #Input = SUBSTRING(#Input, #StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.
SET #EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', #Input)); --First occurance of 4 numbers with a not number behind it.
SET #Input = STUFF(#Input, 1, (#EndPos - 1), REPLICATE('*', (#EndPos - 1))); --#EndPos - 1 gives us the amount of chars we want to replace.
END
END
SET #OutputString += #Input; --Append the last element
SET #OutputString = LEFT(#OutputString, LEN(#OutputString))
SELECT #OutputString;
Which outputs the following:
Example Input: ******7890, 1234, *2345, **3456, ***4567, 123asd456
This entire code could also be made as a function since it only requires an input text.
A dirty solution with recursive CTE
DECLARE
#tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
#c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT #tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #tags)
UNION ALL
SELECT value, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
-- we surround the value and the string with leading/trailing ,
-- so that cloth isn't a false positive for clothing
)
SELECT * FROM Process
WHERE i=0
I think a better solution it's to add clr function in Ms SQL Server to manage regexp.
sql-clr/RegEx
Here is an option using the DelimitedSplit8K_LEAD which can be found here. https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2 This is an extension of Jeff Moden's splitter that is even a little bit faster than the original. The big advantage this splitter has over most of the others is that it returns the ordinal position of each element. One caveat to this is that I am using a space to split on based on your sample data. If you had numbers crammed in the middle of other characters this will ignore them. That may be good or bad depending on you specific requirements.
declare #Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';
with MyCTE as
(
select x.ItemNumber
, Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
from dbo.DelimitedSplit8K_LEAD(#Something, ' ') x
)
select Output = stuff((select ' ' + Result
from MyCTE
order by ItemNumber
FOR XML PATH('')), 1, 1, '')
This produces: Some text with *2345 and 1234 and ****5678

SQL Server query to delete text from text column

I have a SQL Server database with a table feedback that contains a text column comment. In that column I have tag data, for example
This is my record <tag>Random characters are here</tag> with information.
How do I write a query to update all of these records to remove the <tag></tag> and all of the text in between?
I'd like to write this to a different 'temporary' table to first verify the changes and then update the original table.
I am running SQL Server 2014 Express.
Thank you
Here is a function to remove tags..
CREATE FUNCTION [dbo].[RemoveTag](#text NVARCHAR(MAX), #tag as nvarchar(max))
RETURNS NVARCHAR(MAX)
AS
BEGIN
declare #startTagIndex as int
declare #endTagIndex as int
set #startTagIndex = CHARINDEX('<' + #tag + '>', #text)
if(#startTagIndex > 0) BEGIN
set #endTagIndex = CHARINDEX('</' + #tag + '>', #text, #startTagIndex)
if(#endTagIndex > 0) BEGIN
return LEFT(#text, #startTagIndex - 1) + RIGHT(#text, len(#text) - len(#tag) - #endTagIndex - 2)
END
END
return #text
END
Later you can use it like:
Update table set field = dbo.RemoveTag(field, 'tag')
If you want to write fields to other table then:
CREATE TABLE dbo.OtherTable (
OtherField nvarchar(MAX) NOT NULL
)
GO
INSERT INTO OtherTable (OtherField)
SELECT dbo.RemoveTag(field, 'tag') from table
Making a lot assumptions about the format of your string. But if they're valid then this is very simple:
left(s, charindex('<tag>', s - 1)) +
substring(s, charindex('</tag>', s) + 6, len(s))
Obviously we're basically assuming that the search strings appear only once and in the correct order. There's also an assumption that there will be matches. Also, I used len(s) as an easy upper bound on the number of characters to take from the right. You could just hard-code something appropriate if you felt like it since SQL Server doesn't error for going past the end. s is just a stand in for your char column.
http://sqlfiddle.com/#!3/771a3/8
Not sure if extra whitespace is going to be an issue so you might want to trim and add a space character in the middle.
rtrim(left(s, charindex('<tag>', s) - 1)) + ' ' +
ltrim(substring(s, charindex('</tag>', s) + 6, len(s)))
You can use CHARINDEX to find where your tags start and stop, SUBSTRING to get all text between < and >, and REPLACE to swap out the substring for ''.
Select Field,
Substring(FIELD, charindex('<', Field), CHARINDEX('>', Field,
(CHARINDEX('>', FIELD)) + 1) - charindex('<', Field)+1) as ToRemove,
replace (Field, Substring(FIELD, charindex('<', Field), CHARINDEX('>',
Field, (CHARINDEX('>', FIELD)) + 1) - charindex('<', Field)+1), '')
as FinalResult
from TableName
The output will be three columns, Field, ToRemove and FinalResult, but nothing will actually be updated.
I think the only way this will fail is if you have nested tags. <b><i>sometext</i></b>
To actually make the change:
Update #TableName set Field = replace (Field, Substring(FIELD, charindex('<', Field), CHARINDEX('>', Field, (CHARINDEX('>', FIELD)) + 1) - charindex('<', Field)+1), '')
Tested on SQL Server 2012.

SQL SERVER 2008 - Returning a portion of text using SUBSTRING AND CHARINDEX. Need to return all text UNTIL a specific char

I have a column called 'response' that contains lots of data about a person.
I'd like to only return the info after a specific string
But, using the method below I sometimes (when people have <100 IQ) get the | that comes directly after the required number..
I'd like any characters after the'PersonIQ=' but only before the pipe.
I'm not sure of the best way to achieve this.
Query speed is a concern and my idea of nested CASE is likely not the best solution.
Any advice appreciated. Thanks
substring(response,(charindex('PersonIQ=',response)+9),3)
This is my suggestion:
declare #s varchar(200) = 'aaa=bbb|cc=d|PersonIQ=99|e=f|1=2'
declare #iq varchar(10) = 'PersonIQ='
declare #pipe varchar(1) = '|'
select substring(#s,
charindex(#iq, #s) + len(#iq),
charindex(#pipe, #s, charindex(#iq, #s)) - (charindex(#iq, #s) + len(#iq))
)
Instead of the 3 in your formula you should calculate the space between #iq and #pipe with this last part of the formula charindex(#pipe, #s, charindex(#iq, #s)) - (charindex(#iq, #s) + len(#iq)), which gets the first #pipe index after #iq, and then substructs the index of the IQ value.
Assuming there's always a pipe, you could do this:
substring(stuff(reponse,1,charindex('PersonIQ=',reponse)-1,''),1,charindex('|',stuff(reponse,1,charindex('PersonIQ=',reponse)-1,''))-1)
Or, you could convert your string to xml and reference PersonIQ directly, e.g.:
--assuming your string looks something like this..
declare #s varchar(max) = 'asdaf=xxx|PersonIQ=100|xxx=yyy'
select convert(xml, '<x ' + replace(replace(#s, '=', '='''), '|', ''' ') + '''/>').value('(/x/#PersonIQ)[1]','int')

T-SQL SUBSTRING at certain places

I have the following example.
DECLARE #String varchar(100) = 'GAME_20131011_Set - SET_20131012_Game'
SELECT SUBSTRING(#String,0,CHARINDEX('_',#String))
SELECT SUBSTRING(#String,CHARINDEX('- ',#STRING),CHARINDEX('_',#STRING))
I want to get the words 'GAME' and 'SET' (the first word before the first '_' from both sides of ' - '.
I am getting 'GAME' but having trouble with 'SET'
UPDATE: 'GAME' and 'SET' are just examples, those words may vary.
DECLARE #String1 varchar(100) = 'GAMEE_20131011_Set - SET_20131012_Game' -- Looking for 'GAME' and 'SET'
DECLARE #String2 varchar(100) = 'GAMEE_20131011_Set - SETT_20131012_Game' -- Looking for 'GAMEE' and 'SETT'
DECLARE #String2 varchar(100) = 'GAMEEEEE_20131011_Set - SETTEEEEEEEE_20131012_Game' -- Looking for 'GAMEEEEE' and 'SETTEEEEEEEE'
As long as your two parts will always be separated be a specific character (- in your example), you could try splitting on that value:
DECLARE #String varchar(100) = 'GAME_20131011_Set - SET_20131012_Game'
DECLARE #Left varchar(100),
#Right varchar(100)
-- split into two strings based on a delimeter
SELECT #Left = RTRIM(SUBSTRING(#String, 0, CHARINDEX('-',#String)))
SELECT #Right = LTRIM(SUBSTRING(#String, CHARINDEX('-',#String)+1, LEN(#String)))
-- handle the strings individually
SELECT SUBSTRING(#Left, 0, CHARINDEX('_', #Left))
SELECT SUBSTRING(#Right, 0, CHARINDEX('_', #Right))
-- Outputs:
-- GAME
-- SET
Here's a SQLFiddle example of this: http://sqlfiddle.com/#!3/d41d8/22594
The issue that you are running into with your original query is that you are specifying CHARINDEX('- ', #String) for your start index, which will include - in any substring starting at that point. Also, with CHARINDEX('_',#STRING) for your length parameter, you will always end up with the index of the first _ character in the string.
By splitting the original string in two, you avoid these problems.
Try this
SELECT SUBSTRING(#String,0,CHARINDEX('_',#String))
SELECT SUBSTRING(#String,CHARINDEX('- ',#STRING)+1, CHARINDEX('_',#STRING)-1)
charindex takes an optional third parameter that says which poistion in the string to start the search from. You could roll this into one statement, but it's easier to read with three
Declare #start int = charindex('-', #string) + 2;
Declare #end int = charindex('_', #string, #start);
Select substring(#string, #start, #end - #start);
Example SQLFiddle