Get part of the string between 2 different strings - sql

I'm using SQL-Server 2008 R2.
First of all, I want to tell you that's I know that store strings like this is super bad practice, but as I'm SQL developer I don't have an ability to change it, the software of third-party generating output and inserting to the database like this.
Explanation
Sample value looks like:
Name: 'Document No. 996'
Unique No: 'A 54 x. 488sCHU'
No 2: 'RF123456789'
String 'This is dynamic text' value 'test' wrong data
Values 'ETC1 ETC2'.
Note: this is 1 value (1 column, 1 row)
As you see above, the structure is like: After word Name is added : then in single quotes, then some document no, after it line break and so on.
What I need (desired results)
I need to extract from that string this part: String 'This is dynamic text'.
This part always starts with word String, after it will be 1 space and in single quotes will be some text.
So it looks like I have look between 2 chars, first would be String ' and second '.
I have to use maybe SUBSTRING and CHARINDEX, but in anyway I can't achieve it.
What I've tried
There is sample data and what I've tried, just without success:
DECLARE #c varchar(100)
SET #c = 'Name: ''Document No. 996''
Unique No: ''A 54 x. 488sCHU''
No 2: ''RF123456789''
String ''This is dynamic text'' value ''test'' wrong data
Values ''ETC1 ETC2''.'
SELECT SUBSTRING(STUFF(#c, 1, CHARINDEX('String ''',#c), ''), 0, CHARINDEX('''', STUFF(#c, 1, CHARINDEX('String ''',#c), '')))

You can use it
DECLARE #c varchar(1000)
SET #c = 'Name: ''Document No. 996''
Unique No: ''A 54 x. 488sCHU''
No 2: ''RF123456789''
String ''This is dynamic text'' value ''test'' wrong data
Values ''ETC1 ETC2''.'
SELECT SUBSTRING( #c, CHARINDEX('String ''',#c) , CHARINDEX('''', #c, CHARINDEX('String ''',#c)+8 ) - CHARINDEX('String ''',#c)+1)
Result:
String 'This is dynamic text'

DECLARE #c varchar(255) --100 will truncate your string
SET #c = 'Name: ''Document No. 996''
Unique No: ''A 54 x. 488sCHU''
No 2: ''RF123456789''
String ''This is dynamic text'' value ''test'' wrong data
Values ''ETC1 ETC2''.'
Here is solution split in two parts for better understanding. First part is to find substring that starts with String keyword and goes until the end of the original string. We store it #c1, to reuse it twice. Second part is finding next ' but only in #c1 and cutting everything right from it.
DECLARE #c1 Varchar(255)
SELECT #c1 = SUBSTRING(#c, CHARINDEX('String ''',#c) + 8, 255)
--This is dynamic text' value 'test' wrong data Values 'ETC1 ETC2'.
SELECT LEFT(#c1, CHARINDEX('''',#c1) - 1)
--This is dynamic text
All put together - in single query:
SELECT LEFT(SUBSTRING(#c, CHARINDEX('String ''',#c) + 8, 255), CHARINDEX('''',SUBSTRING(#c, CHARINDEX('String ''',#c) + 8, 255)) - 1)

Not Sure but you are looking something as below :
DECLARE #DATA NVARCHAR(MAX);
SET #DATA = 'Name: ''Document No. 996''
Unique No: ''A 54 x. 488sCHU''
No 2: ''RF123456789''
String ''This is dynamic text'' value ''test'' wrong data
Values ''ETC1 ETC2''.';
SELECT SUBSTRING(SUBSTRING(#DATA, CHARINDEX('String', #DATA), CHARINDEX('Values', #DATA)-CHARINDEX('String', #DATA)), 1, CHARINDEX('''', SUBSTRING(SUBSTRING(#DATA, CHARINDEX('String', #DATA), CHARINDEX('Values', #DATA)-CHARINDEX('String', #DATA)), CHARINDEX('''', SUBSTRING(#DATA, CHARINDEX('String', #DATA), CHARINDEX('Values', #DATA)-CHARINDEX('String', #DATA)))+1, LEN(SUBSTRING(#DATA, CHARINDEX('String', #DATA), CHARINDEX('Values', #DATA)-CHARINDEX('String', #DATA)))))+CHARINDEX('''', SUBSTRING(#DATA, CHARINDEX('String', #DATA), CHARINDEX('Values', #DATA)-CHARINDEX('String', #DATA))));
Result :
String 'This is dynamic text'

Related

How to identify and redact all instances of a matching pattern in T-SQL

I have a requirement to run a function over certain fields to identify and redact any numbers which are 5 digits or longer, ensuring all but the last 4 digits are replaced with *
For example: "Some text with 12345 and 1234 and 12345678" would become "Some text with *2345 and 1234 and ****5678"
I've used PATINDEX to identify the the starting character of the pattern:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)
I can recursively call that to get the starting character of all the occurrences, but I'm struggling with the actual redaction.
Does anyone have any pointers on how this can be done? I know to use REPLACE to insert the *s where they need to be, it's just the identification of what I should actually be replacing I'm struggling with.
Could do it on a program, but I need it to be T-SQL (can be a function if needed).
Any tips greatly appreciated!
You can do this using the built in functions of SQL Server. All of which used in this example are present in SQL Server 2008 and higher.
DECLARE #String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE #StartPos INT = 1, #EndPos INT = 1;
DECLARE #Input VARCHAR(500) = ISNULL(#String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE #OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors
WHILE (#StartPOS <> 0)
BEGIN
SET #StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #Input);
IF #StartPOS <> 0
BEGIN
SET #OutputString += SUBSTRING(#Input, 1, #StartPOS - 1); --Seperate all contents before the first occurance of our filter
SET #Input = SUBSTRING(#Input, #StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.
SET #EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', #Input)); --First occurance of 4 numbers with a not number behind it.
SET #Input = STUFF(#Input, 1, (#EndPos - 1), REPLICATE('*', (#EndPos - 1))); --#EndPos - 1 gives us the amount of chars we want to replace.
END
END
SET #OutputString += #Input; --Append the last element
SET #OutputString = LEFT(#OutputString, LEN(#OutputString))
SELECT #OutputString;
Which outputs the following:
Example Input: ******7890, 1234, *2345, **3456, ***4567, 123asd456
This entire code could also be made as a function since it only requires an input text.
A dirty solution with recursive CTE
DECLARE
#tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
#c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT #tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #tags)
UNION ALL
SELECT value, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
-- we surround the value and the string with leading/trailing ,
-- so that cloth isn't a false positive for clothing
)
SELECT * FROM Process
WHERE i=0
I think a better solution it's to add clr function in Ms SQL Server to manage regexp.
sql-clr/RegEx
Here is an option using the DelimitedSplit8K_LEAD which can be found here. https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2 This is an extension of Jeff Moden's splitter that is even a little bit faster than the original. The big advantage this splitter has over most of the others is that it returns the ordinal position of each element. One caveat to this is that I am using a space to split on based on your sample data. If you had numbers crammed in the middle of other characters this will ignore them. That may be good or bad depending on you specific requirements.
declare #Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';
with MyCTE as
(
select x.ItemNumber
, Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
from dbo.DelimitedSplit8K_LEAD(#Something, ' ') x
)
select Output = stuff((select ' ' + Result
from MyCTE
order by ItemNumber
FOR XML PATH('')), 1, 1, '')
This produces: Some text with *2345 and 1234 and ****5678

Remove all characters not like desired value

I'm using SQL server 2012, and I have an issue with certain values. I want to extract a specific set of values from a string (which is in the entire column) and want to just retrieve the specific value.
The value is: SS44\\230433\586 and in other value it's 230084android, and the third orderno 239578
The common denominator is that all numbers start with 23, and are 6 characters long. All other values have to be removed from the string. I tried rtrim and a ltrim but that didn't give me the desired output.
I'm not sure as to how to do this without regex.
You can use PATINDEX to find the start of the number and SUBSTRING to get the next 6 digits:
declare #Value varchar(50) = 'SS44\\230433\586'
select substring(#Value, patindex('%23%', #Value), 6)
If you want to be a bit more careful with the searching, you can use PATINDEX and check next 4 symbols - are they digits:
patindex('%23[0-9][0-9][0-9][0-9]%', #Value)
Eventually, you can store the result returned and check is there a match:
declare #Value varchar(50) = 'SS44\\230433\586'
declare #StartIndex int
set #StartIndex = patindex('%23[0-9][0-9][0-9][0-9]%', #Value)
select IIF(#StartIndex > 0, substring(#Value, #StartIndex, 6), null)

SQL Server query to remove the last word from a string

There's already an answer for this question in SO with a MySQL tag. So I just decided to make your lives easier and put the answer below for SQL Server users. Always happy to see different answers perhaps with a better performance.
Happy coding!
SELECT SUBSTRING(#YourString, 1, LEN(#YourString) - CHARINDEX(' ', REVERSE(#YourString)))
Edit: Make sure #YourString is trimmed first as Alex M has pointed out:
SET #YourString = LTRIM(RTRIM(#YourString))
Just an addition to answers.
The doc for LEN function in MSSQL:
LEN excludes trailing blanks. If that is a problem, consider using the DATALENGTH (Transact-SQL) function which does not trim the string. If processing a unicode string, DATALENGTH will return twice the number of characters.
The problem with the answers here is that trailing spaces are not accounted for.
SELECT SUBSTRING(#YourString, 1, LEN(#YourString) - CHARINDEX(' ', REVERSE(#YourString)))
As an example few inputs for the accepted answer (above for reference), which would have wrong results:
INPUT -> RESULT
'abcd ' -> 'abc' --last symbol removed
'abcd 123 ' -> 'abcd 12' --only removed only last character
To account for the above cases one would need to trim the string (would return the last word out of 2 or more words in the phrase):
SELECT SUBSTRING(RTRIM(#YourString), 1, LEN(#YourString) - CHARINDEX(' ', REVERSE(RTRIM(LTRIM(#YourString)))))
The reverse is trimmed on both sides, that is to account for the leading as well as trailing spaces.
Or alternatively, just trim the input itself.
DECLARE #Sentence VARCHAR(MAX) = 'Hi This is Pavan Kumar'
SELECT SUBSTRING(#Sentence, 1, CHARINDEX(' ', #Sentence) - 1) AS [First Word],
REVERSE(SUBSTRING(REVERSE(#Sentence), 1,
CHARINDEX(' ', REVERSE(#Sentence)) - 1)) AS [Last Word]
DECLARE #String VARCHAR(MAX) = 'One two three four'
SELECT LEFT(#String,LEN(#String)-CHARINDEX(' ', REVERSE(#String),0)+1)
All the answers so far are actually about removing a character, not a word as the OP wanted.
In my case I was building a dynamic SQL statement with UNION'd SELECT statements and wanted to remove the last UNION:
DECLARE #sql NVARCHAR(MAX) = ''
/* populate #sql with something like this:
SELECT 1 FROM dbo.T1 WHERE condition
UNION
SELECT 1 FROM dbo.T2 WHERE condition
UNION
SELECT 1 FROM dbo.T3 WHERE condition
UNION
SELECT 1 FROM dbo.T4 WHERE condition
UNION
*/
-- remove the last UNION
SET #sql = SUBSTRING(#sql, 1, LEN(#sql) - PATINDEX(REVERSE('%UNION%'), REVERSE(#sql)) - LEN('UNION'))
SELECT LEFT(username , LEN(json_path) - CHARINDEX('/', REVERSE(username ))+1)
FROM Login_tbl
UPDATE Login_tbl
SET username = LEFT(username , LEN(json_path) - CHARINDEX('/', REVERSE(username ))+1)
DECLARE #String VARCHAR(MAX) = 'One two three four'
SELECT LEFT(#String,LEN(#String)-CHARINDEX(' ', REVERSE(#String),0)+1)

Substring with conditional statement

tl;dr
I don't understand how to conditionally change the length parameter of SUBSTRING(..)
Short enough, did read
I've got a text field in a sql table that I want to retrieve a substring from
There is a specific part of text I am having trouble retrieving a substring from, because I cannot guarantee the next string.
For example, I have:
... Tracking Code : /a/delimited/string AttributeW : ValueW ...
And
... Tracking Code : /a/different/delimited/string A random string ...
From both of those i want /a/delimited/string and /a/different/delimited/string respectively
My current sql looks something like:
DECLARE #TrackingStartStr VARCHAR(50), #TrackingEndStr VARCHAR(50)
SET #TrackingStartStr = 'Tracking Code :'
SET #TrackingEndStr = 'Some string that indicates the text is about to end'
SELECT
AField
,RTRIM(LTRIM(Substring(CAST([Body] AS VARCHAR(MAX))
,Charindex(#TrackingStartStr,CAST([Body] AS VARCHAR(MAX))) + LEN(#TrackingStartStr)
,charindex(#TrackingEndStr,CAST([Body] AS VARCHAR(MAX))) - (Charindex(#TrackingStartStr,CAST([Body] AS VARCHAR(MAX))) + LEN(#TrackingStartStr))
))) AS TrackingCode
From tbl_stupidTextTable
I don't know how to conditionally change what #TrackingEndStr is for each row.
Try:
select substring(stringfield,
charindex('/', stringfield, 1),
charindex(' ',
stringfield,
charindex('/', stringfield, 1)) -
charindex('/', stringfield, 1)) as val
from tbl
SQL Fiddle demo: http://sqlfiddle.com/#!6/cebab/16/0

Search and Replace Serialized DB Dump

I am moving a database from one server to an other and have lots of serialized data in there. So, I am wondering:
Is it possible to use regex to replace all occurrences like the following (and similar)
s:22:\"http://somedomain.com/\"
s:26:\"http://somedomain.com/abc/\"
s:29:\"http://somedomain.com/abcdef/\"
to
s:27:\"http://someOtherdomain.com/\"
s:31:\"http://someOtherdomain.com/abc/\"
s:34:\"http://someOtherdomain.com/abcdef/\"
If that column, that holds these data, is of the same length, and these occurrences 22, 26, 29,... are at the same position from the beginning of the string. Then, for SQL Server, you can use REPLACE , SUBSTRING with CHARINDEX to do that:
DECLARE #s VARCHAR(50);
DECLARE #sub INT;
SET #s = 's:27:\"http://somedomain.com/\"';
SET #sub = CONVERT(INT, SUBSTRING(#s, CHARINDEX(':', #s) + 1, 2));
SELECT REPLACE(REPLACE(#s, 'somedomain', 'someOtherdomain'), #sub, #sub + 5);
So s:number:\"http://somedomain.com/\" will become s:number + 5:\"http://someOtherdomain.com/\".
If you want to run an UPDATE against that table you can write it this way:
UPDATE #t
SET s = REPLACE(REPLACE(s, 'somedomain', 'someOtherdomain'),
CONVERT(INT, SUBSTRING(s, CHARINDEX(':', s) + 1, 2)),
CONVERT(INT, SUBSTRING(s, CHARINDEX(':', s) + 1, 2)) + 5);
What does this query do, is that, it searches for the occurrence of somedomain and replaces it with someOtherdomain, get the number between the first two :'s, convert it to INT and replace it with the same number + 5. The following is how your data should looks like after you run the previous query:
s:27:\"http://someOtherdomain.com/\"
s:31:\"http://someOtherdomain.com/abc/\"
s:34:\"http://someOtherdomain.com/abcdef/\"
Here is a Live Demo.