Trim String After Keyword - sql

I have a column that contains status changes, but I don't want to return the whole string. Is there any way to return just a part of a string after a certain keyword? Every value of the column is in the format of From X to Y where X and Y could be a single word or multiple words. I've looked at the substring and trim functions, but those seem to require knowledge of how many spaces you want to keep.
Edit: I want to keep part Y from the status and get rid of 'From X to'.

You can use a combination of Charindex and Substring and Len to do it.
Try this:
select SUBSTRING(field,charindex('keyword',field), LEN('keyword'))
So this will find Flop and extract it wherever it is in the field
select SUBSTRING('bullflop',charindex('flop','bullflop'), LEN('flop'))
EDIT:
To get the remainder then just set LEN to the field LEN(field)
declare #field varchar(200)
set #field = 'this is bullflop and other such junk'
select SUBSTRING(#field,charindex('flop',#field), LEN(#field) )
EDIT 2:
Now I understand, here is a quick and dirty version...
declare #field varchar(200)
set #field = 'From X to Y'
select Replace(SUBSTRING(#field,charindex('to ',#field), LEN(#field) ), 'to ','')
Returns:
Y
EDIT 3:
Cory is right, this is cleaner.
declare #field varchar(200) = 'From X to Y'
declare #keyword varchar(200) = 'to '
select SUBSTRING(#field,charindex(#keyword,#field) + LEN(#keyword), LEN(#field) )

Other answers are fine, but I like the STUFF() function and it doesn't seem to be well-known, so here's another option:
DECLARE #field VARCHAR(50) = 'From Authorized to Auth Not Needed'
,#keyword VARCHAR(50) = ' to '
SELECT STUFF(#field,1,CHARINDEX(#keyword,#field)+LEN(#keyword),'')
STUFF() is like SUBSTRING() and REPLACE() combined, you feed it a string, a start position and a length, and can replace that with anything or in your case, nothing ''.
From MSDN:
STUFF ( character_expression , start , length , replaceWith_expression )

You can combine a few string functions to do what you want:
DECLARE #Field varchar(100) = 'From A to Z'
DECLARE #Keyword varchar(100) = 'to'
-- Method 1 (Find the keyword, then take the remainder of the string)
SELECT LTRIM(SUBSTRING(#Field,
CHARINDEX(#Keyword, #Field, 0) + LEN(#Keyword), LEN(#Field)))
EDIT:
-- Method 2 (Take from the right the characters up to the keyword)
SELECT RIGHT(#Field, LEN(#Field) - CHARINDEX(#Keyword, #Field, 0) - LEN(#Keyword))
Produces:
'Z'

Related

How to identify and redact all instances of a matching pattern in T-SQL

I have a requirement to run a function over certain fields to identify and redact any numbers which are 5 digits or longer, ensuring all but the last 4 digits are replaced with *
For example: "Some text with 12345 and 1234 and 12345678" would become "Some text with *2345 and 1234 and ****5678"
I've used PATINDEX to identify the the starting character of the pattern:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)
I can recursively call that to get the starting character of all the occurrences, but I'm struggling with the actual redaction.
Does anyone have any pointers on how this can be done? I know to use REPLACE to insert the *s where they need to be, it's just the identification of what I should actually be replacing I'm struggling with.
Could do it on a program, but I need it to be T-SQL (can be a function if needed).
Any tips greatly appreciated!
You can do this using the built in functions of SQL Server. All of which used in this example are present in SQL Server 2008 and higher.
DECLARE #String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE #StartPos INT = 1, #EndPos INT = 1;
DECLARE #Input VARCHAR(500) = ISNULL(#String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE #OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors
WHILE (#StartPOS <> 0)
BEGIN
SET #StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #Input);
IF #StartPOS <> 0
BEGIN
SET #OutputString += SUBSTRING(#Input, 1, #StartPOS - 1); --Seperate all contents before the first occurance of our filter
SET #Input = SUBSTRING(#Input, #StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.
SET #EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', #Input)); --First occurance of 4 numbers with a not number behind it.
SET #Input = STUFF(#Input, 1, (#EndPos - 1), REPLICATE('*', (#EndPos - 1))); --#EndPos - 1 gives us the amount of chars we want to replace.
END
END
SET #OutputString += #Input; --Append the last element
SET #OutputString = LEFT(#OutputString, LEN(#OutputString))
SELECT #OutputString;
Which outputs the following:
Example Input: ******7890, 1234, *2345, **3456, ***4567, 123asd456
This entire code could also be made as a function since it only requires an input text.
A dirty solution with recursive CTE
DECLARE
#tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
#c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT #tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #tags)
UNION ALL
SELECT value, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
-- we surround the value and the string with leading/trailing ,
-- so that cloth isn't a false positive for clothing
)
SELECT * FROM Process
WHERE i=0
I think a better solution it's to add clr function in Ms SQL Server to manage regexp.
sql-clr/RegEx
Here is an option using the DelimitedSplit8K_LEAD which can be found here. https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2 This is an extension of Jeff Moden's splitter that is even a little bit faster than the original. The big advantage this splitter has over most of the others is that it returns the ordinal position of each element. One caveat to this is that I am using a space to split on based on your sample data. If you had numbers crammed in the middle of other characters this will ignore them. That may be good or bad depending on you specific requirements.
declare #Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';
with MyCTE as
(
select x.ItemNumber
, Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
from dbo.DelimitedSplit8K_LEAD(#Something, ' ') x
)
select Output = stuff((select ' ' + Result
from MyCTE
order by ItemNumber
FOR XML PATH('')), 1, 1, '')
This produces: Some text with *2345 and 1234 and ****5678

SSMS replace all commas outside of quotation marks in string

I've written the following function in SSMS to replace any commas that are outside of quotation marks with ||||:
CREATE FUNCTION dbo.fixqualifier (#string nvarchar(max))
returns nvarchar(max)
as begin
DECLARE #STRINGTOPAD NVARCHAR(MAX)
DECLARE #position int = 1,#newstring nvarchar(max) ='',#QUOTATIONMODE INT = 0
WHILE(LEN(#string)>0)
BEGIN
SET #STRINGTOPAD = SUBSTRING(#string,0,IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)))
SET #newstring = #newstring + IIF(#QUOTATIONMODE = 1, REPLACE(#STRINGTOPAD,',','||||'),#STRINGTOPAD)
SET #QUOTATIONMODE = IIF(#QUOTATIONMODE = 1,0,1)
set #string = SUBSTRING(#string,1+IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)),LEN(#string))
END
return #newstring
end
The idea is for the function to find the first ", replace all ',' before that then switch to quotation mode 1 so it knows to not replace the , until it changes back to quotation mode 0 when it hits the 2nd " and so on.
so for example the string:
qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl
would become:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
It works as expected but it's really inefficient when it comes to doing this for several thousand rows.
Is there a better way or doing this or at least speeding the function up.
Do a simple trick by Modulus
DECLARE #VAR VARCHAR(100) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl'
,#OUTPUT VARCHAR(100) = '';
SELECT #OUTPUT = #OUTPUT + CASE WHEN (LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', ''))) % 2 = 0
THEN REPLACE(VAL, ',', '||||') ELSE VAL END
FROM (
SELECT SUBSTRING(#VAR, NUMBER, 1) VAL
FROM master.dbo.spt_values
WHERE type = 'P'
AND NUMBER BETWEEN 1 AND LEN(#VAR)
) A
PRINT #OUTPUT
Result:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
By this LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', '')) expression, you will get count of ". By taking Modulus of the count %2, if it is zero its even then you can replace commas, otherwise you will keep them.
This uses DelimitedSplit8k and completely avoids any RBAR methods (such as a WHILE or #Variable = #Variable +... (which is a hidden form of RBAR)).
It firstly splits on the quotation, and then on the commas, where the string isn't quoted. Finally it then puts the strings back together again, using the "old" STUFF and FOR XML PATH method:
USE Sandbox;
DECLARE #String varchar(8000) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl';
WITH Splits AS(
SELECT QS.ItemNumber AS QuoteNumber, CS.ItemNumber AS CommaNumber, ISNULL(CS.Item, '"' + QS.Item + '"') AS DelimitedItem
FROM dbo.DelimitedSplit8K(#string,'"') QS
OUTER APPLY (SELECT *
FROM dbo.DelimitedSplit8K(QS.Item,',')
WHERE QS.ItemNumber % 2 = 1) CS
WHERE QS.Item <> ',')
SELECT STUFF((SELECT '||||' + S.DelimitedItem
FROM Splits S
ORDER BY S.QuoteNumber, S.CommaNumber
FOR XML PATH('')),1,1,'') AS DelimitedList;
(Note, DelimitedSplit8K does not accept more than 8,000 characters. If you have more than that, SQL Server is really not the right tool. STRING_SPLIT does not provide the ordinal position, so you would be unable to guarantee the rebuild order with it.)

T-SQL How to create function that compares string, checks difference, and do special function

First - sorry for my english. Second - i'm learning t-SQL.
Goal:
I want to get difference between two strings, then check in which column is this difference. If the difference is in first column, do something, if in second column - do something else.
What I'm actually doing
Column 'messages' is a string which contains list of ID. So i am replacing all '#' with ',' and deleting last ',' what gives to me ActualID and BeforeID column. See below:
DECLARE #string VARCHAR(512);
DECLARE #string2 VARCHAR(512);
DECLARE #string3 VARCHAR(512);
SET #string = '41#42#43#44#45#46#47#48#49#50#51#52#53#54#55#56#57#58#59#';
SET #string2 = REPLACE((SELECT messages FROM USERS WHERE userid = 4), '#', ', ' )
SET #string3 = left(#string2, len(#string2) - 1);
SET #string2 = REPLACE(#string, '#', ', ' )
SET #string = left(#string2, len(#string2) - 1);
SELECT #string3 as ActualID, #string as BeforeID
So now, I want compare BeforeID with ActualID. For example:
In BeforeID we have 1, 2, 3 / In ActualID 1, 2, 3, 4
In example above 4 was added. So, if it was added I want to add it to #AddedElements.
If 4, 5, 7 were added then SELECT #AddedElements as AddedElements should return 4, 5, 7 (With comas)
But, that's not all.
If BeforeID = 1, 5, 10, 14 and ActualID = 1, 5, 14 I want, that element which is in BeforeID, but not in AcutalID will be added to #DeletedElements.
So SELECT #DeletedElements as DeletedElements should return 10
Added elements/Deleted elements should be returned once. I mean, full result what I want to Earn should be
SELECT #AddedElements as AddedElements, #DeletedElements as DeletedElements
Is it possible? If, then how to do it?
First of all, I have to start by saying that this is just poor design; but having said that, I've also found myself in all kinds of situations where I couldn't change the way things worked, only try to make them work better in the current configuration. Therefore, I recommend something like this:
1: Create a UDF (User-Defined Function) that can handle splitting the strings and returning them in table-formed data that you can work with:
CREATE FUNCTION [dbo].[UDF_StringDelimiter]
/*********************************************************
** Takes Parameter "LIST" and transforms it for use **
** to select individual values or ranges of values. **
** **
** EX: 'This,is,a,test' = 'This' 'Is' 'A' 'Test' **
*********************************************************/
(
#LIST VARCHAR(8000)
,#DELIMITER VARCHAR(255)
)
RETURNS #TABLE TABLE
(
[RowID] INT IDENTITY
,[Value] VARCHAR(255)
)
WITH SCHEMABINDING
AS
BEGIN
DECLARE
#LISTLENGTH AS SMALLINT
,#LISTCURSOR AS SMALLINT
,#VALUE AS VARCHAR(255)
;
SELECT
#LISTLENGTH = LEN(#LIST) - LEN(REPLACE(#LIST,#DELIMITER,'')) + 1
,#LISTCURSOR = 1
,#VALUE = ''
;
WHILE #LISTCURSOR <= #LISTLENGTH
BEGIN
INSERT INTO #TABLE (Value)
SELECT
CASE
WHEN #LISTCURSOR < #LISTLENGTH
THEN SUBSTRING(#LIST,1,PATINDEX('%' + #DELIMITER + '%',#LIST) - 1)
ELSE SUBSTRING(#LIST,1,LEN(#LIST))
END
;
SET #LIST = STUFF(#LIST,1,PATINDEX('%' + #DELIMITER + '%',#LIST),'')
;
SET #LISTCURSOR = #LISTCURSOR + 1
;
END
;
RETURN
;
END
;
2: Consider dropping the whole "Switching out commas" thing, because it's pointless - the function I've written here takes two arguments: The string itself, and the delimiter (the mini-string that separates the individual strings within the big string, in your case '#') Then you just have to do a couple of quick comparisons to find out what was added and what was deleted.
DECLARE
#AddedElements VARCHAR(255) = ''
,#DeletedElements VARCHAR(255) = ''
,#ActualID VARCHAR(255) = '41#42#43#44#45#46#47#48#49#50#51#52#53#54#55#56#57#58#59#'
,#BeforeID VARCHAR(255) = '41#42#43#44#45#46#47#48#50#51#52#53#54#55#56#57#58#59#60#'
;
SET #AddedElements = #AddedElements +
SUBSTRING(
(
SELECT ', ' + Value
FROM dbo.UDF_StringDelimiter(#ActualID,'#')
WHERE Value NOT IN
(
SELECT Value
FROM dbo.UDF_StringDelimiter(#BeforeID,'#')
)
GROUP BY ', ' + Value
FOR XML PATH('')
)
,3,255)
;
SET #DeletedElements = #DeletedElements +
SUBSTRING(
(
SELECT ', ' + Value
FROM dbo.UDF_StringDelimiter(#BeforeID,'#')
WHERE Value NOT IN
(
SELECT Value
FROM dbo.UDF_StringDelimiter(#ActualID,'#')
)
GROUP BY ', ' + Value
FOR XML PATH('')
)
,3,255)
;
SELECT #AddedElements AS AddedElements,#DeletedElements AS DeletedElements
;
Using this method, if you add a value to #ActualID that does not exist in #BeforeID, it will show up in #AddedElements.
Likewise , if you remove an element from #ActualID that had previously existed in #BeforeID, it will show up in #DeletedElements.
All of this is, of course, assuming that the dynamic string (the one really being compared here) is the #ActualID. I operated with the understanding that #BeforeID is actually a stored value in the DB, and #ActualID is a dynamic string being passed in from...somewhere. If this is wrong, update me and I'll change the tactic appropriately.
Quick note: It's important to me to point out that this is just one way of dealing with a situation like this, and I'm sure there are better ways; but with the information I have, it's the best I could come up with without spending too much time and energy on it.

Replace every alpha character with itself + wildcard in string SQL Server

My goal is to create a query that will search for results related to a specific keyword.
Say in a database we had the word cat.
Regardless of if the user types C a t, C.A.T. or Cat I want to find a result related to the search as long as the alpha numeric characters are in the correct sequence that is all that matters
Say in the database we have these 4 records
cat
c/a/t
c.a.t
c. at
If the user types in C#$*(&A T I'd like to get all 4 results.
What I have written so far in my query is a function that strips any non-alphanumeric characters from the input string.
What can I do to replace each alphanumeric character with itself and add a wildcard at the end?
For every alpha character my input would look similar to this
C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
Actually, that search string will return only one record from this table: the row with 'c.a.t '.
This is because the expression C%[^a-zA-Z0-9]%A does not mean there can't be any alpha-numeric chars between C and A.
What it actually means is there should be at least one non alpha-numeric value between C and A.
Moreover, it will return incorrect values as well - a value like 'c u a s e t ' will be returned.
You need to change your where clause to something like this:
WHERE column LIKE '%C%A%T%'
AND column NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
This way, if you have cat in the correct order, the first row will resolve to true, and if there are no other alpha-numeric chars between c, a, and t the second row will resolve to true.
Here is a test script, where you can see for yourself what I mean:
DECLARE #T AS TABLE
(
a varchar(20)
)
INSERT INTO #T VALUES
('cat'),
('c/a/t'),
('c.a.t '),
('c. at'),
('c u a s e t ')
-- Incorrect where clause
SELECT *
FROM #T
WHERE a LIKE 'C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%'
-- correct where clause
SELECT *
FROM #T
WHERE a LIKE '%C%A%T%'
AND a NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
You can also see it in action in this link.
And since I had some spare time, here is a script to create both the like and the not like patterns from the input string:
DECLARE #INPUT varchar(100) = '#*# c %^&# a ^&*$&* t (*&(%!##$'
DECLARE #Index int = 1,
#CurrentChar char(1),
#Like varchar(100),
#NotLike varchar(100) = '%'
WHILE #Index < LEN(#Input)
BEGIN
SET #CurrentChar = SUBSTRING(#INPUT, #Index, 1)
IF PATINDEX('%[^a-zA-Z0-9]%', #CurrentChar) = 0
BEGIN
SET #NotLike = #NotLike + #CurrentChar + '%[a-zA-Z0-9]%'
END
SET #Index = #Index + 1
END
SELECT #NotLike = LEFT(#NotLike, LEN(#NotLike) - 12),
#Like = REPLACE(#NotLike, '%[a-zA-Z0-9]%', '%')
SELECT *
FROM #T
WHERE a LIKE #Like
AND a NOT LIKE #NotLike
You can recursively go through your (cleaned) search string and to each letter add the expression you would like. In my example #builtString should be what you would like to use further on, if I understood correctly.
declare #cleanSearch as nvarchar(10) = 'CAT'
declare #builtString as nvarchar(100) = ''
WHILE LEN(#cleanSearch) > 0 -- loop until you deplete the search string
BEGIN
SET #builtString = #builtString + substring(#cleanSearch,1,1) + '%[^a-zA-Z0-9]%' -- append the letter plus regular expression
SET #cleanSearch = right(#cleanSearch, len(#cleanSearch) - 1) -- remove first letter of the search string
END
SELECT #builtString --will look like C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
SELECT #cleanSearch --#cleanSearch is now empty

Check a word starting with specific string [SQL Server]

I try to search on a string like Dhaka is the capital of Bangladesh which contain six words. If my search text is cap (which is the starting text of capital), it will give me the starting index of the search text in the string (14 here). And if the search text contain in the string but not starting text any of the word, it will give me 0. Please take a look at the Test Case for better understanding.
What I tried
DECLARE #SearchText VARCHAR(20),
#Str VARCHAR(MAX),
#Result INT
SET #Str = 'Dhaka is the capital of Bangladesh'
SET #SearchText = 'cap'
SET #Result = CASE WHEN #Str LIKE #SearchText + '%'
OR #Str LIKE + '% ' + #SearchText + '%'
THEN CHARINDEX(#SearchText, #Str)
ELSE 0 END
PRINT #Result -- print 14 here
For my case, I need to generate #Str with another sql function. Here, we need to generate #Str 3 times which is costly (I think). So, is there any way so that I need generate #Str only one time? [Is that possible by using PATINDEX]
Note: CASE condition appear in the where clause at my original query. So, It is not possible to set the #Str value in variable then use it in the where clause.
Test Case
Search Text: Dhaka, Result: 1
Search Text: tal, Result: 0
Search Text: Mirpur, Result: 0
Search Text: isthe, Result: 0
Search Text: is the, Result: 7
Search Text: Dhaka Capital, Result: 0
Simply add a leading space to the strings to ensure that you always find only the beginning of a word:
DECLARE #SearchText VARCHAR(20),
#Str VARCHAR(MAX),
#Result INT
SET #Str = 'Dhaka is the capital of Bangladesh'
SET #SearchText = 'Dhaka Capital'
SET #Result = CHARINDEX(' ' + #SearchText, ' ' + #Str)
PRINT #Result -- print 14 here
I have tested the above query against your test cases and it seems to work.
To compute the function only once per row in SELECT make it table valued function. Or if it's impossible for some reason use CROSS APPLY
SELECT .. a, b,
FROM ..
CROSS APPLY (SELECT my_scalar_fn(a,b) as Str) arg
WHERE CASE WHEN arg.Str LIKE SearchText + '%'
OR arg.Str LIKE + '% ' + SearchText + '%'
THEN CHARINDEX(SearchText, arg.Str)
ELSE 0 END