Replace every alpha character with itself + wildcard in string SQL Server

Replace every alpha character with itself + wildcard in string SQL Server - sql

My goal is to create a query that will search for results related to a specific keyword.
Say in a database we had the word cat.
Regardless of if the user types C a t, C.A.T. or Cat I want to find a result related to the search as long as the alpha numeric characters are in the correct sequence that is all that matters
Say in the database we have these 4 records
cat
c/a/t
c.a.t
c. at
If the user types in C#$*(&A T I'd like to get all 4 results.
What I have written so far in my query is a function that strips any non-alphanumeric characters from the input string.
What can I do to replace each alphanumeric character with itself and add a wildcard at the end?
For every alpha character my input would look similar to this
C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%

Actually, that search string will return only one record from this table: the row with 'c.a.t '.
This is because the expression C%[^a-zA-Z0-9]%A does not mean there can't be any alpha-numeric chars between C and A.
What it actually means is there should be at least one non alpha-numeric value between C and A.
Moreover, it will return incorrect values as well - a value like 'c u a s e t ' will be returned.
You need to change your where clause to something like this:
WHERE column LIKE '%C%A%T%'
AND column NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
This way, if you have cat in the correct order, the first row will resolve to true, and if there are no other alpha-numeric chars between c, a, and t the second row will resolve to true.
Here is a test script, where you can see for yourself what I mean:
DECLARE #T AS TABLE
(
a varchar(20)
)
INSERT INTO #T VALUES
('cat'),
('c/a/t'),
('c.a.t '),
('c. at'),
('c u a s e t ')
-- Incorrect where clause
SELECT *
FROM #T
WHERE a LIKE 'C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%'
-- correct where clause
SELECT *
FROM #T
WHERE a LIKE '%C%A%T%'
AND a NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
You can also see it in action in this link.
And since I had some spare time, here is a script to create both the like and the not like patterns from the input string:
DECLARE #INPUT varchar(100) = '#*# c %^&# a ^&*$&* t (*&(%!##$'
DECLARE #Index int = 1,
#CurrentChar char(1),
#Like varchar(100),
#NotLike varchar(100) = '%'
WHILE #Index < LEN(#Input)
BEGIN
SET #CurrentChar = SUBSTRING(#INPUT, #Index, 1)
IF PATINDEX('%[^a-zA-Z0-9]%', #CurrentChar) = 0
BEGIN
SET #NotLike = #NotLike + #CurrentChar + '%[a-zA-Z0-9]%'
END
SET #Index = #Index + 1
END
SELECT #NotLike = LEFT(#NotLike, LEN(#NotLike) - 12),
#Like = REPLACE(#NotLike, '%[a-zA-Z0-9]%', '%')
SELECT *
FROM #T
WHERE a LIKE #Like
AND a NOT LIKE #NotLike

You can recursively go through your (cleaned) search string and to each letter add the expression you would like. In my example #builtString should be what you would like to use further on, if I understood correctly.
declare #cleanSearch as nvarchar(10) = 'CAT'
declare #builtString as nvarchar(100) = ''
WHILE LEN(#cleanSearch) > 0 -- loop until you deplete the search string
BEGIN
SET #builtString = #builtString + substring(#cleanSearch,1,1) + '%[^a-zA-Z0-9]%' -- append the letter plus regular expression
SET #cleanSearch = right(#cleanSearch, len(#cleanSearch) - 1) -- remove first letter of the search string
END
SELECT #builtString --will look like C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
SELECT #cleanSearch --#cleanSearch is now empty

Related

How to identify and redact all instances of a matching pattern in T-SQL

I have a requirement to run a function over certain fields to identify and redact any numbers which are 5 digits or longer, ensuring all but the last 4 digits are replaced with *
For example: "Some text with 12345 and 1234 and 12345678" would become "Some text with *2345 and 1234 and ****5678"
I've used PATINDEX to identify the the starting character of the pattern:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)
I can recursively call that to get the starting character of all the occurrences, but I'm struggling with the actual redaction.
Does anyone have any pointers on how this can be done? I know to use REPLACE to insert the *s where they need to be, it's just the identification of what I should actually be replacing I'm struggling with.
Could do it on a program, but I need it to be T-SQL (can be a function if needed).
Any tips greatly appreciated!

You can do this using the built in functions of SQL Server. All of which used in this example are present in SQL Server 2008 and higher.
DECLARE #String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE #StartPos INT = 1, #EndPos INT = 1;
DECLARE #Input VARCHAR(500) = ISNULL(#String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE #OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors
WHILE (#StartPOS <> 0)
BEGIN
SET #StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #Input);
IF #StartPOS <> 0
BEGIN
SET #OutputString += SUBSTRING(#Input, 1, #StartPOS - 1); --Seperate all contents before the first occurance of our filter
SET #Input = SUBSTRING(#Input, #StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.
SET #EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', #Input)); --First occurance of 4 numbers with a not number behind it.
SET #Input = STUFF(#Input, 1, (#EndPos - 1), REPLICATE('*', (#EndPos - 1))); --#EndPos - 1 gives us the amount of chars we want to replace.
END
END
SET #OutputString += #Input; --Append the last element
SET #OutputString = LEFT(#OutputString, LEN(#OutputString))
SELECT #OutputString;
Which outputs the following:
Example Input: ******7890, 1234, *2345, **3456, ***4567, 123asd456
This entire code could also be made as a function since it only requires an input text.

A dirty solution with recursive CTE
DECLARE
#tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
#c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT #tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #tags)
UNION ALL
SELECT value, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
-- we surround the value and the string with leading/trailing ,
-- so that cloth isn't a false positive for clothing
)
SELECT * FROM Process
WHERE i=0
I think a better solution it's to add clr function in Ms SQL Server to manage regexp.
sql-clr/RegEx

Here is an option using the DelimitedSplit8K_LEAD which can be found here. https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2 This is an extension of Jeff Moden's splitter that is even a little bit faster than the original. The big advantage this splitter has over most of the others is that it returns the ordinal position of each element. One caveat to this is that I am using a space to split on based on your sample data. If you had numbers crammed in the middle of other characters this will ignore them. That may be good or bad depending on you specific requirements.
declare #Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';
with MyCTE as
(
select x.ItemNumber
, Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
from dbo.DelimitedSplit8K_LEAD(#Something, ' ') x
)
select Output = stuff((select ' ' + Result
from MyCTE
order by ItemNumber
FOR XML PATH('')), 1, 1, '')
This produces: Some text with *2345 and 1234 and ****5678

Joining on numeric part of string

It's been a while...I'd like to get your advice on the most efficient way to join on only the number part of a field that may be prefixed and/or suffixed with up to 2 letters. Here's a simplified snippet of what I'm trying to do:
SELECT a, b, c
FROM table 1 t1
LEFT JOIN table 2 t2 ON t1.PolicyCode = t2.sPolicyID,
Where t2.sPolicyID could begin and/or end with up to 2 letters. Some examples: TG73100, S7286674, 2344506R, etc. We only want to join to just its numeric part in between the letters, i.e. 73100, 7286674 or 2344506 from the examples.
Could someone please advise on a simple way of doing this?

Here is one way:
LEFT JOIN table 2 t2 ON t1.PolicyCode =
LEFT(SUBSTRING(t2.sPolicyID, PATINDEX('%[0-9]%', t2.sPolicyID), 50),
PATINDEX('%[^0-9]%',
SUBSTRING(t2.sPolicyID, PATINDEX('%[0-9]%', t2.sPolicyID), 50)
+ 'a') -1)
To break this down, there are 4 main parts.
1: Find the position of the first number with PATINDEX:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT PATINDEX('%[0-9]%', #spolicyID)
--Returns 3
2: Use SUBSTRING() to cut off everything before the first letter:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50)
--Returns 123123xx
If we hardcoded the 3 that we know is returned from the first part, it would look like this:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT SUBSTRING(#spolicyID, 3), 50)
--50 is the number of characters to extract, set to something
--higher than the max string length to be safe
Of course, we don't want to hardcode it since it can change, but that makes seeing the different functions a bit easier.
3: Find the position of the next letter using PATINDEX again:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT PATINDEX('%[^0-9]%', SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50) + 'a')
--Returns 7 since it is looking at 123123xx
--The first x is in the 7th position
Note that we added an a onto the string. This is because if we had a string with no letters at the end, it would throw an error as the length 0 would be returned to SUBSTRING. You could add any letter or letters to the end and it would work, we are just making sure there is at least one. Try removing the + 'a' and using a string like xx123123 to see the error.
If we hardcoded the 123123xx from step 2 it would look like this (again just for easy example):
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT PATINDEX('%[^0-9]%', '123123xx' + 'a')
4: Use LEFT() to return everything before the trailing letters, leaving us with only the numbers in between:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
LEFT(SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50),PATINDEX('%[^0-9]%', SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50) + 'a') -1)
--Need to add `-1` because step 3 PATINDEX returns 7
--as the position of first trailing letter, and
--we want the 6 characters before that
And again hardcoded from step 2 and 3 for easy viewing:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
LEFT('123123xx', 7-1)

Extract number between two substrings in sql

I had a previous question and it got me started but now I'm needing help completing this. Previous question = How to search a string and return only numeric value?
Basically I have a table with one of the columns containing a very long XML string. There's a number I want to extract near the end. A sample of the number would be this...
<SendDocument DocumentID="1234567">true</SendDocument>
So I want to use substrings to find the first part = true so that Im only left with the number.
What Ive tried so far is this:
SELECT SUBSTRING(xml_column, CHARINDEX('>true</SendDocument>', xml_column) - CHARINDEX('<SendDocument',xml_column) +10087,9)
The above gives me the results but its far from being correct. My concern is that, what if the number grows from 7 digits to 8 digits, or 9 or 10?
In the previous question I was helped with this:
SELECT SUBSTRING(cip_msg, CHARINDEX('<SendDocument',cip_msg)+26,7)
and thats how I got started but I wanted to alter so that I could subtract the last portion and just be left with the numbers.
So again, first part of the string that contains the digits, find the two substrings around the digits and remove them and retrieve just the digits no matter the length.
Thank you all

You should be able to setup your SUBSTRING() so that both the starting and ending positions are variable. That way the length of the number itself doesn't matter.
From the sound of it, the starting position you want is right After the "true"
The starting position would be:
CHARINDEX('<SendDocument DocumentID=', xml_column) + 25
((adding 25 because I think CHARINDEX gives you the position at the beginning of the string you are searching for))
Length would be:
CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25
((Position of the ending text minus the position of the start text))
So, how about something along the lines of:
SELECT SUBSTRING(xml_column, CHARINDEX('<SendDocument DocumentID=', xml_column)+25,(CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25))

Have you tried working directly with the xml type? Like below:
DECLARE #TempXmlTable TABLE
(XmlElement xml )
INSERT INTO #TempXmlTable
select Convert(xml,'<SendDocument DocumentID="1234567">true</SendDocument>')
SELECT
element.value('./#DocumentID', 'varchar(50)') as DocumentID
FROM
#TempXmlTable CROSS APPLY
XmlElement.nodes('//.') AS DocumentID(element)
WHERE element.value('./#DocumentID', 'varchar(50)') is not null
If you just want to work with this as a string you can do the following:
DECLARE #SearchString varchar(max) = '<SendDocument DocumentID="1234567">true</SendDocument>'
DECLARE #Start int = (select CHARINDEX('DocumentID="',#SearchString)) + 12 -- 12 Character search pattern
DECLARE #End int = (select CHARINDEX('">', #SearchString)) - #Start --Find End Characters and subtract start position
SELECT SUBSTRING(#SearchString,#Start,#End)
Below is the extended version of parsing an XML document string. In the example below, I create a copy of a PLSQL function called INSTR, the MS SQL database does not have this by default. The function will allow me to search strings at a designated starting position. In addition, I'm parsing a sample XML string into a variable temp table into lines and only looking at lines that match my search criteria. This is because there may be many elements with the words DocumentID and I'll want to find all of them. See below:
IF EXISTS (select * from sys.objects where name = 'INSTR' and type = 'FN')
DROP FUNCTION [dbo].[INSTR]
GO
CREATE FUNCTION [dbo].[INSTR] (#String VARCHAR(8000), #SearchStr VARCHAR(255), #Start INT, #Occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #Found INT = #Occurrence,
#Position INT = #Start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #Position = CHARINDEX(#SearchStr, #String, #Position);
-- Nothing found
IF #Position IS NULL OR #Position = 0
RETURN #Position;
-- The required occurrence found
IF #Found = 1
BREAK;
-- Prepare to find another one occurrence
SET #Found = #Found - 1;
SET #Position = #Position + 1;
END
RETURN #Position;
END
GO
--Assuming well formated xml
DECLARE #XmlStringDocument varchar(max) = '<SomeTag Attrib1="5">
<SendDocument DocumentID="1234567">true</SendDocument>
<SendDocument DocumentID="1234568">true</SendDocument>
</SomeTag>'
--Split Lines on this element tag
DECLARE #SplitOn nvarchar(25) = '</SendDocument>'
--Let's hold all lines in Temp variable table
DECLARE #XmlStringLines TABLE
(
Value nvarchar(100)
)
While (Charindex(#SplitOn,#XmlStringDocument)>0)
Begin
Insert Into #XmlStringLines (value)
Select
Value = ltrim(rtrim(Substring(#XmlStringDocument,1,Charindex(#SplitOn,#XmlStringDocument)-1)))
Set #XmlStringDocument = Substring(#XmlStringDocument,Charindex(#SplitOn,#XmlStringDocument)+len(#SplitOn),len(#XmlStringDocument))
End
Insert Into #XmlStringLines (Value)
Select Value = ltrim(rtrim(#XmlStringDocument))
--Now we have a table with multple lines find all Document IDs
SELECT
StartPosition = CHARINDEX('DocumentID="',Value) + 12,
--Now lets use the INSTR function to find the first instance of '">' after our search string
EndPosition = dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1),
--Now that we know the start and end lets use substring
Value = SUBSTRING(value,(
-- Start Position
CHARINDEX('DocumentID="',Value)) + 12,
--End Position Minus Start Position
dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1) - (CHARINDEX('DocumentID="',Value) + 12))
FROM
#XmlStringLines
WHERE Value like '%DocumentID%' --Only care about lines with a document id

create sql view from comma separated values

T-sql question:
I need help to build a join from 2 tables, where on one of the tables I have aggregated data (comma separated values).
I have a table - Users where I have 3 columns: UserId, DefaultLanguage and OtherLanguages.
The table looks like this:
UserId | DefaultLanguage | OtherLanguages
---------------------------------------------
1 | en | NULL
2 | en | it, fr
3 | fr | en, it
4 | en | sp
and so on.
I have another table where I have the association between language code (en, fr, ro, it, sp) and language name:
LangCode | LanguageName
-------------------------
en | English
fr | French
it | Italian
sp | Spanish
and so on.
I want to create a view like this:
UserId | DefaultLanguage | OtherLanguages
---------------------------------------------
1 | English | NULL
2 | English | Italian, French
3 | French | English, Italian
4 | English | Spanish
and so on.
In short, I need a view where the language code is replaced by language name.
Any help, please?

Several solutions of course you can recreate all table change the data structure.
1. If all the language are 2 digits:
select t1.UserId, t2.LanguageName,
ISNULL( t3.LanguageName, '') + ISNULL(', '+t4.LanguageName, '') + ISNULL( ', '+t5.LanguageName, '') OtherLanguages
from Table1 t1
inner join Table2 t2 on t1.DefaultLanguage = t2.LangCode
left join Table2 t3 on Left(t1.OtherLanguages,2) = t3.LangCode
left join Table2 t4 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 3 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 4, 2) ELSE null END = t4.LangCode
left join Table2 t5 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 6 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 7, 2) ELSE null END = t5.LangCode
Use user-define function:
CREATE FUNCTION [dbo].[func_GetLanguageName] (#pLanguageList varchar(max))
RETURNS varchar(max) AS
BEGIN
Declare #aLanguageList varchar(max) = #pLanguageList
Declare #aLangCode varchar(max) = null
Declare #aReturnName varchar(max) = null
WHILE LEN(#aLanguageList) > 0
BEGIN
IF PATINDEX('%,%',#aLanguageList) > 0
BEGIN
SET #aLangCode = RTRIM(LTRIM(SUBSTRING(#aLanguageList, 0, PATINDEX('%,%',#aLanguageList))))
SET #aLanguageList = LTRIM(SUBSTRING(#aLanguageList, LEN(#aLangCode + ',') + 1,LEN(#aLanguageList)))
END
ELSE
BEGIN
SET #aLangCode = #aLanguageList
SET #aLanguageList = NULL
END
Select #aReturnName = ISNULL( #aReturnName + ', ' , '') + LanguageName from Table2 where LangCode=#aLangCode
END
RETURN(#aReturnName)
END
and use select
select UserId, dbo.func_GetLanguageName(DefaultLanguage)DefaultLanguage, dbo.func_GetLanguageName(OtherLanguages) OtherLanguages from table1

Best practice would dictate not to have this type of comma delimited
data in a column...
Since you stated in comments that the schema cannot be changed, the next best thing is a function. This can be used in a select query in-line.
SQL is notoriously slow with string manipulation. Here is an interesting article on the topic. There are many SQL "string split" functions out there. They all generally split a comma delimited string and return a table.
For this specific use-case, you actually need a scalar-valued
function (a function which returns one value) rather than a
table-valued function (one which returns a table of values).
Below is a modified such function, which returns a scalar value in place of the original comma delimited string of language codes.
The comments explain what is happening line by line.
The gist is that you must loop through the input string keeping track of the last comma location, extract each code, lookup the full language from the languages table, and then return the output as a comma-delimited string.
Language codes to languages function:
Create Function [dbo].fn_languageCodeToFull
( #Input Varchar(100) )
Returns Varchar(1000)
As
Begin
-- To address null input, based on the example you provided, we set the output to NULL if there is no input
If #Input = '' Or #Input Is Null
Return Null
Declare
#CodeLength int, -- constant for code length to avoid hardcoded "magic numbers"
#Output varchar(1000), -- will contain the final comma delimited string of full languages
#LastIndex int, -- tracks the location of the input we are searching as we loop over the string
#CurrentCode varchar(2), -- for code readability, we extract each language code to this variable
#CurrentLanguage varchar(50), -- for code readability, we store the full language in this variable
#IndexIncrement int -- constant to increment the search index by 1 at each iteration
-- ensuring the loop moves forward
Set #LastIndex = 0 -- seed the index, so we begin to search at 0 index
Set #CodeLength = 2 -- ISO language codes are always 2 characters in length
Set #Output = '' -- seed with empty string to avoid NULL when concatenating
Set #IndexIncrement = 1 -- again avoiding hardcoded values...
-- We will loop until we have gone to or beyond the length of the input string
While #LastIndex < len(#Input)
Begin
-- Set the index of each comma (charindex is 1-based)
Set #LastIndex = CHARINDEX(',', #Input, #LastIndex)
-- When we get to the last item, CharIndex will return 0 when it does not find a comma.
-- To pull the last item, we will artificially set #LastIndex to be 1 greater than the input string
-- This will allow the code following this line to be unaltered for this scenario
If #LastIndex = 0 set #LastIndex = len(#Input) + 1 -- account for 1-based index of substring
-- Extract the code prior to the current comma that charindex has identified
Set #CurrentCode = substring(#Input, #LastIndex - #CodeLength, #CodeLength)
-- Do a lookup to get the language for the current code
Set #CurrentLanguage = (Select LanguageName From languages Where code = #CurrentCode)
-- Only add comma after first language to ensure no extra comma will be present in Output
If #LastIndex > 3 Set #Output = #Output + ','
-- Here we build the Output string with the language
Set #Output = #Output + #CurrentLanguage
-- Finally, we increment #LastIndex by 1 to avoid loop on first instance of comma
Set #LastIndex = #LastIndex + #IndexIncrement
End
Return #Output
End
Then your view would simply do something like:
Sample view using the function:
Create View vw_UserLanguages
As
Select
UserId,
dbo.fn_languageCodeToFull(DefaultLanguage) as DefaultLanguage,
dbo.fn_languageCodeToFull(OtherLanguages) as OtherLanguages,
From UserLanguageCodes -- you do not provide a name so I made one up
Note that the function will work whether there are commas or not, so there is no need to join the Languages table here as you can just have the function do all the work in this case.

One quick and dirty solution would be to use a nested REPLACE command but that could result in a very complex statement a bit long winded, especially if you have more than five languages.
As an example:
SELECT [UserId],[DefaultLanguage],
CASE
WHEN [OtherLanguages] IS NULL THEN ''
ELSE REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE([OtherLanguages],
'en','English'),
'fr','French'),
'it','Italian'),
'ro','Romulan'), --Probably not the intended language ;-)
'sp','Spanish')
END as [OtherLanguages]
FROM YourTable
Personally, I'd create a scalar function, again using the REPLACE command, but you can then check the number of languages present and add a counter so that you're not doing unnecessary lookups.
SELECT [UserId],[DefaultLanguage],
CASE
WHEN [OtherLanguages] IS NULL THEN ''
WHEN [OtherLanguages] = '' THEN ''
ELSE do_function_name([OtherLanguages])
END as [OtherLanguages]
FROM YourTable
It might not be good practice but there are times when it is more efficient to store multiple values in a single field but accept that when you do, it will slow down the way you handle that data.

Trim String After Keyword

I have a column that contains status changes, but I don't want to return the whole string. Is there any way to return just a part of a string after a certain keyword? Every value of the column is in the format of From X to Y where X and Y could be a single word or multiple words. I've looked at the substring and trim functions, but those seem to require knowledge of how many spaces you want to keep.
Edit: I want to keep part Y from the status and get rid of 'From X to'.

You can use a combination of Charindex and Substring and Len to do it.
Try this:
select SUBSTRING(field,charindex('keyword',field), LEN('keyword'))
So this will find Flop and extract it wherever it is in the field
select SUBSTRING('bullflop',charindex('flop','bullflop'), LEN('flop'))
EDIT:
To get the remainder then just set LEN to the field LEN(field)
declare #field varchar(200)
set #field = 'this is bullflop and other such junk'
select SUBSTRING(#field,charindex('flop',#field), LEN(#field) )
EDIT 2:
Now I understand, here is a quick and dirty version...
declare #field varchar(200)
set #field = 'From X to Y'
select Replace(SUBSTRING(#field,charindex('to ',#field), LEN(#field) ), 'to ','')
Returns:
Y
EDIT 3:
Cory is right, this is cleaner.
declare #field varchar(200) = 'From X to Y'
declare #keyword varchar(200) = 'to '
select SUBSTRING(#field,charindex(#keyword,#field) + LEN(#keyword), LEN(#field) )

Other answers are fine, but I like the STUFF() function and it doesn't seem to be well-known, so here's another option:
DECLARE #field VARCHAR(50) = 'From Authorized to Auth Not Needed'
,#keyword VARCHAR(50) = ' to '
SELECT STUFF(#field,1,CHARINDEX(#keyword,#field)+LEN(#keyword),'')
STUFF() is like SUBSTRING() and REPLACE() combined, you feed it a string, a start position and a length, and can replace that with anything or in your case, nothing ''.
From MSDN:
STUFF ( character_expression , start , length , replaceWith_expression )

You can combine a few string functions to do what you want:
DECLARE #Field varchar(100) = 'From A to Z'
DECLARE #Keyword varchar(100) = 'to'
-- Method 1 (Find the keyword, then take the remainder of the string)
SELECT LTRIM(SUBSTRING(#Field,
CHARINDEX(#Keyword, #Field, 0) + LEN(#Keyword), LEN(#Field)))
EDIT:
-- Method 2 (Take from the right the characters up to the keyword)
SELECT RIGHT(#Field, LEN(#Field) - CHARINDEX(#Keyword, #Field, 0) - LEN(#Keyword))
Produces:
'Z'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Replace every alpha character with itself + wildcard in string SQL Server - sql

Related

How to identify and redact all instances of a matching pattern in T-SQL

Joining on numeric part of string

Extract number between two substrings in sql

create sql view from comma separated values

Trim String After Keyword

Categories

Resources