Trim Only the comma separated Numbers without trimming the character appended Numbers - sql

I have a column '<InstructorID>' which may contain data like "79,instr1,inst2,13" and so on.
The following code gives me result like this "791213"
declare #InstructorID varchar(100)
set #InstructorID= (select InstructorID from CourseSession where CourseSessionNum=262)
WHILE PATINDEX('%[^0-9]%', #InstructorID) > 0
BEGIN
SET #InstructorID = STUFF(#InstructorID, PATINDEX('%[^0-9]%', #InstructorID), 1, '')
END
select #InstructorID
I need the output ti be like this "79,13"
i.e those numbers attached to characters shoud not appear in output.
P.S: I need to achieve this using sql only. Unfortunately i'm unable to use Regex which would have made this task much easier.

I agree with others that your problem would seem to be indicative of a mistake in your data design.
However, accepting that you cannot change the design, the following would allow you to achieve what you are looking for:
DECLARE #InstructorID VARCHAR(100)
DECLARE #Part VARCHAR(100)
DECLARE #Pos INT
DECLARE #Return VARCHAR(100)
SET #InstructorID = '79,instr1,inst2,13'
SET #Return = ''
-- Continue until InstructorID is empty
WHILE (LEN(#InstructorID) > 0)
BEGIN
-- Get the position of the next comma, and set to the end of InstructorID if there are no more
SET #Pos = CHARINDEX(',', #InstructorID)
IF (#Pos = 0)
SET #Pos = LEN(#InstructorID)
-- Get the next part of the text and shorted InstructorID
SET #Part = SUBSTRING(#InstructorID, 1, #Pos)
SET #InstructorID = RIGHT(#InstructorID, LEN(#InstructorID) - #Pos)
-- Check that the part is numeric
IF (ISNUMERIC(#Part) = 1)
SET #Return = #Return + #Part
END
-- Trim trailing comma (if any)
IF (RIGHT(#Return, 1) = ',')
SET #Return = LEFT(#Return, LEN(#Return) - 1)
PRINT #Return
Essentially, this loops through the #InstructorID, extracting parts of text between commas.
If the part is numeric then it adds it to the output text. I am PRINTing the text but you could SELECT it or use it however you wish.
Obviously, where I have SET #InstructorID = xyz, you should change this to your SELECT statement.
This code can be placed into a UDF if preferred, although as I say, your data format seems less than ideal.

Related

sql Return string between two characters

I want to know a flexible way to extract the string between two '-'. The issue is that '-' may or may not exist in the source string. I have the following code which works fine when '-' exists twice in the source string, marking the start and end of the extracted string. But it throws an error "Invalid length parameter passed to the LEFT or SUBSTRING function" when there is only one '-' or none at all or if the string is blank. Can someone please help? Thanks
declare #string varchar(100) = 'BLAH90-ExtractThis-WOW'
SELECT SUBSTRING(#string,CHARINDEX('-',#string)+1, CHARINDEX('-',#string,CHARINDEX('-',#string)+1) -CHARINDEX('-',#string)-1) as My_String
Desired Output: ExtractThis
If there is one dash only e.g. 'BLAH90-ExtractThisWOW' then the output should be everything after the first dash i.e. ExtractThisWOW. If there are no dashes then the string will have a blank space instead e.g. 'BLAH90 ExtractThisWOW' and should return everything after the blank space i.e. ExtractThisWOW.
You can try something like this.
When there is no dash, it starts at the space if there is one or take the whole string if not.
Then I look if there is only one dash or 2
declare #string varchar(100) = 'BLAH90-ExtractThis-WOW'
declare #dash_pos integer = CHARINDEX('-',#string)
SELECT CASE
WHEN #dash_pos = 0 THEN
RIGHT(#string,LEN(#string)-CHARINDEX(' ',#string))
ELSE (
CASE
WHEN #dash_pos = LEN(#string)-CHARINDEX('-',REVERSE(#string))+1
THEN RIGHT(#string,LEN(#string)-#dash_pos)
ELSE SUBSTRING(#string,#dash_pos+1, CHARINDEX('-',#string,#dash_pos+1) -
#dash_pos -1)
END
)
END as My_String
Try this. If there are two dashes, it'll take what is inside. If there is only one or none, it'll keep the original string.
declare #string varchar(100) = 'BLAH-90ExtractThisWOW'
declare #dash_index1 int = case when #string like '%-%' then CHARINDEX('-', #string) else -1 end
declare #dash_index2 int = case when #string like '%-%'then len(#string) - CHARINDEX('-', reverse(#string)) + 1 else -1 end
SELECT case
when #dash_index1 <> #dash_index2 then SUBSTRING(#string,CHARINDEX('-',#string)+1, CHARINDEX('-',#string,CHARINDEX('-',#string)+1) -CHARINDEX('-',#string)-1)
else #string end
as My_String
Take your existing code:
declare #string varchar(100) = 'BLAH90-ExtractThis-WOW'
SELECT SUBSTRING(#string,CHARINDEX('-',#string)+1, CHARINDEX('-',#string,CHARINDEX('-',#string)+1) -CHARINDEX('-',#string)-1) as My_String
insert one line, like so:
declare #string varchar(100) = 'BLAH90-ExtractThis-WOW'
SET #string = #string + '--'
SELECT SUBSTRING(#string,CHARINDEX('-',#string)+1, CHARINDEX('-',#string,CHARINDEX('-',#string)+1) -CHARINDEX('-',#string)-1) as My_String
and you're done. (If NULL, you will get NULL returned. Also, this will return all data based on the FIRST dash found in the string, regardless of however many dashes are in the string.)

How to identify and redact all instances of a matching pattern in T-SQL

I have a requirement to run a function over certain fields to identify and redact any numbers which are 5 digits or longer, ensuring all but the last 4 digits are replaced with *
For example: "Some text with 12345 and 1234 and 12345678" would become "Some text with *2345 and 1234 and ****5678"
I've used PATINDEX to identify the the starting character of the pattern:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)
I can recursively call that to get the starting character of all the occurrences, but I'm struggling with the actual redaction.
Does anyone have any pointers on how this can be done? I know to use REPLACE to insert the *s where they need to be, it's just the identification of what I should actually be replacing I'm struggling with.
Could do it on a program, but I need it to be T-SQL (can be a function if needed).
Any tips greatly appreciated!
You can do this using the built in functions of SQL Server. All of which used in this example are present in SQL Server 2008 and higher.
DECLARE #String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE #StartPos INT = 1, #EndPos INT = 1;
DECLARE #Input VARCHAR(500) = ISNULL(#String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE #OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors
WHILE (#StartPOS <> 0)
BEGIN
SET #StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #Input);
IF #StartPOS <> 0
BEGIN
SET #OutputString += SUBSTRING(#Input, 1, #StartPOS - 1); --Seperate all contents before the first occurance of our filter
SET #Input = SUBSTRING(#Input, #StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.
SET #EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', #Input)); --First occurance of 4 numbers with a not number behind it.
SET #Input = STUFF(#Input, 1, (#EndPos - 1), REPLICATE('*', (#EndPos - 1))); --#EndPos - 1 gives us the amount of chars we want to replace.
END
END
SET #OutputString += #Input; --Append the last element
SET #OutputString = LEFT(#OutputString, LEN(#OutputString))
SELECT #OutputString;
Which outputs the following:
Example Input: ******7890, 1234, *2345, **3456, ***4567, 123asd456
This entire code could also be made as a function since it only requires an input text.
A dirty solution with recursive CTE
DECLARE
#tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
#c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT #tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #tags)
UNION ALL
SELECT value, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
-- we surround the value and the string with leading/trailing ,
-- so that cloth isn't a false positive for clothing
)
SELECT * FROM Process
WHERE i=0
I think a better solution it's to add clr function in Ms SQL Server to manage regexp.
sql-clr/RegEx
Here is an option using the DelimitedSplit8K_LEAD which can be found here. https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2 This is an extension of Jeff Moden's splitter that is even a little bit faster than the original. The big advantage this splitter has over most of the others is that it returns the ordinal position of each element. One caveat to this is that I am using a space to split on based on your sample data. If you had numbers crammed in the middle of other characters this will ignore them. That may be good or bad depending on you specific requirements.
declare #Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';
with MyCTE as
(
select x.ItemNumber
, Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
from dbo.DelimitedSplit8K_LEAD(#Something, ' ') x
)
select Output = stuff((select ' ' + Result
from MyCTE
order by ItemNumber
FOR XML PATH('')), 1, 1, '')
This produces: Some text with *2345 and 1234 and ****5678

Extract number between two substrings in sql

I had a previous question and it got me started but now I'm needing help completing this. Previous question = How to search a string and return only numeric value?
Basically I have a table with one of the columns containing a very long XML string. There's a number I want to extract near the end. A sample of the number would be this...
<SendDocument DocumentID="1234567">true</SendDocument>
So I want to use substrings to find the first part = true so that Im only left with the number.
What Ive tried so far is this:
SELECT SUBSTRING(xml_column, CHARINDEX('>true</SendDocument>', xml_column) - CHARINDEX('<SendDocument',xml_column) +10087,9)
The above gives me the results but its far from being correct. My concern is that, what if the number grows from 7 digits to 8 digits, or 9 or 10?
In the previous question I was helped with this:
SELECT SUBSTRING(cip_msg, CHARINDEX('<SendDocument',cip_msg)+26,7)
and thats how I got started but I wanted to alter so that I could subtract the last portion and just be left with the numbers.
So again, first part of the string that contains the digits, find the two substrings around the digits and remove them and retrieve just the digits no matter the length.
Thank you all
You should be able to setup your SUBSTRING() so that both the starting and ending positions are variable. That way the length of the number itself doesn't matter.
From the sound of it, the starting position you want is right After the "true"
The starting position would be:
CHARINDEX('<SendDocument DocumentID=', xml_column) + 25
((adding 25 because I think CHARINDEX gives you the position at the beginning of the string you are searching for))
Length would be:
CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25
((Position of the ending text minus the position of the start text))
So, how about something along the lines of:
SELECT SUBSTRING(xml_column, CHARINDEX('<SendDocument DocumentID=', xml_column)+25,(CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25))
Have you tried working directly with the xml type? Like below:
DECLARE #TempXmlTable TABLE
(XmlElement xml )
INSERT INTO #TempXmlTable
select Convert(xml,'<SendDocument DocumentID="1234567">true</SendDocument>')
SELECT
element.value('./#DocumentID', 'varchar(50)') as DocumentID
FROM
#TempXmlTable CROSS APPLY
XmlElement.nodes('//.') AS DocumentID(element)
WHERE element.value('./#DocumentID', 'varchar(50)') is not null
If you just want to work with this as a string you can do the following:
DECLARE #SearchString varchar(max) = '<SendDocument DocumentID="1234567">true</SendDocument>'
DECLARE #Start int = (select CHARINDEX('DocumentID="',#SearchString)) + 12 -- 12 Character search pattern
DECLARE #End int = (select CHARINDEX('">', #SearchString)) - #Start --Find End Characters and subtract start position
SELECT SUBSTRING(#SearchString,#Start,#End)
Below is the extended version of parsing an XML document string. In the example below, I create a copy of a PLSQL function called INSTR, the MS SQL database does not have this by default. The function will allow me to search strings at a designated starting position. In addition, I'm parsing a sample XML string into a variable temp table into lines and only looking at lines that match my search criteria. This is because there may be many elements with the words DocumentID and I'll want to find all of them. See below:
IF EXISTS (select * from sys.objects where name = 'INSTR' and type = 'FN')
DROP FUNCTION [dbo].[INSTR]
GO
CREATE FUNCTION [dbo].[INSTR] (#String VARCHAR(8000), #SearchStr VARCHAR(255), #Start INT, #Occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #Found INT = #Occurrence,
#Position INT = #Start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #Position = CHARINDEX(#SearchStr, #String, #Position);
-- Nothing found
IF #Position IS NULL OR #Position = 0
RETURN #Position;
-- The required occurrence found
IF #Found = 1
BREAK;
-- Prepare to find another one occurrence
SET #Found = #Found - 1;
SET #Position = #Position + 1;
END
RETURN #Position;
END
GO
--Assuming well formated xml
DECLARE #XmlStringDocument varchar(max) = '<SomeTag Attrib1="5">
<SendDocument DocumentID="1234567">true</SendDocument>
<SendDocument DocumentID="1234568">true</SendDocument>
</SomeTag>'
--Split Lines on this element tag
DECLARE #SplitOn nvarchar(25) = '</SendDocument>'
--Let's hold all lines in Temp variable table
DECLARE #XmlStringLines TABLE
(
Value nvarchar(100)
)
While (Charindex(#SplitOn,#XmlStringDocument)>0)
Begin
Insert Into #XmlStringLines (value)
Select
Value = ltrim(rtrim(Substring(#XmlStringDocument,1,Charindex(#SplitOn,#XmlStringDocument)-1)))
Set #XmlStringDocument = Substring(#XmlStringDocument,Charindex(#SplitOn,#XmlStringDocument)+len(#SplitOn),len(#XmlStringDocument))
End
Insert Into #XmlStringLines (Value)
Select Value = ltrim(rtrim(#XmlStringDocument))
--Now we have a table with multple lines find all Document IDs
SELECT
StartPosition = CHARINDEX('DocumentID="',Value) + 12,
--Now lets use the INSTR function to find the first instance of '">' after our search string
EndPosition = dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1),
--Now that we know the start and end lets use substring
Value = SUBSTRING(value,(
-- Start Position
CHARINDEX('DocumentID="',Value)) + 12,
--End Position Minus Start Position
dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1) - (CHARINDEX('DocumentID="',Value) + 12))
FROM
#XmlStringLines
WHERE Value like '%DocumentID%' --Only care about lines with a document id

Replace every alpha character with itself + wildcard in string SQL Server

My goal is to create a query that will search for results related to a specific keyword.
Say in a database we had the word cat.
Regardless of if the user types C a t, C.A.T. or Cat I want to find a result related to the search as long as the alpha numeric characters are in the correct sequence that is all that matters
Say in the database we have these 4 records
cat
c/a/t
c.a.t
c. at
If the user types in C#$*(&A T I'd like to get all 4 results.
What I have written so far in my query is a function that strips any non-alphanumeric characters from the input string.
What can I do to replace each alphanumeric character with itself and add a wildcard at the end?
For every alpha character my input would look similar to this
C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
Actually, that search string will return only one record from this table: the row with 'c.a.t '.
This is because the expression C%[^a-zA-Z0-9]%A does not mean there can't be any alpha-numeric chars between C and A.
What it actually means is there should be at least one non alpha-numeric value between C and A.
Moreover, it will return incorrect values as well - a value like 'c u a s e t ' will be returned.
You need to change your where clause to something like this:
WHERE column LIKE '%C%A%T%'
AND column NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
This way, if you have cat in the correct order, the first row will resolve to true, and if there are no other alpha-numeric chars between c, a, and t the second row will resolve to true.
Here is a test script, where you can see for yourself what I mean:
DECLARE #T AS TABLE
(
a varchar(20)
)
INSERT INTO #T VALUES
('cat'),
('c/a/t'),
('c.a.t '),
('c. at'),
('c u a s e t ')
-- Incorrect where clause
SELECT *
FROM #T
WHERE a LIKE 'C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%'
-- correct where clause
SELECT *
FROM #T
WHERE a LIKE '%C%A%T%'
AND a NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
You can also see it in action in this link.
And since I had some spare time, here is a script to create both the like and the not like patterns from the input string:
DECLARE #INPUT varchar(100) = '#*# c %^&# a ^&*$&* t (*&(%!##$'
DECLARE #Index int = 1,
#CurrentChar char(1),
#Like varchar(100),
#NotLike varchar(100) = '%'
WHILE #Index < LEN(#Input)
BEGIN
SET #CurrentChar = SUBSTRING(#INPUT, #Index, 1)
IF PATINDEX('%[^a-zA-Z0-9]%', #CurrentChar) = 0
BEGIN
SET #NotLike = #NotLike + #CurrentChar + '%[a-zA-Z0-9]%'
END
SET #Index = #Index + 1
END
SELECT #NotLike = LEFT(#NotLike, LEN(#NotLike) - 12),
#Like = REPLACE(#NotLike, '%[a-zA-Z0-9]%', '%')
SELECT *
FROM #T
WHERE a LIKE #Like
AND a NOT LIKE #NotLike
You can recursively go through your (cleaned) search string and to each letter add the expression you would like. In my example #builtString should be what you would like to use further on, if I understood correctly.
declare #cleanSearch as nvarchar(10) = 'CAT'
declare #builtString as nvarchar(100) = ''
WHILE LEN(#cleanSearch) > 0 -- loop until you deplete the search string
BEGIN
SET #builtString = #builtString + substring(#cleanSearch,1,1) + '%[^a-zA-Z0-9]%' -- append the letter plus regular expression
SET #cleanSearch = right(#cleanSearch, len(#cleanSearch) - 1) -- remove first letter of the search string
END
SELECT #builtString --will look like C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
SELECT #cleanSearch --#cleanSearch is now empty

Other approach for handling this TSQL text manipulation

I have this following data:
0297144600-4799 0297485500-5599
The 0297485500-5599 based on observation always on position 31 char from the left which this is an easy approach.
But I would like to do is to anticipate just in case the data is like this below which means the position is no longer valid:
0297144600-4799 0297485500-5599 0297485600-5699
As you can see, I guess the first approach will the split by 1 blank space (" ") but due to number of space is unknown (varies) how do I take this approach then? Is there any method to find the space in between and shrink into 1 blank space (" ").
BTW ... it needs to be done in TSQL (Ms SQL 2005) unfortunately cause it's for SSIS :(
I am open with your idea/suggestion.
Thanks
I have updated my answer a bit, now that I know the number pattern will not always match. This code assumes the sequences will begin and end with a number and be separated by any number of spaces.
DECLARE #input nvarchar -- max in parens
DECLARE #pattern nvarchar -- max in parens
DECLARE #answer nvarchar -- max in parens
DECLARE #pos int
SET #input = ' 0297144623423400-4799 5615618131201561561 0297485600-5699 '
-- Make sure our search string has whitespace at the end for our pattern to match
SET #input = #input + ' '
-- Find anything that starts and ends with a number
WHILE PATINDEX('%[0-9]%[0-9] %', #input) > 0
BEGIN
-- Trim off the leading whitespace
SET #input = LTRIM(#input)
-- Find the end of the sequence by finding a space
SET #pos = PATINDEX('% %', #input)
-- Get the result out now that we know where it is
SET #answer = SUBSTRING(#input, 0, #pos)
SELECT [Result] = #answer
-- Remove the result off the front of the string so we can continue parsing
SET #input = SUBSTRING(#input, LEN(#answer) + 1, 8096)
END
Assuming you're processing one line at a time, you can also try this:
DECLARE #InputString nvarchar(max)
SET #InputString = '0297144600-4799 0297485500-5599 0297485600-5699'
BEGIN
WHILE CHARINDEX(' ',#InputString) > 0 -- Checking for double spaces
SET #InputString =
REPLACE(#InputString,' ',' ') -- Replace 2 spaces with 1 space
END
PRINT #InputString
(taken directly from SQLUSA, fnRemoveMultipleSpaces1)