Remove all characters not like desired value - sql

I'm using SQL server 2012, and I have an issue with certain values. I want to extract a specific set of values from a string (which is in the entire column) and want to just retrieve the specific value.
The value is: SS44\\230433\586 and in other value it's 230084android, and the third orderno 239578
The common denominator is that all numbers start with 23, and are 6 characters long. All other values have to be removed from the string. I tried rtrim and a ltrim but that didn't give me the desired output.
I'm not sure as to how to do this without regex.

You can use PATINDEX to find the start of the number and SUBSTRING to get the next 6 digits:
declare #Value varchar(50) = 'SS44\\230433\586'
select substring(#Value, patindex('%23%', #Value), 6)
If you want to be a bit more careful with the searching, you can use PATINDEX and check next 4 symbols - are they digits:
patindex('%23[0-9][0-9][0-9][0-9]%', #Value)
Eventually, you can store the result returned and check is there a match:
declare #Value varchar(50) = 'SS44\\230433\586'
declare #StartIndex int
set #StartIndex = patindex('%23[0-9][0-9][0-9][0-9]%', #Value)
select IIF(#StartIndex > 0, substring(#Value, #StartIndex, 6), null)

Related

Joining on numeric part of string

It's been a while...I'd like to get your advice on the most efficient way to join on only the number part of a field that may be prefixed and/or suffixed with up to 2 letters. Here's a simplified snippet of what I'm trying to do:
SELECT a, b, c
FROM table 1 t1
LEFT JOIN table 2 t2 ON t1.PolicyCode = t2.sPolicyID,
Where t2.sPolicyID could begin and/or end with up to 2 letters. Some examples: TG73100, S7286674, 2344506R, etc. We only want to join to just its numeric part in between the letters, i.e. 73100, 7286674 or 2344506 from the examples.
Could someone please advise on a simple way of doing this?
Here is one way:
LEFT JOIN table 2 t2 ON t1.PolicyCode =
LEFT(SUBSTRING(t2.sPolicyID, PATINDEX('%[0-9]%', t2.sPolicyID), 50),
PATINDEX('%[^0-9]%',
SUBSTRING(t2.sPolicyID, PATINDEX('%[0-9]%', t2.sPolicyID), 50)
+ 'a') -1)
To break this down, there are 4 main parts.
1: Find the position of the first number with PATINDEX:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT PATINDEX('%[0-9]%', #spolicyID)
--Returns 3
2: Use SUBSTRING() to cut off everything before the first letter:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50)
--Returns 123123xx
If we hardcoded the 3 that we know is returned from the first part, it would look like this:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT SUBSTRING(#spolicyID, 3), 50)
--50 is the number of characters to extract, set to something
--higher than the max string length to be safe
Of course, we don't want to hardcode it since it can change, but that makes seeing the different functions a bit easier.
3: Find the position of the next letter using PATINDEX again:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT PATINDEX('%[^0-9]%', SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50) + 'a')
--Returns 7 since it is looking at 123123xx
--The first x is in the 7th position
Note that we added an a onto the string. This is because if we had a string with no letters at the end, it would throw an error as the length 0 would be returned to SUBSTRING. You could add any letter or letters to the end and it would work, we are just making sure there is at least one. Try removing the + 'a' and using a string like xx123123 to see the error.
If we hardcoded the 123123xx from step 2 it would look like this (again just for easy example):
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
SELECT PATINDEX('%[^0-9]%', '123123xx' + 'a')
4: Use LEFT() to return everything before the trailing letters, leaving us with only the numbers in between:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
LEFT(SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50),PATINDEX('%[^0-9]%', SUBSTRING(#spolicyID, PATINDEX('%[0-9]%', #spolicyID), 50) + 'a') -1)
--Need to add `-1` because step 3 PATINDEX returns 7
--as the position of first trailing letter, and
--we want the 6 characters before that
And again hardcoded from step 2 and 3 for easy viewing:
DECLARE #spolicyID VARCHAR(20) = 'xx123123xx'
LEFT('123123xx', 7-1)

Remove characters before and after string SQL

I would like to remove all characters before and after a string in the select statement.
In the example below I would like to remove everything before and including /Supply> and after and including >/
Note the remaining part will be a fixed number of characters.
Any help would be much appreciated
Eg.
abs/Supply>hhfhjgglldppprrr>/llllllldsfsjhfhhhfdhudfhfhdhdfhfsd
Would become:
hhfhjgglldppprrr
If your input always has exactly two instances of ">" you could use PARSENAME.
declare #SomeValue varchar(100) = 'abs/Supply>hhfhjgglldppprrr>/llllllldsfsjhfhhhfdhudfhfhdhdfhfsd'
select PARSENAME(replace(#SomeValue, '>', '.'), 2)
This will not work correctly if your data also contains any periods (.). We can deal with that if needed with a couple of replace statements. Still very simple and easy to maintain with the same caveat of exactly 2 >.
declare #SomeOtherValue varchar(100) = 'abs/Supply>hhfhjgg.lldppprrr>/llllllldsfsjhfhhhfdhudfhfhdhdfhfsd'
select replace(PARSENAME(replace(replace(#SomeOtherValue, '.', '~!##'), '>', '.'), 2), '~!##', '.')
You can use PATINDEX() to identify the position of the patterns you are looking for (/Supply> and >/) then remove them based on the length of the string:
SELECT LEFT(RIGHT(col,LEN(col) - PATINDEX('%/Supply>%',col) -7), PATINDEX('%>/%', RIGHT(col,LEN(col) - PATINDEX('%Supply>%',col) -7))-1)
Simply replace col in the above with your column name.
Example below with test string abs/Supply>keep>/remove
First remove everything before and including /Supply>:
SELECT RIGHT('abs/Supply>keep>/remove',LEN('abs/Supply>keep>/remove') - PATINDEX('%/Supply>%','abs/Supply>keep>/remove') -7)
This will give keep>/remove
Then remove everything after and including >/:
SELECT LEFT('keep>/remove',PATINDEX('%>/%','keep>/remove') - 1)
This will give keep, the part of the string you want.
Here is the combined version, same as above, just includes the test string instead of col so you can run it easily:
SELECT LEFT(RIGHT('abs/Supply>keep>/remove',LEN('abs/Supply>keep>/remove') - PATINDEX('%/Supply>%','abs/Supply>keep>/remove') -7), PATINDEX('%>/%', RIGHT('abs/Supply>keep>/remove',LEN('abs/Supply>keep>/remove') - PATINDEX('%/Supply>%','abs/Supply>keep>/remove') -7))-1)
This will give keep. You can also replace the string above with the one in your question, I just used a different test string because it is shorter and makes the code more readable.
try this:
DECLARE #inputStr VARCHAR(max)= 'abs/Supply>hhfhjgglldppprrr>/llllllldsfsjhfhhhfdhudfhfhdhdfhfsd'
DECLARE #startString VARCHAR(100)='/Supply>'
DECLARE #EndString VARCHAR(100)='>/'
DECLARE #LenStartString INT = LEN(#startString)
DECLARe #TempInputString VARCHAR(max)='';
DECLARE #StartIndex INT
DECLARE #EndIndex INT
SELECT #StartIndex = CHARINDEX(#startString,#inputStr)+#LenStartString
SELECT #TempInputString = STUFF(#inputStr, 1, #StartIndex, '')
SELECT SUBSTRING(#TempInputString,0,CHARINDEX(#EndString,#TempInputString))
In Single Line
DECLARE #inputStr VARCHAR(max)= 'abs/Supply>hhfhjgglldppprrr>/llllllldsfsjhfhhhfdhudfhfhdhdfhfsd'
DECLARE #startString VARCHAR(100)='/Supply>'
DECLARE #EndString VARCHAR(100)='>/'
SELECT SUBSTRING(STUFF(#inputStr, 1, CHARINDEX(#startString,#inputStr)+LEN(#startString), ''),0,CHARINDEX(#EndString,STUFF(#inputStr, 1,CHARINDEX(#startString,#inputStr)+LEN(#startString), '')))

Extract number between two substrings in sql

I had a previous question and it got me started but now I'm needing help completing this. Previous question = How to search a string and return only numeric value?
Basically I have a table with one of the columns containing a very long XML string. There's a number I want to extract near the end. A sample of the number would be this...
<SendDocument DocumentID="1234567">true</SendDocument>
So I want to use substrings to find the first part = true so that Im only left with the number.
What Ive tried so far is this:
SELECT SUBSTRING(xml_column, CHARINDEX('>true</SendDocument>', xml_column) - CHARINDEX('<SendDocument',xml_column) +10087,9)
The above gives me the results but its far from being correct. My concern is that, what if the number grows from 7 digits to 8 digits, or 9 or 10?
In the previous question I was helped with this:
SELECT SUBSTRING(cip_msg, CHARINDEX('<SendDocument',cip_msg)+26,7)
and thats how I got started but I wanted to alter so that I could subtract the last portion and just be left with the numbers.
So again, first part of the string that contains the digits, find the two substrings around the digits and remove them and retrieve just the digits no matter the length.
Thank you all
You should be able to setup your SUBSTRING() so that both the starting and ending positions are variable. That way the length of the number itself doesn't matter.
From the sound of it, the starting position you want is right After the "true"
The starting position would be:
CHARINDEX('<SendDocument DocumentID=', xml_column) + 25
((adding 25 because I think CHARINDEX gives you the position at the beginning of the string you are searching for))
Length would be:
CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25
((Position of the ending text minus the position of the start text))
So, how about something along the lines of:
SELECT SUBSTRING(xml_column, CHARINDEX('<SendDocument DocumentID=', xml_column)+25,(CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25))
Have you tried working directly with the xml type? Like below:
DECLARE #TempXmlTable TABLE
(XmlElement xml )
INSERT INTO #TempXmlTable
select Convert(xml,'<SendDocument DocumentID="1234567">true</SendDocument>')
SELECT
element.value('./#DocumentID', 'varchar(50)') as DocumentID
FROM
#TempXmlTable CROSS APPLY
XmlElement.nodes('//.') AS DocumentID(element)
WHERE element.value('./#DocumentID', 'varchar(50)') is not null
If you just want to work with this as a string you can do the following:
DECLARE #SearchString varchar(max) = '<SendDocument DocumentID="1234567">true</SendDocument>'
DECLARE #Start int = (select CHARINDEX('DocumentID="',#SearchString)) + 12 -- 12 Character search pattern
DECLARE #End int = (select CHARINDEX('">', #SearchString)) - #Start --Find End Characters and subtract start position
SELECT SUBSTRING(#SearchString,#Start,#End)
Below is the extended version of parsing an XML document string. In the example below, I create a copy of a PLSQL function called INSTR, the MS SQL database does not have this by default. The function will allow me to search strings at a designated starting position. In addition, I'm parsing a sample XML string into a variable temp table into lines and only looking at lines that match my search criteria. This is because there may be many elements with the words DocumentID and I'll want to find all of them. See below:
IF EXISTS (select * from sys.objects where name = 'INSTR' and type = 'FN')
DROP FUNCTION [dbo].[INSTR]
GO
CREATE FUNCTION [dbo].[INSTR] (#String VARCHAR(8000), #SearchStr VARCHAR(255), #Start INT, #Occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #Found INT = #Occurrence,
#Position INT = #Start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #Position = CHARINDEX(#SearchStr, #String, #Position);
-- Nothing found
IF #Position IS NULL OR #Position = 0
RETURN #Position;
-- The required occurrence found
IF #Found = 1
BREAK;
-- Prepare to find another one occurrence
SET #Found = #Found - 1;
SET #Position = #Position + 1;
END
RETURN #Position;
END
GO
--Assuming well formated xml
DECLARE #XmlStringDocument varchar(max) = '<SomeTag Attrib1="5">
<SendDocument DocumentID="1234567">true</SendDocument>
<SendDocument DocumentID="1234568">true</SendDocument>
</SomeTag>'
--Split Lines on this element tag
DECLARE #SplitOn nvarchar(25) = '</SendDocument>'
--Let's hold all lines in Temp variable table
DECLARE #XmlStringLines TABLE
(
Value nvarchar(100)
)
While (Charindex(#SplitOn,#XmlStringDocument)>0)
Begin
Insert Into #XmlStringLines (value)
Select
Value = ltrim(rtrim(Substring(#XmlStringDocument,1,Charindex(#SplitOn,#XmlStringDocument)-1)))
Set #XmlStringDocument = Substring(#XmlStringDocument,Charindex(#SplitOn,#XmlStringDocument)+len(#SplitOn),len(#XmlStringDocument))
End
Insert Into #XmlStringLines (Value)
Select Value = ltrim(rtrim(#XmlStringDocument))
--Now we have a table with multple lines find all Document IDs
SELECT
StartPosition = CHARINDEX('DocumentID="',Value) + 12,
--Now lets use the INSTR function to find the first instance of '">' after our search string
EndPosition = dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1),
--Now that we know the start and end lets use substring
Value = SUBSTRING(value,(
-- Start Position
CHARINDEX('DocumentID="',Value)) + 12,
--End Position Minus Start Position
dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1) - (CHARINDEX('DocumentID="',Value) + 12))
FROM
#XmlStringLines
WHERE Value like '%DocumentID%' --Only care about lines with a document id

Simple Explanation for PATINDEX

I have have been reading up on PATINDEX attempting to understand what and why. I understand the when using the wildcards it will return an INT as to where that character(s) appears/starts. So:
SELECT PATINDEX('%b%', '123b') -- returns 4
However I am looking to see if someone can explain the reason as to why you would use this in a simple(ish) way. I have read some other forums but it just is not sinking in to be honest.
Are you asking for realistic use-cases? I can think of two, real-life use-cases that I've had at work where PATINDEX() was my best option.
I had to import a text-file and parse it for INSERT INTO later on. But these files sometimes had numbers in this format: 00000-59. If you try CAST('00000-59' AS INT) you'll get an error. So I needed code that would parse 00000-59 to -59 but also 00000159 to 159 etc. The - could be anywhere, or it could simply not be there at all. This is what I did:
DECLARE #my_var VARCHAR(255) = '00000-59', #my_int INT
SET #my_var = STUFF(#my_var, 1, PATINDEX('%[^0]%', #my_var)-1, '')
SET #my_int = CAST(#my_var AS INT)
[^0] in this case means "any character that isn't a 0". So PATINDEX() tells me when the 0's end, regardless of whether that's because of a - or a number.
The second use-case I've had was checking whether an IBAN number was correct. In order to do that, any letters in the IBAN need to be changed to a corresponding number (A=10, B=11, etc...). I did something like this (incomplete but you get the idea):
SET #i = PATINDEX('%[^0-9]%', #IBAN)
WHILE #i <> 0 BEGIN
SET #num = UNICODE(SUBSTRING(#IBAN, #i, 1))-55
SET #IBAN = STUFF(#IBAN, #i, 1, CAST(#num AS VARCHAR(2))
SET #i = PATINDEX('%[^0-9]%', #IBAN)
END
So again, I'm not concerned with finding exactly the letter A or B etc. I'm just finding anything that isn't a number and converting it.
PATINDEX is roughly equivalent to CHARINDEX except that it returns the position of a pattern instead of single character. Examples:
Check if a string contains at least one digit:
SELECT PATINDEX('%[0-9]%', 'Hello') -- 0
SELECT PATINDEX('%[0-9]%', 'H3110') -- 2
Extract numeric portion from a string:
SELECT SUBSTRING('12345', PATINDEX('%[0-9]%', '12345'), 100) -- 12345
SELECT SUBSTRING('x2345', PATINDEX('%[0-9]%', 'x2345'), 100) -- 2345
SELECT SUBSTRING('xx345', PATINDEX('%[0-9]%', 'xx345'), 100) -- 345
Quoted from PATINDEX (Transact-SQL)
The following example uses % and _ wildcards to find the position at
which the pattern 'en', followed by any one character and 'ure' starts
in the specified string (index starts at 1):
SELECT PATINDEX('%en_ure%', 'please ensure the door is locked');
Here is the result set.
8
You'd use the PATINDEX function when you want to know at which character position a pattern begins in an expression of a valid text or character data type.

Split string won't work when passing a single item unless it has a comma after it. Why?

I have a split string function like so:
ALTER Function [dbo].[fnParmSplitter]
(#Parm Varchar(100) )
Returns #tblSplit Table (Parm Int)
AS
BEGIN
-- Append comma
SET #Parm = #Parm + ','
-- Indexes to keep position
Declare #pos1 int
Declare #pos2 int
-- Start from first character
SET #pos1 = 1
SET #pos2 = 1
WHILE #pos1 < LEN(#Parm)
BEGIN
SET #pos1 = CHARINDEX(',', #Parm, #pos1)
INSERT #tblSplit SELECT CAST(Substring(#Parm, #pos2, #pos1-#pos2) AS int)
-- Go to next non-comma character
SET #pos2 = #pos1 + 1
-- Search from the next character
SET #pos1 = #pos1 + 1
END
RETURN
END
It works if I send it a string of items like this: 1,2,3 but if I try to send it a single item (3) it doesn't work unless I put in a comma after 3 (3,). I cannot figure out why, though I'm sure it's staring me in the face.
Looks like you need your insert to read:
INSERT #tblSplit SELECT CAST(Substring(#Parm, #pos2, #pos1-#pos2) AS int)
The last parameter is the length so cannot be negative. This gives for:
select * from [dbo].[fnParmSplitter]('1')
select * from [dbo].[fnParmSplitter]('1,2')
select * from [dbo].[fnParmSplitter]('1,')
The response:
Parm
-----------
1
(1 row(s) affected)
Parm
-----------
1
2
(2 row(s) affected)
Parm
-----------
1
(1 row(s) affected)
There is a typo in the substring statement:
substring(#parm, #pos2, -#pos2)
should read
substring(#param, #pos2, #pos1 - #pos2)
Example SQLFiddle
However, what you have in the question wouldn't work for any input. So I suspect there is something else going on.
The line causing your problem is:
INSERT #tblSplit SELECT CAST(Substring(#Parm, #pos2, -#pos2) AS int)
Replace the above line with:
INSERT #tblSplit SELECT CAST(Substring(#Parm, #pos2, #pos1 - #pos2) AS int)
EDIT: I misunderstood the original question. I thought the OP wanted to know why the line about the comma was necessary to the function, rather than why some strings being passed in weren't working. Planning to delete once I know the conversation in the comments is over.
As part of your function, when you pass in a string with no commas you are effectively trying to run:
SELECT SUBSTRING(#Parm, 1, 0-1)
That returns a negative value, and errors out.
Your #pos1 and #pos2 variables both start out with a positive value, but the first line of your WHILE loop is setting #pos1 to zero (the CHARINDEX() function returned 0 because there was no comma in your search string).
DECLARE #Parm VARCHAR(100)
SET #Parm = 'value'
SELECT CHARINDEX(',', #Parm, 1) -- returns zero, indicating no comma found
SET #Parm = #Parm + ','
SELECT CHARINDEX(',', #Parm, 1) -- returns 6, the position of a comma in the string.
You then subtract #pos2 (with a value of 1) from #pos1 (with a value of 0), and ask the CHARINDEX() function to return -1 values from #Parm starting with the first character. This isn't possible, and you receive an error message.