A SQL Query to select a string between two known strings - sql

I need a SQL query to get the value between two known strings (the returned value should start and end with these two strings).
An example.
"All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought."
In this case the known strings are "the dog" and "immediately". So my query should return "the dog had been very bad and required harsh punishment immediately"
I've come up with this so far but to no avail:
SELECT SUBSTRING(#Text, CHARINDEX('the dog', #Text), CHARINDEX('immediately', #Text))
#Text being the variable containing the main string.
Can someone please help me with where I'm going wrong?

The problem is that the second part of your substring argument is including the first index.
You need to subtract the first index from your second index to make this work.
SELECT SUBSTRING(#Text, CHARINDEX('the dog', #Text)
, CHARINDEX('immediately',#text) - CHARINDEX('the dog', #Text) + Len('immediately'))

I think what Evan meant was this:
SELECT SUBSTRING(#Text, CHARINDEX(#First, #Text) + LEN(#First),
CHARINDEX(#Second, #Text) - CHARINDEX(#First, #Text) - LEN(#First))

An example is this: You have a string and the character $
String :
aaaaa$bbbbb$ccccc
Code:
SELECT SUBSTRING('aaaaa$bbbbb$ccccc',CHARINDEX('$','aaaaa$bbbbb$ccccc')+1, CHARINDEX('$','aaaaa$bbbbb$ccccc',CHARINDEX('$','aaaaa$bbbbb$ccccc')+1) -CHARINDEX('$','aaaaa$bbbbb$ccccc')-1) as My_String
Output:
bbbbb

You need to adjust for the LENGTH in the SUBSTRING. You were pointing it to the END of the 'ending string'.
Try something like this:
declare #TEXT varchar(200)
declare #ST varchar(200)
declare #EN varchar(200)
set #ST = 'the dog'
set #EN = 'immediately'
set #TEXT = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
SELECT SUBSTRING(#Text, CHARINDEX(#ST, #Text), (CHARINDEX(#EN, #Text)+LEN(#EN))-CHARINDEX(#ST, #Text))
Of course, you may need to adjust it a bit.

I had a similar need to parse out a set of parameters stored within an IIS logs' csUriQuery field, which looked like this: id=3598308&user=AD\user&parameter=1&listing=No needed in this format.
I ended up creating a User-defined function to accomplish a string between, with the following assumptions:
If the starting occurrence is not found, a NULL is returned, and
If the ending occurrence is not found, the rest of the string is returned
Here's the code:
CREATE FUNCTION dbo.str_between(#col varchar(max), #start varchar(50), #end varchar(50))
RETURNS varchar(max)
WITH EXECUTE AS CALLER
AS
BEGIN
RETURN substring(#col, charindex(#start, #col) + len(#start),
isnull(nullif(charindex(#end, stuff(#col, 1, charindex(#start, #col)-1, '')),0),
len(stuff(#col, 1, charindex(#start, #col)-1, ''))+1) - len(#start)-1);
END;
GO
For the above question, the usage is as follows:
DECLARE #a VARCHAR(MAX) = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
SELECT dbo.str_between(#a, 'the dog', 'immediately')
-- Yields' had been very bad and required harsh punishment '

Try this and replace '[' & ']' with your string
SELECT SUBSTRING(#TEXT,CHARINDEX('[',#TEXT)+1,(CHARINDEX(']',#TEXT)-CHARINDEX('[',#TEXT))-1)

I have a feeling you might need SQL Server's PATINDEX() function. Check this out:
Usage on Patindex() function
So maybe:
SELECT SUBSTRING(#TEXT, PATINDEX('%the dog%', #TEXT), PATINDEX('%immediately%',#TEXT))

SELECT
SUBSTRING( '123#yahoo.com', charindex('#','123#yahoo.com',1) + 1, charindex('.','123#yahoo.com',1) - charindex('#','123#yahoo.com',1) - 1 )

DECLARE #Text VARCHAR(MAX), #First VARCHAR(MAX), #Second VARCHAR(MAX)
SET #Text = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
SET #First = 'the dog'
SET #Second = 'immediately'
SELECT SUBSTRING(#Text, CHARINDEX(#First, #Text),
CHARINDEX(#Second, #Text) - CHARINDEX(#First, #Text) + LEN(#Second))

You're getting the starting position of 'punishment immediately', but passing that in as the length parameter for your substring.
You would need to substract the starting position of 'the dog' from the charindex of 'punishment immediately', and then add the length of the 'punishment immediately' string to your third parameter. This would then give you the correct text.
Here's some rough, hacky code to illustrate the process:
DECLARE #text VARCHAR(MAX)
SET #text = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
DECLARE #start INT
SELECT #start = CHARINDEX('the dog',#text)
DECLARE #endLen INT
SELECT #endLen = LEN('immediately')
DECLARE #end INT
SELECT #end = CHARINDEX('immediately',#text)
SET #end = #end - #start + #endLen
SELECT #end
SELECT SUBSTRING(#text,#start,#end)
Result: the dog had been very bad and required harsh punishment immediately

Among the many options is to create a simple function.
Can keep your code cleaner.
Gives the ability to handle errors if the start or end marker/string is not present.
This function also allows for trimming leading or trailing whitespace as an option.
SELECT dbo.GetStringBetweenMarkers('123456789', '234', '78', 0, 1)
Yields:
56
--Code to create the function
USE [xxxx_YourDB_xxxx]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[GetStringBetweenMarkers] (#FullString varchar(max), #StartMarker varchar(500), #EndMarker varchar(500), #TrimLRWhiteSpace bit, #ReportErrorInResult bit)
RETURNS varchar(max)
AS
BEGIN
--Purpose is to simply return the string between 2 string markers. ew 2022-11-06
--Will perform a LTRIM and RTRIM if #TrimLRWhiteSpace = 1
--Will report errors of either marker not being found in the RETURNed string if #ReportErrorInResult = 1.
-- When #ReportErrorInResult = 0, if the start marker isn't found, will return everything from the start of the #FullString to the left of the end marker.
-- When #ReportErrorInResult = 0, if the end marker isn't found, SQL will return an error of "Invalid length parameter passed to the LEFT or SUBSTRING function."
DECLARE #ReturnString VARCHAR(max) = ''
DECLARE #StartOfStartMarker INT = CHARINDEX(#StartMarker, #FullString)
DECLARE #StartOfTarget INT = CHARINDEX(#StartMarker, #FullString) + LEN(#StartMarker)
DECLARE #EndOfTarget INT = CHARINDEX(#EndMarker, #FullString, #StartOfTarget)
--If a marker wasn't found, put that into the
IF #ReportErrorInResult = 1
BEGIN
IF #EndOfTarget = 0 SET #ReturnString = '[ERROR: EndMarker not found.]'
IF #StartOfStartMarker = 0 SET #ReturnString = '[ERROR: StartMarker not found.]'
IF #StartOfStartMarker = 0 AND #EndOfTarget = 0 SET #ReturnString = '[ERROR: Both StartMarker and EndMarker not found.]'
END
--If not reporting errors, and start marker not found (i.e. CHARINDEX = 0) we would start our string at the LEN(#StartMarker).
-- This would give an odd result. Best to just provide from 0, i.e. the start of the #FullString.
IF #ReportErrorInResult = 0 AND #StartOfStartMarker = 0 SET #StartOfTarget = 0
--Main action
IF #ReturnString = '' SET #ReturnString = SUBSTRING(#FullString, #StartOfTarget, #EndOfTarget - #StartOfTarget)
IF #TrimLRWhiteSpace = 1 SET #ReturnString = LTRIM(RTRIM(#ReturnString))
RETURN #ReturnString
--Examples
-- SELECT '>' + dbo.GetStringBetweenMarkers('123456789','234','78',0,1) + '<' AS 'Result-Returns what is in between markers w/ white space'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56 789','234','78',0,1) + '<' AS 'Result-Without trimming white space'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56 789','234','78',1,1) + '<' AS 'Result-Will trim white space with a #TrimLRWhiteSpace = 1'
-- SELECT '>' + dbo.GetStringBetweenMarkers('abcdefgh','ABC','FG',0,1) + '<' AS 'Result-Not Case Sensitive'
-- SELECT '>' + dbo.GetStringBetweenMarkers('abc_de_fgh','_','_',0,1) + '<' AS 'Result-Using the same marker for start and end'
--Errors are returned if start or end marker are not found
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56789','zz','78',0,1) + '<' AS 'Result-Start not found'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56789','234','zz',0,1) + '<' AS 'Result-End not found'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56789','zz','zz',0,1) + '<' AS 'Result-Niether found'
--If #ReportErrorInResult = 0
-- SELECT '>' + dbo.GetStringBetweenMarkers('123456789','zz','78',0,0) + '<' AS 'Result-Start not found-Returns from the start of the #FullString'
-- SELECT '>' + dbo.GetStringBetweenMarkers('123456789','34','zz',0,0) + '<' AS 'Result-End found-should get "Invalid length parameter passed to the LEFT or SUBSTRING function."'
END
GO

SELECT SUBSTRING('aaaaa$bbbbb$ccccc',instr('aaaaa$bbbbb$ccccc','$',1,1)+1, instr('aaaaa$bbbbb$ccccc','$',1,2)-1) -instr('aaaaa$bbbbb$ccccc','$',1,1)) as My_String

Hope this helps :
Declared a variable , in case of any changes need to be made thats only once .
declare #line varchar(100)
set #line ='Email_i-Julie#mail.com'
select SUBSTRING(#line ,(charindex('-',#line)+1), CHARINDEX('#',#line)-charindex('-',#line)-1)

I needed to get (099) 0000111-> (099) | 0000111 like two different columns.
SELECT
SUBSTRING(Phone, CHARINDEX('(', Phone) + 0, (2 + ((LEN(Phone)) - CHARINDEX(')', REVERSE(Phone))) - CHARINDEX('(', Phone))) AS CodePhone,
LTRIM(SUBSTRING(Phone, CHARINDEX(')', Phone) + 1, LEN(Phone))) AS NumberPhone
FROM
Suppliers
WHERE
Phone LIKE '%(%)%'

DECLARE #text VARCHAR(MAX)
SET #text = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
DECLARE #pretext AS nvarchar(100) = 'the dog'
DECLARE #posttext AS nvarchar(100) = 'immediately'
SELECT
CASE
WHEN CHARINDEX(#posttext, #Text) - (CHARINDEX(#pretext, #Text) + len(#pretext)) < 0
THEN ''
ELSE SUBSTRING(#Text,
CHARINDEX(#pretext, #Text) + LEN(#pretext),
CHARINDEX(#posttext, #Text) - (CHARINDEX(#pretext, #Text) + LEN(#pretext)))
END AS betweentext

I'm a few years behind, but here's what I did to get a string between characters, that are not the same and also in the even you don't find the ending character, to still give you the substring
BEGIN
DECLARE #TEXT AS VARCHAR(20)
SET #TEXT='E101465445454-1'
SELECT SUBSTRING(#TEXT, CHARINDEX('E', #TEXT)+1, CHARINDEX('-',#TEXT)) as 'STR',
CAST(CHARINDEX('E', #TEXT)+1 AS INT) as 'val1', CAST(CHARINDEX('-', #TEXT) AS INT) as 'val2',
(CAST(CHARINDEX('-',#TEXT) AS INT) - CAST(CHARINDEX('E',#TEXT)+1 AS INT)) as 'SUBTR', LEN(#TEXT) as 'LEN'
SELECT CASE WHEN (CHARINDEX('-', #TEXT) > 0) THEN
SUBSTRING(#TEXT, CHARINDEX('E', #TEXT)+1, (CAST(CHARINDEX('-',#TEXT) AS INT) - CAST(CHARINDEX('E',#TEXT)+1 AS INT)))
ELSE
SUBSTRING(#TEXT, CHARINDEX('E', #TEXT)+1,LEN(#TEXT)- CHARINDEX('E', #TEXT))
END
END
Try it and comment for any improvements or if it does the job

select substring(#string,charindex('#first',#string)+1,charindex('#second',#string)-(charindex('#first',#string)+1))

Let us consider we have a string DUMMY_DATA_CODE_FILE and we want to find out the substring between 2nd and 3rd underscore(_). Then we use query something like this.
select SUBSTRING('DUMMY_DATA_CODE_FILE',charindex('_', 'DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', 1))+1)+1, (charindex('_', 'DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', 1))+1))+1)- charindex('_', 'DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', 1))+1)-1)) as Code

Related

How to identify and redact all instances of a matching pattern in T-SQL

I have a requirement to run a function over certain fields to identify and redact any numbers which are 5 digits or longer, ensuring all but the last 4 digits are replaced with *
For example: "Some text with 12345 and 1234 and 12345678" would become "Some text with *2345 and 1234 and ****5678"
I've used PATINDEX to identify the the starting character of the pattern:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)
I can recursively call that to get the starting character of all the occurrences, but I'm struggling with the actual redaction.
Does anyone have any pointers on how this can be done? I know to use REPLACE to insert the *s where they need to be, it's just the identification of what I should actually be replacing I'm struggling with.
Could do it on a program, but I need it to be T-SQL (can be a function if needed).
Any tips greatly appreciated!
You can do this using the built in functions of SQL Server. All of which used in this example are present in SQL Server 2008 and higher.
DECLARE #String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE #StartPos INT = 1, #EndPos INT = 1;
DECLARE #Input VARCHAR(500) = ISNULL(#String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE #OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors
WHILE (#StartPOS <> 0)
BEGIN
SET #StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #Input);
IF #StartPOS <> 0
BEGIN
SET #OutputString += SUBSTRING(#Input, 1, #StartPOS - 1); --Seperate all contents before the first occurance of our filter
SET #Input = SUBSTRING(#Input, #StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.
SET #EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', #Input)); --First occurance of 4 numbers with a not number behind it.
SET #Input = STUFF(#Input, 1, (#EndPos - 1), REPLICATE('*', (#EndPos - 1))); --#EndPos - 1 gives us the amount of chars we want to replace.
END
END
SET #OutputString += #Input; --Append the last element
SET #OutputString = LEFT(#OutputString, LEN(#OutputString))
SELECT #OutputString;
Which outputs the following:
Example Input: ******7890, 1234, *2345, **3456, ***4567, 123asd456
This entire code could also be made as a function since it only requires an input text.
A dirty solution with recursive CTE
DECLARE
#tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
#c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT #tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #tags)
UNION ALL
SELECT value, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
-- we surround the value and the string with leading/trailing ,
-- so that cloth isn't a false positive for clothing
)
SELECT * FROM Process
WHERE i=0
I think a better solution it's to add clr function in Ms SQL Server to manage regexp.
sql-clr/RegEx
Here is an option using the DelimitedSplit8K_LEAD which can be found here. https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2 This is an extension of Jeff Moden's splitter that is even a little bit faster than the original. The big advantage this splitter has over most of the others is that it returns the ordinal position of each element. One caveat to this is that I am using a space to split on based on your sample data. If you had numbers crammed in the middle of other characters this will ignore them. That may be good or bad depending on you specific requirements.
declare #Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';
with MyCTE as
(
select x.ItemNumber
, Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
from dbo.DelimitedSplit8K_LEAD(#Something, ' ') x
)
select Output = stuff((select ' ' + Result
from MyCTE
order by ItemNumber
FOR XML PATH('')), 1, 1, '')
This produces: Some text with *2345 and 1234 and ****5678

T-SQL SUBSTRING at certain places

I have the following example.
DECLARE #String varchar(100) = 'GAME_20131011_Set - SET_20131012_Game'
SELECT SUBSTRING(#String,0,CHARINDEX('_',#String))
SELECT SUBSTRING(#String,CHARINDEX('- ',#STRING),CHARINDEX('_',#STRING))
I want to get the words 'GAME' and 'SET' (the first word before the first '_' from both sides of ' - '.
I am getting 'GAME' but having trouble with 'SET'
UPDATE: 'GAME' and 'SET' are just examples, those words may vary.
DECLARE #String1 varchar(100) = 'GAMEE_20131011_Set - SET_20131012_Game' -- Looking for 'GAME' and 'SET'
DECLARE #String2 varchar(100) = 'GAMEE_20131011_Set - SETT_20131012_Game' -- Looking for 'GAMEE' and 'SETT'
DECLARE #String2 varchar(100) = 'GAMEEEEE_20131011_Set - SETTEEEEEEEE_20131012_Game' -- Looking for 'GAMEEEEE' and 'SETTEEEEEEEE'
As long as your two parts will always be separated be a specific character (- in your example), you could try splitting on that value:
DECLARE #String varchar(100) = 'GAME_20131011_Set - SET_20131012_Game'
DECLARE #Left varchar(100),
#Right varchar(100)
-- split into two strings based on a delimeter
SELECT #Left = RTRIM(SUBSTRING(#String, 0, CHARINDEX('-',#String)))
SELECT #Right = LTRIM(SUBSTRING(#String, CHARINDEX('-',#String)+1, LEN(#String)))
-- handle the strings individually
SELECT SUBSTRING(#Left, 0, CHARINDEX('_', #Left))
SELECT SUBSTRING(#Right, 0, CHARINDEX('_', #Right))
-- Outputs:
-- GAME
-- SET
Here's a SQLFiddle example of this: http://sqlfiddle.com/#!3/d41d8/22594
The issue that you are running into with your original query is that you are specifying CHARINDEX('- ', #String) for your start index, which will include - in any substring starting at that point. Also, with CHARINDEX('_',#STRING) for your length parameter, you will always end up with the index of the first _ character in the string.
By splitting the original string in two, you avoid these problems.
Try this
SELECT SUBSTRING(#String,0,CHARINDEX('_',#String))
SELECT SUBSTRING(#String,CHARINDEX('- ',#STRING)+1, CHARINDEX('_',#STRING)-1)
charindex takes an optional third parameter that says which poistion in the string to start the search from. You could roll this into one statement, but it's easier to read with three
Declare #start int = charindex('-', #string) + 2;
Declare #end int = charindex('_', #string, #start);
Select substring(#string, #start, #end - #start);
Example SQLFiddle

TSQL count uppcase and lowercase letters

Hi I am writing some TSQL code practices, trying to count the vowels, lowercase and uppercase letter in a string, my code worked for vowels, but somehow it counts all letters as lowercase letters, Here is my code:
DECLARE #name VARCHAR(200) ='Abc Efg Hij'
DECLARE #i int = 1
DECLARE #numVowels int = 0
DECLARE #numLower int = 0
DECLARE #numUpper int = 0
WHILE #i <= LEN(#name)
BEGIN
IF PATINDEX('%' + LOWER(SUBSTRING(#name, #i, 1)) + '%', 'aeiou') > 0
BEGIN
SET #numVowels += 1
END
IF SUBSTRING(#name, #i, 1) BETWEEN 'a' AND 'z'
BEGIN
SET #numLower += 1
END
ELSE IF SUBSTRING(#name, #i, 1) BETWEEN 'A' AND 'Z'
BEGIN
SET #numUpper += 1
END
PRINT SUBSTRING(#name, #i, 1)
SET #i +=1
END
PRINT 'There are ' + CAST((#numVowels) AS VARCHAR(200)) + ' vowels'
PRINT 'There are ' + CAST((#numLower) AS VARCHAR(200)) + ' lower-case letters'
PRINT 'There are ' + CAST((#numUpper) AS VARCHAR(200)) + ' upper-case letters'
Please help, Thanks
Your issue is related to collation, but a few tests raised more questions in my mind than it solved. First, to have your code work, you just have to replace both occurrences of:
IF SUBSTRING(#name, #i, 1) BETWEEN ...
with
IF SUBSTRING(#name, #i, 1) COLLATE Latin1_General_BIN BETWEEN ...
Forcing a binary collation will prevent SQL Server from considering that 'a' and 'A' are equal.
Now the questions raised in my mind are:
Why doesn't it work with a case sensitive collation like Latin1_General_CS_AS (9 lower-case, 0 upper-case) ? This was my first try as I was expecting your problem to be caused by a case insensitive collation, I expected it to be solved with a case-sensitive one
Why does it partially work with SQL_Latin1_General_CP1_CS_AS (8 lower-case, 1 upper-case) ? Only the first 'A' is not considered as a lower-case character and I don't know why.
That's all I can get from the top my .NET developer's head. If you are looking for more information, maybe someone else here or at https://dba.stackexchange.com/ can provide more information.

Extract a number from String in SQL

I have the following string:
"FLEETWOOD DESIGNS 535353110XXXXX" (The X's are actually numbers I just wanted to hide them here)
Does anyone know how can I search through Strings in SQL and extract numbers that are greater then lets say 10 characters long?
This a quite old post but might help anyone else. I was searching for an user defined function in SQL Server to extract only the numbers of a given string, and, surprisingly I could not find exactly what I was looking for.
Let me put here the code of a function to "Extract a number from string in SQL" (valid for SQL Server). This is taken from the fantastic blog of Pinal Dave, I've modified it just to return NULL is a NULL value is passed to the function.
CREATE FUNCTION [dbo].[ExtractInteger](#String VARCHAR(2000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #Count INT
DECLARE #IntNumbers VARCHAR(1000)
SET #Count = 0
SET #IntNumbers = ''
IF #String IS NULL
RETURN NULL;
WHILE #Count <= LEN(#String)
BEGIN
IF SUBSTRING(#String,#Count,1) >= '0' AND SUBSTRING(#String,#Count,1) <= '9'
BEGIN
SET #IntNumbers = #IntNumbers + SUBSTRING(#String,#Count,1)
END
SET #Count = #Count + 1
END
RETURN #IntNumbers
END
Tests
select '"' + dbo.ExtractInteger('1a2b3c4d5e6f7g8h9i') + '"'
GO
select '"' + dbo.ExtractInteger('abcdefghi') + '"'
GO
select '"' + dbo.ExtractInteger(NULL) + '"'
GO
select '"' + dbo.ExtractInteger('') + '"'
GO
Results
"123456789"
""
NULL
""
You don't mention the DB engine, so we don't know what features are available...
If regexpressions are available then pattern like \d{10,} would match numbers with 10 or more digit.
In mySQL REGEXP can only return true or false (0 or 1) so you'd have to use some ugly hack like
SELECT
LEAST(
INSTR(field,'0'),
INSTR(field,'1'),
INSTR(field,'2'),
INSTR(field,'3'),
INSTR(field,'4'),
INSTR(field,'5'),
INSTR(field,'6'),
INSTR(field,'7'),
INSTR(field,'8'),
INSTR(field,'9')
) AS startPos,
REVERSE(field) AS backward,
LEAST(
INSTR(backward,'0'),
INSTR(backward,'1'),
INSTR(backward,'2'),
INSTR(backward,'3'),
INSTR(backward,'4'),
INSTR(backward,'5'),
INSTR(backward,'6'),
INSTR(backward,'7'),
INSTR(backward,'8'),
INSTR(backward,'9')
) AS endPos,
SUBSTRING(field, startPos, endPos - startPos + 1)
FROM tab
WHERE(field REGEXP '[0-9]{10,}')
but this isn't perfect - it would extract false substring for string like "ABC 9 A 1234567891", not to mention that it is probably so slooooow that it is faster to go througt data by hand.
SUBSTRING('FLEETWOOD DESIGNS 535353110XXXXX', 18, 32)
You could also use LEN() to get the length of the string itself. If you know the serial number length, you can just subtract that from the end index to get your start index of the substring.
It could be done like this
Declare #X varchar(100)
Select #X= 'Here is where15234Numbers'
--
Select #X= SubString(#X,PATINDEX('%[0-9]%',#X),Len(#X))
Select #X= SubString(#X,0,PATINDEX('%[^0-9]%',#X))
--// show result
Select #X

Replace null character in a string in sql

I need to replace a null character in a sql string, i cant seem to find the right command to achieve this. I have used replace (myString ,'\0', '') but this seems not to work, any help would be great
The trick that works is to COLLATE your value to Latin1_General_BIN before using REPLACE and also use nchar(0x00) COLLATE Latin1_General_BIN for string_pattern.
REPLACE ( string_expression , string_pattern , string_replacement )
select
[Terminated] = N'123' + nchar(0) + N'567'
,[Replaced with -] = REPLACE((N'123' + nchar(0) + N'567') COLLATE Latin1_General_BIN
, nchar(0x00) COLLATE Latin1_General_BIN
,'-')
,[Removed] = REPLACE((N'123' + nchar(0) + N'567') COLLATE Latin1_General_BIN
, nchar(0x00) COLLATE Latin1_General_BIN
,'')
Here is the result (use Output To Text):
Contains Replaced with - Removed
---------- ----------------- --------
123 567 123-567 123567
Use this:
REPLACE(myString, char(0), '')
These functions remove null characters from Unicode strings, at least in SQL Server 2008.
-- Remove all null characters
CREATE FUNCTION RemoveNulls(#s nvarchar(max))
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #r nvarchar(max);
SET #r = REPLACE(#s COLLATE Latin1_General_BIN, NCHAR(0), N'');
RETURN #r;
END
-- Remove all characters from the first null character
CREATE FUNCTION TrimNull(#s nvarchar(max))
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #r nvarchar(max);
DECLARE #i int = CHARINDEX(NCHAR(0), #s COLLATE Latin1_General_BIN);
IF #i = 0
SET #r = #s;
ELSE
SET #r = SUBSTRING(#s, 1, #i - 1);
RETURN #r;
END
-- Example usage
DECLARE #s nvarchar(10) = N'Test' + NCHAR(0) + N'!';
SELECT dbo.RemoveNulls(#s), dbo.TrimNull(#s);
--> Test!, Test
In my case, fields from ODBC were padded to 8000 characters with null and TrimNull was much faster than RemoveNulls.
For latin characters:
select REPLACE('Ho'+CHAR(0)+'mer' COLLATE SQL_Latin1_General_CP1_CS_AS, CHAR(0), '')
For russian characters:
select REPLACE(('Го'+CHAR(0)+'мер') COLLATE Cyrillic_General_BIN , CHAR(0), '')
If you Only have ASCII (Char/VarChar) strings then this will work as #DyingCactus suggests:
REPLACE(myString, Char(0x00), '')
However, if you are dealing with Null-Terminated Strings and are trying to fix or convert to something like XML, and your data is Unicode (nChar/nVarChar), then use this:
(CASE WHEN UNICODE(SUBSTRING(myString, LEN(myString), 1)) = 0x0000
THEN SUBSTRING(myString, 1, LEN(myString) - 1)
ELSE myString END)
This works for both ASCII (Char/VarChar) and Unicode (nChar/nVarChar).
Note
Using the Replace() function with Char(0) or nChar(0) will NOT work for Unicode (nChar/nVarChar).
It's a bug in the SQL Server Replace() function.
You could cast as VarChar, then use Replace(), but then you would lose any special Unicode/Non-ASCII characters you might have intended to keep.
Otherwise you wouldn't have used the Unicode datatype (that takes up twice as much space to store your data) in the first place.
If you have Null-Characters mixed in with your Unicode strings (and not only at the end), and, for the purposes of your query, maintaining Unicode-specific characters are unimportant, then as a last resort you could use this :
(CASE WHEN myString LIKE (N'%' + nCHAR(0x0000) + N'%')--Has Null-Character(s).
THEN REPLACE(CAST(myString as VarChar(MAX)), Char(0x00), '')--Cast as ASCII
ELSE myString END)--Else, leave as Unicode to preserve Unicode-Only chars.
I'm not completely sure what is wrong with your strings, but here are some things to try, are you using varchar?, edit question with more details:
if you have NULL characters within a string:
declare #x varchar(10)
set #x='123'+char(0)+'456'
SELECT #x AS Has_NULL_in_it, REPLACE(#x, char(0), '') AS Has_NULL_removed
OUTPUT:
Has_NULL_in_it Has_NULL_removed
-------------- ----------------
123 456 123456
(1 row(s) affected)
If you can't tell the character within the string, try this ASCII:
DECLARE #y varchar(10),#c int
set #y='123'+char(0)+'456'
set #c=0
WHILE #c<LEN(#y)
BEGIN
SET #c=#c+1
PRINT CONVERT(varchar(5),#c)+' - '+SUBSTRING(#y,#c,1)+' - CHAR('+CONVERT(varchar(5),ASCII(SUBSTRING(#y,#c,1)))+')'
END
OUTPUT:
1 - 1 - CHAR(49)
2 - 2 - CHAR(50)
3 - 3 - CHAR(51)
4 - - CHAR(0)
5 - 4 - CHAR(52)
6 - 5 - CHAR(53)
7 - 6 - CHAR(54)
try this unicode:
DECLARE #y nvarchar(10),#c int
set #y='123'+char(0)+'456'
set #c=0
WHILE #c<LEN(#y)
BEGIN
SET #c=#c+1
PRINT CONVERT(nvarchar(5),#c)+' - '+SUBSTRING(#y,#c,1)+' - UNICODE('+CONVERT(nvarchar(5),UNICODE(SUBSTRING(#y,#c,1)))+')'
END
if your have strings that are completely NULL:
declare #z varchar(10)
set #z=NULL
select #z AS IS_NULL, ISNULL(#Z,'') AS NULL_Removed
OUTPUT:
IS_NULL NULL_Removed
---------- ------------
NULL
(1 row(s) affected)
If you are concatenating values to get your string use IsNull(value, replacement) to avoid having null values or set CONCAT_NULL_YIELDS_NULL ON to avoid null strings as a result.
We had the same problem: Ending \0 character in nvarchar fields and unable to replace it with any of the REPLACE variants proposed (SQL Server 2008). When using
LEFT(Bar, LEN(Bar)-1)
it cut off the last regular character together with the \0 !
Our solution now to correct the fields is (as weird as it may seem on a first glimpse):
UPDATE Foo
SET Bar = LEFT(Bar, LEN(Bar))
WHERE RIGHT(Bar, 1) = CHAR(0)
Examples resolved
CREATE FUNCTION dbo.F_ReplaceNullChar( #STR NVARCHAR(MAX) )
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #i INT=0
DECLARE #RET NVARCHAR(MAX)=''
WHILE #I<LEN(#STR)
BEGIN
SET #i=#i+1
IF UNICODE(SUBSTRING(#STR,#i,1)) <> 0x0000
SET #RET=#RET+SUBSTRING(#STR,#i,1)
END
RETURN #RET
END
GO
SELECT LEN(mycol) lenbefore,mycol,
LEN( dbo.F_ReplaceNullChar(mycol)) lenafter, dbo.F_ReplaceNullChar(mycol) mycolafter
FROM mytab
select zz.xx
, replace(zz.xx, '', '')
from (
select
t.string_with_null,
(
select s.string_with_null+''
from TABLE_1 s
where s.token_hash = t.token_hash
for xml path('')
) xx
from TABLE_1 t(nolock)
)zz