I need a SQL query to get the value between two known strings (the returned value should start and end with these two strings).
An example.
"All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought."
In this case the known strings are "the dog" and "immediately". So my query should return "the dog had been very bad and required harsh punishment immediately"
I've come up with this so far but to no avail:
SELECT SUBSTRING(#Text, CHARINDEX('the dog', #Text), CHARINDEX('immediately', #Text))
#Text being the variable containing the main string.
Can someone please help me with where I'm going wrong?
The problem is that the second part of your substring argument is including the first index.
You need to subtract the first index from your second index to make this work.
SELECT SUBSTRING(#Text, CHARINDEX('the dog', #Text)
, CHARINDEX('immediately',#text) - CHARINDEX('the dog', #Text) + Len('immediately'))
I think what Evan meant was this:
SELECT SUBSTRING(#Text, CHARINDEX(#First, #Text) + LEN(#First),
CHARINDEX(#Second, #Text) - CHARINDEX(#First, #Text) - LEN(#First))
An example is this: You have a string and the character $
String :
aaaaa$bbbbb$ccccc
Code:
SELECT SUBSTRING('aaaaa$bbbbb$ccccc',CHARINDEX('$','aaaaa$bbbbb$ccccc')+1, CHARINDEX('$','aaaaa$bbbbb$ccccc',CHARINDEX('$','aaaaa$bbbbb$ccccc')+1) -CHARINDEX('$','aaaaa$bbbbb$ccccc')-1) as My_String
Output:
bbbbb
You need to adjust for the LENGTH in the SUBSTRING. You were pointing it to the END of the 'ending string'.
Try something like this:
declare #TEXT varchar(200)
declare #ST varchar(200)
declare #EN varchar(200)
set #ST = 'the dog'
set #EN = 'immediately'
set #TEXT = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
SELECT SUBSTRING(#Text, CHARINDEX(#ST, #Text), (CHARINDEX(#EN, #Text)+LEN(#EN))-CHARINDEX(#ST, #Text))
Of course, you may need to adjust it a bit.
I had a similar need to parse out a set of parameters stored within an IIS logs' csUriQuery field, which looked like this: id=3598308&user=AD\user¶meter=1&listing=No needed in this format.
I ended up creating a User-defined function to accomplish a string between, with the following assumptions:
If the starting occurrence is not found, a NULL is returned, and
If the ending occurrence is not found, the rest of the string is returned
Here's the code:
CREATE FUNCTION dbo.str_between(#col varchar(max), #start varchar(50), #end varchar(50))
RETURNS varchar(max)
WITH EXECUTE AS CALLER
AS
BEGIN
RETURN substring(#col, charindex(#start, #col) + len(#start),
isnull(nullif(charindex(#end, stuff(#col, 1, charindex(#start, #col)-1, '')),0),
len(stuff(#col, 1, charindex(#start, #col)-1, ''))+1) - len(#start)-1);
END;
GO
For the above question, the usage is as follows:
DECLARE #a VARCHAR(MAX) = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
SELECT dbo.str_between(#a, 'the dog', 'immediately')
-- Yields' had been very bad and required harsh punishment '
Try this and replace '[' & ']' with your string
SELECT SUBSTRING(#TEXT,CHARINDEX('[',#TEXT)+1,(CHARINDEX(']',#TEXT)-CHARINDEX('[',#TEXT))-1)
I have a feeling you might need SQL Server's PATINDEX() function. Check this out:
Usage on Patindex() function
So maybe:
SELECT SUBSTRING(#TEXT, PATINDEX('%the dog%', #TEXT), PATINDEX('%immediately%',#TEXT))
SELECT
SUBSTRING( '123#yahoo.com', charindex('#','123#yahoo.com',1) + 1, charindex('.','123#yahoo.com',1) - charindex('#','123#yahoo.com',1) - 1 )
DECLARE #Text VARCHAR(MAX), #First VARCHAR(MAX), #Second VARCHAR(MAX)
SET #Text = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
SET #First = 'the dog'
SET #Second = 'immediately'
SELECT SUBSTRING(#Text, CHARINDEX(#First, #Text),
CHARINDEX(#Second, #Text) - CHARINDEX(#First, #Text) + LEN(#Second))
You're getting the starting position of 'punishment immediately', but passing that in as the length parameter for your substring.
You would need to substract the starting position of 'the dog' from the charindex of 'punishment immediately', and then add the length of the 'punishment immediately' string to your third parameter. This would then give you the correct text.
Here's some rough, hacky code to illustrate the process:
DECLARE #text VARCHAR(MAX)
SET #text = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
DECLARE #start INT
SELECT #start = CHARINDEX('the dog',#text)
DECLARE #endLen INT
SELECT #endLen = LEN('immediately')
DECLARE #end INT
SELECT #end = CHARINDEX('immediately',#text)
SET #end = #end - #start + #endLen
SELECT #end
SELECT SUBSTRING(#text,#start,#end)
Result: the dog had been very bad and required harsh punishment immediately
Among the many options is to create a simple function.
Can keep your code cleaner.
Gives the ability to handle errors if the start or end marker/string is not present.
This function also allows for trimming leading or trailing whitespace as an option.
SELECT dbo.GetStringBetweenMarkers('123456789', '234', '78', 0, 1)
Yields:
56
--Code to create the function
USE [xxxx_YourDB_xxxx]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[GetStringBetweenMarkers] (#FullString varchar(max), #StartMarker varchar(500), #EndMarker varchar(500), #TrimLRWhiteSpace bit, #ReportErrorInResult bit)
RETURNS varchar(max)
AS
BEGIN
--Purpose is to simply return the string between 2 string markers. ew 2022-11-06
--Will perform a LTRIM and RTRIM if #TrimLRWhiteSpace = 1
--Will report errors of either marker not being found in the RETURNed string if #ReportErrorInResult = 1.
-- When #ReportErrorInResult = 0, if the start marker isn't found, will return everything from the start of the #FullString to the left of the end marker.
-- When #ReportErrorInResult = 0, if the end marker isn't found, SQL will return an error of "Invalid length parameter passed to the LEFT or SUBSTRING function."
DECLARE #ReturnString VARCHAR(max) = ''
DECLARE #StartOfStartMarker INT = CHARINDEX(#StartMarker, #FullString)
DECLARE #StartOfTarget INT = CHARINDEX(#StartMarker, #FullString) + LEN(#StartMarker)
DECLARE #EndOfTarget INT = CHARINDEX(#EndMarker, #FullString, #StartOfTarget)
--If a marker wasn't found, put that into the
IF #ReportErrorInResult = 1
BEGIN
IF #EndOfTarget = 0 SET #ReturnString = '[ERROR: EndMarker not found.]'
IF #StartOfStartMarker = 0 SET #ReturnString = '[ERROR: StartMarker not found.]'
IF #StartOfStartMarker = 0 AND #EndOfTarget = 0 SET #ReturnString = '[ERROR: Both StartMarker and EndMarker not found.]'
END
--If not reporting errors, and start marker not found (i.e. CHARINDEX = 0) we would start our string at the LEN(#StartMarker).
-- This would give an odd result. Best to just provide from 0, i.e. the start of the #FullString.
IF #ReportErrorInResult = 0 AND #StartOfStartMarker = 0 SET #StartOfTarget = 0
--Main action
IF #ReturnString = '' SET #ReturnString = SUBSTRING(#FullString, #StartOfTarget, #EndOfTarget - #StartOfTarget)
IF #TrimLRWhiteSpace = 1 SET #ReturnString = LTRIM(RTRIM(#ReturnString))
RETURN #ReturnString
--Examples
-- SELECT '>' + dbo.GetStringBetweenMarkers('123456789','234','78',0,1) + '<' AS 'Result-Returns what is in between markers w/ white space'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56 789','234','78',0,1) + '<' AS 'Result-Without trimming white space'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56 789','234','78',1,1) + '<' AS 'Result-Will trim white space with a #TrimLRWhiteSpace = 1'
-- SELECT '>' + dbo.GetStringBetweenMarkers('abcdefgh','ABC','FG',0,1) + '<' AS 'Result-Not Case Sensitive'
-- SELECT '>' + dbo.GetStringBetweenMarkers('abc_de_fgh','_','_',0,1) + '<' AS 'Result-Using the same marker for start and end'
--Errors are returned if start or end marker are not found
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56789','zz','78',0,1) + '<' AS 'Result-Start not found'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56789','234','zz',0,1) + '<' AS 'Result-End not found'
-- SELECT '>' + dbo.GetStringBetweenMarkers('1234 56789','zz','zz',0,1) + '<' AS 'Result-Niether found'
--If #ReportErrorInResult = 0
-- SELECT '>' + dbo.GetStringBetweenMarkers('123456789','zz','78',0,0) + '<' AS 'Result-Start not found-Returns from the start of the #FullString'
-- SELECT '>' + dbo.GetStringBetweenMarkers('123456789','34','zz',0,0) + '<' AS 'Result-End found-should get "Invalid length parameter passed to the LEFT or SUBSTRING function."'
END
GO
SELECT SUBSTRING('aaaaa$bbbbb$ccccc',instr('aaaaa$bbbbb$ccccc','$',1,1)+1, instr('aaaaa$bbbbb$ccccc','$',1,2)-1) -instr('aaaaa$bbbbb$ccccc','$',1,1)) as My_String
Hope this helps :
Declared a variable , in case of any changes need to be made thats only once .
declare #line varchar(100)
set #line ='Email_i-Julie#mail.com'
select SUBSTRING(#line ,(charindex('-',#line)+1), CHARINDEX('#',#line)-charindex('-',#line)-1)
I needed to get (099) 0000111-> (099) | 0000111 like two different columns.
SELECT
SUBSTRING(Phone, CHARINDEX('(', Phone) + 0, (2 + ((LEN(Phone)) - CHARINDEX(')', REVERSE(Phone))) - CHARINDEX('(', Phone))) AS CodePhone,
LTRIM(SUBSTRING(Phone, CHARINDEX(')', Phone) + 1, LEN(Phone))) AS NumberPhone
FROM
Suppliers
WHERE
Phone LIKE '%(%)%'
DECLARE #text VARCHAR(MAX)
SET #text = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
DECLARE #pretext AS nvarchar(100) = 'the dog'
DECLARE #posttext AS nvarchar(100) = 'immediately'
SELECT
CASE
WHEN CHARINDEX(#posttext, #Text) - (CHARINDEX(#pretext, #Text) + len(#pretext)) < 0
THEN ''
ELSE SUBSTRING(#Text,
CHARINDEX(#pretext, #Text) + LEN(#pretext),
CHARINDEX(#posttext, #Text) - (CHARINDEX(#pretext, #Text) + LEN(#pretext)))
END AS betweentext
I'm a few years behind, but here's what I did to get a string between characters, that are not the same and also in the even you don't find the ending character, to still give you the substring
BEGIN
DECLARE #TEXT AS VARCHAR(20)
SET #TEXT='E101465445454-1'
SELECT SUBSTRING(#TEXT, CHARINDEX('E', #TEXT)+1, CHARINDEX('-',#TEXT)) as 'STR',
CAST(CHARINDEX('E', #TEXT)+1 AS INT) as 'val1', CAST(CHARINDEX('-', #TEXT) AS INT) as 'val2',
(CAST(CHARINDEX('-',#TEXT) AS INT) - CAST(CHARINDEX('E',#TEXT)+1 AS INT)) as 'SUBTR', LEN(#TEXT) as 'LEN'
SELECT CASE WHEN (CHARINDEX('-', #TEXT) > 0) THEN
SUBSTRING(#TEXT, CHARINDEX('E', #TEXT)+1, (CAST(CHARINDEX('-',#TEXT) AS INT) - CAST(CHARINDEX('E',#TEXT)+1 AS INT)))
ELSE
SUBSTRING(#TEXT, CHARINDEX('E', #TEXT)+1,LEN(#TEXT)- CHARINDEX('E', #TEXT))
END
END
Try it and comment for any improvements or if it does the job
select substring(#string,charindex('#first',#string)+1,charindex('#second',#string)-(charindex('#first',#string)+1))
Let us consider we have a string DUMMY_DATA_CODE_FILE and we want to find out the substring between 2nd and 3rd underscore(_). Then we use query something like this.
select SUBSTRING('DUMMY_DATA_CODE_FILE',charindex('_', 'DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', 1))+1)+1, (charindex('_', 'DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', 1))+1))+1)- charindex('_', 'DUMMY_DATA_CODE_FILE', (charindex('_','DUMMY_DATA_CODE_FILE', 1))+1)-1)) as Code
I have the following string:
"FLEETWOOD DESIGNS 535353110XXXXX" (The X's are actually numbers I just wanted to hide them here)
Does anyone know how can I search through Strings in SQL and extract numbers that are greater then lets say 10 characters long?
This a quite old post but might help anyone else. I was searching for an user defined function in SQL Server to extract only the numbers of a given string, and, surprisingly I could not find exactly what I was looking for.
Let me put here the code of a function to "Extract a number from string in SQL" (valid for SQL Server). This is taken from the fantastic blog of Pinal Dave, I've modified it just to return NULL is a NULL value is passed to the function.
CREATE FUNCTION [dbo].[ExtractInteger](#String VARCHAR(2000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #Count INT
DECLARE #IntNumbers VARCHAR(1000)
SET #Count = 0
SET #IntNumbers = ''
IF #String IS NULL
RETURN NULL;
WHILE #Count <= LEN(#String)
BEGIN
IF SUBSTRING(#String,#Count,1) >= '0' AND SUBSTRING(#String,#Count,1) <= '9'
BEGIN
SET #IntNumbers = #IntNumbers + SUBSTRING(#String,#Count,1)
END
SET #Count = #Count + 1
END
RETURN #IntNumbers
END
Tests
select '"' + dbo.ExtractInteger('1a2b3c4d5e6f7g8h9i') + '"'
GO
select '"' + dbo.ExtractInteger('abcdefghi') + '"'
GO
select '"' + dbo.ExtractInteger(NULL) + '"'
GO
select '"' + dbo.ExtractInteger('') + '"'
GO
Results
"123456789"
""
NULL
""
You don't mention the DB engine, so we don't know what features are available...
If regexpressions are available then pattern like \d{10,} would match numbers with 10 or more digit.
In mySQL REGEXP can only return true or false (0 or 1) so you'd have to use some ugly hack like
SELECT
LEAST(
INSTR(field,'0'),
INSTR(field,'1'),
INSTR(field,'2'),
INSTR(field,'3'),
INSTR(field,'4'),
INSTR(field,'5'),
INSTR(field,'6'),
INSTR(field,'7'),
INSTR(field,'8'),
INSTR(field,'9')
) AS startPos,
REVERSE(field) AS backward,
LEAST(
INSTR(backward,'0'),
INSTR(backward,'1'),
INSTR(backward,'2'),
INSTR(backward,'3'),
INSTR(backward,'4'),
INSTR(backward,'5'),
INSTR(backward,'6'),
INSTR(backward,'7'),
INSTR(backward,'8'),
INSTR(backward,'9')
) AS endPos,
SUBSTRING(field, startPos, endPos - startPos + 1)
FROM tab
WHERE(field REGEXP '[0-9]{10,}')
but this isn't perfect - it would extract false substring for string like "ABC 9 A 1234567891", not to mention that it is probably so slooooow that it is faster to go througt data by hand.
SUBSTRING('FLEETWOOD DESIGNS 535353110XXXXX', 18, 32)
You could also use LEN() to get the length of the string itself. If you know the serial number length, you can just subtract that from the end index to get your start index of the substring.
It could be done like this
Declare #X varchar(100)
Select #X= 'Here is where15234Numbers'
--
Select #X= SubString(#X,PATINDEX('%[0-9]%',#X),Len(#X))
Select #X= SubString(#X,0,PATINDEX('%[^0-9]%',#X))
--// show result
Select #X
I need to replace a null character in a sql string, i cant seem to find the right command to achieve this. I have used replace (myString ,'\0', '') but this seems not to work, any help would be great
The trick that works is to COLLATE your value to Latin1_General_BIN before using REPLACE and also use nchar(0x00) COLLATE Latin1_General_BIN for string_pattern.
REPLACE ( string_expression , string_pattern , string_replacement )
select
[Terminated] = N'123' + nchar(0) + N'567'
,[Replaced with -] = REPLACE((N'123' + nchar(0) + N'567') COLLATE Latin1_General_BIN
, nchar(0x00) COLLATE Latin1_General_BIN
,'-')
,[Removed] = REPLACE((N'123' + nchar(0) + N'567') COLLATE Latin1_General_BIN
, nchar(0x00) COLLATE Latin1_General_BIN
,'')
Here is the result (use Output To Text):
Contains Replaced with - Removed
---------- ----------------- --------
123 567 123-567 123567
Use this:
REPLACE(myString, char(0), '')
These functions remove null characters from Unicode strings, at least in SQL Server 2008.
-- Remove all null characters
CREATE FUNCTION RemoveNulls(#s nvarchar(max))
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #r nvarchar(max);
SET #r = REPLACE(#s COLLATE Latin1_General_BIN, NCHAR(0), N'');
RETURN #r;
END
-- Remove all characters from the first null character
CREATE FUNCTION TrimNull(#s nvarchar(max))
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #r nvarchar(max);
DECLARE #i int = CHARINDEX(NCHAR(0), #s COLLATE Latin1_General_BIN);
IF #i = 0
SET #r = #s;
ELSE
SET #r = SUBSTRING(#s, 1, #i - 1);
RETURN #r;
END
-- Example usage
DECLARE #s nvarchar(10) = N'Test' + NCHAR(0) + N'!';
SELECT dbo.RemoveNulls(#s), dbo.TrimNull(#s);
--> Test!, Test
In my case, fields from ODBC were padded to 8000 characters with null and TrimNull was much faster than RemoveNulls.
For latin characters:
select REPLACE('Ho'+CHAR(0)+'mer' COLLATE SQL_Latin1_General_CP1_CS_AS, CHAR(0), '')
For russian characters:
select REPLACE(('Го'+CHAR(0)+'мер') COLLATE Cyrillic_General_BIN , CHAR(0), '')
If you Only have ASCII (Char/VarChar) strings then this will work as #DyingCactus suggests:
REPLACE(myString, Char(0x00), '')
However, if you are dealing with Null-Terminated Strings and are trying to fix or convert to something like XML, and your data is Unicode (nChar/nVarChar), then use this:
(CASE WHEN UNICODE(SUBSTRING(myString, LEN(myString), 1)) = 0x0000
THEN SUBSTRING(myString, 1, LEN(myString) - 1)
ELSE myString END)
This works for both ASCII (Char/VarChar) and Unicode (nChar/nVarChar).
Note
Using the Replace() function with Char(0) or nChar(0) will NOT work for Unicode (nChar/nVarChar).
It's a bug in the SQL Server Replace() function.
You could cast as VarChar, then use Replace(), but then you would lose any special Unicode/Non-ASCII characters you might have intended to keep.
Otherwise you wouldn't have used the Unicode datatype (that takes up twice as much space to store your data) in the first place.
If you have Null-Characters mixed in with your Unicode strings (and not only at the end), and, for the purposes of your query, maintaining Unicode-specific characters are unimportant, then as a last resort you could use this :
(CASE WHEN myString LIKE (N'%' + nCHAR(0x0000) + N'%')--Has Null-Character(s).
THEN REPLACE(CAST(myString as VarChar(MAX)), Char(0x00), '')--Cast as ASCII
ELSE myString END)--Else, leave as Unicode to preserve Unicode-Only chars.
I'm not completely sure what is wrong with your strings, but here are some things to try, are you using varchar?, edit question with more details:
if you have NULL characters within a string:
declare #x varchar(10)
set #x='123'+char(0)+'456'
SELECT #x AS Has_NULL_in_it, REPLACE(#x, char(0), '') AS Has_NULL_removed
OUTPUT:
Has_NULL_in_it Has_NULL_removed
-------------- ----------------
123 456 123456
(1 row(s) affected)
If you can't tell the character within the string, try this ASCII:
DECLARE #y varchar(10),#c int
set #y='123'+char(0)+'456'
set #c=0
WHILE #c<LEN(#y)
BEGIN
SET #c=#c+1
PRINT CONVERT(varchar(5),#c)+' - '+SUBSTRING(#y,#c,1)+' - CHAR('+CONVERT(varchar(5),ASCII(SUBSTRING(#y,#c,1)))+')'
END
OUTPUT:
1 - 1 - CHAR(49)
2 - 2 - CHAR(50)
3 - 3 - CHAR(51)
4 - - CHAR(0)
5 - 4 - CHAR(52)
6 - 5 - CHAR(53)
7 - 6 - CHAR(54)
try this unicode:
DECLARE #y nvarchar(10),#c int
set #y='123'+char(0)+'456'
set #c=0
WHILE #c<LEN(#y)
BEGIN
SET #c=#c+1
PRINT CONVERT(nvarchar(5),#c)+' - '+SUBSTRING(#y,#c,1)+' - UNICODE('+CONVERT(nvarchar(5),UNICODE(SUBSTRING(#y,#c,1)))+')'
END
if your have strings that are completely NULL:
declare #z varchar(10)
set #z=NULL
select #z AS IS_NULL, ISNULL(#Z,'') AS NULL_Removed
OUTPUT:
IS_NULL NULL_Removed
---------- ------------
NULL
(1 row(s) affected)
If you are concatenating values to get your string use IsNull(value, replacement) to avoid having null values or set CONCAT_NULL_YIELDS_NULL ON to avoid null strings as a result.
We had the same problem: Ending \0 character in nvarchar fields and unable to replace it with any of the REPLACE variants proposed (SQL Server 2008). When using
LEFT(Bar, LEN(Bar)-1)
it cut off the last regular character together with the \0 !
Our solution now to correct the fields is (as weird as it may seem on a first glimpse):
UPDATE Foo
SET Bar = LEFT(Bar, LEN(Bar))
WHERE RIGHT(Bar, 1) = CHAR(0)
Examples resolved
CREATE FUNCTION dbo.F_ReplaceNullChar( #STR NVARCHAR(MAX) )
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #i INT=0
DECLARE #RET NVARCHAR(MAX)=''
WHILE #I<LEN(#STR)
BEGIN
SET #i=#i+1
IF UNICODE(SUBSTRING(#STR,#i,1)) <> 0x0000
SET #RET=#RET+SUBSTRING(#STR,#i,1)
END
RETURN #RET
END
GO
SELECT LEN(mycol) lenbefore,mycol,
LEN( dbo.F_ReplaceNullChar(mycol)) lenafter, dbo.F_ReplaceNullChar(mycol) mycolafter
FROM mytab
select zz.xx
, replace(zz.xx, '', '')
from (
select
t.string_with_null,
(
select s.string_with_null+''
from TABLE_1 s
where s.token_hash = t.token_hash
for xml path('')
) xx
from TABLE_1 t(nolock)
)zz