TSQL - Extract text between two words

TSQL - Extract text between two words - sql

I did find some info on the site but I am unable to make it work correctly. I have a text field [User] that contains USER: John.Smith SessionId: {There is a space after User: and one after the name}
Everything I tried will either remove the first section or the last one, none remove both. Or will give me this message Invalid length parameter passed to the LEFT or SUBSTRING function
I want to have the name John.Smith extracted from that field.
If possible I do not want to declare any tables.
Thanks

Why not use replace()?
select replace(replace(col, 'USER: ', ''), ' SessionId:', '')

If open to a TVF
Example
Select A.ID
,B.*
From YourTable A
Cross Apply [dbo].[tvf-Str-Extract](SomeCol,'USER:','SessionId:') B
Returns
ID RetSeq RetVal
1 1 John.Smith
The Function if Interested
CREATE FUNCTION [dbo].[tvf-Str-Extract] (#String varchar(max),#Delim1 varchar(100),#Delim2 varchar(100))
Returns Table
As
Return (
Select RetSeq = row_number() over (order by RetSeq)
,RetVal = left(RetVal,charindex(#Delim2,RetVal)-1)
From (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (convert(xml,'<x>' + replace((Select replace(#String,#Delim1,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>').query('.'))) as A(XMLData)
Cross Apply XMLData.nodes('x') AS B(i)
) C1
Where charindex(#Delim2,RetVal)>1
)
/*
Declare #String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (#String,'[[',']]')
*/

I got SUBSTRING() to work:
SUBSTRING(USER, 7,(LEN(USER)-7)-(charindex('SessionId',USERID)))
Where:
7 = # of characters in "USERID:"
LEN(User)-7 counts the character length less the 7 from "USERID:"
charindex('SessionId',USERID) gives you the character location where "SessionId" starts

Related

Extract string between two characters in a string

I have a set of strings that has datetime values and I would like to extract them. I am not sure if this is even possible using T-SQL.
CREATE TABLE #Temp (
BLOB_NM VARCHAR(100)
);
INSERT INTO #Temp
SELECT 'products_country_20200528102030.txt'
UNION ALL
SELECT 'products_territory_20190528102030.txt'
UNION ALL
SELECT 'products_country_2020-05-20_20200528102030.txt'
;
Expected Results:
20200528102030
20190528102030
20200528102030

For this dataset, string functions should do it:
select blob_nm, substring(blob_nm, len(blob_nm) - 17, 14) res from #temp
The idea is to count backwards from the end of the string, and capture the 14 characters that preced the extension (represented by the last 4 characters of the string).
Demo on DB Fiddle:
blob_nm | res
:--------------------------------------------- | :-------------
products_country_20200528102030.txt | 20200528102030
products_territory_20190528102030.txt | 20190528102030
products_country_2020-05-20_20200528102030.txt | 20200528102030

If interested in a helper function... I created this TVF because I was tiered of extracting portions of strings (left, right, charindex, reverse, substing, etc)
Example
Select *
From #Temp A
Cross Apply [dbo].[tvf-Str-Extract](Blob_NM,'_','.') B
Returns
BLOB_NM RetSeq RetVal
products_country_20200528102030.txt 1 20200528102030
products_territory_20190528102030.txt 1 20190528102030
products_country_2020-05-20_20200528102030.txt 1 20200528102030
The Function if Interested
CREATE FUNCTION [dbo].[tvf-Str-Extract] (#String varchar(max),#Delim1 varchar(100),#Delim2 varchar(100))
Returns Table
As
Return (
Select RetSeq = row_number() over (order by RetSeq)
,RetVal = left(RetVal,charindex(#Delim2,RetVal)-1)
From (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (convert(xml,'<x>' + replace((Select replace(#String,#Delim1,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>').query('.'))) as A(XMLData)
Cross Apply XMLData.nodes('x') AS B(i)
) C1
Where charindex(#Delim2,RetVal)>1
)

i suppose :
Files extension are not always 3 of characters length
Your Date/Time format are always on 14 characters
Try this :
select
CONVERT(DATETIME, STUFF(STUFF(STUFF(left(right(BLOB_NM, charindex('_', reverse(BLOB_NM) + '_') - 1), 14),13,0,':'),11,0,':'),9,0,' ')) as Result
from #Temp

Split string and fold in T-SQL

Is it possible to split a delimited string and then 'fold' the delimited parts such that the result is a string containing all possible 'paths'? I'm looking to purely use built-in functions if possible without resorting to recursive CTEs, etc.
This is a common functional pattern known as scan/fold. Wondering if T-SQL has a similar pattern.
Example
FOLD('A|B|C|D') = '[A],[A|B],[A|B|C],[A|B|C|D]'
EDIT: The order of the substrings must remain the same in the result. The target SQL version is Azure SQL.

if you have sql-server-2017 you can use STRING_AGG and STRING_SPLIT
declare #text VARCHAR(MAX) ='A|B|C|D'
declare #result VARCHAR(MAX) = ''
SELECT #result = #result + ',[' + STRING_AGG(X.value, '|') + ']' FROM
STRING_SPLIT(#text ,'|') X
INNER JOIN STRING_SPLIT(#text ,'|') Y
ON X.value <= Y.Value
GROUP BY Y.Value
SET #result = STUFF(#result,1,1,'')
print #result
Result:
[A],[A|B],[A|B|C],[A|B|C|D]

As I note in the comments, STRING_SPLIT has a big caveat in the documentation:
The order is not guaranteed to match the order of the substrings in the input string.
As a result, you're safer off using a function that gives you the ordinal position. In this case I use DelimitedSplit8K_LEAD and then assume you are using SQL Server 2017+:
DECLARE #YourString varchar(20) = 'A|B|C|D';
WITH Splits AS(
SELECT DS.ItemNumber,
DS.Item
FROM dbo.DelimitedSplit8K_LEAD(#YourString,'|') DS),
Groups AS(
SELECT S1.ItemNumber,
CONCAT('[',STRING_AGG(S2.Item,'|') WITHIN GROUP (ORDER BY S2.ItemNumber),']') AS Agg
FROM Splits S1
JOIN Splits S2 ON S1.ItemNumber >= S2.ItemNumber
GROUP BY S1.ItemNumber)
SELECT STRING_AGG(Agg,',') WITHIN GROUP (ORDER BY ItemNumber)
FROM Groups;
If you aren't on SQL Server 2017+, you'll need to use the "old" FOR XML PATH (and STUFF) method.
DB<>Fiddle

Just in case you don't want (or can't use) that SUPER DelimitedSplit8K_LEAD, here is an XML approach that will maintain the sequence
Example
Declare #S varchar(max) = 'A|B|C|D'
;with cte as (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(#S,'|','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
), cte1 as (
Select *
,Comb ='['+stuff((select '|' +RetVal From cte Where RetSeq<=A.RetSeq Order By RetSeq For XML Path ('')),1,1,'') +']'
From cte A
Group By RetSeq,RetVal
)
Select NewValue = stuff((select ',' +Comb From cte1 Order By RetSeq For XML Path ('')),1,1,'')
Returns
NewValue
[A],[A|B],[A|B|C],[A|B|C|D]

Starting with v2016 there is JSON, which allows for a position-safe splitter using a JSON array. The path can be built with a recursive CTE:
DECLARE #yourString VARCHAR(MAX) ='A|B|C|D';
WITH cte AS
(
SELECT A.[key] AS ItmIndex
,A.[value] AS ItmVal
FROM OPENJSON(CONCAT('["',REPLACE(#yourString,'|','","'),'"]')) A
)
,rcte AS
(
SELECT ItmIndex, ItmVal
,CAST(ItmVal AS VARCHAR(MAX)) AS Result
FROM cte
WHERE ItmIndex=0
UNION ALL
SELECT cte.ItmIndex, cte.ItmVal
,CAST(CONCAT(rcte.Result,'|',cte.ItmVal) AS VARCHAR(MAX))
FROM cte
INNER JOIN rcte ON cte.ItmIndex=rcte.ItmIndex+1
)
SELECT * FROM rcte;
The idea in short:
The first cte will transform your string into a set with a guaranteed sort order (other than STRING_SPLIT()).
The second cte will start with the array's index 0 and then travers the list adding each item to the growing string.

Select string in between two strings

I need to output a string in between two strings. The problem is sometimes one of the two reference strings will be missing. If the first reference string is not missing and the second reference string is missing, I want to output from the first reference string to the end of the string. If the first reference string is missing, I want to output null or blank.
I saw a similar post but it included the reference strings. In my case, I do not want to include the reference strings.
SELECT SUBSTRING(#Text, CHARINDEX('1stRefStr', #Text)
, CHARINDEX('2ndRefStr',#text) - CHARINDEX('1stRefStr', #Text) + Len('2ndRefStr'))
Example:
Patient: A Date: 1/1/1 Message: Hi Message Sent To: B
1st string reference is "Message:"
2nd string reference is "Message Sent To:"
Expected Result:
Hi

If you don't mind a helper function.
Being a TVF, it is easy to incorporate into a CROSS APPLY if your data is in a table.
I modified a split/parse function to accept two non-like delimeters.
Example
Declare #Text varchar(max) = 'Patient: A Date: 1/1/1 Message: Hi Message Sent To: B'
Select *
From [dbo].[tvf-Str-Extract](#Text,'Message:','Message Sent') A
Returns
RetSeq RetVal
1 Hi
The Function
CREATE FUNCTION [dbo].[tvf-Str-Extract] (#String varchar(max),#Delim1 varchar(100),#Delim2 varchar(100))
Returns Table
As
Return (
Select RetSeq = row_number() over (order by RetSeq)
,RetVal = left(RetVal,charindex(#Delim2,RetVal)-1)
From (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (convert(xml,'<x>' + replace((Select replace(#String,#Delim1,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>').query('.'))) as A(XMLData)
Cross Apply XMLData.nodes('x') AS B(i)
) C1
Where charindex(#Delim2,RetVal)>1
)
Update As A CROSS APPLY
Declare #YourTable table (ID int,SomeCol varchar(max))
Insert Into #YourTable values
(1,'Patient: A Date: 1/1/1 Message: Hi Message Sent To: B')
Select A.ID
,B.*
From #YourTable A
Cross Apply (
Select RetSeq = row_number() over (order by RetSeq)
,RetVal = left(RetVal,charindex('Message Sent',RetVal)-1)
From (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (convert(xml,'<x>' + replace((Select replace(SomeCol,'Message:','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>').query('.'))) as A(XMLData)
Cross Apply XMLData.nodes('x') AS B(i)
) C1
Where charindex('Message Sent',RetVal)>1
) B

Pad zeros for nvarchar column in table

I want to add zero for only single digit value before the dot (.)
When i use
Input:
1.3.45 TU 3
1.2.5 TU 8
Expected Output:
01034503
01020508
Current query:
select REPLACE(
replace(
replace(#Column,'TU','') -- remove TU
,'.','' -- remove dot
)
,' ','') -- remove space
from Table;
Current Output:
13453
1258

If SQL Server, you can use a Split/Parse function to normalize the string
Declare #YourTable Table (YourField varchar(25))
Insert Into #YourTable values
('1.3.45 TU 3'),
('1.2.5 TU 8')
Select A.*
,NewField = B.String
From #YourTable A
Cross Apply (
Select String = ltrim((Select cast(RetVal as varchar(25))
From (Select RetSeq,RetVal=Right('00'+RetVal,2)
From [dbo].[udf-Str-Parse](replace(YourField,' ','.'),'.')
Where Try_Convert(int,RetVal)>=0 ) A
For XML Path ('')))
) B
Returns
YourField NewField
1.3.45 TU 3 01034503
1.2.5 TU 8 01020508
The UDF if needed
CREATE FUNCTION [dbo].[udf-Str-Parse] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>'+ Replace(#String,#Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')

Where are the zeros? You want something like this:
select ('0' + -- initial zero
replace(replace(replace(#Column, 'TU', '' -- remove TU
), '.', '0' -- replace dot with zero
), ' ', ''
) -- remove space
)
from Table;

Line break or Carriage return in a Delimited Field in Sql

I have an email column that stores a minimum of more than 10 emails in a row. Now, I want to write a query that puts each email on a separate line, e.g:
hay#line.com
u#y.com
live.gmail.com
How do write this?

If you mean rows of data... Any Parse/Split function will do if you don't have 2016. Otherwise the REPLACE() as JohnHC mentioned
Declare #YourTable table (ID int,Emails varchar(max))
Insert Into #YourTable values
(1,'hay#line.com,u#y.com,live.gmail.com')
Select A.ID
,EMail=B.RetVal
From #YourTable A
Cross Apply [dbo].[udf-Str-Parse](A.EMails,',') B
Returns
ID EMail
1 hay#line.com
1 u#y.com
1 live.gmail.com
Or Simply
Select * from [dbo].[udf-Str-Parse]('hay#line.com,u#y.com,live.gmail.com',',')
Returns
RetSeq RetVal
1 hay#line.com
2 u#y.com
3 live.gmail.com
The Function if Needed
CREATE FUNCTION [dbo].[udf-Str-Parse] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>'+ Replace(#String,#Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')

Use Replace()
select replace(MyEmailField, '<CurrentDelimeter>', char(13)) as NewEmail
from MyTable

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

TSQL - Extract text between two words - sql

Why not use replace()? select replace(replace(col, 'USER: ', ''), ' SessionId:', '')

I got SUBSTRING() to work: SUBSTRING(USER, 7,(LEN(USER)-7)-(charindex('SessionId',USERID))) Where: 7 = # of characters in "USERID:" LEN(User)-7 counts the character length less the 7 from "USERID:" charindex('SessionId',USERID) gives you the character location where "SessionId" starts

Related

Extract string between two characters in a string

Split string and fold in T-SQL

Select string in between two strings

Pad zeros for nvarchar column in table

Line break or Carriage return in a Delimited Field in Sql

Categories

Resources