Extract email address from string using tsql

Extract email address from string using tsql - sql

I'm trying to extract email addresses from an existing comments field and put it into its own column. The string may be something like this "this is an example comment with an email address of someemail#domain.org" or just literally the email itself "someemail#domain.org".
I figure the best thing to do would be to find the index of the '#' symbol and search in both directions until either the end of the string was hit or there was a space. Can anyone help me out with this implementation?

I know wewesthemenace already answered the question, but his/her solution seems over complicated. Why concatenate the left and right sides of the email address together? I'd rather just find the beginning and the end of the email address and then use substring to return the email address like so:
My Table
DECLARE #Table TABLE (comment NVARCHAR(50));
INSERT INTO #Table
VALUES ('blah MyEmailAddress#domain.org'), --At the end
('blah MyEmailAddress#domain.org blah blah'), --In the middle
('MyEmailAddress#domain.org blah'), --At the beginning
('no email');
Actual Query:
SELECT comment,
CASE
WHEN CHARINDEX('#',comment) = 0 THEN NULL
ELSE SUBSTRING(comment,beginningOfEmail,endOfEmail-beginningOfEmail)
END email
FROM #Table
CROSS APPLY (SELECT CHARINDEX(' ',comment + ' ',CHARINDEX('#',comment))) AS A(endOfEmail)
CROSS APPLY (SELECT DATALENGTH(comment)/2 - CHARINDEX(' ',REVERSE(' ' + comment),CHARINDEX('#',REVERSE(' ' + comment))) + 2) AS B(beginningOfEmail)
Results:
comment email
-------------------------------------------------- --------------------------------------------------
blah MyEmailAddress#domain.org MyEmailAddress#domain.org
blah MyEmailAddress#domain.org blah blah MyEmailAddress#domain.org
MyEmailAddress#domain.org blah MyEmailAddress#domain.org
no email NULL

You can search for '#' in the string. Then you get the string at the LEFT and RIGHT side of '#'. You then want to REVERSE the LEFT side and get first occurrence of ' ' then get the SUBSTRING from there. Then REVERSE it to get the original form. Same principle apply to the RIGHT side without doing REVERSE.
Example string: 'some text someemail#domain.org some text'
LEFT = 'some text someemail'
RIGHT = '#domain.org some text'
Reverse LEFT = 'liameemos txet emos'
SUBSTRING up to the first space = 'liameemos'
REVERSE(4) = someemail
SUBSTRING (2) up to the first space = '#domain.org'
Combine 5 and 6 = 'someemail#domain.org'
Your query would be:
;WITH CteEmail(email) AS(
SELECT 'someemail#domain.org' UNION ALL
SELECT 'some text someemail#domain.org some text' UNION ALL
SELECT 'no email'
)
,CteStrings AS(
SELECT
[Left] = LEFT(email, CHARINDEX('#', email, 0) - 1),
Reverse_Left = REVERSE(LEFT(email, CHARINDEX('#', email, 0) - 1)),
[Right] = RIGHT(email, CHARINDEX('#', email, 0) + 1)
FROM CteEmail
WHERE email LIKE '%#%'
)
SELECT *,
REVERSE(
SUBSTRING(Reverse_Left, 0,
CASE
WHEN CHARINDEX(' ', Reverse_Left, 0) = 0 THEN LEN(Reverse_Left) + 1
ELSE CHARINDEX(' ', Reverse_Left, 0)
END
)
)
+
SUBSTRING([Right], 0,
CASE
WHEN CHARINDEX(' ', [Right], 0) = 0 THEN LEN([Right]) + 1
ELSE CHARINDEX(' ', [Right], 0)
END
)
FROM CteStrings
Sample Data:
email
----------------------------------------
someemail#domain.org
some text someemail#domain.org some text
no email
Result
---------------------
someemail#domain.org
someemail#domain.org

Stephan's answer is great when looking for a single email address in each row.
However, I was running into this error when trying to get multiple email addresses in each row:
Invalid length parameter passed to the LEFT or SUBSTRING function
I used this answer from DBA Stack Exchange to get all of the positions of # inside the string. It entails a table-valued function that returns the number of positions equal to the number a certain pattern inside the string. I also had to modify the CROSS APPLY functions to handle multiple email addresses as well.
My Table:
DECLARE #Table TABLE (comment VARCHAR(500));
INSERT INTO #Table (comment)
VALUES ('blah blah My.EmailAddress#domain.org more blah someemailaddress#domain.com even more blah asdf#gmail.com'),
('blah hello.world#domain.org more'),
('no email')
Table-valued Function:
CREATE FUNCTION dbo.fnFindPatternLocation
(
#string NVARCHAR(MAX),
#term NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN
(
SELECT pos = Number - LEN(#term)
FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#string, Number,
CHARINDEX(#term, #string + #term, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects) AS n(Number)
WHERE Number > 1 AND Number <= CONVERT(INT, LEN(#string))
AND SUBSTRING(#term + #string, Number, LEN(#term)) = #term
) AS y);
GO
Query:
SELECT comment, pos, SUBSTRING(comment,beginningOfEmail,endOfEmail-beginningOfEmail) AS email
FROM #Table
CROSS APPLY (SELECT pos FROM dbo.fnFindPatternLocation(comment, '#')) AS A(pos)
CROSS APPLY (SELECT CHARINDEX(' ',comment + ' ', pos)) AS B(endOfEmail)
CROSS APPLY (SELECT pos - CHARINDEX(' ', REVERSE(SUBSTRING(comment, 1, pos))) + 2) AS C(beginningOfEmail)
Results:
comment
---------------------------------------------------------------------------------------------------------
blah blah My.EmailAddress#domain.org more blah someemailaddress#domain.com even more blah asdf#gmail.com
blah blah My.EmailAddress#domain.org more blah someemailaddress#domain.com even more blah asdf#gmail.com
blah blah My.EmailAddress#domain.org more blah someemailaddress#domain.com even more blah asdf#gmail.com
blah hello.world#domain.org more
pos email
--- ------------------------------
26 My.EmailAddress#domain.org
64 someemailaddress#domain.com
95 asdf#gmail.com
17 hello.world#domain.org

DECLARE #t TABLE (row_id INT, email VARCHAR(100))
INSERT #t (row_id, email)
VALUES (1, 'drgkls<ivan#gvi.ru>, info#gvi.com, # dgh507-16-65#'),
(2, 'hjshfkjshfj#kjs.kjsehf herwfjewr#kjsd.com adjfhja#.com u3483dhj#hhb#.dfj'),
(3, 'kjsdghfjs4254.23detygh#jhjdfg.dgb лдоврывплдоо isgfsi# klsdfksdl#,dd.com')
DECLARE #pat VARCHAR(100) = '%[^a-z0-9#._ ]%';
WITH f AS (
SELECT row_id,
CAST(' ' + email + ' ' AS VARCHAR(102)) email,
SUBSTRING(email, PATINDEX(#pat, email), 1) bad,
PATINDEX(#pat, email) pat
FROM #t
UNION ALL
SELECT row_id,
CAST(REPLACE(email, bad, ' ') AS VARCHAR(102)),
SUBSTRING(REPLACE(email, bad, ' '), PATINDEX(#pat, REPLACE(email, bad, ' ')), 1) bad,
PATINDEX(#pat, REPLACE(email, bad, ' '))
FROM f
WHERE PATINDEX(#pat, email) > 0
),
s AS
(
SELECT row_id,
email, PATINDEX('%#%', email) pos
FROM f
WHERE pat = 0
AND PATINDEX('%#%', email) > 0
UNION ALL
SELECT row_id,
SUBSTRING(email, pos + 1, 102),
PATINDEX('%#%', SUBSTRING(email, pos + 1, 102))
FROM s
WHERE PATINDEX('%#%', SUBSTRING(email, pos + 1, 102)) > 0
)
SELECT row_id, o1 + pp
FROM s
CROSS APPLY (SELECT REVERSE(LEFT(email, pos -1)) s1) x
CROSS APPLY (SELECT CHARINDEX(' ', s1) i1) y
CROSS APPLY (SELECT REVERSE(LEFT(s1, i1 -1)) o1 WHERE i1 > 0) z
CROSS APPLY (SELECT CHARINDEX(' ', email, pos) i2) e
CROSS APPLY (SELECT SUBSTRING(email, pos, i2 -pos) pp WHERE i2 > pos + 1) q
WHERE LEN(o1) > 1
AND CHARINDEX('.', pp) > 0
AND PATINDEX('%#%#%', pp) = 0
AND PATINDEX('%#.%', pp) = 0
AND PATINDEX('%.', pp) = 0

This one line would also work (a bit long line though lol):
--declare #a varchar(100)
--set #a = 'a asfd saasd asdfgh#asd.com wqe z zx cxzc '
select substring(substring(#a,0,charindex('#',#a)),len(substring(#a,0,charindex('#',#a)))-charindex(' ',reverse(substring(#a,0,charindex('#',#a))))+2,len(substring(#a,0,charindex('#',#a)))) + substring(substring(#a,charindex('#',#a),len(#a)),0,charindex(' ',substring(#a,charindex('#',#a),len(#a))))

For strings that contain new line characters I modified Felix's answer using PATINDEX to search for the first control character rather than white space.
I also had to modify the Right field to subtract the correct amount of text.
WITH CteEmail(email) AS(
SELECT 'example string with new lines
Email: some.example#email.address.com
(first email address - should be returned)
Email: another#test.co.uk
(other email addresses should be ignored
more example text' UNION ALL
SELECT 'Email: some.example#email.address.com' UNION ALL
SELECT 'someemail#domain.org' UNION ALL
SELECT 'some text someemail#domain.org some text' UNION ALL
SELECT 'no email'
)
,CteStrings AS(
SELECT
[Left] = LEFT(email, CHARINDEX('#', email, 0) - 1),
Reverse_Left = REVERSE(LEFT(email, CHARINDEX('#', email, 0) - 1)),
[Right] = RIGHT(email, LEN(email) - CHARINDEX('#', email, 0) + 1 )
FROM CteEmail
WHERE email LIKE '%#%'
)
SELECT *,
REVERSE(
SUBSTRING(Reverse_Left, 0,
CASE
WHEN PATINDEX('%[' + CHAR(10)+'- ]%', Reverse_Left) = 0 THEN LEN(Reverse_Left) + 1
ELSE PATINDEX('%[' + CHAR(0)+'- ]%', Reverse_Left)
END
)
)
+
SUBSTRING([Right], 0,
CASE
WHEN PATINDEX('%[' + CHAR(0)+'- ]%', [Right]) = 0 THEN LEN([Right]) + 1
ELSE PATINDEX('%[' + CHAR(0)+'- ]%', [Right])
END
)
FROM CteStrings

If you need it in a function then this works for me...
CREATE FUNCTION [dbo].[extractEmail]
(
#input nvarchar(500)
)
RETURNS nvarchar(100)
AS
BEGIN
DECLARE #atPosition int
DECLARE #firstRelevantSpace int
DECLARE #name nvarchar(100)
DECLARE #secondRelelvantSpace int
DECLARE #everythingAfterAt nvarchar(500)
DECLARE #domain nvarchar(100)
DECLARE #email nvarchar(100) = ''
IF CHARINDEX('#', #input,0) > 0
BEGIN
SET #input = ' ' + #input
SET #atPosition = CHARINDEX('#', #input, 0)
SET #firstRelevantSpace = CHARINDEX(' ',REVERSE(LEFT(#input, CHARINDEX('#', #input, 0) - 1)))
SET #name = REVERSE(LEFT(REVERSE(LEFT(#input, #atPosition - 1)),#firstRelevantSpace-1))
SET #everythingAfterAt = SUBSTRING(#input, #atPosition,len(#input)-#atPosition+1)
SET #secondRelelvantSpace = CHARINDEX(' ',#everythingAfterAt)
IF #secondRelelvantSpace = 0
SET #domain = #everythingAfterAt
ELSE
SET #domain = LEFT(#everythingAfterAt, #secondRelelvantSpace)
SET #email = #name + #domain
END
RETURN #email
END

Using Cymorg's Function: I ran into an issue where my data included CR/LF and it prevented the Function from working 100%. It was tough to figure out because, when using the function in a select statement, it would return occasionally incorrect results. If I copied the offending text from my query results and invoked the function using sql print with the text in quotes it would work fine. Inconceivable!
After much trial and error, I used sql replace to replace the CR/LF with spaces and huzza! I am an excellent guesser.
select extractEmail(replace(replace(MyColumn,CHAR(10),' '),CHAR(13),' ')) as AsYouWish from FacilityContacts

Related

Select only characters in SQL

I have strings in a database like this:
firstname.lastname#email.com
And I only need the characters that appear after the # symbol and before (.) symbol i.e. (email) from the above example
I am trying to find a simple way to do this in SQL.

Do this:
use [your_db_name];
go
create table dbo.test
(
string varchar(max) null
)
insert into dbo.test values ('firstname.lastname#email.com')
select
string,
substring(
string,
charindex('#', string, 0) + 1,
charindex('.', string, charindex('#', string, 0)) - charindex('#', string, 0) - 1
) as you_need
from dbo.test

String manipulations are such a pain in SQL Server. Here is one method:
select t.*,
left(en.emailname, charindex('.', en.emailname + '.') - 1)
from t outer apply
(select stuff(email, 1, charindex('#', email + '#'), '') as emailname) en;
That that in the charindex() calls, the character being searched for is placed at the end of the string. This allows the code to work even for malformed emails -- it returns an empty string when the email is not of the form '%#%.%'.

DECLARE #col char(200)
set #col = 'firstname.lastname#email.com'
SELECT SUBSTRING(#col, LEN(LEFT(#col, CHARINDEX ('#', #col))) + 1, LEN(#col) - LEN(LEFT(#col, CHARINDEX ('#', #col))) - LEN(RIGHT(#col, LEN(#col) - CHARINDEX ('.', #col))) - 4);

DECLARE #str varchar(50) = 'firstname.lastname#email.com';
SELECT LEFT(
RIGHT(#str, LEN(#str) - CHARINDEX('#', #str))
,CHARINDEX('.', RIGHT(#str, LEN(#str) - CHARINDEX('#', #str))
) - 1) AS OUTPUT
Above query gives only domain-name from Email. The query can be applied for column in a table

Try This:-
DECLARE #Text varchar(100)
SET #Text = 'firstname.lastname#email.com'
SELECT SUBSTRING(STUFF(#Text, 1, CHARINDEX('#',#Text), ''), 0,
CHARINDEX('.', STUFF(#Text, 1, CHARINDEX('#',#Text), '')))
Result:-
email

DECLARE #myStr varchar(100) = 'firstname.lastname#email.com'
SELECT
SUBSTRING(SUBSTRING(#myStr,CHARINDEX('#',#myStr)+1,LEN(#myStr)-CHARINDEX('#',#myStr)+1),0,CHARINDEX('.',SUBSTRING(#myStr,CHARINDEX('#',#myStr)+1,LEN(#myStr)-CHARINDEX('#',#myStr)+1)))
That can be useful but I really recommend you to build user defined function in C#/Visaul basic they could be much more faster that this.

Using charindex, len and reverse to search for the positions of the # and the last dot.
And substring to get the name based on those positions:
create table test (id int identity(1,1), email varchar(60));
insert into test (email) values
('jane.doe#email.com'),
('not an email'),
('#invalid.email.xxx'),
('john.doe#longer.domainname.net');
select *,
(case
when email like '[a-z]%#%.%'
then substring(email,
charindex('#',email)+1,
len(email) - charindex('#',email) - charindex('.',reverse(email))
)
end) as email_domain_without_extension
from test;
The CASE WHEN is used to return NULL when it's not an email (instead of an empty string).

How to iterate over a string in one line in SQL

I am writing a query that roughly has this structure:
SELECT Name, <calculated-valued> as Version FROM <tables>
This calculated value needs to work like so: I have a varchar column 'Name' that could contain something like 'ABC' and I want to convert each letter into ASCII, and append them back together to form '65.66.67' in this example. (An empty string should return '0') Is there any way to do this?
My approach wasn't very good, but up to 5 characters I could do the following:
SELECT
CASE WHEN LEN(Name) = 0 THEN '0'
ELSE CAST(ASCII(SUBSTRING(Name, 1, 1)) as varchar(max)) +
CASE WHEN LEN(Name) = 1 THEN ''
ELSE '.' + CAST(ASCII(SUBSTRING(Name, 2, 1)) as varchar(max)) +
CASE WHEN LEN(Name) = 2 THEN ''
ELSE '.' + CAST(ASCII(SUBSTRING(Name, 3, 1)) as varchar(max)) +
CASE WHEN LEN(Name) = 3 THEN ''
ELSE '.' + CAST(ASCII(SUBSTRING(Name, 4, 1)) as varchar(max)) +
CASE WHEN LEN(Name) = 4 THEN ''
ELSE '.' + CAST(ASCII(SUBSTRING(Name, 5, 1)) as varchar(max))
END
END
END
END
END AS MyColumn
FROM <tables>
Is there a better way to do this? Ideally a method that can take any length of string?
Either that or can I cast letters into a hierarchyid datatype? I need to create things like 1/2/a/bc/4// or whatever, but hierarchyid doesn't support that. So instead I'm trying to convert it to 1/2/97/98.99/4/0 so I can convert and maintain the correct order. This column is only used for sorting.
Thanks for any help!

One method is a recursive CTE:
with cte as (
select Name, 1 as lev
cast(ascii(substring(name, 1, 1)) as varchar(max)) as ascii_name
from t
union all
select Name, lev + 1,
ascii_name + '.' + cast(ascii(substring(name, lev + 1, 1)) as varchar(max))
from cte
where len(Name) > lev
)
select Name, ascii_name
from cte;

Another option is with an ad-hoc tally table and a CROSS APPLY
Declare #YourTable table (Name varchar(25))
Insert Into #YourTable values
('ABC'),
('Jack'),
('Jill'),
('')
Select A.Name
,Version = isnull(B.String,'0')
From #YourTable A
Cross Apply (
Select String=Stuff((Select '.' +cast(S as varchar(5))
From (Select Top (len(A.Name))
S=ASCII(substring(A.Name,Row_Number() Over (Order By (Select NULL)),1))
From master..spt_values ) S
For XML Path ('')),1,1,'')
) B
Returns
Name String
ABC 65.66.67
Jack 74.97.99.107
Jill 74.105.108.108
0

T-SQL split string based on delimiter

I have some data that I would like to split based on a delimiter that may or may not exist.
Example data:
John/Smith
Jane/Doe
Steve
Bob/Johnson
I am using the following code to split this data into First and Last names:
SELECT SUBSTRING(myColumn, 1, CHARINDEX('/', myColumn)-1) AS FirstName,
SUBSTRING(myColumn, CHARINDEX('/', myColumn) + 1, 1000) AS LastName
FROM MyTable
The results I would like:
FirstName---LastName
John--------Smith
Jane--------Doe
Steve-------NULL
Bob---------Johnson
This code works just fine as long as all the rows have the anticipated delimiter, but errors out when a row does not:
"Invalid length parameter passed to the LEFT or SUBSTRING function."
How can I re-write this to work properly?

May be this will help you.
SELECT SUBSTRING(myColumn, 1, CASE CHARINDEX('/', myColumn)
WHEN 0
THEN LEN(myColumn)
ELSE CHARINDEX('/', myColumn) - 1
END) AS FirstName
,SUBSTRING(myColumn, CASE CHARINDEX('/', myColumn)
WHEN 0
THEN LEN(myColumn) + 1
ELSE CHARINDEX('/', myColumn) + 1
END, 1000) AS LastName
FROM MyTable

For those looking for answers for SQL Server 2016+. Use the built-in STRING_SPLIT function
Eg:
DECLARE #tags NVARCHAR(400) = 'clothing,road,,touring,bike'
SELECT value
FROM STRING_SPLIT(#tags, ',')
WHERE RTRIM(value) <> '';
Reference: https://msdn.microsoft.com/en-nz/library/mt684588.aspx

Try filtering out the rows that contain strings with the delimiter and work on those only like:
SELECT SUBSTRING(myColumn, 1, CHARINDEX('/', myColumn)-1) AS FirstName,
SUBSTRING(myColumn, CHARINDEX('/', myColumn) + 1, 1000) AS LastName
FROM MyTable
WHERE CHARINDEX('/', myColumn) > 0
Or
SELECT SUBSTRING(myColumn, 1, CHARINDEX('/', myColumn)-1) AS FirstName,
SUBSTRING(myColumn, CHARINDEX('/', myColumn) + 1, 1000) AS LastName
FROM MyTable
WHERE myColumn LIKE '%/%'

SELECT CASE
WHEN CHARINDEX('/', myColumn, 0) = 0
THEN myColumn
ELSE LEFT(myColumn, CHARINDEX('/', myColumn, 0)-1)
END AS FirstName
,CASE
WHEN CHARINDEX('/', myColumn, 0) = 0
THEN ''
ELSE RIGHT(myColumn, CHARINDEX('/', REVERSE(myColumn), 0)-1)
END AS LastName
FROM MyTable

ALTER FUNCTION [dbo].[split_string](
#delimited NVARCHAR(MAX),
#delimiter NVARCHAR(100)
) RETURNS #t TABLE (id INT IDENTITY(1,1), val NVARCHAR(MAX))
AS
BEGIN
DECLARE #xml XML
SET #xml = N'<t>' + REPLACE(#delimited,#delimiter,'</t><t>') + '</t>'
INSERT INTO #t(val)
SELECT r.value('.','varchar(MAX)') as item
FROM #xml.nodes('/t') as records(r)
RETURN
END

I just wanted to give an alternative way to split a string with multiple delimiters, in case you are using a SQL Server version under 2016.
The general idea is to split out all of the characters in the string, determine the position of the delimiters, then obtain substrings relative to the delimiters. Here is a sample:
-- Sample data
DECLARE #testTable TABLE (
TestString VARCHAR(50)
)
INSERT INTO #testTable VALUES
('Teststring,1,2,3')
,('Test')
DECLARE #delimiter VARCHAR(1) = ','
-- Generate numbers with which we can enumerate
;WITH Numbers AS (
SELECT 1 AS N
UNION ALL
SELECT N + 1
FROM Numbers
WHERE N < 255
),
-- Enumerate letters in the string and select only the delimiters
Letters AS (
SELECT n.N
, SUBSTRING(t.TestString, n.N, 1) AS Letter
, t.TestString
, ROW_NUMBER() OVER ( PARTITION BY t.TestString
ORDER BY n.N
) AS Delimiter_Number
FROM Numbers n
INNER JOIN #testTable t
ON n <= LEN(t.TestString)
WHERE SUBSTRING(t.TestString, n, 1) = #delimiter
UNION
-- Include 0th position to "delimit" the start of the string
SELECT 0
, NULL
, t.TestString
, 0
FROM #testTable t
)
-- Obtain substrings based on delimiter positions
SELECT t.TestString
, ds.Delimiter_Number + 1 AS Position
, SUBSTRING(t.TestString, ds.N + 1, ISNULL(de.N, LEN(t.TestString) + 1) - ds.N - 1) AS Delimited_Substring
FROM #testTable t
LEFT JOIN Letters ds
ON t.TestString = ds.TestString
LEFT JOIN Letters de
ON t.TestString = de.TestString
AND ds.Delimiter_Number + 1 = de.Delimiter_Number
OPTION (MAXRECURSION 0)

The examples above work fine when there is only one delimiter, but it doesn't scale well for multiple delimiters. Note that this will only work for SQL Server 2016 and above.
/*Some Sample Data*/
DECLARE #mytable TABLE ([id] VARCHAR(10), [name] VARCHAR(1000));
INSERT INTO #mytable
VALUES ('1','John/Smith'),('2','Jane/Doe'), ('3','Steve'), ('4','Bob/Johnson')
/*Split based on delimeter*/
SELECT P.id, [1] 'FirstName', [2] 'LastName', [3] 'Col3', [4] 'Col4'
FROM(
SELECT A.id, X1.VALUE, ROW_NUMBER() OVER (PARTITION BY A.id ORDER BY A.id) RN
FROM #mytable A
CROSS APPLY STRING_SPLIT(A.name, '/') X1
) A
PIVOT (MAX(A.[VALUE]) FOR A.RN IN ([1],[2],[3],[4],[5])) P

These all helped me get to this. I am still on 2012 but now have something quick that will allow me to split a string, even if string has varying numbers of delimiters, and grab the nth substring from that string. It's quick too. I know this post is old, but it took me forever to find something so hopefully this will help someone else.
CREATE FUNCTION [dbo].[SplitsByIndex]
(#separator VARCHAR(20) = ' ',
#string VARCHAR(MAX),
#position INT
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #results TABLE
(id INT IDENTITY(1, 1),
chrs VARCHAR(8000)
);
DECLARE #outResult VARCHAR(8000);
WITH X(N)
AS (SELECT 'Table1'
FROM(VALUES(0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) T(C)),
Y(N)
AS (SELECT 'Table2'
FROM X A1,
X A2,
X A3,
X A4,
X A5,
X A6,
X A7,
X A8), -- Up to 16^8 = 4 billion
T(N)
AS (SELECT TOP (ISNULL(LEN(#string), 0)) ROW_NUMBER() OVER(
ORDER BY
(
SELECT NULL
)) - 1 N
FROM Y),
Delim(Pos)
AS (SELECT t.N
FROM T
WHERE(SUBSTRING(#string, t.N, LEN(#separator + 'x') - 1) LIKE #separator
OR t.N = 0)),
Separated(value)
AS (SELECT SUBSTRING(#string, d.Pos + LEN(#separator + 'x') - 1, LEAD(d.Pos, 1, 2147483647) OVER(
ORDER BY
(
SELECT NULL
))-d.Pos - LEN(#separator))
FROM Delim d
WHERE #string IS NOT NULL)
INSERT INTO #results(chrs)
SELECT s.value
FROM Separated s
WHERE s.value <> #separator;
SELECT #outResult =
(
SELECT chrs
FROM #results
WHERE id = #position
);
RETURN #outResult;
END;
This can be used like this:
SELECT [dbo].[SplitsByIndex](' ',fieldname,2)
from tablename

I would protect the substring operation by always appending a delimiter to the test strings. This makes the parsing much simpler. Your code may now rely on finding the right pattern, and not need to cope with special cases.
SELECT SUBSTRING(myColumn + '/', 1, CHARINDEX('/', myColumn)-1) AS FirstName,
SUBSTRING(myColumn + '/', CHARINDEX('/', myColumn) + 1, 1000) AS LastName
FROM MyTable
It eliminates edge cases and conditionals and cases.
Always add an extra delimiter at the end, then the challenge case is no problem.

How to get the nth string in any generic word or sentence with a space delimiter

How do I get the nth word in a sentence or a set of strings with space delimiter?
Sorry for the change in the requirement.Thank you.

By using instr.
select substr(help, 1, instr(help,' ') - 1)
from ( select 'hello my name is...' as help
from dual )
instr(help,' ') returns the positional index of the first occurrence of the second argument in the first, inclusive of the string you're searching for. i.e. the first occurrence of ' ' in the string 'hello my name is...' plus the space.
substr(help, 1, instr(help,' ') - 1) then takes the input string from the first character to the index indicated in instr(.... I then remove one so that the space isn't included..
For the nth occurrence just change this slightly:
instr(help,' ',1,n) is the nth occurrence of ' ' from the first character. You then need to find the positional index of the next index instr(help,' ',1,n + 1), lastly work out the difference between them so you know how far to go in your substr(.... As you're looking for the nth, when n is 1 this breaks down and you have to deal with it, like so:
select substr( help
, decode( n
, 1, 1
, instr(help, ' ', 1, n - 1) + 1
)
, decode( &1
, 1, instr(help, ' ', 1, n ) - 1
, instr(help, ' ', 1, n) - instr(help, ' ', 1, n - 1) - 1
)
)
from ( select 'hello my name is...' as help
from dual )
This will also break down at n. As you can see this is getting ridiculous so you might want to consider using regular expressions
select regexp_substr(help, '[^[:space:]]+', 1, n )
from ( select 'hello my name is...' as help
from dual )

Try this. An example of getting the 4th word:
select names from (
select
regexp_substr('I want my two dollars','[^ ]+', 1, level) as names,
rownum as nth
from dual
connect by regexp_substr('I want my two dollars', '[^ ]+', 1, level) is not null
)
where nth = 4;
The inner query is converting the space-delimited string into a set of rows. The outer query is grabbing the nth item from the set.

Try something like
WITH q AS (SELECT 'ABCD EFGH IJKL' AS A_STRING FROM DUAL)
SELECT SUBSTR(A_STRING, 1, INSTR(A_STRING, ' ')-1)
FROM q
Share and enjoy.
And here's the solution for the revised question:
WITH q AS (SELECT 'ABCD EFGH IJKL' AS A_STRING, 3 AS OCCURRENCE FROM DUAL)
SELECT SUBSTR(A_STRING,
CASE
WHEN OCCURRENCE=1 THEN 1
ELSE INSTR(A_STRING, ' ', 1, OCCURRENCE-1)+1
END,
CASE
WHEN INSTR(A_STRING, ' ', 1, OCCURRENCE) = 0 THEN LENGTH(A_STRING)
ELSE INSTR(A_STRING, ' ', 1, OCCURRENCE) - CASE
WHEN OCCURRENCE=1 THEN 0
ELSE INSTR(A_STRING, ' ', 1, OCCURRENCE-1)
END - 1
END)
FROM q;
Share and enjoy.

CREATE PROC spGetCharactersInAStrings
(
#S VARCHAR(100) = '^1402 WSN NI^AMLAB^tev^e^^rtS htimS 0055518',
#Char VARCHAR(100) = '8'
)
AS
-- exec spGetCharactersInAStrings '^1402 WSN NI^AMLAB^tev^e^^rtS htimS 0055518', '5'
BEGIN
DECLARE #i INT = 1,
#c INT,
#pos INT = 0,
#NewStr VARCHAR(100),
#sql NVARCHAR(100),
#ParmDefinition nvarchar(500) = N'#retvalOUT int OUTPUT'
DECLARE #D TABLE
(
ID INT IDENTITY(1, 1),
String VARCHAR(100),
Position INT
)
SELECT #c = LEN(#S), #NewStr = #S
WHILE #i <= #c
BEGIN
SET #sql = ''
SET #sql = ' SELECT #retvalOUT = CHARINDEX(''' + + #Char + ''',''' + #NewStr + ''')'
EXEC sp_executesql #sql, #ParmDefinition, #retvalOUT=#i OUTPUT;
IF #i > 0
BEGIN
set #pos = #pos + #i
SELECT #NewStr = SUBSTRING(#NewStr, #i + 1, LEN(#S))
--SELECT #NewStr '#NewStr', #Char '#Char', #pos '#pos', #sql '#sql'
--SELECT #NewStr '#NewStr', #pos '#pos'
INSERT INTO #D
SELECT #NewStr, #pos
SET #i = #i + 1
END
ELSE
BREAK
END
SELECT * FROM #D
END

If you're using MySQL and cannot use the instr function that accepts four parameters or regexp_substr, you can do this way:
select substring_index(substring_index(help, ' ', 2), ' ', -1)
from (select 'hello my name is...' as help) h
Result: "my".
Replace "2" in the code above with the number of the word you want.

If you are using SQL Server 2016+ then you can take advantage of the STRING_SPLIT function. It returns rows of string values and if you aim to get nth value, then you can use Row_Number() window function.
Here there is a little trick as you don't want to really order by something so that you have to "cheat" the row_number function and allow its value in the natural order which is the STRING_SPLIT() function will spit out.
Below is a code snippet if you want to find the third word of the string
Declare #_intPart INT = 3; -- change nth work here, start # from 1 not 0
SELECT value FROM(
SELECT value,
ROW_NUMBER()OVER(ORDER BY (SELECT 1)) AS rowno
FROM STRING_SPLIT('hello world this is amazing', ' ')
) AS o1 WHERE o1.rowno = #_intPart;
You can also make a scalar function to retrieve values.

Using PATINDEX to find varying length patterns in T-SQL

I'm looking to pull floats out of some varchars, using PATINDEX() to spot them. I know in each varchar string, I'm only interested in the first float that exists, but they might have different lengths.
e.g.
'some text 456.09 other text'
'even more text 98273.453 la la la'
I would normally match these with a regex
"[0-9]+[.][0-9]+"
However, I can't find an equivalent for the + operator, which PATINDEX accepts. So they would need to be matched (respectively) with:
'[0-9][0-9][0-9].[0-9][0-9]' and '[0-9][0-9][0-9][0-9][0-9].[0-9][0-9][0-9]'
Is there any way to match both of these example varchars with one single valid PATINDEX pattern?

I blogged about this a while ago.
Extracting numbers with SQL server
Declare #Temp Table(Data VarChar(100))
Insert Into #Temp Values('some text 456.09 other text')
Insert Into #Temp Values('even more text 98273.453 la la la')
Insert Into #Temp Values('There are no numbers in this one')
Select Left(
SubString(Data, PatIndex('%[0-9.-]%', Data), 8000),
PatIndex('%[^0-9.-]%', SubString(Data, PatIndex('%[0-9.-]%', Data), 8000) + 'X')-1)
From #Temp

Wildcards.
SELECT PATINDEX('%[0-9]%[0-9].[0-9]%[0-9]%','some text 456.09 other text')
SELECT PATINDEX('%[0-9]%[0-9].[0-9]%[0-9]%','even more text 98273.453 la la la')

Yes you need to link to the clr to get regex support. But if PATINDEX does not do what you need then regex was designed exactly for that.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Should be checked for robustness (what if you only have an int, for example), but this is just to put you on a track:
if exists (select routine_name from information_schema.routines where routine_name = 'GetFirstFloat')
drop function GetFirstFloat
go
create function GetFirstFloat (#string varchar(max))
returns float
as
begin
declare #float varchar(max)
declare #pos int
select #pos = patindex('%[0-9]%', #string)
select #float = ''
while isnumeric(substring(#string, #pos, 1)) = 1
begin
select #float = #float + substring(#string, #pos, 1)
select #pos = #pos + 1
end
return cast(#float as float)
end
go
select dbo.GetFirstFloat('this is a string containing pi 3.14159216 and another non float 3 followed by a new fload 5.41 and that''s it')
select dbo.GetFirstFloat('this is a string with no float')
select dbo.GetFirstFloat('this is another string with an int 3')

Given that the pattern is going to be varied in length, you're not going to have a rough time getting this to work with PATINDEX. There is another post that I wrote, which I've modified to accomplish what you're trying to do here. Will this work for you?
CREATE TABLE #nums (n INT)
DECLARE #i INT
SET #i = 1
WHILE #i < 8000
BEGIN
INSERT #nums VALUES(#i)
SET #i = #i + 1
END
CREATE TABLE #tmp (
id INT IDENTITY(1,1) not null,
words VARCHAR(MAX) null
)
INSERT INTO #tmp
VALUES('I''m looking for a number, regardless of length, even 23.258 long'),('Maybe even pi which roughly 3.14159265358,'),('or possibly something else that isn''t a number')
UPDATE #tmp SET words = REPLACE(words, ',',' ')
;WITH CTE AS (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS rownum, ID, NULLIF(SUBSTRING(' ' + words + ' ' , n , CHARINDEX(' ' , ' ' + words + ' ' , n) - n) , '') AS word
FROM #nums, #tmp
WHERE ID <= LEN(' ' + words + ' ') AND SUBSTRING(' ' + words + ' ' , n - 1, 1) = ' '
AND CHARINDEX(' ' , ' ' + words + ' ' , n) - n > 0),
ids AS (SELECT ID, MIN(rownum) AS rownum FROM CTE WHERE ISNUMERIC(word) = 1 GROUP BY id)
SELECT CTE.rownum, cte.id, cte.word
FROM CTE, ids WHERE cte.id = ids.id AND cte.rownum = ids.rownum
The explanation and origin of the code is covered in more detail in the origional post

PATINDEX is not powerful enough to do that. You should use regular expressions.
SQL Server has Regular expression support since SQL Server 2005.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract email address from string using tsql - sql

Related

Select only characters in SQL

How to iterate over a string in one line in SQL

T-SQL split string based on delimiter

How to get the nth string in any generic word or sentence with a space delimiter

Using PATINDEX to find varying length patterns in T-SQL

Categories

Resources