Split a String based on a specific pattern of characters - sql

Basically, I would like to be able to Split a long text field after each date into unique rows that correspond to the dates. The source field "Notes" is just a long running text field with multiple comments over time with a distinct date ... initially, I tried splitting off the '-' after the date which works to some degree, except where there are dashes elsewhere in the text. So I'm thinking of something where I could split off of each unique instance of a date (mm/dd/yy) ... one issue is the length is not consistent meaning, it could be:
'm/d/yy-' or 'mm/dd/yy-' or 'mm/d/yy-'
Example Data > 'Notes' Column:
3/30/16-Had a meeting 2/5/16-LVM 10/5/15-Spoke to customer
*A single cell could have multiple dates and comments in it
Looking for end result like this:
Date Value
3/30/16 Had a meeting
2/15/16 LVM
10/5/15 Spoke to customer
I am using something basic like the below, but wondering if I can get a little more sophisticated with the STRING_SPLIT
SELECT NOTES, VALUE
FROM SRC_TABLE
CROSS APPLY STRING_SPLIT(NOTES, '-')
Appreciate any insights or ideas!

Hmmm. I think this does what you want:
select t.*, s.*
from src_table t outer apply
(select value as value_date
from string_split(t.notes, '-') s
where s.value like '%/%/%' and s.value not like '%/%/%/%'
) s;
EDIT:
If you just want to split on the first -, you can use:
select left(notes, charindex('-', notes + '-') - 1),
stuff(notes, 1, charindex('-', notes + '-'), '')
from src_table;
Here is a db<>fiddle.

Related

Retrieve text between two periods in a value

I’ve been spinning around a bit on how to accomplish this in SQL DW. I need to extract the text between two periods in a returned value. So my value returned for Result is:
I’m trying to extract the values between period 1 and 2, so the red portion above:
The values will be a wide variety of lengths.
I’ve got this code:
substring(Result,charindex('.',Result)+1,3) as ResultMid
that results in this:
My problem is I’m not sure how to get to a variable length to return so that I can pull the full value between the two periods. Would someone happen to know how I can accomplish this?
Thx,
Joe
We can build on your current attempt:
substring(
result,
charindex('.', result) + 1,
charindex('.', result, charindex('.', result) + 1) - charindex('.', result) - 1
)
Rationale: you alreay have the first two arguments to substring() right. The third argument defines the number of characters to capture. For this, we compute the position of the next dot (.) with expression: charindex('.', result, charindex('.', result) + 1). Then we substract the position of the first dot from that value, which gives us the number of characters that we should capture.
Demo on DB Fiddle:
result | result_mid
:----------------------- | :---------
sam.pdc.sys.paas.l.com | pdc
sm.ridl.sys.paas.m.com | ridl
s.sandbox.sys.paas.g.com | sandbox
If you are dealing with up to 128 characters per delimited part of the string, try parsename as below. Otherwise, GMB has a pretty solid solution up there.
select *, parsename(left(result,charindex('.',result,charindex('.',result)+1)-1),1) as mid
from your_table;
Another method that you can easily modify to extract 3rd, 4th...(hopefully not too remote) part of the string using cross apply.
select result, mid
from your_table t1
cross apply (select charindex('.',result) as i1) t2
cross apply (select charindex('.',result,(i1 + 1)) as i2) t3
cross apply (select substring(result,(i1+1),(i2-i1-1)) as mid) t4;
DEMO

get sub string in between mix symbols

I want to get sub string my output should look like gmail,outlook,Skype.
my string values are
'abc#gmail.com'
'cde.nitish#yahoo.com'
'xyz.vijay#sarvang.com.com'
somthing like this as you can see its having variable length with mix symbol '.' and '#'
string values store in table form as a column name Mail_ID and Table name is tbl_Data
i am using sql server 2012
i use chart index for getting sub string
select SUBSTRING(Mail_ID, CHARINDEX('#',MAil_ID)+1, (CHARINDEX('.',MAil_ID) - (CHARINDEX('#', Mail_ID)+1)))
from tbl_data
And i want my output like:
'gmail'
'yahoo'
'sarvang'
Please help me i am newbies in sql server
This is my solution. I first get the position of the '#', and then get the position of the '.' in the string prior to it (the '#'). Then I can use those results to get the appropriate substring:
SELECT V.YourString,
SUBSTRING(V.YourString,D.I,A.I - D.I) AS StringPart
FROM (VALUES('abc#gmail.com'),
('cde.nitish#yahoo.com'),
('xyz.vijay#sarvang.com.com'))V(YourString)
CROSS APPLY(VALUES(CHARINDEX('#',V.YourString)))A(I) --Get position of # to not repeat logic
CROSS APPLY(VALUES(CHARINDEX('.',LEFT(V.YourString,A.I))+1))D(I) --Get position of . to not repeat logic
Note for value of 'abc.def.steve#... it would return 'def.steve'; however, we don't have such an example so I don't know what the correct return value would be.
I'm posting this as a new answer, a the OP moved the goal posts from the original answer. My initial answer was based on their original question, not their "new" one, and it seems silly to remove an answer that was correct at the time:
SELECT V.YourString,
SUBSTRING(V.YourString,A.I, D.I - A.I) AS StringPart
FROM (VALUES('abc#gmail.com'),
('cde.nitish#yahoo.com'),
('xyz.vijay#sarvang.com.com'))V(YourString)
CROSS APPLY(VALUES(CHARINDEX('#',V.YourString)+1))A(I)
CROSS APPLY(VALUES(CHARINDEX('.',V.YourString,A.I)))D(I);
This answers the original version of the question.
This may be simplest with a case expression to detect if there is a period before the '#':
select (case when email like '%.%#%'
then stuff(left(email, charindex('#', email) - 1), 1, charindex('.', email), '')
else left(email, charindex('#', email) - 1)
end)
from (values ('abc#gmail.com'), ('cde.nitish#yahoo.com'), ('xyz.vijay#sarvang.com.com')) v(email)
I create a temp table with your data and write below query its worked
CREATE TABLE #T
(
DATA NVARCHAR(50)
)
INSERT INTO #T
VALUES('abc#gmail.com'),
('cde.nitish#yahoo.com'),
('xyz.vijay#sarvang.com.com')
SELECT *,LEFT(RIGHT(DATA,LEN(DATA)-CHARINDEX('#',DATA,1)),CHARINDEX('.',RIGHT(DATA,LEN(DATA)-CHARINDEX('#',DATA,1)),1)-1)
FROM #t
AND its a output of my T-SQL
abc#gmail.com gmail
cde.nitish#yahoo.com yahoo
xyz.vijay#sarvang.com.com sarvang

TSQL extract part of string with regex

i would make a script that iterate over the records of a table with a cursor
and extract from a column value formatted like that "yyy://xx/bb/147011"
only the final number 147011and to put this value in a variable.
It's possible to do something like that?
Many thanks.
You don't need a cursor for this. You can just use a query. The following gets everything after the last /:
select right(str, charindex('/', reverse(str)) - 1 )
from (values ('yyy://xx/bb/147011')) v(str)
It does not specifically check if it is a number, but that can be added as well.
You can also use the below query.
SELECT RIGHT(RTRIM('yyy://xx/bb/147011'),
CHARINDEX('/', REVERSE('/' + RTRIM('yyy://xx/bb/147011'))) - 1) AS LastWord
If numeric value has exact position defined with sample data, then you can do :
SELECT t.*, SUBSTRING(t.col, PATINDEX('%[0-9]%', t.col), LEN(t.col))
FROM table t;

How can I extract part of a string with different lengths and insert it into a table?

I uploaded some data from excel sheet to a table in sql , I would like to use part of the string that I inserted into the column PPRName and insert into another table [Verify].
The data in the column when inserted looks like this:
August 2018 [ NW: Construction MTP021 - Building and Civil Construction: Masonry NQF 3 ]
I want to insert this part of the string :
NW: Construction MTP021 - Building and Civil Construction: Masonry NQF 3
into another table [Verify] for every PPR Name in the PPRName column. The names of the PPRs vary in length but all come in same format.
I would also like to extract the August 2018 and cast it as a date and insert into my table [Verify].
I am not sure how to use Charindex and Substrings to achieve this.
i tried this but no data was returned
select SUBSTRING([PPR_Caption],charindex('[',[PPR_Caption]),charindex([PPR_Caption],']'))
FROM [dbo].[PPRS]
You incorrectly use the 2nd CHARINDEX and you incorrectly use the SUBSTRING commands.
SELECT SUBSTRING(PPR_Caption, CHARINDEX("[", PPR_Caption) + 1, CHARINDEX("]", PPR_Caption) - CHARINDEX("[", PPR_Caption) - 1)
FROM PPRS
SUBSTRING uses a start and a lenght, not the start and end point. To get the length use your end point and substract the start point (and correct the 1 position offset with -1).
In your 2nd CHARINDEX you switched the string to search in and the string to look for.
String operations like this are cumbersome in SQL Server.
Try this:
select replace(v2.str_rest, ' ]', '') as name, cast(str_start as date) as dte
from (values ('August 2018 [ NW: Construction MTP021 - Building and Civil Construction: Masonry NQF 3 ]')
) v(str) cross apply
(values (stuff(v.str, 1, charindex('[', str) + 1, ''), substring(v.str, 1, charindex('[', str) -1))
) v2(str_rest, str_start);
SQL Server is pretty good about guessing formats for converting dates, so it will actually convert the date without the day of the month.

Sort varchar datatype with numeric characters

SQL SERVER 2005
SQL Sorting :
Datatype varchar
Should sort by
1.aaaa
5.xx
11.bbbbbb
12
15.
how can i get this sorting order
Wrong
1.aaaa
11.bbbbbb
12
15.
5.xx
On Oracle, this would work.
SELECT
*
FROM
table
ORDER BY
to_number(regexp_substr(COLUMN,'^[0-9]+')),
regexp_substr(column,'\..*');
You could do this by calculating a column based on what's on the left hand side of the period('.').
However this method will be very difficult to make robust enough to use in a production system, unless you can make a lot of assertions about the content of the strings.
Also handling strings without periods could cause some grief
with r as (
select '1.aaaa' as string
union select '5.xx'
union select '11.bbbbbb'
union select '12'
union select '15.' )
select *
from r
order by
CONVERT(int, left(r.string, case when ( CHARINDEX('.', r.string)-1 < 1)
then LEN(r.string)
else CHARINDEX('.', r.string)-1 end )),
r.string
If all the entries have this form, you could split them into two parts and sort be these, for example like this:
ORDER BY
CONVERT(INT, SUBSTRING(fieldname, 1, CHARINDEX('.', fieldname))),
SUBSTRING(fieldname, CHARINDEX('.', fieldname) + 1, LEN(fieldname))
This should do a numeric sort on the part before the . and an alphanumeric sort for the part after the ., but may need some tuning, as I haven't actually tried it.
Another way (and faster) might be to create computed columns that contain the part before the . and after the . and sort by them.
A third way (if you can't create computed columns) could be to create a view over the table that has two additional columns with the respective parts of the field and then do the select on that view.