Search between the third and forth occurrence in a string - sql

I have a string that contains some text:
Example:
XPTOP XPTOP 4WS00632 BLACK VERNIS
I want to extract only
4WS00632
and ignore everything else.
I did try indexing the spaces, using charindex for searching all the spaces and then a substring to start at x, end at y.
But, no luck. Sometimes it returns "4WS0063" or "P 4WS00632".
This data is not coherent only "XPTOP XPTOP" is somewhat coherent.
For instance "4WS00632" this has 8 digits, but it might have 9 or 12.
So, I really need to catch everything in between the 3rd space and 4th space.

Yet another option is with a bit of JSON
Example or dbFiddle
Select A.SomeCol
,NewVal = JSON_VALUE('["'+replace(SomeCol,' ','","')+'"]','$[2]')
From YourTable A
Results
SomeCol NewVal
XPTOP XPTOP 4WS00632 BLACK VERNIS 4WS00632

Try this technique:
DECLARE #text NVARCHAR(MAX) = 'XPTOP XPTOP 4WS00632 BLACK VERNIS';
DECLARE #text_xml XML = '<a>' + REPLACE(#text, ' ', '</a><a>')+ '</a>';
WITH DataSource (element_pos, element_text) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY T.c)
,T.c.value('(.)[1]', 'nvarchar(128)')
FROM #text_xml.nodes('a') T(c)
)
SELECT element_text
FROM DataSource
WHERE element_pos = 3;
The idea is to split the string by space and order the strings by their position. Then simply get the 3rd one.

One method is string_split() along with some like comparisons:
select t.*, s.value
from t cross apply
(select s.value
from string_split(t.str, ' ') s
where t.str like concat('% % % ', s.value, ' %') and
t.str not like concat('% % % % ', s.value, ' %')
) s

This is one way of using a combination of charindex and stuff.
It could be done in a single statement but I broke it into a couple of logical steps to make it clearer.
create table #test (col1 varchar(100))
insert into #test values ('XPTOP XPTOP 4WS00632 BLACK VERNIS')
with s1(s) as (select Stuff(col1,1,CharIndex(' ',col1),'') from #test),
s2(s) as (select Stuff(s,1, CharIndex(' ',s),'') from s1)
select Left(s, charIndex(' ',s) )
from s2;

Here is a solution using charindex - and substring:
Declare #testTable Table (TestString varchar(100));
Insert Into #testTable (TestString)
Values ('XPTOP XPTOP 4WS00632 BLACK VERNIS')
, ('XYZ XYZ JW2B0037VK3 S17 SENAPE BLACK');
Select tt.TestString
, part1 = ltrim(substring(v.TestString, 1, p01.pos - 2))
, part2 = ltrim(substring(v.TestString, p01.pos, p02.pos - p01.pos - 1))
, part3 = ltrim(substring(v.TestString, p02.pos, p03.pos - p02.pos - 1))
From #testTable As tt
Cross Apply (Values (concat(tt.TestString, replicate(' ', 3)))) As v(TestString)
Cross Apply (Values (charindex(' ', v.TestString, 1) + 1)) As p01(pos)
Cross Apply (Values (charindex(' ', v.TestString, p01.pos) + 1)) As p02(pos)
Cross Apply (Values (charindex(' ', v.TestString, p02.pos) + 1)) As p03(pos)

Related

How do I combine a substring and trim right in SQL

I am trying to extract the data between two underscore characters. In some situations, the 2nd underscore may not exist.
MyFld
P_36840
U_216137
C_203134_H
C_203134_W
I tried this:
substring(i.[MyFld],
CHARINDEX ('_',i.[MyFld])+1,len(i.[MyFld])
-CHARINDEX ('_',i.[MyFld])
) [DerivedPrimaryKey]
And I get this:
DerivedPrimaryKey
36840
216137
203134_H
203134_W
https://dbfiddle.uk/uPKC6oX4
I want to remove the second underscore and data that follows it. I'm trying to combine it with a trim right, but I'm unsure where to start.
How can I do this?
We can start by simplifying what you have so far. I will also add enough to make this a complete query, so we can see it in context for later steps:
SELECT
right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [DerivedPrimaryKey]
FROM I
With this much done, we can now use it as the source for removing the trailing portion of the field:
SELECT
reverse(substring(reverse(step1)
, charindex('_', reverse(step1))+1
, len(step1)
)) [DerivedPrimaryKey]
FROM (
SELECT right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [step1]
FROM I
) T
Notice the layer of nesting. You can, of course, remove the nesting, but it means replicating the entire inner expression every time you see step1 (good thing I took the time to simplify it):
SELECT
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
FROM I
And now back to just the expression:
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
See it work here:
https://dbfiddle.uk/nFO4Vwhm
There is also this alternate expression that saves one function call:
left( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)),
coalesce(
nullif(
charindex('_',
right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld))
) -1, -1,
),
len( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)) )
)
)
Just a two more options. One using parsename() provided your data does not have more than 4 segments. The second using a JSON array
Example
Declare #YourTable Table ([MyFld] varchar(50)) Insert Into #YourTable Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
Select *
,UsingParseName = reverse(parsename(reverse(replace(MyFld,'_','.')),2))
,UsingJSONValue = json_value('["'+replace(MyFld,'_','","')+'"]','$[1]')
From #You
Results
MyFld UsingParseName UsingJSONValue
P_36840 36840 36840
U_216137 216137 216137
C_203134_H 203134 203134
C_203134_W 203134 203134
We can do this:
Declare #testData Table ([MyFld] varchar(50));
Insert Into #testData (MyFld)
Values ('P_36840')
, ('U_216137')
, ('C_203134_H')
, ('C_203134_W');
Select *
, second_element = substring(v.MyFld, p1.pos, p2.pos - p1.pos - 1)
From #testData As td
Cross Apply (Values (concat(td.MyFld, '__'))) As v(MyFld) -- Make sure we have at least 2 delimiters
Cross Apply (Values (charindex('_', v.MyFld, 1) + 1)) As p1(pos) -- First Position
Cross Apply (Values (charindex('_', v.MyFld, p1.pos) + 1)) As p2(pos) -- Second Position
If you actually have a fixed number of characters in the first element, then it could be simplified to:
Select *
, second_element = substring(v.MyFld, 3, charindex('_', v.MyFld, 4) - 3)
From #testData td
Cross Apply (Values (concat(td.MyFld, '_'))) As v(MyFld)
Often I try to fake out SQL if an expected character isn't always present and I don't need the resulting value:
SELECT SUBSTRING(field_Calculated, 1, CHARINDEX('_', field_Calculated) - 1)
FROM (SELECT SUBSTRING(MyFld, CHARINDEX('_', MyFld) + 1, LEN(MyFld)) + '_' As field_Calculated
FROM MyTable) T
I think this is clear, but I really like the ParseName solution #JohnCappalletti suggests.
If it's only ever one numeric value you can use string_split:
SELECT * FROM MyTable
CROSS APPLY string_split(MyFld, '_')
WHERE ISNUMERIC(value) = 1
Either way you have to be careful of the data before deciding the best approach.
your data
Declare #Table Table ([MyFld] varchar(100))
Insert Into #Table
([MyFld] ) Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
use SubString,Left and PatIndex
select
Left(
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
),
PatIndex(
'%[^0-9.-]%',
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
) + 'X'
)-1
) as DerivedPrimaryKey
from
#Table

Best way to pad section of this string with 0s

This is 2 examples of what the string currently look like:
6731-121-1
9552-3-1
This is what I want to pad them to look like
0006731-121-1
0009552-003-1
So I want them to be padded with 7 zeroes before the first '-' then 3 zeroes between the first and second '-'
What would be the best way to accomplish this in SQL SELECT statement.
SELECT RIGHT('0000000'
+ ISNULL(
LEFT(OE.exception_id, CHARINDEX('-', OE.exception_id)
- 1) ,
''
) ,7) + '-'
+ SUBSTRING(OE.exception_id, CHARINDEX('-', ( OE.exception_id )), 10) exception_id
ParseName() could be an option here
Example
Declare #YourTable Table ([YourCol] varchar(50))
Insert Into #YourTable Values
('6731-121-1')
,('9552-3-1')
Select *
,NewVal = right('0000000'+parsename(replace(YourCol,'-','.'),3),7)
+'-'
+right('000'+parsename(replace(YourCol,'-','.'),2),3)
+'-'
+parsename(replace(YourCol,'-','.'),1)
From #YourTable
Returns
YourCol NewVal
6731-121-1 0006731-121-1
9552-3-1 0009552-003-1
In situations with more than 3 periods
Example: '1.2.3.4.5'
Or any value is empty
3 examples: '1..3', '1.2.3.', '.2'
Parsename will return null for all values. You will need to split the column using a different method.
Here is an alternative to parsename:
DECLARE #table table(col varchar(100))
INSERT #table values('6731-121-1'),('9552-3-1')
SELECT
col,
REPLICATE('0', 8-x) + STUFF(col, x+1, 0,REPLICATE('0', 4 - (y-x))) newcol
FROM #table
CROSS APPLY
(SELECT CHARINDEX('-', col) x) x
CROSS APPLY
(SELECT CHARINDEX('-', col + '-', x+1) y) y
col newcol
6731-121-1 0006731-121-1
9552-3-1 0009552-003-1

How to get the middle word using Substring via Charindex of Second Position?

Basically what I am trying to do is that I want to get the middle word, using the second occurrence of the same character (on this case, dash "-").
This is the sample input:
declare #word nvarchar(max)
set #word = 'Technical Materials - Conversion - Team Dashboard'
There are three parts on this sentence, and they are divided by '-' dash line.
The first part is 'Technical Materials' which I am able to get using:
SELECT LTRIM(RTRIM(SUBSTRING(#word, 0, CHARINDEX('-', #word, 0))))
The last set was 'Team Dashboard' which I am able to get using:
SELECT CASE WHEN LEN(#word) - LEN(REPLACE(#word, '-', '')) = 1
THEN NULL
ELSE
RIGHT(#word,CHARINDEX('-', REVERSE(#word))-1)
END
The problem was, I am having a hard time getting the middle words which is 'Conversion' in this example.
If the format is fixed, you can use PARSENAME to achieve your expectation:
DECLARE #Word AS NVARCHAR(MAX) = 'Technical Materials - Conversion - Team Dashboard'
SELECT PARSENAME(REPLACE(#Word, '-', '.'), 2)
if you want to trim the extra spaces, then:
SELECT LTRIM(RTRIM(PARSENAME(REPLACE(#Word, '-', '.'), 2)))
Try this query:
SELECT
SUBSTRING(#word,
CHARINDEX('-', #word) + 2,
CHARINDEX('-', #word, CHARINDEX('-', #word) + 1) -
CHARINDEX('-', #word) - 3)
FROM yourTable
The general strategy here is to use SUBSTRING(), which requires the starting and ending positions of the middle string in question. We can use CHARINDEX to find both the first and second dash in the string. From this, we can compute the positions of the middle substring we want.
Demo here:
Rextester
This will find the text between the first 2 occurrences of '-'
DECLARE #word nvarchar(max)
SET #word = 'Technical Materials - Conversion - Team Dashboard'
SELECT SUBSTRING(x, 0, charindex('-', x))
FROM (values(stuff(#word, 1, charindex('-', #word), ''))) x(x)
This will find the middle element. In case of an even number of elements it will pick the first of the 2 middle elements
DECLARE #word nvarchar(max)
SET #word = 'Technical Materials - Conversion - Team Dashboard'
;WITH CTE(txt, rn, cnt) as
(
SELECT
t.c.value('.', 'VARCHAR(2000)'),
row_number() over (order by (select 1)), count(*) over()
FROM (
SELECT x = CAST('<t>' +
REPLACE(#word, ' - ', '</t><t>') + '</t>' AS XML)
) a
CROSS APPLY x.nodes('/t') t(c)
)
SELECT txt
FROM CTE
WHERE (cnt+1) / 2 = rn

Pad Zero before first hypen and remove spaces and add BA and IN

I have data as below
98-45.3A-22
104-44.0A-23
00983-29.1-22
01757-42.5A-22
04968-37.3A2-23
Output Looking for output as below in SQL Server
00098-BA45.3A-IN-22
00104-BA44.0A-IN-23
00983-BA29.1-IN-22
01757-BA42.5A-IN-22
04968-BA37.3A2-IN-23
I splitted parts to cope with tricky data templates. This should work even with non-dash-2-digit tail:
WITH Src AS
(
SELECT * FROM (VALUES
('98-45.3A-22'),
('104-44.0A-23'),
('00983-29.1-22'),
('01757-42.5A-22'),
('04968-37.3A2-23')
) T(X)
), Parts AS
(
SELECT *,
RIGHT('00000'+SUBSTRING(X, 1, CHARINDEX('-',X, 1)-1),5) Front,
'BA'+SUBSTRING(X, CHARINDEX('-',X, 1)+1, 2) BA,
SUBSTRING(X, PATINDEX('%.%',X), LEN(X)-CHARINDEX('-', REVERSE(X), 1)-PATINDEX('%.%',X)+1) P,
SUBSTRING(X, LEN(X)-CHARINDEX('-', REVERSE(X), 1)+1, LEN(X)) En
FROM Src
)
SELECT Front+'-'+BA+P+'-IN'+En
FROM Parts
It returns:
00098-BA45.3A-IN-22
00104-BA44.0A-IN-23
00983-BA29.1-IN-22
01757-BA42.5A-IN-22
04968-BA37.3A2-IN-23
Try this,
DECLARE #String VARCHAR(100) = '98-45.3A-22'
SELECT ISNULL(REPLICATE('0',6 - CHARINDEX('-',#String)),'') -- Add leading Zeros
+ STUFF(
STUFF(#String,CHARINDEX('-',#String),1,'-BA'), -- Add 'BA'
CHARINDEX('-',#String,CHARINDEX('-',#String)+1)+2, -- 2 additional for the character 'BA'
1,'-IN') -- Add 'IN'
What if I have more than 6 digit number before first hyphen and want to remove the leading zeros to make it 6 digits.
DECLARE #String VARCHAR(100) = '0000098-45.3A-22'
SELECT CASE WHEN CHARINDEX('-',#String) <= 6
THEN ISNULL(REPLICATE('0',6 - CHARINDEX('-',#String)),'') -- Add leading Zeros
+ STUFF(
STUFF( #String,CHARINDEX('-',#String),1,'-BA'), -- Add 'BA'
CHARINDEX('-',#String,CHARINDEX('-',#String)+1)+2, -- 2 additional for the character 'BA'
1,'-IN') -- Add 'IN'
ELSE STUFF(
STUFF(
STUFF(#String,CHARINDEX('-',#String),1,'-BA'), -- Add 'BA'
CHARINDEX('-',#String,CHARINDEX('-',#String)+1)+2, -- 2 additional for the character 'BA'
1,'-IN'), -- Add 'IN'
1, CHARINDEX('-',#String) - 6, '' -- remove extra leading Zeros
)
END
Making assumptions that the format is consistent (e.g. always ends with "-" + 2 characters....)
DECLARE #Data TABLE (Col1 VARCHAR(100))
INSERT #Data ( Col1 )
SELECT Col1
FROM (
VALUES ('98-45.3A-22'), ('104-44.0A-23'),
('00983-29.1-22'), ('01757-42.5A-22'),
('04968-37.3A2-23')
) x (Col1)
SELECT RIGHT('0000' + LEFT(Col1, CHARINDEX('-', Col1) - 1), 5)
+ '-BA' + SUBSTRING(Col1, CHARINDEX('-', Col1) + 1, CHARINDEX('.', Col1) - CHARINDEX('-', Col1))
+ SUBSTRING(Col1, CHARINDEX('.', Col1) + 1, LEN(Col1) - CHARINDEX('.', Col1) - 3)
+ '-IN-' + RIGHT(Col1, 2)
FROM #Data
It's not ideal IMO to do this string manipulation all the time in SQL. You could shift it out to your presentation layer, or store the pre-formatted value in the db to save the cost of this every time.
Use REPLICATE AND CHARINDEX:
Replicate: will repeat given character till reach required count specify in function
CharIndex: Finds the first occurrence of any character
Declare #Data AS VARCHAR(50)='98-45.3A-22'
SELECT REPLICATE('0',6-CHARINDEX('-',#Data)) + #Data
SELECT
SUBSTRING
(
(REPLICATE('0',6-CHARINDEX('-',#Data)) +#Data)
,0
,6
)
+'-'+'BA'+ CAST('<x>' + REPLACE(#Data,'-','</x><x>') + '</x>' AS XML).value('/x[2]','varchar(max)')
+'-'+ 'IN'+ '-' + CAST('<x>' + REPLACE(#Data,'-','</x><x>') + '</x>' AS XML).value('/x[3]','varchar(max)')
In another way by using PARSENAME() you can use this query:
WITH t AS (
SELECT
PARSENAME(REPLACE(REPLACE(s, '.', '###'), '-', '.'), 3) AS p1,
REPLACE(PARSENAME(REPLACE(REPLACE(s, '.', '###'), '-', '.'), 2), '###', '.') AS p2,
PARSENAME(REPLACE(REPLACE(s, '.', '###'), '-', '.'), 1) AS p3
FROM yourTable)
SELECT RIGHT('00000' + p1, 5) + '-BA' + p2 + '-IN-' + p3
FROM t;

Extract string between after second / and before -

I have a field that holds an account code. I've managed to extract the first 2 parts OK but I'm struggling with the last 2.
The field data is as follows:
812330/50110/0-0
812330/50110/BDG001-0
812330/50110/0-X001
I need to get the string between the second "/" and the "-" and after the "-" .Both fields have variable lengths, so I would be looking to output 0 and 0 on the first record, BDG001 and 0 on the second record and 0 and X001 on the third record.
Any help much appreciated, thanks.
You can use CHARINDEX and LEFT/RIGHT:
CREATE TABLE #tab(col VARCHAR(1000));
INSERT INTO #tab VALUES ('812330/50110/0-0'),('812330/50110/BDG001-0'),
('812330/50110/0-X001');
WITH cte AS
(
SELECT
col,
r = RIGHT(col, CHARINDEX('/', REVERSE(col))-1)
FROM #tab
)
SELECT col,
r,
sub1 = LEFT(r, CHARINDEX('-', r)-1),
sub2 = RIGHT(r, LEN(r) - CHARINDEX('-', r))
FROM cte;
LiveDemo
EDIT:
or even simpler:
SELECT
col
,sub1 = SUBSTRING(col,
LEN(col) - CHARINDEX('/', REVERSE(col)) + 2,
CHARINDEX('/', REVERSE(col)) -CHARINDEX('-', REVERSE(col))-1)
,sub2 = RIGHT(col, CHARINDEX('-', REVERSE(col))-1)
FROM #tab;
LiveDemo2
EDIT 2:
Using PARSENAME SQL SERVER 2012+ (if your data does not contain .):
SELECT
col,
sub1 = PARSENAME(REPLACE(REPLACE(col, '/', '.'), '-', '.'), 2),
sub2 = PARSENAME(REPLACE(REPLACE(col, '/', '.'), '-', '.'), 1)
FROM #tab;
LiveDemo3
...Or you can do this, so you only go from left side to right, so you don't need to count from the end in case you have more '/' or '-' signs:
SELECT
SUBSTRING(columnName, CHARINDEX('/' , columnName, CHARINDEX('/' , columnName) + 1) + 1,
CHARINDEX('-', columnName) - CHARINDEX('/' , columnName, CHARINDEX('/' , columnName) + 1) - 1) AS FirstPart,
SUBSTRING(columnName, CHARINDEX('-' , columnName) + 1, LEN(columnName)) AS LastPart
FROM table_name
One method way is to download a split() function off the web and use it. However, the values end up in separate rows, not separate columns. An alternative is a series of nested subqueries, CTEs, or outer applies:
select t.*, p1.part1, p12.part2, p12.part3
from table t outer apply
(select t.*,
left(t.field, charindex('/', t.field)) as part1,
substring(t.field, charindex('/', t.field) + 1) as rest1
) p1 outer apply
(select left(p1.rest1, charindex('/', p1.rest1) as part2,
substring(p1.rest1, charindex('/', p1.rest1) + 1, len(p1.rest1)) as part3
) p12
where t.field like '%/%/%';
The where clause guarantees that the field value is in the right format. Otherwise, you need to start sprinkling the code with case statements to handle misformated data.