How do I combine a substring and trim right in SQL - sql

I am trying to extract the data between two underscore characters. In some situations, the 2nd underscore may not exist.
MyFld
P_36840
U_216137
C_203134_H
C_203134_W
I tried this:
substring(i.[MyFld],
CHARINDEX ('_',i.[MyFld])+1,len(i.[MyFld])
-CHARINDEX ('_',i.[MyFld])
) [DerivedPrimaryKey]
And I get this:
DerivedPrimaryKey
36840
216137
203134_H
203134_W
https://dbfiddle.uk/uPKC6oX4
I want to remove the second underscore and data that follows it. I'm trying to combine it with a trim right, but I'm unsure where to start.
How can I do this?

We can start by simplifying what you have so far. I will also add enough to make this a complete query, so we can see it in context for later steps:
SELECT
right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [DerivedPrimaryKey]
FROM I
With this much done, we can now use it as the source for removing the trailing portion of the field:
SELECT
reverse(substring(reverse(step1)
, charindex('_', reverse(step1))+1
, len(step1)
)) [DerivedPrimaryKey]
FROM (
SELECT right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [step1]
FROM I
) T
Notice the layer of nesting. You can, of course, remove the nesting, but it means replicating the entire inner expression every time you see step1 (good thing I took the time to simplify it):
SELECT
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
FROM I
And now back to just the expression:
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
See it work here:
https://dbfiddle.uk/nFO4Vwhm
There is also this alternate expression that saves one function call:
left( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)),
coalesce(
nullif(
charindex('_',
right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld))
) -1, -1,
),
len( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)) )
)
)

Just a two more options. One using parsename() provided your data does not have more than 4 segments. The second using a JSON array
Example
Declare #YourTable Table ([MyFld] varchar(50)) Insert Into #YourTable Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
Select *
,UsingParseName = reverse(parsename(reverse(replace(MyFld,'_','.')),2))
,UsingJSONValue = json_value('["'+replace(MyFld,'_','","')+'"]','$[1]')
From #You
Results
MyFld UsingParseName UsingJSONValue
P_36840 36840 36840
U_216137 216137 216137
C_203134_H 203134 203134
C_203134_W 203134 203134

We can do this:
Declare #testData Table ([MyFld] varchar(50));
Insert Into #testData (MyFld)
Values ('P_36840')
, ('U_216137')
, ('C_203134_H')
, ('C_203134_W');
Select *
, second_element = substring(v.MyFld, p1.pos, p2.pos - p1.pos - 1)
From #testData As td
Cross Apply (Values (concat(td.MyFld, '__'))) As v(MyFld) -- Make sure we have at least 2 delimiters
Cross Apply (Values (charindex('_', v.MyFld, 1) + 1)) As p1(pos) -- First Position
Cross Apply (Values (charindex('_', v.MyFld, p1.pos) + 1)) As p2(pos) -- Second Position
If you actually have a fixed number of characters in the first element, then it could be simplified to:
Select *
, second_element = substring(v.MyFld, 3, charindex('_', v.MyFld, 4) - 3)
From #testData td
Cross Apply (Values (concat(td.MyFld, '_'))) As v(MyFld)

Often I try to fake out SQL if an expected character isn't always present and I don't need the resulting value:
SELECT SUBSTRING(field_Calculated, 1, CHARINDEX('_', field_Calculated) - 1)
FROM (SELECT SUBSTRING(MyFld, CHARINDEX('_', MyFld) + 1, LEN(MyFld)) + '_' As field_Calculated
FROM MyTable) T
I think this is clear, but I really like the ParseName solution #JohnCappalletti suggests.
If it's only ever one numeric value you can use string_split:
SELECT * FROM MyTable
CROSS APPLY string_split(MyFld, '_')
WHERE ISNUMERIC(value) = 1
Either way you have to be careful of the data before deciding the best approach.

your data
Declare #Table Table ([MyFld] varchar(100))
Insert Into #Table
([MyFld] ) Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
use SubString,Left and PatIndex
select
Left(
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
),
PatIndex(
'%[^0-9.-]%',
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
) + 'X'
)-1
) as DerivedPrimaryKey
from
#Table

Related

Is it possible to find the first occurrence of a string that's NOT within a set of delimiters in SQL Server 2016+?

I have a column in a SQL Server table that has strings of varying lengths. I need to find the position of the first occurrence of the string , -- that's not enclosed in single quotes or square brackets.
For example, in the following two strings, I've bolded the portion I would like to get the position of. Notice in the first string, the first time , -- appears on its own (without being between single quote or square bracket delimiters) is at position 13 and in the second string, it's at position 16.
'a, --'[, --]**, --**[, --]
[a, --b]aaaaaaa_ **, --**', --'
Also I should mention that , -- itself could appear multiple times in the string.
Here's a simple query that shows the strings and my desired output.
SELECT
t.string, t.desired_pos
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
Is there any way to accomplish this using a SELECT query (or multiple) without using a function?
Thank you in advance!
I've tried variations of SUBSTRING, CHARINDEX, and even some CROSS APPLYs but I can't seem to get the result I'm looking for.
Before i write down my solution, i must warn you: DON'T USE IT. Use a function, or do this in some other language. This code is probably buggy.
It doesn't handle stuff like escaped quotes etcetc.
The idea is to first remove the stuff inside brackets [] and quotes '' and then just do a "simple" charindex.
To remove the brackets, i'm using a recursive CTE that loops ever part of matching quotes and replaces their content with placeholder strings.
One important point is that quotes might be embedded in each other, so you have to try both variants and chose the one that is earliest.
WITH CTE AS (
SELECT *
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
)
, cte2 AS (
select x.start
, x.finish
, case when x.start > 0 THEN STUFF(string, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1)) ELSE string END AS newString
, 1 as level
, string as orig
, desired_pos
from cte
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', string) AS start
, charindex(']', string) AS finish
UNION ALL
SELECT charindex('''', string) AS startQ
, charindex('''', string, charindex('''', string) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
UNION ALL
select x.start
, x.finish
, STUFF(newString, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1))
, 1 as level
, orig
, desired_pos
from cte2
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', newString) AS start
, charindex(']', newString) AS finish
UNION ALL
SELECT charindex('''', newString) AS startQ
, charindex('''', newString, charindex('''', newString) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
AND x.start > 0
AND cte2.start > 0 -- Must have been a match
)
SELECT PATINDEX('%, --%', newString), *
from (
select *, row_number() over(partition by orig order by level desc) AS sort
from cte2
) x
where x.sort = 1
Try this approach. I'm replacing the strings you don't need for another string of the same length. Then look for the position of the interested string.
SELECT string, desired_pos,
CHARINDEX(', --', REPLACE(REPLACE(string, ''', --''', '******'), '[, --]', '******')
) start_index
FROM (VALUES (N''', --''[, --], --[, --]', 13),
(N'[, --]aaaaaaa_ , --'', --''', 16)) t(string, desired_pos)
I don't know if it makes sense with a C# solution, but this class for CVS is a nice little parcer: TextFieldParser
Then you just define Delimeters etc. and assuming the input is escaped consistently then all is good.
Im late the game here but This kind of thing is simple in SQL Server when leveraging NGrams8k. Not only do you not need REGEX, a CLR, C# required. Furthermore, NGrams8k will be the fastest by far. In 8 years nobody has produced anything remotely as fast. Furthermore, this code will be faster and far less complex than a recursive CTE solution (which are almost always slow in SQL Server)
;--==== Sample Data
DECLARE #T Table (String VARCHAR(100))
INSERT #T
VALUES (N'''a, --''[, --], --[, --]'),
(N'[a, —-b]aaaaaaa_ , --'', --''');
;--==== Solution
SELECT
t.String, ng.Position
FROM #t AS t
CROSS APPLY (VALUES(REPLACE(t.String,'[',CHAR(1)))) AS f(S)
CROSS APPLY samd.NGrams8k(f.S,4) AS ng
CROSS APPLY (VALUES(SUBSTRING(f.S,ng.Position-2,7))) AS g(String)
WHERE ng.Token = ', --'
AND g.String NOT LIKE '%''%''%'
AND g.String NOT LIKE '%'+CHAR(1)+'%]%';
Results:
String Position
----------------------------- --------------------
'a, --'[, --], --[, --] 14
[a, —-b]aaaaaaa_ , --', --' 18

SQL - Split Concatenated String into Columns after ;

I really need help with SQL. I have a database where you can find values like: abc;defg;hi and this is in just 1 column. So I wanna create 2 more colums which inserts the splited values.
For example:
Before:
Value01: abc;defg;hi
After:
Value01: abc,
Value02: defg,
Value03: hi
--Another Example would be this:--
Before:
Value01: abcd;efg;
After:
Value01: abcd,
Value02: efg,
Value03: null
So always 3 new values were created. I hope you understand my question!
Greetings
You can use string_split():
select nullif(s.value, '')
from string_split(#value, ';') s
Use WITH
WITH CTE
AS
(
SELECT [xml_val] = CAST('<t>' + REPLACE(SomeValue,';','</t><t>') + '</t>' AS XML)
FROM #yourTable
)
SELECT [SomeValue] = col.value('.','VARCHAR(100)')
FROM CTE
CROSS APPLY [xml_val].nodes('/t') CA(col)
Here is a solution that should be fairly performant:
--==== Sample Data
Declare #testData Table (Value01 varchar(255));
Insert Into #testData (Value01)
Values ('abc;defg;hi'), ('abcd;efg;');
--==== Solution using sample data from above
Select *
, Value01 = substring(td.Value01, 1, p1.pos - 2)
, Value02 = substring(td.Value01, p1.pos, p2.pos - p1.pos - 1)
, Value03 = substring(td.Value01, p2.pos, p3.pos - p2.pos - 1)
From #testData td
Cross Apply (Values (concat(td.Value01, ';;;'))) As v(Value01) --Make sure we have delimiters
Cross Apply (Values (charindex(';', v.Value01, 1) + 1)) As p1(pos)
Cross Apply (Values (charindex(';', v.Value01, p1.pos) + 1)) As p2(pos)
Cross Apply (Values (charindex(';', v.Value01, p2.pos) + 1)) As p3(pos) --End of string/element
You could also create a function that parses out the elements and then just call the function using cross apply.

How to pull out information from a long string of data

I have this data point:
455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215
Column: ,[t810str]
How would I be able to modify column [t810str] in order to pull out the last comma set before 857?
Desired Result = 422-L-202008011052
First you need to implement some kind of splitter that respects ordinal position (STRING_SPLIT does not). I'm therefore going to make use of DelimitedSplit8k_LEAD. Then you can split the value, and use LAG to get the prior value. Finally you can filter on where the item has a value LIKE '857%' but the previous does not:
WITH CTE AS(
SELECT DS.Item,
LAG(DS.Item) OVER (PARTITION BY YourColumn ORDER BY DS.itemNumber) AS PrevItem
FROM (VALUES('455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215'))V(YourColumn)
CROSS APPLY dbo.DelimitedSplit8K_LEAD(V.YourColumn,',') DS)
SELECT C.PrevItem
FROM CTE C
WHERE C.Item LIKE '857%'
AND C.PrevItem NOT LIKE '857%';
Based on your data and the assumption that items are 18 characters (your data do not indicate otherwise):
DECLARE #t AS NVARCHAR(255) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215';
SELECT RIGHT(LEFT(#t,CHARINDEX(',857',#t)-1),18)
Using cross apply (which you can also rewrite using a CTE or a subquery for readability). This removes everything after first occurrence of 857 and then grabs the last set that's left. So even if you have multiple 857 and varying length of delimited strings, this should work
select *, right(remind , charindex (',' ,reverse(remind))-1)
from t t1
cross apply (select stuff(col, charindex(',857',col), len(col),'') as remind) t2
DEMO
Another solution use a recursive CTE
DECLARE #Var VARCHAR(200) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215';
WITH CTE AS
(
SELECT 0 N, LEFT(#Var, CHARINDEX(',', #Var)-1) Part,
RIGHT(#Var, LEN(#Var) - CHARINDEX(',', #Var)) Remind
UNION ALL
SELECT N + 1,
LEFT(Remind, CHARINDEX(',', Remind) - 1),
RIGHT(Remind, LEN(Remind) - CHARINDEX(',', Remind))
FROM CTE
WHERE CHARINDEX(',', Remind) <> 0
)
SELECT TOP 1 Part
FROM CTE
WHERE LEFT(Remind, 3) = '857'
ORDER BY N;
Demo
Implemented with string functions (and assuming your data items can have variable length :-) it might look a bit confusing (therefore I'd prefer #Larnu's answer):
DECLARE #string VARCHAR(2000) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215'
SELECT SUBSTRING(#string, CHARINDEX(',857',#string) - CHARINDEX(',', REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1)) ) + 1, CHARINDEX(',', REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1)))-1 )
Parts of the latter separated:
DECLARE #string VARCHAR(2000) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215'SELECT CHARINDEX(',857',#string)
SELECT LEFT(#string, PATINDEX('%,857%',#string) - 1)
SELECT REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1) )
SELECT CHARINDEX(',', REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1)) )

SQL Get string between second and third underscore

I need to extract a certain string from a column in a table as part of an SSIS package.
The contents of the column is formatted like this "TST_AB1_ABC123456_TEST".
I need to get the string between the second and 3rd "_", e.g. "ABC123456" without changing too much of the package so would rather do it in 1 SQL command if possible.
I've tried a few different methods using SUBSTRING, REVERSE and CHARINDEX but can't figure out how to get just that string.
Using the base string functions:
SELECT
SUBSTRING(col,
CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1,
CHARINDEX('_', col, CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1) -
CHARINDEX('_', col, CHARINDEX('_', col) + 1) - 1)
FROM yourTable;
In notes format, the above call to SUBSTRING is saying:
SELECT
SUBSTRING(<your column>,
<starting at one past the second underscore>,
<for a length of the number of characters in between the 2nd and 3rd
underscore>)
FROM yourTable;
On other databases, such as Postgres and Oracle, there are substring index and regex functions which can handle the above more gracefully. Actually, more recent versions of SQL Server have a STRING_SPLIT function, which could be used here, but it does not maintain the order of the resulting parts.
If your column values always have 4 parts you can use the PARSENAME() function like this.
DECLARE #MyString VARCHAR(100)
SET #MyString = 'TST_AB1_ABC123456_TEST';
SELECT PARSENAME(REPLACE(#MyString, '_', '.'), 2)
You could also do this using Cross Apply. I added in a where clause to make sure you don't get an error resulting from strings without 3 underscores
with your_table as (select 'TST_AB1_ABC123456_TEST' as txt1)
select txt1, txt2
from your_table t1
where txt1 like '%_%_%_%'
cross apply (select charindex( '_', txt1) as i1) t2 -- locate the 1st underscore
cross apply (select charindex( '_', txt1, (i1 + 1)) as i2 ) t3 -- then the 2nd
cross apply (select charindex( '_', txt1, (i2 + 1)) as i3 ) t4 -- then the 3rd
cross apply (select substring( txt1,(i2+1), (i3-i2-1)) as txt2) t5 -- between 2nd & 3rd
Outputs
+------------------------+-----------+
| txt1 | txt2 |
+------------------------+-----------+
| TST_AB1_ABC123456_TEST | ABC123456 |
+------------------------+-----------+
DEMO

Select text from between two characters in string

I have data in database an example of data below
folder/subfolder/file/doc
folder/subfolder/doc
how do I get the 1st instance of characters from between the '/'
I want to extract 'folder/subfolder'
I have tried the following but not what I need. this gets 'folder/'
LEFT([Cat], CHARINDEX('/', [Cat]) ) as 'doc_cat',
and the below gets the last part
RIGHT([Cat], CHARINDEX('/', [Cat]) ) as 'doc_cat2',
I want to get the 1st part of and second part of string
Here is one method:
select left(doc_cat_1, charindex('/', doc_cat_1) - 1)
from t cross apply
(select stuff(cat, 1, charindex('/', cat), '') as doc_cat_1
) v1;
The string handling capabilities of SQL Server are pretty lousy. Apply at least makes it easier to handle intermediate results.
You can use LEFT and CHARINDEX
LEFT([Cat],charindex('/',[Cat],charindex('/',[Cat])+1)-1) AS 'doc_cat'
One more way to accomplish using XML -
declare #s table(patterns nvarchar(100))
insert into #s
values ('folder/subfolder/file/doc'), ('folder/subfolder/doc'),('folder/subfolder')
select cast(concat('<x>', REPLACE(patterns, '/', '</x><x>'), '</x>') as xml).value('/x[1]','varchar(100)') + '/'
+ cast(concat('<x>', REPLACE(patterns, '/', '</x><x>'), '</x>') as xml).value('/x[2]','varchar(100)')
from #s
If you're on SQL 2016 or newer, you could use STRING_SPLIT()
WITH cte AS (
SELECT cat, value, ROW_NUMBER() OVER (PARTITION BY cat ORDER BY cat) rn
FROM someTable CROSS APPLY
STRING_SPLIT(cat,'/')
)
SELECT cat, value FROM cte WHERE rn = 2;
The advantage here is that rn could be any number you need.
Fiddle here.