How to solve this string algorithm? - sql

So here is the challenge:
I need to update the strings of a specific column in my table following the pattern as the examples below:
Example 1:
From: texttext/21/812/21a
To: texttext-21a81221
Example 2:
From: texttext/6/163/38a
To: texttext-38a1636
"texttext" lenght may vary and may contain slashes (/) as well.
Also, the block 2 and block 3 of numbers can’t have 2 digits.
So,
texttext/53/7a/a2
must turn into:
texttext-0a207a53
I'm using SQL Server 2008.
I appreciate your efforts to help me.
Thanks!

Not sure what you mean of block 2 and block 3 of numbers can’t have 2 digits
Anyway below should give you idea
WITH data AS
(
select Value, REVERSE(Value) AS ReverseValue from (values
('texttext/21/812/21a'), ('texttext/6/163/38a'), ('texttext/53/7a/a2'), ('text/t/e/xt/53/7a/a2')
)t(Value)
), split AS
(
select
Value, ReverseValue,
reverse(substring(ReverseValue, 1, P1.Pos - 1)) AS Forth,
reverse(substring(ReverseValue, P1.Pos + 1, P2.Pos - P1.Pos - 1)) AS Third,
reverse(substring(ReverseValue, P2.Pos + 1, P3.Pos - P2.Pos - 1)) AS Second,
reverse(substring(ReverseValue, p3.Pos + 1, len(ReverseValue))) AS First
from data
cross apply (select (charindex('/', ReverseValue))) as P1(Pos)
cross apply (select (charindex('/', ReverseValue, P1.Pos+1))) as P2(Pos)
cross apply (select (charindex('/', ReverseValue, P2.Pos+1))) as P3(Pos)
)
select Value, First + '-' + Forth + Third + Second AS NewValue from split

Related

How do I combine a substring and trim right in SQL

I am trying to extract the data between two underscore characters. In some situations, the 2nd underscore may not exist.
MyFld
P_36840
U_216137
C_203134_H
C_203134_W
I tried this:
substring(i.[MyFld],
CHARINDEX ('_',i.[MyFld])+1,len(i.[MyFld])
-CHARINDEX ('_',i.[MyFld])
) [DerivedPrimaryKey]
And I get this:
DerivedPrimaryKey
36840
216137
203134_H
203134_W
https://dbfiddle.uk/uPKC6oX4
I want to remove the second underscore and data that follows it. I'm trying to combine it with a trim right, but I'm unsure where to start.
How can I do this?
We can start by simplifying what you have so far. I will also add enough to make this a complete query, so we can see it in context for later steps:
SELECT
right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [DerivedPrimaryKey]
FROM I
With this much done, we can now use it as the source for removing the trailing portion of the field:
SELECT
reverse(substring(reverse(step1)
, charindex('_', reverse(step1))+1
, len(step1)
)) [DerivedPrimaryKey]
FROM (
SELECT right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)) [step1]
FROM I
) T
Notice the layer of nesting. You can, of course, remove the nesting, but it means replicating the entire inner expression every time you see step1 (good thing I took the time to simplify it):
SELECT
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
FROM I
And now back to just the expression:
reverse(substring(reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
, charindex('_', reverse(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld))))+1
, len(right(i.MyFld, len(i.MyFld) - charindex('_', i.MyFld)))
))
See it work here:
https://dbfiddle.uk/nFO4Vwhm
There is also this alternate expression that saves one function call:
left( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)),
coalesce(
nullif(
charindex('_',
right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld))
) -1, -1,
),
len( right(i.MyFld,len(i.MyFld)-charindex('_',i.MyFld)) )
)
)
Just a two more options. One using parsename() provided your data does not have more than 4 segments. The second using a JSON array
Example
Declare #YourTable Table ([MyFld] varchar(50)) Insert Into #YourTable Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
Select *
,UsingParseName = reverse(parsename(reverse(replace(MyFld,'_','.')),2))
,UsingJSONValue = json_value('["'+replace(MyFld,'_','","')+'"]','$[1]')
From #You
Results
MyFld UsingParseName UsingJSONValue
P_36840 36840 36840
U_216137 216137 216137
C_203134_H 203134 203134
C_203134_W 203134 203134
We can do this:
Declare #testData Table ([MyFld] varchar(50));
Insert Into #testData (MyFld)
Values ('P_36840')
, ('U_216137')
, ('C_203134_H')
, ('C_203134_W');
Select *
, second_element = substring(v.MyFld, p1.pos, p2.pos - p1.pos - 1)
From #testData As td
Cross Apply (Values (concat(td.MyFld, '__'))) As v(MyFld) -- Make sure we have at least 2 delimiters
Cross Apply (Values (charindex('_', v.MyFld, 1) + 1)) As p1(pos) -- First Position
Cross Apply (Values (charindex('_', v.MyFld, p1.pos) + 1)) As p2(pos) -- Second Position
If you actually have a fixed number of characters in the first element, then it could be simplified to:
Select *
, second_element = substring(v.MyFld, 3, charindex('_', v.MyFld, 4) - 3)
From #testData td
Cross Apply (Values (concat(td.MyFld, '_'))) As v(MyFld)
Often I try to fake out SQL if an expected character isn't always present and I don't need the resulting value:
SELECT SUBSTRING(field_Calculated, 1, CHARINDEX('_', field_Calculated) - 1)
FROM (SELECT SUBSTRING(MyFld, CHARINDEX('_', MyFld) + 1, LEN(MyFld)) + '_' As field_Calculated
FROM MyTable) T
I think this is clear, but I really like the ParseName solution #JohnCappalletti suggests.
If it's only ever one numeric value you can use string_split:
SELECT * FROM MyTable
CROSS APPLY string_split(MyFld, '_')
WHERE ISNUMERIC(value) = 1
Either way you have to be careful of the data before deciding the best approach.
your data
Declare #Table Table ([MyFld] varchar(100))
Insert Into #Table
([MyFld] ) Values
('P_36840')
,('U_216137')
,('C_203134_H')
,('C_203134_W')
use SubString,Left and PatIndex
select
Left(
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
),
PatIndex(
'%[^0-9.-]%',
SubString(
[MyFld],
PatIndex('%[0-9.-]%', [MyFld]),
8000
) + 'X'
)-1
) as DerivedPrimaryKey
from
#Table

SQL Server - Extract text from a given string field

What is the best method to extract the text from the string field below?
I am trying to extract the ProjectID numbers (91, 108, 250) below but am struggling because the ProjectIDs are either 2 or 3 integers long and are within different parts of the string.
Row Parameter
1 ProjectID=91&GroupID=250&ParentID=1
2 ProjectID=108&GroupID=250&ParentID=35
3 GroupID=1080&ProjectID=250&ParentID=43
4 ProjectID=250
Any help would be much appreciated.
SQL Server is kind of lousy on string functionality. Here is one method:
select left(v1.p, patindex('%[^0-9]%', v1.p + ' ') - 1)
from (values ('ProjectID=91&GroupID=250&ParentID=1'),
('ProjectID=108&GroupID=250&ParentID=35'),
('GroupID=1080&ProjectID=250&ParentID=43'),
('ProjectID=250')
) v(parameter) cross apply
(values (stuff(v.parameter, 1, charindex('ProjectID=', v.parameter) + 9, ''))
) v1(p);
Or split the string and look for a match:
select stuff(s.value, 1, 10, '')
from (values ('ProjectID=91&GroupID=250&ParentID=1'),
('ProjectID=108&GroupID=250&ParentID=35'),
('GroupID=1080&ProjectID=250&ParentID=43'),
('ProjectID=250')
) t(parameter) cross apply
string_split(t.parameter, '&') s
where s.value like 'ProjectId=%';
Here is a db<>fiddle.
SELECT
substring ( #Parameter,
CHARINDEX('ProjectID', #Parameter) + 10,
CHARINDEX('&', #parameter, CHARINDEX('ProjectID', #Parameter)) -
(CHARINDEX('ProjectID', #Parameter) + 10))
from table

SQL Get string between second and third underscore

I need to extract a certain string from a column in a table as part of an SSIS package.
The contents of the column is formatted like this "TST_AB1_ABC123456_TEST".
I need to get the string between the second and 3rd "_", e.g. "ABC123456" without changing too much of the package so would rather do it in 1 SQL command if possible.
I've tried a few different methods using SUBSTRING, REVERSE and CHARINDEX but can't figure out how to get just that string.
Using the base string functions:
SELECT
SUBSTRING(col,
CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1,
CHARINDEX('_', col, CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1) -
CHARINDEX('_', col, CHARINDEX('_', col) + 1) - 1)
FROM yourTable;
In notes format, the above call to SUBSTRING is saying:
SELECT
SUBSTRING(<your column>,
<starting at one past the second underscore>,
<for a length of the number of characters in between the 2nd and 3rd
underscore>)
FROM yourTable;
On other databases, such as Postgres and Oracle, there are substring index and regex functions which can handle the above more gracefully. Actually, more recent versions of SQL Server have a STRING_SPLIT function, which could be used here, but it does not maintain the order of the resulting parts.
If your column values always have 4 parts you can use the PARSENAME() function like this.
DECLARE #MyString VARCHAR(100)
SET #MyString = 'TST_AB1_ABC123456_TEST';
SELECT PARSENAME(REPLACE(#MyString, '_', '.'), 2)
You could also do this using Cross Apply. I added in a where clause to make sure you don't get an error resulting from strings without 3 underscores
with your_table as (select 'TST_AB1_ABC123456_TEST' as txt1)
select txt1, txt2
from your_table t1
where txt1 like '%_%_%_%'
cross apply (select charindex( '_', txt1) as i1) t2 -- locate the 1st underscore
cross apply (select charindex( '_', txt1, (i1 + 1)) as i2 ) t3 -- then the 2nd
cross apply (select charindex( '_', txt1, (i2 + 1)) as i3 ) t4 -- then the 3rd
cross apply (select substring( txt1,(i2+1), (i3-i2-1)) as txt2) t5 -- between 2nd & 3rd
Outputs
+------------------------+-----------+
| txt1 | txt2 |
+------------------------+-----------+
| TST_AB1_ABC123456_TEST | ABC123456 |
+------------------------+-----------+
DEMO

How to work with reverse and split in a select in SQL Server 2008?

I found the below query very useful for an application I'm working on. However, I wanna replace the values for a select in a table.
WITH data AS
(
select Value, REVERSE(Value) AS ReverseValue from (values
('texttext/21/812/21a'), ('texttext/6/163/38a'), ('texttext/53/7a/a2'), ('text/t/e/xt/53/7a/a2')
)t(Value)
), split AS
(
select
Value, ReverseValue,
reverse(substring(ReverseValue, 1, P1.Pos - 1)) AS Forth,
reverse(substring(ReverseValue, P1.Pos + 1, P2.Pos - P1.Pos - 1)) AS Third,
reverse(substring(ReverseValue, P2.Pos + 1, P3.Pos - P2.Pos - 1)) AS Second,
reverse(substring(ReverseValue, p3.Pos + 1, len(ReverseValue))) AS First
from data
cross apply (select (charindex('/', ReverseValue))) as P1(Pos)
cross apply (select (charindex('/', ReverseValue, P1.Pos+1))) as P2(Pos)
cross apply (select (charindex('/', ReverseValue, P2.Pos+1))) as P3(Pos)
)
select Value, First + '-' + Forth + Third + Second AS NewValue from split
So instead of (values ('texttext/21/812/21a') (...) i want something like (select myfield from myutable). Any ideas of how to do that? Thanks.
Just replace the values in the first CTE:
WITH data AS (
select myfield, REVERSE(myfield) AS ReverseValue
from mytable
),
. . .

Spliting a column into multiple columns in SQL Server

I am working with a column called FullName which stores a lengthy value.
The format is something like
'Microsoft.SQL.Server.20XX.DBFile:ABC.edf.com;XXXX_XXX_XXX;master;1;1'
'SQLVersion.DBFile:Hostname;InstanceName;Dummy;Dummy;Dummy'
What I want is to split FullName into SQLVersion, Hostname and InstanceName.
I have searched a sort of threads about splitting values which separate by a dot or a colon, which is slightly different with my case.
You can use the following trick with cross apply. I think the code is self explanatory and doesn't need elaboration:
create table t(v varchar(200))
insert into t values
('Microsoft.SQL.Server.20XX.DBFile:ABC.edf.com;XXXX_XXX_XXX;master;1;1'),
('SQLVersion.DBFile:Hostname;InstanceName;Dummy;Dummy;Dummy')
select substring(v, 1, c1.i1 - 1) as SqlVersion,
substring(v, c1.i1 + 1, c2.i2 - c1.i1 - 1) as HostName,
substring(v, c2.i2 + 1, c3.i3 - c2.i2 - 1) as InstanceName
from t
cross apply(select charindex(':', t.v) as i1 ) c1
cross apply(select charindex(';', t.v, c1.i1 + 1) as i2) c2
cross apply(select charindex(';', t.v, c2.i2 + 1) as i3) c3
In case it is not clear. In first cross apply I am selecting index of symbol :. In second cross apply I am selecting index of symbol ; that is after the index of first cross apply. In third the index of symbol ; that is after the index of second cross apply. In main select I just use those indeces to grab the needed portions of string.
Fiddle here http://sqlfiddle.com/#!3/d79b45/16
I agree with the comment above that separate columns would be preferred, if possible. However, I believe this parsing logic is doing what you want:
declare
#test nvarchar(200)
, #versionStart int
, #versionLength int
, #hostStart int
, #hostLength int
, #instanceStart int
, #instanceLength int
SET #test = 'Microsoft.SQL.Server.20XX.DBFile:ABC.edf.com;XXXX_XXX_XXX;master;1;1'
SET #versionStart = PATINDEX('%Microsoft.SQL.Server.%', #test) + 21
SET #versionLength = CHARINDEX('.', #test, #versionStart) - #versionStart
SET #hostStart = PATINDEX('%.DBFile:%', #test) + 8
SET #hostLength = CHARINDEX(';', #test, #hostStart) - #hostStart
SET #instanceStart = CHARINDEX(';', #test, #hostStart + #hostLength) + 1
SET #instanceLength = CHARINDEX(';', #test, #instanceStart) - #instanceStart
select
SUBSTRING(#test, #versionStart, 4) AS Version
, SUBSTRING(#test, #hostStart, #hostLength) AS HostName
, SUBSTRING(#test, #instanceStart, #instanceLength) AS InstanceName