SQL String - remove a substring between 2 occurences - sql

I have some strings like :
0#11001#2017#308
0#1#2018#327
0#200510#2020#3022
I need to remove the year value (between last 2 '#').
So the results should be like :
0#11001#308
0#1#327
0#200510#3022
Is there a simple query to do it?

UPDATE
yourTable
SET
col = STUFF(yourTable.col, AtSign2.pos, atSign3.pos - atSign2.pos, '')
FROM
yourTable
OUTER APPLY
(SELECT CHARINDEX('#', yourTable.col, 0)) AS atSign1(pos)
OUTER APPLY
(SELECT CHARINDEX('#', yourTable.col, atSign1.pos+1)) AS atSign2(pos)
OUTER APPLY
(SELECT CHARINDEX('#', yourTable.col, atSign2.pos+1)) AS atSign3(pos)

The best thing to do here would be to normalize your data, and stop storing multiple data points embedded in a single string, in a single column. That being said, if you must proceed, we can try using the base string functions to splice together the update:
UPDATE yourTable
SET col = LEFT(col, CHARINDEX('#', col, CHARINDEX('#', col) + 1)) +
REVERSE(LEFT(REVERSE(col), CHARINDEX('#', REVERSE(col)) - 1));
Demo
Note that this answer assumes that every record would have three # separators in it. If not, then we would have to add additional logic.

DECLARE #STRING VARCHAR(50) = '0#200510#2020#3022'
DECLARE #NEED_TO_REMOVE VARCHAR(50) = PARSENAME(REPLACE(#STRING,'#','.'),2) + '#'
SELECT REPLACE(#STRING,#NEED_TO_REMOVE,'') --result: 0#200510#3022

Related

How to pull out information from a long string of data

I have this data point:
455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215
Column: ,[t810str]
How would I be able to modify column [t810str] in order to pull out the last comma set before 857?
Desired Result = 422-L-202008011052
First you need to implement some kind of splitter that respects ordinal position (STRING_SPLIT does not). I'm therefore going to make use of DelimitedSplit8k_LEAD. Then you can split the value, and use LAG to get the prior value. Finally you can filter on where the item has a value LIKE '857%' but the previous does not:
WITH CTE AS(
SELECT DS.Item,
LAG(DS.Item) OVER (PARTITION BY YourColumn ORDER BY DS.itemNumber) AS PrevItem
FROM (VALUES('455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215'))V(YourColumn)
CROSS APPLY dbo.DelimitedSplit8K_LEAD(V.YourColumn,',') DS)
SELECT C.PrevItem
FROM CTE C
WHERE C.Item LIKE '857%'
AND C.PrevItem NOT LIKE '857%';
Based on your data and the assumption that items are 18 characters (your data do not indicate otherwise):
DECLARE #t AS NVARCHAR(255) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215';
SELECT RIGHT(LEFT(#t,CHARINDEX(',857',#t)-1),18)
Using cross apply (which you can also rewrite using a CTE or a subquery for readability). This removes everything after first occurrence of 857 and then grabs the last set that's left. So even if you have multiple 857 and varying length of delimited strings, this should work
select *, right(remind , charindex (',' ,reverse(remind))-1)
from t t1
cross apply (select stuff(col, charindex(',857',col), len(col),'') as remind) t2
DEMO
Another solution use a recursive CTE
DECLARE #Var VARCHAR(200) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215';
WITH CTE AS
(
SELECT 0 N, LEFT(#Var, CHARINDEX(',', #Var)-1) Part,
RIGHT(#Var, LEN(#Var) - CHARINDEX(',', #Var)) Remind
UNION ALL
SELECT N + 1,
LEFT(Remind, CHARINDEX(',', Remind) - 1),
RIGHT(Remind, LEN(Remind) - CHARINDEX(',', Remind))
FROM CTE
WHERE CHARINDEX(',', Remind) <> 0
)
SELECT TOP 1 Part
FROM CTE
WHERE LEFT(Remind, 3) = '857'
ORDER BY N;
Demo
Implemented with string functions (and assuming your data items can have variable length :-) it might look a bit confusing (therefore I'd prefer #Larnu's answer):
DECLARE #string VARCHAR(2000) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215'
SELECT SUBSTRING(#string, CHARINDEX(',857',#string) - CHARINDEX(',', REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1)) ) + 1, CHARINDEX(',', REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1)))-1 )
Parts of the latter separated:
DECLARE #string VARCHAR(2000) = '455-U-202007302233,455-L-202007302233,422-U-202008011052,422-L-202008011052,857-U-202008041142,857-L-202008061215'SELECT CHARINDEX(',857',#string)
SELECT LEFT(#string, PATINDEX('%,857%',#string) - 1)
SELECT REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1) )
SELECT CHARINDEX(',', REVERSE( LEFT(#string, PATINDEX('%,857%',#string) - 1)) )

SQL Get string between second and third underscore

I need to extract a certain string from a column in a table as part of an SSIS package.
The contents of the column is formatted like this "TST_AB1_ABC123456_TEST".
I need to get the string between the second and 3rd "_", e.g. "ABC123456" without changing too much of the package so would rather do it in 1 SQL command if possible.
I've tried a few different methods using SUBSTRING, REVERSE and CHARINDEX but can't figure out how to get just that string.
Using the base string functions:
SELECT
SUBSTRING(col,
CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1,
CHARINDEX('_', col, CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1) -
CHARINDEX('_', col, CHARINDEX('_', col) + 1) - 1)
FROM yourTable;
In notes format, the above call to SUBSTRING is saying:
SELECT
SUBSTRING(<your column>,
<starting at one past the second underscore>,
<for a length of the number of characters in between the 2nd and 3rd
underscore>)
FROM yourTable;
On other databases, such as Postgres and Oracle, there are substring index and regex functions which can handle the above more gracefully. Actually, more recent versions of SQL Server have a STRING_SPLIT function, which could be used here, but it does not maintain the order of the resulting parts.
If your column values always have 4 parts you can use the PARSENAME() function like this.
DECLARE #MyString VARCHAR(100)
SET #MyString = 'TST_AB1_ABC123456_TEST';
SELECT PARSENAME(REPLACE(#MyString, '_', '.'), 2)
You could also do this using Cross Apply. I added in a where clause to make sure you don't get an error resulting from strings without 3 underscores
with your_table as (select 'TST_AB1_ABC123456_TEST' as txt1)
select txt1, txt2
from your_table t1
where txt1 like '%_%_%_%'
cross apply (select charindex( '_', txt1) as i1) t2 -- locate the 1st underscore
cross apply (select charindex( '_', txt1, (i1 + 1)) as i2 ) t3 -- then the 2nd
cross apply (select charindex( '_', txt1, (i2 + 1)) as i3 ) t4 -- then the 3rd
cross apply (select substring( txt1,(i2+1), (i3-i2-1)) as txt2) t5 -- between 2nd & 3rd
Outputs
+------------------------+-----------+
| txt1 | txt2 |
+------------------------+-----------+
| TST_AB1_ABC123456_TEST | ABC123456 |
+------------------------+-----------+
DEMO

SQL Server remove part of string between characters

I have emails that look like this:
john.doe.946a9979-2951-4852-9e79-ad03eb0c1e5d#gmail.com
I am trying to get this output:
john.doe#gmail.com
I have this so far.... it's close.
SELECT
Caller = REPLACE(Caller,
SUBSTRING(Caller,
CHARINDEX('.', Caller),
CASE WHEN CHARINDEX('#', Caller, CHARINDEX('.', Caller)) > 1 THEN
CHARINDEX('#', Caller, CHARINDEX('.', Caller)) - CHARINDEX('.', Caller)
ELSE
LEN(Caller)
END ) , '')
FROM
some.table
Hmmm. I suspect the string you want to remove is fixed in length. So how about:
select stuff(caller, charindex('#',caller ) - 37, 37, '')
If you have SS 2016 or later, you can use R code to do it - granted I don't know about speed on larger data so be wary if its a production environment. Also be warned Regexp isn't my strongest area so you may want to check that portion.
DECLARE #dummyScript NVARCHAR(1000) = '
SELECT * FROM
(VALUES (''john.doe.946a9979-2951-4852-9e79-ad03eb0c1e5d#gmail.com''),
(''jane whoever 1234453-534343#yahoo.com'') ) t (Email)
'
DECLARE #myRcode NVARCHAR(600)
SET #myRcode = 'OutputDataset <- data.frame(Email_Cleaned = gsub("[0-9]+.+#", "#", InputDataSet$Email)) '
DECLARE #CleanedTable TABLE (Email_Cleaned VARCHAR(500))
INSERT INTO #CleanedTable
EXEC sp_execute_external_script
#language = N'R'
,#script = #myRcode
,#input_data_1 = #dummyScript
,#input_data_1_name = N'InputDataSet'
,#output_data_1_name = N'OutputDataset'
SELECT * FROM #CleanedTable
Here is a simpler method using LEFT(), RIGHT(), and CHARINDEX() functions
DECLARE #Caller VARCHAR(MAX) = 'john.doe.946a9979-2951-4852-9e79-ad03eb0c1e5d#gmail.com'
SELECT
LEFT(#Caller, CHARINDEX('.', #Caller, CHARINDEX('.', #Caller) + 1) - 1) + RIGHT(#Caller, LEN(#Caller) - CHARINDEX('#', #Caller) + 1) Email
The left side will get all the characters until the second dot, and the right side will get the characters from # sign to the end of characters.
Try like below
SELECT
REPLACE(CALLER,
Substring(CALLER,
PATINDEX('.[0-9]%#', CALLER),
PATINDEX('#', CALLER ) )
,'#')
From Table

How do I select a substring from two different patindex?

I have many different types of string, but they all follow the two same patterns:
ABC123-S-XYZ789
ABC123-P-XYZ789
QUESTION 1:
I know how I can extract the first part: ABC123
But how do I extract the second part??? XYZ789
QUESTION 2:
I can't tell beforehand if the string follows the -S- pattern or the -P- pattern, it can be different each time. Anyone who know how I can solve this?
Thanks! / Sophie
You can try following code:
SELECT CASE WHEN #a LIKE '%-S-%' THEN right(#a, CHARINDEX('-S-', #a)-1)
WHEN #a LIKE '%-P-%' THEN right(#a, CHARINDEX('-P-', #a)-1)
ELSE NULL END AS 'ColName'
FROM tablename
Is this what you need?
DECLARE #Input VARCHAR(100) = 'ABC123-S-XYZ789'
SELECT
FirstPart = SUBSTRING(
#Input,
1,
CHARINDEX('-', #Input) - 1),
SecondPart = SUBSTRING(
#Input,
LEN(#Input) - CHARINDEX('-', REVERSE(#Input)) + 2,
100),
Pattern = CASE
WHEN #Input LIKE '%-S-%' THEN 'S'
WHEN #Input LIKE '%-P-%' THEN 'P' END
You can use parsename() if the string has always this kind of parts such as ABC123-S-XYZ789
select col, parsename(replace(col, '-', '.'), 1)
However, the parsename() requires the SQL Server+12 if not then you can use reverse()
select col, reverse(left(reverse(col), charindex('-', reverse(col))-1))
If you're using SQL Server 2016 or newer, you can use STRING_SPLIT
CREATE TABLE #temp (string VARCHAR(100));
INSERT #temp VALUES ('ABC123-S-XYZ789'),('ABC123-P-XYZ789');
SELECT *, ROW_NUMBER() OVER (PARTITION BY string ORDER BY string)
FROM #temp t
CROSS APPLY STRING_SPLIT(t.string, '-');
I can't tell beforehand if the string folllows the -S- pattern or the -P- pattern
You can then use a CTE to get a specific part of the string:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY string ORDER BY string) rn
FROM #temp t
CROSS APPLY STRING_SPLIT(t.string, '-')
)
SELECT * FROM cte WHERE rn = 2

How to use substring in SQL Server

Suppose I have this query.
SELECT
proj.refno [Reference No.],
proj.projname [NNNN],
TotalCost= '$' + CONVERT(NVARCHAR(100),cast(ROUND((cast(ship.volfinish as int) * data.price)/1000,2) as decimal(5,2)))
FROM
projects proj
INNER JOIN
projdata data ON proj.controlno = data.controlno
INNER JOIN
shipment ship ON data.ctrlno = ship.dctrlno
WHERE
proj.refno IN ('item1', 'item2','item3')
ORDER BY
proj.refno
with this output:
Reference No. NNNN TotalCost
GR-NFS52 abc123 StudentsTitle123 (NNNN: xxxxxxxxxxxxx) $215.45
GR-PFS53 def456 StudentsTitle456 (NNNN: xxxxxxxxxxxxx) $259.55
GR-SSFS43 ghi789 StudentsTitle789 (NNNN: xxxxxxxxxxxxx) $242.35
How can I make the NNNN column used the substring function with this output. Cause I'm not into t-sql.
NNNN
xxxxxxxxxxxxx
xxxxxxxxxxxxx
xxxxxxxxxxxxx
Assuming you have pattern like NNNN: xxxxxxxxxxx) in your strings you can extract this number using some simple manipulation over the string value using charindex and substring:
declare #str nvarchar(max)
select #str = 'Students (NNNN: 9781410314291)'
select substring(#str,
charindex('ISBN:', #str) + 6,
charindex(')', #str, charindex('NNNN:', #str)) - charindex('NNNN:', #str) - 6)
Here we first find position of NNNN: substring, then position of first occurence of closing bracket ) after this substing and taking part of string between these positions - it is exactly number you need.
In your particular case you can use outer apply in select query in order to make it more readable by avoiding multiple copy-pasting the same charindex('NNNN:', proj.projname) expression:
select
proj.refno [Reference No.],
substring(proj.projname,
CALC.pos_from,
charindex(')', proj.projname, CALC.pos_from) - CALC.pos_from - 6) as [NNNN],
....
FROM projects proj
.....
outer apply (select charindex('NNNN:', proj.projname) as pos_from) as CALC
Try this:
DECLARE #str nvarchar(max) = 'Novels for Students, vol. 52 (ISBN: 9781410314291)'
SELECT
REPLACE(STUFF(#str, 1, PATINDEX('% '+REPLICATE('[0-9]', 13) + '%', #str), ''), ')', '')
Result:
9781410314291