Extracting from text using tsql

Extracting from text using tsql - sql

I have the following string format in a Sql table column
[CID]: 267 [MID]: 319A [Name]: RJR
How can I extract only the value of MID which is 319A in select query so I can use the MID in a join.
In other words I need to extract the MID value from this text field to use it in a join. I copy/pasted the value and it looks like there are /n (new line) characters after each value.
Thanks in advance

you may try this one.
declare
#t varchar(100)
set #t = '[CID]: 267 [MID]: 319A [Name]: RJR';
select ltrim(rtrim(substring(#t,charindex('[MID]:',#t)+6,(charindex('[NAME]',#t))-(charindex('[MID]:',#t)+6))))
---------------------------------------------------------
319A
ltrim and rtrim will trim your 319A value.
you can try without them at start if you like.
Cheers

http://www.simple-talk.com/sql/t-sql-programming/tsql-regular-expression-workbench/
to add regex support to sql server
Rubular Regex to get you started:
\[MID\]: (.*) \[Name]:

Not clean at ALL, but if you need it in SQL, here you go:
Use
SUBSTRING ( value_expression , start_expression , length_expression )
and
LOCATE( string1, string2 [, start] )
together:
SUBSTRING(INPUT,
((SELECT LOCATE( 'MID]: ', INPUT ))+6),
((SELECT LOCATE( '[Name]', INPUT )) - ((SELECT LOCATE( 'MID]: ', INPUT ))+6))
depending where is taking place? If it is in a batch process, I would export those fields with an ID, write a perl one liner that extracts them, and then load them back to the db. it would be so much faster than using these functions.
if it is screen event, then I suggest breaking them into 3 columns instead, itll actully save you space.

Don't really think you need all these trimming and substring-ing functions.
USE tempdb;
GO
CREATE TABLE #t1
(
a INT,
b VARCHAR(64)
);
INSERT #t1 SELECT 1, '[CID]: 267 [MID]: 319A [Name]: RJR'
UNION ALL SELECT 2, '[CID]: 26232 [MID]: 229dd5A [Name]: RJ'
UNION ALL SELECT 3, 'Garbage that will not match';
CREATE TABLE #t2
(
c INT,
d VARCHAR(32)
);
INSERT #t2 SELECT 4, '319A'
UNION ALL SELECT 5, '229dd5A'
UNION ALL SELECT 6, 'NO MATCH';
SELECT t1.a, t1.b, t2.c, t2.d
FROM #t1 AS t1
INNER JOIN #t2 AS t2
ON t1.b LIKE '%`[MID`]: ' + t2.d + ' %' ESCAPE '`'
GO
DROP TABLE #t1, #t2;
If you have no idea how many spaces might be between [MID]: and the start of your value, or the end of your value and the start of the next [, and assuming there are no spaces in the values you want to match, you could use:
ON REPLACE(t1.b, ' ', '') LIKE '%`[MID`]:' + t2.d + '`[%' ESCAPE '`'

Related

Can SQL STRING_SPLIT use two (or more) separators?

I have a list of drug names that are stored in various upper and lower case combinations. I need to capitalize the first letter of each word in a string, excluding certain words. The string is separated by spaces, but can also be separated by a forward slash.
The following code works:
create table #exclusionlist (word varchar(25))
create table #drugnames (drugname varchar(50))
insert into #exclusionlist values ('ER')
insert into #exclusionlist values ('HCL')
insert into #drugnames values ('DRUGNAME ER')
insert into #drugnames values ('drugname hcl')
insert into #drugnames values ('ONEDRUG/OTHERDRUG')
select 'Product Name' = drugname
, 'Product Name 2' = STUFF((SELECT ' ' +
case when value in (select word from #exclusionlist) then upper(value)
else upper(left(value, 1)) + lower(substring(value, 2, len(value))) end
from STRING_SPLIT(drugname, ' ')
FOR XML PATH('')) ,1,1,'')
from #drugnames
The output looks like this:
Drugname ER
Drugname HCL
Onedrug/otherdrug
How can I get that last one to look like this:
Onedrug/Otherdrug
I did try STRING_SPLIT(replace(drugname, '/', ' '), ' ') but obviously replaces the slash with a space. And if the slash is at the end of the string like ONEDRUG/OTHERDRUG/ then the result looks like Onedrug Otherdrug
It's possible that the string may end in a forward slash due to the field only holding N number of characters. When data gets inserted into the table, only the first N characters of the drug name are inserted. If that Nth character is a slash, the string will end in a slash.

You can use a CASE expression for the separator parameter to the STRING_SPLIT function.
In the code below, a common table expression (CTE) uses STRING_SPLIT to split out all the drugname words and capitalize the first letter of each word as appropriate.
The CTE results are unioned together using STRING_AGG to join the drugnames back together. Note that separator parameter for STRING_AGG cannot be an expression. Using a CASE expression results in this error:
Msg 8733, Level 16, State 1, Line 14 Separator parameter for
STRING_AGG must be a string literal or variable.
For SQL Server, you would need to be on SQL 2017 or greater for the STRING_AGG function. (I added an id/identity column to the #drugnames temp table to assist with grouping.)
DROP TABLE IF EXISTS #exclusionlist;
DROP TABLE IF EXISTS #drugnames;
CREATE TABLE #exclusionlist (word VARCHAR(25))
CREATE TABLE #drugnames (id INT IDENTITY, drugname VARCHAR(50))
INSERT INTO #exclusionlist VALUES ('ER')
INSERT INTO #exclusionlist VALUES ('HCL')
INSERT INTO #drugnames VALUES ('DRUGNAME ER')
INSERT INTO #drugnames VALUES ('drugname hcl')
INSERT INTO #drugnames VALUES ('ONEDRUG/OTHERDRUG')
;WITH SomeStuff AS
(
select d.id, d.drugname,
sp.value AS SplitValue, el.word AS ExclusionListSpaceWord,
COALESCE(el.word,
UPPER(LEFT(sp.value, 1)) + LOWER(RIGHT(sp.value, LEN(sp.value) - 1))
) AS CapitalizedWord
from #drugnames d
CROSS APPLY STRING_SPLIT(d.drugname, CASE WHEN d.drugname LIKE '%/%' THEN '/' ELSE ' ' END ) sp
LEFT JOIN #exclusionlist el
ON el.word = sp.value
)
SELECT ss.id, STRING_AGG(ss.CapitalizedWord, '/') AS ReconstructedDrugname
FROM SomeStuff ss
WHERE ss.drugname LIKE '%/%'
GROUP BY ss.id
UNION
SELECT ss.id, STRING_AGG(ss.CapitalizedWord, ' ') AS ReconstructedDrugname
FROM SomeStuff ss
WHERE ss.drugname LIKE '% %'
GROUP BY ss.id
Output:
id ReconstructedDrugname
----------- ----------------------
1 Drugname ER
2 Drugname HCL
3 Onedrug/Otherdrug

If (hopefully) using SQL 2017+ you can somewhat compact the logic. first a single string split by using translate to create a common separator. THen apply the exclude word criteria forllowed by the upper-case criteria, then re-aggregate, noting which separator to use.
select *
from #drugnames
outer apply (
select case
when max(sep)=' ' then String_Agg(word,' ')
else String_Agg(word,'/') end NewName
from (
select
case
when exists (select * from #exclusionlist x where x.word = value)
then Upper(value)
else Stuff(Lower(value),1,1,Upper(Left(value,1)))
end word, Iif(drugname like '%/%','/',' ') sep
from String_Split(Translate(drugname,' /','**'),'*')
)w
)new;
Example DB<>Fiddle
Output:
Note - using string_split does not, according to the documentation, guarantee the ordering of the values. In practice, I've never seen this be the case and since you're already using the function I'm using here also. There's plenty of ways to split the string while retaining an ordering (using json for example) should it ever prove necessary.
If you are still on 2016 then the string_agg can just be replaced with the for xml implementation you're already using.

If the "/" is replaced to a " /", then the split can still happen on the space.
And the extra space can be removed afterwards.
SELECT
[Product Name] = d.drugname
, [Product Name 2] = ca.drugname2
FROM #drugnames d
CROSS APPLY (
SELECT
REPLACE(LTRIM(x.value('(./text())[1]','VARCHAR(MAX)')),' /','/') AS drugname2
FROM
(
SELECT ' '+
CASE
WHEN e.word IS NOT NULL THEN e.word
WHEN s.value LIKE '/%'
THEN STUFF(LOWER(s.value),1,2,UPPER(LEFT(s.value,2)))
ELSE STUFF(LOWER(s.value),1,1,UPPER(LEFT(s.value,1)))
END
FROM STRING_SPLIT(REPLACE(d.drugname,'/',' /'),' ') s
LEFT JOIN #exclusionlist e
ON e.word = s.value
FOR XML PATH(''), TYPE
) q(x)
) ca;
Product Name
Product Name 2
DRUGNAME ER
Drugname ER
drugname hcl
Drugname HCL
ONEDRUG/OTHERDRUG
Onedrug/Otherdrug
Test on db<>fiddle here

How to SELECT string between second and third instance of ",,"?

I am trying to get string between second and third instance of ",," using SQL SELECT.
Apparently functions substring and charindex are useful, and I have tried them but the problem is that I need the string between those specific ",,"s and the length of the strings between them can change.
Can't find working example anywhere.
Here is an example:
Table: test
Column: Column1
Row1: cat1,,cat2,,cat3,,cat4,,cat5
Row2: dogger1,,dogger2,,dogger3,,dogger4,,dogger5
Result: cat3dogger3
Here is my closest attempt, it works if the strings are same length every time, but they aren't:
SELECT SUBSTRING(column1,LEN(LEFT(column1,CHARINDEX(',,', column1,12)+2)),LEN(column1) - LEN(LEFT(column1,CHARINDEX(',,', column1,20)+2)) - LEN(RIGHT(column1,CHARINDEX(',,', (REVERSE(column1)))))) AS column1
FROM testi

Just repeat sub-string 3 times, each time moving onto the next ",," e.g.
select
-- Substring till the third ',,'
substring(z.col1, 1, patindex('%,,%',z.col1)-1)
from (values ('cat1,,cat2,,cat3,,cat4,,cat5'),('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')) x (col1)
-- Substring from the first ',,'
cross apply (values (substring(x.col1,patindex('%,,%',x.col1)+2,len(x.col1)))) y (col1)
-- Substring from the second ',,'
cross apply (values (substring(y.col1,patindex('%,,%',y.col1)+2,len(y.col1)))) z (col1);
And just to reiterate, this is a terrible way to store data, so the best solution is to store it properly.

Here is an alternative solution using charindex. The base idea is the same as in Dale K's an answer, but instead of cutting the string, we specify the start_location for the search by using the third, optional parameter, of charindex. This way, we get the location of each separator, and could slip each value off from the main string.
declare #vtest table (column1 varchar(200))
insert into #vtest ( column1 ) values('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')
insert into #vtest ( column1 ) values('cat1,,cat2,,cat3,,cat4,,cat5')
declare #separetor char(2) = ',,'
select
t.column1
, FI.FirstInstance
, SI.SecondInstance
, TI.ThirdInstance
, iif(TI.ThirdInstance is not null, substring(t.column1, SI.SecondInstance + 2, TI.ThirdInstance - SI.SecondInstance - 2), null)
from
#vtest t
cross apply (select nullif(charindex(#separetor, t.column1), 0) FirstInstance) FI
cross apply (select nullif(charindex(#separetor, t.column1, FI.FirstInstance + 2), 0) SecondInstance) SI
cross apply (select nullif(charindex(#separetor, t.column1, SI.SecondInstance + 2), 0) ThirdInstance) TI
For transparency, I saved the separator string in a variable.
By default the charindex returns 0 if the search string is not present, so I overwrite it with the value null, by using nullif

IMHO, SQL Server 2016 and its JSON support in the best option here.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tokens VARCHAR(500));
INSERT INTO #tbl VALUES
('cat1,,cat2,,cat3,,cat4,,cat5'),
('dogger1,,dogger2,,dogger3,,dogger4,,dogger5');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, '["' + REPLACE(Tokens
, ',,', '","')
+ '"]' AS jsondata
FROM #tbl
)
SELECT rs.ID, rs.Tokens
, JSON_VALUE(jsondata, '$[2]') AS ThirdToken
FROM rs;
Output
+----+---------------------------------------------+------------+
| ID | Tokens | ThirdToken |
+----+---------------------------------------------+------------+
| 1 | cat1,,cat2,,cat3,,cat4,,cat5 | cat3 |
| 2 | dogger1,,dogger2,,dogger3,,dogger4,,dogger5 | dogger3 |
+----+---------------------------------------------+------------+

It´s the same as #"Yitzhak Khabinsky" but i think it looks clearer
WITH CTE_Data
AS(
SELECT 'cat1,,cat2,,cat3,,cat4,,cat5' AS [String]
UNION
SELECT 'dogger1,,dogger2,,dogger3,,dogger4,,dogger5' AS [String]
)
SELECT
A.[String]
,Value3 = JSON_VALUE('["'+ REPLACE(A.[String], ',,', '","') + '"]', '$[2]')
FROM CTE_Data AS A

Remove Characters in a String in SQL

I have a column u_manualdoc which contains the values are like this CGY DR# 7405. I want to remove the CGY DR#.
Here's the code:
select u_manualdoc, cardcode, cardname from ODLN
I want only the 7405 number. Thanks!

Try this:
--sample data you provided in comments
declare #tbl table(codes varchar(20))
insert into #tbl values
('CGY PST - 58277') , ('CGY RMC PST # 58083'), ('CGY DR # 7443'), ('CSI # 1304'), ('PO# 0568 , 0570'), ('CGY DR# 7446')
--actual query that you can apply to your table
select SUBSTRING(codes, PATINDEX('%[0-9]%', codes), len(codes)) from #tbl
The key point here is to use patindex, which searches for a pattern and returns index where such pattern occur. I specified %[0-9]% which means that we search for any digit - it will return first occurrence of a digit. Now- since this would be our starting point to substring, we pass it to such function. Third parameter of substring is length. Since we want the rest of a string, len function makes sure that we get that :)
Applying to your naming:
select SUBSTRING(u_manualdoc, PATINDEX('%[0-9]%', u_manualdoc), len(u_manualdoc)),
cardcode,
cardname
from ODLN

You should use string functions charindex,len and substring to get it.
See the code below.
select SUBSTRING(u_manualdoc,CHARINDEX('#',u_manualdoc)+1,LEN(u_manualdoc)- CHARINDEX('#',u_manualdoc))

EDIT
In addition to the other answers, you can use this simple method:
select
substring(
u_manualdoc,
len(u_manualdoc) - patindex('%[^0-9]%', reverse(u_manualdoc)) + 2,
len(u_manualdoc)
),
cardcode, cardname
from ODLN
In this example, patindex finds the first non-digit (as specified by ^[0-9]) from the right side of the string, and then uses that as the starting point of the substring.
This will work on all of your sample strings (including 'PO# 0568 , 0570 CGY DR# 7446').
Or use SQL Server Regex, which lets you use more powerful regular expressions within your queries.

TRY THIS
DECLARE #table TABLE(DirtyCol VARCHAR(100));
INSERT INTO #table
VALUES('AB ABCDE # 123'), ('ABCDE# 123'), ('AB: ABC# 123 AB: ABC# 123'), ('AB#'), ('AB # 1 000 000'), ('AB # 1`234`567'), ('AB # (9)(876)(543)');
WITH tally
AS (
SELECT TOP (100) N = ROW_NUMBER() OVER(ORDER BY ##spid)
FROM sys.all_columns),
data
AS (
SELECT DirtyCol,
Col
FROM #table
CROSS APPLY
(
SELECT
(
SELECT C+''
FROM
(
SELECT N,
SUBSTRING(DirtyCol, N, 1) C
FROM tally
WHERE N <= DATALENGTH(DirtyCol)
) [1]
WHERE C BETWEEN '0' AND '9'
ORDER BY N FOR XML PATH('')
)
) p(Col)
WHERE p.Col IS NOT NULL)
SELECT DirtyCol,
CAST(Col AS INT) IntCol
FROM data;

Dynamic Comma Seperated string into different column

May someone please help me for this strange scenario. i have a data as given below.
DECLARE #TABLE TABLE
(
ID INT,
PHONE001 VARCHAR(500)
)
INSERT TEST
SELECT 1,'01323840261,01323844711' UNION ALL
SELECT 2,'' UNION ALL
SELECT 3,',01476862000' UNION ALL
SELECT 4,'01233625418,1223822583,125985' UNION ALL
SELECT 5,'2089840022,9.99021E+13'
and i am trying to put in seperate column for each comma value. the max number of column depends on the largest comma seperated string.
Expected Output
1|01323840261|01323844711|''
2|''|''|''
3|01476862000|''|''|
4|01233625418|1223822583|125985|
5|2089840022|9.99021E+13|''|

try
select id,T.c.value('t[1]','varchar(50)') as col1,
T.c.value('t[2]','varchar(50)') as col2 ,
T.c.value('t[3]','varchar(50)') as col3 from
(select id,cast ('<t>'+ replace(PHONE001,',','</t><t>') +'</t>'
as xml) x
from #TABLE) a cross apply x.nodes('.') t(c)

case statement to delete extra spaces only when there is some

I replace all blanks with # using this
SELECT *, REPLACE(NAME,' ','#') AS NAME2
which results miss#test#blogs############## (different number of #s dependent on length of name!
I then delete all # signs after the name using this
select *, substring(Name2,0,charindex('##',Name2)) as name3
which then gives my desired results of, for example MISS#test#blogs
However some wheren't giving this result, they are null. This is because annoyingly some rows in the sheet I have read in dont have the spaces after the name.
is there a case statement i can use so it only deletes # signs after the name if they are there in the first place?
Thanks

The function rtrim can be used to remove trailing spaces. For example:
select replace(rtrim('miss test blogs '),' ','#')
-->
'miss#test#blogs'
Example at SQL Fiddle.

try this:
Declare #t table (name varchar(100),title varchar(100),forename varchar(100))
insert into #t
values('a b c','dasdh dsalkdk asdhl','asd dfg sd')
SELECT REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(name)),' ',' '+CHAR(7)),CHAR(7)+' ','') ,CHAR(7),'') AS Name,
REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(title)),' ',' '+CHAR(7)),CHAR(7)+' ','') ,CHAR(7),'') AS title,
REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(forename)),' ',' '+CHAR(7)),CHAR(7)+' ','') ,CHAR(7),'') AS forename
FROM #t WHERE
(CHARINDEX(' ',NAME) > 0 or CHARINDEX(' ',title) > 0 or CHARINDEX(' ',forename) > 0)
SQL Fiddle Demo

select name2, left(name2,len(name2)+1-patindex('%[^#]%',reverse(name2)+'.'))
from (
SELECT *, REPLACE(NAME,' ','#') AS NAME2
from t
) x;
Check this SQL Fiddle
For posterity, sample table:
create table t (name varchar(100));
insert t select 'name#name#ne###'
union all select '#name#name'
union all select 'name name hi '
union all select 'joe public'
union all select ''
union all select 'joe'
union all select 'joe '
union all select null
union all select ' leading spaces'
union all select ' leading trailing ';

Don't quite understand the question, but if the problem is there is not spaces after some names, can't you do this first:
SELECT *, REPLACE(NAME+' ',' ','#') AS NAME2
i.e., add a space to all names right off the bat?

I had this same problem some days ago.
Well actually, there's a quickly way to subtract the spaces from both the begin and end inside strings. In SQL Server, you can use the RTRIM and LTRIM for this. The first one supresses spaces from right side and the second supresses from left. But, if in your scenario also may exists more than one space in the middle of the string I sugest you take a look on this post on SQL Server Central: http://www.sqlservercentral.com/articles/T-SQL/68378/
There the script's author explain, in details, a good solution for this situation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extracting from text using tsql - sql

http://www.simple-talk.com/sql/t-sql-programming/tsql-regular-expression-workbench/ to add regex support to sql server Rubular Regex to get you started: \[MID\]: (.*) \[Name]:

Related

Can SQL STRING_SPLIT use two (or more) separators?

How to SELECT string between second and third instance of ",,"?

Remove Characters in a String in SQL

Dynamic Comma Seperated string into different column

case statement to delete extra spaces only when there is some

Categories

Resources