Shortening Replace statement with RexEX? - sql

Below is the SQL Query I am using in order to get some information. Within this information is an XML column. I am wanting to read this XML and parse out the needed ID inside the <> brackets. This query below does do that but I am looking for a cleaner way of doing it [if it exists]:
SELECT
tblAT.*,
tblA.*,
tblEM.[Custom] AS fullXML,
REPLACE(
REPLACE(
CONVERT(
VARCHAR(MAX),
tblEM.[Custom].query('/Ind/ABC')
)
, '<ABC>'
, ''
)
,'</ABC>'
,''
) AS ABC
FROM
ATable AS tblA
JOIN
LLink AS tblL
ON tblL.A_AID = tblA.AID
JOIN
AssetsT AS tblAT
ON tblAT.AID = tblL.BAID
JOIN
ExternalMetadata AS tblEM
ON tblEM.AID = tblA.AID
WHERE
tblAT.ATID = 12
AND
tblA.AID = 30610
AND
tblA.CreatedDate > '2021-05-11 08:58:00'
The XML strutor looks like this:
<Ind>
<ABC>some value here</ABC>
</Ind>
The part:
REPLACE(
REPLACE(
CONVERT(
VARCHAR(MAX),
tblEM.[Custom].query('/Individual/ABC')
)
, '<ABC>'
, ''
)
,'</ABC>'
,''
) AS ABC
is what I am wanting to replace with perhaps a simpler type of removing the <> from the beginning and the end of the XML.
I was hoping to be able to do a type of regex replace using /<[^>]*>/g in order to lessen the query length.
I am using SQL version 13.0.5103.6.
So is there any way of cleaning up the replace query area?

No need for any RegEx and/or multiple REPLACE() calls.
XML date type could be easily handled by the XQuery.
Check it out
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, [Custom] XML);
INSERT INTO #tbl ([Custom]) VALUES
(N'<Ind>
<ABC>some value here</ABC>
</Ind>');
-- DDL and sample data population, end
SELECT *
, [Custom].value('(/Ind/ABC/text())[1]', 'varchar(30)') AS result
FROM #tbl;
Output
+----+---------------------------------------+-----------------+
| ID | Custom | result |
+----+---------------------------------------+-----------------+
| 1 | <Ind><ABC>some value here</ABC></Ind> | some value here |
+----+---------------------------------------+-----------------+

Related

Extract string using SQL Server 2012

I have a string in the form of
<div>#FIRST#12345#</div>
How do I extract the number part from this string using T-SQL in SQL Server 2012? Note the number has variable length
Using just t-sql string functions you can try:
create table t(col varchar(50))
insert into t select '<div>#FIRST#12345#</div>'
insert into t select '<div>#THIRD#543#</div>'
insert into t select '<div>#SECOND#3690123#</div>'
select col,
case when p1.v=0 or p2.v <= p1.v then ''
else Substring(col, p1.v, p2.v-p1.v)
end ExtractedNumber
from t
cross apply(values(CharIndex('#',col,7) + 1))p1(v)
cross apply(values(CharIndex('#',col, p1.v + 1)))p2(v)
Output:
Caveat, this doesn't handle any "edge" cases and assumes data is as described.
Shooting from the hip due to a missing minimal reproducible example.
Assuming that it is XML data type column.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO #tbl (xmldata) VALUES
('<div>#FIRST#12345#</div>'),
('<div>#FIRST#770770#</div>');
-- DDL and sample data population, end
SELECT t.*
, LEFT(x, CHARINDEX('#', x) - 1) AS Result
FROM #tbl t
CROSS APPLY xmldata.nodes('/div/text()') AS t1(c)
CROSS APPLY (SELECT REPLACE(c.value('.', 'VARCHAR(100)'), '#FIRST#' ,'')) AS t2(x);
Output
+----+---------------------------+--------+
| ID | xmldata | Result |
+----+---------------------------+--------+
| 1 | <div>#FIRST#12345#</div> | 12345 |
| 2 | <div>#FIRST#770770#</div> | 770770 |
+----+---------------------------+--------+

Replace a specific character with blank

How can I replace 'a' to blank?
`Name` `ID`
----------------------------------
`b,c,d,e,abb,a` `1`
`b,c,d,a,e,abb` `2`
`a,b,c,d,a,e,abb` `3`
One way to do it would be to add a , to the beginning and end of each Name, then replace every occurence of ',a,' with ',', then trim the result of the ,:
update table_name
set Name = trim(',' from replace(concat(',', Name, ','), ',a,', ','));
Fiddle
Or if you just want to do a select without changing the rows:
select trim(',' from replace(concat(',', Name, ','), ',a,', ',')) as Name, ID
from table_name;
To address #Iptr's comment, if there can be consecutive a such as a, a, ..., you could use STRING_SPLIT to get rows from comma-separated values, then filter out where the value is a, then STRING_AGG and group by to get the comma separated values back:
select ID, STRING_AGG(u.Value, ',') as Name
from table_name
cross apply STRING_SPLIT (Name, ',') u
where Value <> 'a'
group by ID
Fiddle
Here is a solution based on tokenization via XML/XQuery.
It will work starting from SQL Server 2012 onwards.
Steps:
We are tokenizing a string of tokens via XML.
XQuery FLWOR expression is filtering out the 'a' token.
Reverting it back to a string of tokens.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(1000));
INSERT INTO #tbl (tokens) VALUES
('b,c,d,e,abb,a'),
('b,c,d,a,e,abb'),
('a,b,c,d,a,e,abb');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ',';
SELECT t.*
, REPLACE(c.query('
for $x in /root/r/text()
return if ($x = "a") then ()
else data($x)
').value('.', 'VARCHAR(MAX)'), SPACE(1), #separator) AS Result
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Output
+----+-----------------+-------------+
| ID | tokens | Result |
+----+-----------------+-------------+
| 1 | b,c,d,e,abb,a | b,c,d,e,abb |
| 2 | b,c,d,a,e,abb | b,c,d,e,abb |
| 3 | a,b,c,d,a,e,abb | b,c,d,e,abb |
+----+-----------------+-------------+
Try as follow:
select Replace(name, N'a', N'') as RepName , ID from yourTable
Try this.
SELECT ID,Name, REPLACE(Name, 'a', ' ')
FROM tableName;

How to SELECT string between second and third instance of ",,"?

I am trying to get string between second and third instance of ",," using SQL SELECT.
Apparently functions substring and charindex are useful, and I have tried them but the problem is that I need the string between those specific ",,"s and the length of the strings between them can change.
Can't find working example anywhere.
Here is an example:
Table: test
Column: Column1
Row1: cat1,,cat2,,cat3,,cat4,,cat5
Row2: dogger1,,dogger2,,dogger3,,dogger4,,dogger5
Result: cat3dogger3
Here is my closest attempt, it works if the strings are same length every time, but they aren't:
SELECT SUBSTRING(column1,LEN(LEFT(column1,CHARINDEX(',,', column1,12)+2)),LEN(column1) - LEN(LEFT(column1,CHARINDEX(',,', column1,20)+2)) - LEN(RIGHT(column1,CHARINDEX(',,', (REVERSE(column1)))))) AS column1
FROM testi
Just repeat sub-string 3 times, each time moving onto the next ",," e.g.
select
-- Substring till the third ',,'
substring(z.col1, 1, patindex('%,,%',z.col1)-1)
from (values ('cat1,,cat2,,cat3,,cat4,,cat5'),('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')) x (col1)
-- Substring from the first ',,'
cross apply (values (substring(x.col1,patindex('%,,%',x.col1)+2,len(x.col1)))) y (col1)
-- Substring from the second ',,'
cross apply (values (substring(y.col1,patindex('%,,%',y.col1)+2,len(y.col1)))) z (col1);
And just to reiterate, this is a terrible way to store data, so the best solution is to store it properly.
Here is an alternative solution using charindex. The base idea is the same as in Dale K's an answer, but instead of cutting the string, we specify the start_location for the search by using the third, optional parameter, of charindex. This way, we get the location of each separator, and could slip each value off from the main string.
declare #vtest table (column1 varchar(200))
insert into #vtest ( column1 ) values('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')
insert into #vtest ( column1 ) values('cat1,,cat2,,cat3,,cat4,,cat5')
declare #separetor char(2) = ',,'
select
t.column1
, FI.FirstInstance
, SI.SecondInstance
, TI.ThirdInstance
, iif(TI.ThirdInstance is not null, substring(t.column1, SI.SecondInstance + 2, TI.ThirdInstance - SI.SecondInstance - 2), null)
from
#vtest t
cross apply (select nullif(charindex(#separetor, t.column1), 0) FirstInstance) FI
cross apply (select nullif(charindex(#separetor, t.column1, FI.FirstInstance + 2), 0) SecondInstance) SI
cross apply (select nullif(charindex(#separetor, t.column1, SI.SecondInstance + 2), 0) ThirdInstance) TI
For transparency, I saved the separator string in a variable.
By default the charindex returns 0 if the search string is not present, so I overwrite it with the value null, by using nullif
IMHO, SQL Server 2016 and its JSON support in the best option here.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tokens VARCHAR(500));
INSERT INTO #tbl VALUES
('cat1,,cat2,,cat3,,cat4,,cat5'),
('dogger1,,dogger2,,dogger3,,dogger4,,dogger5');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, '["' + REPLACE(Tokens
, ',,', '","')
+ '"]' AS jsondata
FROM #tbl
)
SELECT rs.ID, rs.Tokens
, JSON_VALUE(jsondata, '$[2]') AS ThirdToken
FROM rs;
Output
+----+---------------------------------------------+------------+
| ID | Tokens | ThirdToken |
+----+---------------------------------------------+------------+
| 1 | cat1,,cat2,,cat3,,cat4,,cat5 | cat3 |
| 2 | dogger1,,dogger2,,dogger3,,dogger4,,dogger5 | dogger3 |
+----+---------------------------------------------+------------+
It´s the same as #"Yitzhak Khabinsky" but i think it looks clearer
WITH CTE_Data
AS(
SELECT 'cat1,,cat2,,cat3,,cat4,,cat5' AS [String]
UNION
SELECT 'dogger1,,dogger2,,dogger3,,dogger4,,dogger5' AS [String]
)
SELECT
A.[String]
,Value3 = JSON_VALUE('["'+ REPLACE(A.[String], ',,', '","') + '"]', '$[2]')
FROM CTE_Data AS A

String_Split on delimiter '.' SQL Server

I have an issue when parsing out a particular field of data, and I'm at a block on how to solve it, so I'm hoping I can gain some insight on how to solve it.
I have a field being brought [ItemCategory] that contains instances like...
Instance: TennisShoes.Laces
Instance: HikingBoot-Dr.Marten.Laces
(I cannot change the delimiter from '.' to '|' as I don't control the source)
the code being used to separate the instances is as follows:
SELECT
[Program] = LTRIM(RTRIM(LEFT(c.[ItemCategory], CHARINDEX('.',c.[ItemCategory] + '.') - 1)))
,[Category] = LTRIM(RTRIM(RIGHT(c.[ItemCategory],LEN(c.[ItemCategory]) - CHARINDEX('.',c.[ItemCategory]))))
So my issue when the DHikingBoot-Dr.Marten.Laces instance passes through the code it becomes.
[Program] = HikingBoot-Dr
[Category] = Marten.Laces
How would I make it to ignore the first '.' and delimit on the second '.', while still maintaining correctness for the first instance.
Thank you for your time.. any advice is helpful.
Give this one a try for grabbing the end.
RIGHT(c.[ItemCategory], CHARINDEX(REVERSE('.'), REVERSE(c.[ItemCategory])) -1)
I would suggest revisiting how you are storing this data, if you can, as it is flawed and will continue to give you challenges.
But that aside, this solution assumes "Category" will not include a period and the data will always end with .category
A few tweaks to what you had started, we'll use REVERSE() to basically determine the length of "Category" when using LEFT(). Then when we do "Program" we subtract that from the total length when using the RIGHT()
DECLARE #testdata TABLE
(
[sampledata] NVARCHAR(100)
);
INSERT INTO #testdata (
[sampledata]
)
VALUES ( N'TennisShoes.Laces' )
, ( 'HikingBoot-Dr.Marten.Laces' );
SELECT LEFT([sampledata], LEN([sampledata]) - CHARINDEX('.', REVERSE([sampledata]))) AS [Program]
,RIGHT([sampledata], CHARINDEX('.', REVERSE([sampledata])) -1) AS [Category]
FROM #testdata;
You can also use SUBTRING() along with REVERSE()
For category, reverse the data, find the first period, parse the
value and reverse it back.
For Program, reverse the data, go 1 past the first period to the end
and reverse it back.
DECLARE #testdata TABLE
(
[sampledata] NVARCHAR(100)
);
INSERT INTO #testdata (
[sampledata]
)
VALUES ( N'TennisShoes.Laces' )
, ( 'HikingBoot-Dr.Marten.Laces' );
SELECT REVERSE(SUBSTRING(REVERSE([sampledata]), CHARINDEX('.', REVERSE([sampledata])) + 1, LEN([sampledata]))) AS [Program]
, REVERSE(SUBSTRING(REVERSE([sampledata]), 1, CHARINDEX('.', REVERSE([sampledata])) - 1)) AS [Category]
FROM #testdata;
Both giving you results of:
Program Category
--------------------- ----------
TennisShoes Laces
HikingBoot-Dr.Marten Laces
If you need to select only last part after the ., then you can reverse the string, find charindex and do left and right with that position:
with s as (
select 'TennisShoes.Laces' as inst union
select 'HikingBoot-Dr.Marten.Laces' union
select 'Test'
)
, pos as (
select
s.*,
charindex('.', reverse(inst)) as pos
from s
)
select
ltrim(rtrim(left(inst, len(inst) - pos))) as program,
ltrim(rtrim(right(inst, nullif(pos - 1, -1)))) as category
from pos
program | category
:------------------- | :-------
HikingBoot-Dr.Marten | Laces
TennisShoes | Laces
Test | null
db<>fiddle

Can I replace substrings in a formula stored in a string in SQL?

I need to replace values within a formula stored as a string in SQL.
Example formulas stored in a column:
'=AA+BB/DC'
'=-(AA+CC)'
'=AA/BB+DD'
I have values for AA, BB etc. stored in another table.
Can I find and replace 'AA', 'BB' and so forth from within the formulas with numeric values to actually calculate the formula?
I assume I also need to replace the arithmetic operators ('+' , '/') from strings to actual signs, and if so is there a way to do it?
Desired Result
Assuming: AA = 10, BB = 20, DC = 5
I would need
'=AA+BB/DC' converted to 10+20/5 and a final output of 14
Please note that formulas can change in the future so I would need something resilient to that.
Thank you!
Okay, so this is a real hack, but I was intrigued by your question. You could turn my example into a function and then refactor it to your specific needs.
Note: using TRANSLATE requires SQL Server 2017. This could be a deal-breaker for you right there. TRANSLATE simplifies the replacement process greatly.
This example is just that--an example. A hack. Performance issues are unknown. You still need to do your diligence with testing.
-- Create a mock-up of the values table/data.
DECLARE #Values TABLE ( [key] VARCHAR(2), [val] INT );
INSERT INTO #Values ( [key], [val] ) VALUES
( 'AA', 10 ), ( 'BB', 20 ), ( 'CC', 6 ), ( 'DC', 5 );
-- Variable passed in to function.
DECLARE #formula VARCHAR(255) = '=(AA+BB)/DC';
-- Remove unnecessary mathmatical characters from the formula values.
DECLARE #vals VARCHAR(255) = REPLACE ( TRANSLATE ( #formula, '=()', '___' ), '_', '' );
-- Remove any leading mathmatical operations from #vals.
WHILE PATINDEX ( '[A-Z]', LEFT ( #vals, 1 ) ) = 0
SET #vals = SUBSTRING ( #vals, 2, LEN ( #vals ) );
-- Use SQL hack to replace placeholder values with actual values...
SELECT #formula = REPLACE ( #formula, fx.key_val, v.val )
FROM (
SELECT
[value] AS [key_val],
ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) AS [key_id]
FROM STRING_SPLIT ( TRANSLATE ( #vals, '+/*-', ',,,,' ), ',' )
) AS fx
INNER JOIN #Values v
ON Fx.[key_val] = v.[key]
ORDER BY
fx.[key_id]
-- Return updated formula.
SELECT #formula AS RevisedFormula;
-- Return the result (remove the equals sign).
SET #formula = FORMATMESSAGE ( 'SELECT %s AS FormulaResult;', REPLACE ( #formula, '=', '' ) );
EXEC ( #formula );
SELECT #formula AS RevisedFormula; returns:
+----------------+
| RevisedFormula |
+----------------+
| =(10+20)/5 |
+----------------+
The last part of my example uses EXEC to do the math. You cannot use EXEC in a function.
-- Return the result (remove the equals sign).
SET #formula = FORMATMESSAGE ( 'SELECT %s AS FormulaResult;', REPLACE ( #formula, '=', '' ) );
EXEC ( #formula );
Returns
+---------------+
| FormulaResult |
+---------------+
| 6 |
+---------------+
Changing the formula value to =-(AA+CC) returns:
+----------------+
| RevisedFormula |
+----------------+
| =-(10+6) |
+----------------+
+---------------+
| FormulaResult |
+---------------+
| -16 |
+---------------+
It's probably worth noting to pay attention to math order in your formulas. Your original example of =AA+BB/DC returns 14, not the 6 that may have been expected. I updated your formula to =(AA+BB)/DC for my example.