How to remove links from text with SQL - sql

I need to clean up a database by removing links from tables. So for column entry like this:
Thank you for the important information<br />Read More Here<br /> This is great.
i need to remove the entire link, so it would end up like this:
Thank you for the important information<br /><br /> This is great.
Is there a way to do this with a single UPDATE statement?
For extra credit, is there a way to remove the HTML semantics from the link, while leaving the content in the text?

Just try to find the starting and ending of the hrefj and replace it with a single space.
declare #StringToFix varchar(500)
set #StringToFix = 'Thank you for the important information<br /><a href="http://www.cnn.com">Read More'
select REPLACE(
#stringtofix
, Substring(#StringToFix
, CHARINDEX('<a href=', #StringToFix) -- Starting Point
-- End Point - Starting Point with 4 more spaces
, CHARINDEX('</a>', #StringToFix)
- CHARINDEX('<a href=', #StringToFix) +4 )
, ' '
) as ResultField

If all the links are done in a very consistent way than you can just use a regex replace of
'\<a href.*?\</a\>'
to an empty string.
I don't have SQL Server instance handy but the query in oracle would look something like:
update table
set col1 = REGEXP_REPLACE(col1,'\<a href.*?\</a\>', '', 1, 0, 'in');

I want share my sql script that remove ahref tag from text but leave anchor text.
Source text:
Visit Google, then Bing
Result text:
Visit Google, then Bing
MS SQL CODE:
declare #str nvarchar(max) = 'Visit Google, then Bing'
declare #aStart int = charindex('<a ', #str)
declare #aStartTagEnd int = charindex('>', #str, #aStart)
DECLARE #result nvarchar(max) = #str;
set #result = replace(#result, '</a>', '')
select #result
WHILE (#aStart > 0 and #aStartTagEnd > 0)
BEGIN
declare #rep1 nvarchar(max) = substring(#result, #aStart, #aStartTagEnd + 1 - #aStart)
set #result = replace(#result, #rep1, '')
set #aStart = charindex('<a ', #result)
set #aStartTagEnd = charindex('>', #result, #aStart)
END
select #result

Related

Replace the multiple values between 2 characters in azure sql

In Azure SQL, I'm attempting to delete any text that is present between the < and > characters to my column in my table
Sample text:
The best part is that. < br >Note:< br >< u> reading
:< /u> < span style="font-family: calibri,sans-serif; font-size: 11pt;"> moral stories from an early age
< b>not only helps your child.< /b>< br>< u>in
learning important: < /u>< /span>< span style="font-family: calibri;
">life lessons but it also helps, in language development.< /span>< ./span>
Output:
The best part is that. reading: moral stories from an early age not only helps your child in learning important: life lessons but it also helps in language development.
I tried below query its working only for small comments text:
SELECT [Comments],REPLACE([Comments], SUBSTRING([Comments], CHARINDEX('<', [Comments]), CHARINDEX('>', [Comments]) - CHARINDEX('<', [Comments]) + 1),'') AS result
FROM table
I have taken input table named check_1 and sample data is inserted into that table.
This query removes only the first occurring pattern.
SELECT [Comments],REPLACE([Comments], SUBSTRING([Comments], CHARINDEX('<', [Comments]), CHARINDEX('>', [Comments]) - CHARINDEX('<', [Comments]) + 1),'') AS result
FROM check_1
In order to remove all string patterns beginning with '<' and ending with '>' in the text, a user defined function with a while loop is created.
CREATE FUNCTION [dbo].[udf_removetags] (#input_text VARCHAR(MAX)) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #pos_1 INT
DECLARE #pos_n INT
DECLARE #Length INT
SET #pos_1 = CHARINDEX('<',#input_text)
SET #pos_n = CHARINDEX('>',#input_text,CHARINDEX('<',#input_text))
SET #Length = (#pos_n - #pos_1) + 1
WHILE #pos_1 > 0 AND #pos_n > 0 AND #Length > 0
BEGIN
SET #input_text = replace(#input_text,substring(#input_text,#pos_1,#Length),'')
SET #pos_1 = CHARINDEX('<',#input_text)
SET #pos_n = CHARINDEX('>',#input_text,CHARINDEX('<',#input_text))
SET #Length = (#pos_n - #pos_1) + 1
END
RETURN #input_text
END
select [dbo].[udf_removetags](comments) as result from check_1
Output String:
The best part is that. Note: reading : moral stories from an early age not only helps your child.in learning important: life lessons but it also helps, in language development.
You can also use Stuff [Refer Microsoft document on STUFF] in place of replace+substring function.
Replace this SET #input_text = replace(#input_text,substring(#input_text,#pos_1,#Length),'')
line with the line
SET #input_text = STUFF(#input_text,#pos_1,#Length,'')
in the user defined function.
Result will be same.
According to https://learn.microsoft.com/../azure/../regexp_replace Azure supports REGEXP_REPLACE.
This means it should be possible to replace all '<...>' by '' via
select regexp_replace(comments, '<[^>]*>', '') from mytable;

sql server concating or replacing, which one is better (faster)

I have to generate a very long procedure every time for a reporting system, so i created a template for my procedure and replacing the parts are needed to, but i could do it with Concat or +(&)
for example:
set #query = '... and (
--#InnerQueries
)'
set #query = replace(#query,'--#InnerQueries',#otherValues)
vs
set #query += ' and exists (...)'
if(#xxx is not null)
set #query += 'and not exists (...)'
with replace approach it's more readable and maintainable for me, but for sake of optimization, what about Concat and attaching string together?
with replace: there are a lot of searching but less string creation
and with concat: lot's of string creation but no searching
so any idea?
I assume you're talking about using CONCAT or REPLACE to build an SQL then run it. If ultimately you'll process fewer than 100 REPLACEments, I'd go with that approach rather than CONCAT because it's more readable.
If however, you're talking about using concat/replace to create report output data and you will e.g. be carrying out 100 REPLACE operations per row on a million rows, I'd do the CONCAT route
update 2:
there could be something missing here:
if i change first variable :#sourceText_Replace
to a max value of 8000 character, and continue to add to it:
set #sourceText_Replace += '8000 character length'
set #sourceText_Replace +=#sourceText_Replace
set #sourceText_Replace +=#sourceText_Replace
set #sourceText_Replace +=#sourceText_Replace
set #sourceText_Replace +=#sourceText_Replace
set #sourceText_Replace +=#sourceText_Replace
set #sourceText_Replace +=#sourceText_Replace
it works fine, even if go up until: 16384017 character length
so any idea here is as good as mine
orginal answer:
to summarize (and if i didnt make any mistakes):
if you are searching in a long text, dont even think about using replace, it took seconds not milliseconds, but for concat obviously does not make any difference
in the blew code, in first try(small text), i just used variables default values and did not append to them,
but for second try(long Text) , i just append result from previous loop run
for long text, i did not bothered to run the loop more than 20 time, because it took over minutes.
smallText: set #destSmallText_Replace =
longText: set #destSmallText_Replace +=
here is the code for test:
SET NOCOUNT ON
drop table if exists #tempReplace
drop table if exists #tempConcat
create table #tempReplace
(
[txt] nvarchar(max) not null
)
create table #tempConcat
(
[txt] nvarchar(max) not null
)
declare #sourceText_Replace nvarchar(max) = 'small1 text to replace #textToBeReplaced after param text'
declare #text_Replace nvarchar(max) = #sourceText_Replace
declare #textToSearch nvarchar(max) = '#textToBeReplaced'
declare #textToReplace nvarchar(max) = 'textToBeReplaced'
declare #concat_Start nvarchar(max) = 'small1 text to replace'
declare #concat_End nvarchar(max) = 'after param text'
declare #text_Concat nvarchar(max) = #concat_Start
declare #whileCounter int =0
declare #maxCounter int = 5
declare #startTime datetime = getdate();
declare #endTime datetime = getdate();
begin
set #startTime = getDate();
while(#whileCounter <=#maxCounter)
begin
--long text
set #text_Replace += replace(#sourceText_Replace,#textToSearch,#textToReplace + convert(nvarchar(10), #whileCounter)) + #textToSearch
--small text
--set #text_Replace = replace(#sourceText_Replace,#textToSearch,#textToReplace + convert(nvarchar(10), #whileCounter)) + #textToSearch
--print #destSmallText_Replace
insert into #tempReplace values(#text_Replace)
set #whileCounter+=1
end
set #endTime = getDate();
print 'passedTime ' + Convert(nvarchar(20), DATEPART(millisecond, #endTime) - DATEPART(millisecond, #startTime))
end
begin
set #whileCounter = 0;
set #startTime = getDate();
while(#whileCounter <=#maxCounter)
begin
set #text_Concat += concat(#concat_Start,#textToReplace + convert(nvarchar(10), #whileCounter),#concat_End) + #textToSearch
--print #sourceSmallText_Concat
insert into #tempConcat values(#text_Concat)
set #whileCounter+=1
end
set #endTime = getDate();
print 'passedTime ' + Convert(nvarchar(20), DATEPART(millisecond, #endTime) - DATEPART(millisecond, #startTime))
end

Sparx EA Heatmap: Combine none or multiple results from a selects subquery into a single comma-separated value

I'm using Sparx EA 14.x with the file based repository, and moving into SQL server based soon. Currently creating some base template level model, to be used later with real customer data with SQL server based repository.
I have created Tagged Values (type=RefGUIDList) for e.g. adding relation into existing Bus.Processes in my data elemets. The list of existing business processes can be selected and their .ea_guid is stored in the tagged value as value.
I have created an HeatMap chart, with attached sql.
The sql works fine if the tagged value has only one business process selected, the problem is that if I add more processes there is no results.
SELECT (SELECT t_object.Name FROM t_object
WHERE t_object.ea_guid = tv.Value) AS Series,
t_object.Alias AS GroupName, Packages.Name
FROM t_object,
t_package RootPackage,
t_package Packages,
t_objectproperties tv
WHERE RootPackage.Name = 'Data elements' AND
Packages.Parent_ID = RootPackage.Package_ID AND
t_object.Package_ID = Packages.Package_ID AND
t_object.Object_ID = tv.Object_ID AND
tv.Property = 'APM:Prosesses'
One solution, that I have been looking, would be to concatenate the listed Bus.processes names and show the result.
I'm aware that the SQL server dialect is different than the current Access based repository.
The problem was that the Sparx t_objectproperties.ea_guid was stored in several times into t_objectproperties.Value and I did need the corresponding t_object.Name as comma concatenated.
In my case worked a solution, where I 1st moved the repository into SQL server based repository and did create a function like below:
CREATE FUNCTION fnSplitString
(
#string NVARCHAR(1000),
#delimiter CHAR(1)
)
RETURNS VARCHAR(1000) AS
BEGIN
DECLARE #csvObjectname VARCHAR(1000)
DECLARE #start INT, #end INT
SELECT #start = 1, #end = CHARINDEX(#delimiter, #string)
WHILE #start < LEN(#string) + 1 BEGIN
IF #end = 0
SET #end = LEN(#string) + 1
SELECT #csvObjectname = COALESCE(#csvObjectname + ', ', '') +
COALESCE(t_Object.Name,'')
FROM t_Object
WHERE t_Object.ea_guid = SUBSTRING(#string, #start, #end - #start)
SET #start = #end + 1
SET #end = CHARINDEX(#delimiter, #string, #start)
END
RETURN #csvObjectname
END

Error Handling for numbers of delimiters when extracting substrings

Situation: I have a column where each cell can have up to 5 delimiters. However, it's possible that there are none.
Objective: How do i handle errors such as :
Invalid length parameter passed to the LEFT or SUBSTRING function.
in the case that it cannot find the specified delimiter.
Query:
declare #text VARCHAR(111) = 'abc-def-geeee-ifjf-zzz'
declare #start1 as int
declare #start2 as int
declare #start3 as int
declare #start4 as int
declare #start_index_reverse as int
set #start1 = CHARINDEX('-',#text,1)
set #start2 = CHARINDEX('-',#text,charindex('-',#text,1)+1)
set #start3 = CHARINDEX('-',#text,charindex('-',#text,CHARINDEX('-',#text,1)+1)+1)
set #start4 = CHARINDEX('-',#text,charindex('-',#text,CHARINDEX('-',#text,CHARINDEX('-',#text,1)+1)+1)+1)
set #start_index_reverse = CHARINDEX('-',REVERSE(#text),1)
select
LEFT(#text,#start1-1) AS Frst,
SUBSTRING(#text,#start1+1,#start2-#start1-1) AS Scnd,
SUBSTRING(#text,#start2+1,#start3-#start2-1) AS Third,
SUBSTRING(#text,#start3+1,#start4-#start3-1)AS Third,
RIGHT(#text,#start_index_reverse-1) AS Lst
In this case my variable includes 5 delimiters and so my query works but if i removed one '-' it would break.
XML support in SQL Server brings about some unintentional but useful tricks. Converting this string to XML allows for some parsing that is far less messy than native string handling, which is very far from awesome.
DECLARE #test varchar(111) = 'abc-def-ghi-jkl-mnop'; -- try also with 'abc-def'
;WITH n(x) AS
(
SELECT CONVERT(xml, '<x>' + REPLACE(#test, '-', '</x><x>') + '</x>')
)
SELECT
Frst = x.value('/x[1]','varchar(111)'),
Scnd = x.value('/x[2]','varchar(111)'),
Thrd = x.value('/x[3]','varchar(111)'),
Frth = x.value('/x[4]','varchar(111)'),
Ffth = x.value('/x[5]','varchar(111)')
FROM n;
For a table it's almost identical:
DECLARE #foo TABLE ( col varchar(111) );
INSERT #foo(col) VALUES('abc-def-ghi-jkl-mnop'),('abc'),('def-ghi');
;WITH n(x) AS
(
SELECT CONVERT(xml, '<x>' + REPLACE(col, '-', '</x><x>') + '</x>')
FROM #foo
)
SELECT
Frst = x.value('/x[1]','varchar(111)'),
Scnd = x.value('/x[2]','varchar(111)'),
Thrd = x.value('/x[3]','varchar(111)'),
Frth = x.value('/x[4]','varchar(111)'),
Ffth = x.value('/x[5]','varchar(111)')
FROM n;
Results (sorry about the massive size, seems this doesn't handle 144dpi well):
add a test before your last select
then you should decide how to handle the other case (when one of start is 0)
You can also refer to this link about splitting a string in sql server
which is uses a loop and can handle any number of delimiters
if #start1>0 and #start2>0 and #start3>0 and #start4>0
select LEFT(#text,#start1-1) AS Frst,
SUBSTRING(#text,#start1+1,#start2-#start1-1) AS Scnd,
SUBSTRING(#text,#start2+1,#start3-#start2-1) AS Third,
SUBSTRING(#text,#start3+1,#start4-#start3-1)AS Third,
RIGHT(#text,#start_index_reverse-1) AS Lst

How can I write this SQL while loop code to get an XML results in one line instead of 3 separate lines?

I'm trying to get all this XML result in one line instead of 3 for each column
DECLARE #ii INT = 10;
DECLARE #String1 NVARCHAR(4000);
SET #String1 = '';
WHILE(#ii <= 18)
BEGIN
SET #String1 = (#String1 + 'SELECT LoanNumber = ''Complaint'+CONVERT(VARCHAR(2),#ii)+'-Call1'' , LoanStatus=''Compliants'' , LoanStatusDate = CAST(GETDATE() AS DATE)
UNION
SELECT LoanNumber = ''Complaint'+CONVERT(VARCHAR(2),#ii)+'-Call2'', LoanStatus=''Compliants'' , LoanStatusDate = CAST(GETDATE() AS DATE)
UNION
SELECT LoanNumber = ''Complaint'+CONVERT(VARCHAR(2),#ii)+'-Call3'', LoanStatus=''Compliants'' , LoanStatusDate = CAST(GETDATE() AS DATE)')
IF #ii != 18
SET #string1 = #string1 + ' UNION '
ELSE
SET #string1 = #string1 + 'FOR XML PATH (''Loan''),ROOT(''Loans'') '
SET #ii = #ii+1
END
EXEC sp_executesql #String1
I want something like this:
<Loans>
<LoanNumber>Complaint10-Call1<LoanStatus>Compliants<LoanStatusDate>2019-01-18
</Loan>
<Loan>
<LoanNumber>Complaint10-Call2 <LoanStatus>Compliants<LoanStatusDate>2019-01-18
</Loan>
<Loan>
<LoanNumber>Complaint10-Call3<LoanStatus>Compliants<LoanStatusDate>2019-01-18
</Loan>
Instead of the result that you get when you execute the code I provided. I appreciate your help.
This might be wild guessing, but I've got the feeling, that I understand, what this is about:
if you run the code you will see the result. no input data is needed .
I just want the structure of the xml outcome to all be on one line for
one set of each loop
Your provided code leads to this:
<Loans>
<Loan>
<LoanNumber>Complaint10-Call1</LoanNumber>
<LoanStatus>Compliants</LoanStatus>
<LoanStatusDate>2019-01-22</LoanStatusDate>
</Loan>
<Loan>
<LoanNumber>Complaint10-Call2</LoanNumber>
<LoanStatus>Compliants</LoanStatus>
<LoanStatusDate>2019-01-22</LoanStatusDate>
</Loan>
<!-- more of them-->
</Loans>
This is perfectly okay, valid XML.
But you want the result
outcome to all be on one line for one set of each loop
Something like this?
<Loans>
<Loan>
<LoanNumber>Complaint10-Call1</LoanNumber><LoanStatus>Compliants</LoanStatus><LoanStatusDate>2019-01-22</LoanStatusDate>
</Loan>
<!-- more of them-->
</Loans>
There is a big misconception I think... XML is not the thing you see. The same XML can look quite differently, without any semantic difference:
Check this out:
DECLARE #xmltable table(SomeXml XML)
INSERT INTO #xmltable VALUES
--the whole in one line
('<root><a>test</a><a>test2</a></root>')
--all <a>s in one line
,('<root>
<a>test</a><a>test2</a>
</root>')
--each element in one line
,('<root>
<a>test</a>
<a>test2</a>
</root>')
--white space going wild...
,('<root>
<a>test</a>
<a>test2</a>
</root>');
--now check the results
SELECT * FROM #xmltable;
This means: How the XML appears is a matter of the interpreter. The same XML opened with another tool might appear differently. Dealing with XML means dealing with data but not with format... The actual format has no meaning and should not matter at all...
Starting with SQL-Server 2016 you might have a look at JSON, if you need a tiny format:
DECLARE #somedata table(SomeValue VARCHAR(100),SomeStatus VARCHAR(100),SomeDate DATE);
INSERT INTO #somedata VALUES
('Complaint10-Call1','Complaints','2019-01-22')
,('Complaint10-Call2','Complaints','2019-01-22')
,('Complaint10-Call3','Complaints','2019-01-22');
SELECT * FROM #somedata FOR JSON PATH;
The result comes in one line:
[{"SomeValue":"Complaint10-Call1","SomeStatus":"Complaints","SomeDate":"2019-01-22"},{"SomeValue":"Complaint10-Call2","SomeStatus":"Complaints","SomeDate":"2019-01-22"},{"SomeValue":"Complaint10-Call3","SomeStatus":"Complaints","SomeDate":"2019-01-22"}]