SQL Server query to delete text from text column - sql

I have a SQL Server database with a table feedback that contains a text column comment. In that column I have tag data, for example
This is my record <tag>Random characters are here</tag> with information.
How do I write a query to update all of these records to remove the <tag></tag> and all of the text in between?
I'd like to write this to a different 'temporary' table to first verify the changes and then update the original table.
I am running SQL Server 2014 Express.
Thank you

Here is a function to remove tags..
CREATE FUNCTION [dbo].[RemoveTag](#text NVARCHAR(MAX), #tag as nvarchar(max))
RETURNS NVARCHAR(MAX)
AS
BEGIN
declare #startTagIndex as int
declare #endTagIndex as int
set #startTagIndex = CHARINDEX('<' + #tag + '>', #text)
if(#startTagIndex > 0) BEGIN
set #endTagIndex = CHARINDEX('</' + #tag + '>', #text, #startTagIndex)
if(#endTagIndex > 0) BEGIN
return LEFT(#text, #startTagIndex - 1) + RIGHT(#text, len(#text) - len(#tag) - #endTagIndex - 2)
END
END
return #text
END
Later you can use it like:
Update table set field = dbo.RemoveTag(field, 'tag')
If you want to write fields to other table then:
CREATE TABLE dbo.OtherTable (
OtherField nvarchar(MAX) NOT NULL
)
GO
INSERT INTO OtherTable (OtherField)
SELECT dbo.RemoveTag(field, 'tag') from table

Making a lot assumptions about the format of your string. But if they're valid then this is very simple:
left(s, charindex('<tag>', s - 1)) +
substring(s, charindex('</tag>', s) + 6, len(s))
Obviously we're basically assuming that the search strings appear only once and in the correct order. There's also an assumption that there will be matches. Also, I used len(s) as an easy upper bound on the number of characters to take from the right. You could just hard-code something appropriate if you felt like it since SQL Server doesn't error for going past the end. s is just a stand in for your char column.
http://sqlfiddle.com/#!3/771a3/8
Not sure if extra whitespace is going to be an issue so you might want to trim and add a space character in the middle.
rtrim(left(s, charindex('<tag>', s) - 1)) + ' ' +
ltrim(substring(s, charindex('</tag>', s) + 6, len(s)))

You can use CHARINDEX to find where your tags start and stop, SUBSTRING to get all text between < and >, and REPLACE to swap out the substring for ''.
Select Field,
Substring(FIELD, charindex('<', Field), CHARINDEX('>', Field,
(CHARINDEX('>', FIELD)) + 1) - charindex('<', Field)+1) as ToRemove,
replace (Field, Substring(FIELD, charindex('<', Field), CHARINDEX('>',
Field, (CHARINDEX('>', FIELD)) + 1) - charindex('<', Field)+1), '')
as FinalResult
from TableName
The output will be three columns, Field, ToRemove and FinalResult, but nothing will actually be updated.
I think the only way this will fail is if you have nested tags. <b><i>sometext</i></b>
To actually make the change:
Update #TableName set Field = replace (Field, Substring(FIELD, charindex('<', Field), CHARINDEX('>', Field, (CHARINDEX('>', FIELD)) + 1) - charindex('<', Field)+1), '')
Tested on SQL Server 2012.

Related

Returning the column value even if the special character isn't present using SUBSTRING & CHARINDEX

I'm using SQL and trying to show all the data in a column before a special character like <.
I've used this SQL:
SUBSTRING(ACTIVITY.Name, 0, CHARINDEX('<', ACTIVITY.Name, 2)) AS ActivityIdentifier
It works absolutely fine when there is a < in the column, but when one isn't present I get no result. I need to be able to return the column value even if the character isn't present.
I looked at RTRIM, LEFT and LEN functions but as my Activity Name can be different lengths, they didn't seem to fit.
I'd appreciate any advice.
I think the simplest fix is to append a '<' to the string:
SUBSTRING(ACTIVITY.Name, 1, CHARINDEX('<', ACTIVITY.Name + '<', 2)) as ActivityIdentifier
If you don't want the '<' in the result, then:
SUBSTRING(ACTIVITY.Name, 1, CHARINDEX('<', ACTIVITY.Name + '<', 2) - 1) as ActivityIdentifier
You could also write a stored function that would handle this - since it's just operating on data passed in, and not doing any "hidden" data access in the background, it should be fairly well behaved in terms of performance.
Try this:
CREATE OR ALTER FUNCTION dbo.TrimSpecialChar
(#Input NVARCHAR(500), #SpecialChar NCHAR(1))
RETURNS NVARCHAR(500)
AS
BEGIN
DECLARE #Result NVARCHAR(500);
-- if "special char" is not found - just return input
DECLARE #SpecCharIx INT = CHARINDEX(#SpecialChar, #Input);
IF (#SpecCharIx = 0)
SET #Result = #Input;
ELSE
SET #Result = SUBSTRING(#Input, 1, #SpecCharIx-1);
RETURN #Result;
END
You can then call it like this:
SELECT dbo.TrimSpecialChar(N'Testinput without special characters', N'<')
should return back the whole input, while
SELECT dbo.TrimSpecialChar(N'Testinput with the < special characters', N'<')
would just return
Testinput with the
You can simply add case expression :
(case when charindex('<', ACTIVITY.Name) > 0
then SUBSTRING(ACTIVITY.Name, 1, CHARINDEX('<', ACTIVITY.Name, 2))
else ACTIVITY.Name
end) as ActivityIdentifier

SQL SERVER 2008 - Returning a portion of text using SUBSTRING AND CHARINDEX. Need to return all text UNTIL a specific char

I have a column called 'response' that contains lots of data about a person.
I'd like to only return the info after a specific string
But, using the method below I sometimes (when people have <100 IQ) get the | that comes directly after the required number..
I'd like any characters after the'PersonIQ=' but only before the pipe.
I'm not sure of the best way to achieve this.
Query speed is a concern and my idea of nested CASE is likely not the best solution.
Any advice appreciated. Thanks
substring(response,(charindex('PersonIQ=',response)+9),3)
This is my suggestion:
declare #s varchar(200) = 'aaa=bbb|cc=d|PersonIQ=99|e=f|1=2'
declare #iq varchar(10) = 'PersonIQ='
declare #pipe varchar(1) = '|'
select substring(#s,
charindex(#iq, #s) + len(#iq),
charindex(#pipe, #s, charindex(#iq, #s)) - (charindex(#iq, #s) + len(#iq))
)
Instead of the 3 in your formula you should calculate the space between #iq and #pipe with this last part of the formula charindex(#pipe, #s, charindex(#iq, #s)) - (charindex(#iq, #s) + len(#iq)), which gets the first #pipe index after #iq, and then substructs the index of the IQ value.
Assuming there's always a pipe, you could do this:
substring(stuff(reponse,1,charindex('PersonIQ=',reponse)-1,''),1,charindex('|',stuff(reponse,1,charindex('PersonIQ=',reponse)-1,''))-1)
Or, you could convert your string to xml and reference PersonIQ directly, e.g.:
--assuming your string looks something like this..
declare #s varchar(max) = 'asdaf=xxx|PersonIQ=100|xxx=yyy'
select convert(xml, '<x ' + replace(replace(#s, '=', '='''), '|', ''' ') + '''/>').value('(/x/#PersonIQ)[1]','int')

Substring with conditional statement

tl;dr
I don't understand how to conditionally change the length parameter of SUBSTRING(..)
Short enough, did read
I've got a text field in a sql table that I want to retrieve a substring from
There is a specific part of text I am having trouble retrieving a substring from, because I cannot guarantee the next string.
For example, I have:
... Tracking Code : /a/delimited/string AttributeW : ValueW ...
And
... Tracking Code : /a/different/delimited/string A random string ...
From both of those i want /a/delimited/string and /a/different/delimited/string respectively
My current sql looks something like:
DECLARE #TrackingStartStr VARCHAR(50), #TrackingEndStr VARCHAR(50)
SET #TrackingStartStr = 'Tracking Code :'
SET #TrackingEndStr = 'Some string that indicates the text is about to end'
SELECT
AField
,RTRIM(LTRIM(Substring(CAST([Body] AS VARCHAR(MAX))
,Charindex(#TrackingStartStr,CAST([Body] AS VARCHAR(MAX))) + LEN(#TrackingStartStr)
,charindex(#TrackingEndStr,CAST([Body] AS VARCHAR(MAX))) - (Charindex(#TrackingStartStr,CAST([Body] AS VARCHAR(MAX))) + LEN(#TrackingStartStr))
))) AS TrackingCode
From tbl_stupidTextTable
I don't know how to conditionally change what #TrackingEndStr is for each row.
Try:
select substring(stringfield,
charindex('/', stringfield, 1),
charindex(' ',
stringfield,
charindex('/', stringfield, 1)) -
charindex('/', stringfield, 1)) as val
from tbl
SQL Fiddle demo: http://sqlfiddle.com/#!6/cebab/16/0

Extracting text between two characters in SQL

I am trying to extract the text between two characters using t-sql. I have been able to write it where it pulls the information close to what I want, but for some reason I am not getting what i am expecting(suprise, suprise). Could really use alittle help refining it. I am trying to extract part of the table name that is located between two [ ]. An example of the column data is as follows(this is a table that records all changes made to the database so the column text is basically SQL statements):
ALTER TABLE [TABLENAME].[MYTABLE] ADD
[VIP_CUSTOMER] [int] NULL
I am trying to extract part of the table name, in this example I just want 'MYTABLE'
Right now I am using:
select SUBSTRING(db.Event_Text, CHARINDEX('.', db.Event_Text) + 2, (CHARINDEX(']', db.Event_Text)) - CHARINDEX('', db.Event_Text) + Len(']')) as OBJName
FROM DBA_AUDIT_EVENT DB
WHERE DATABASE_NAME = 'XYZ'
But when I use this, I don't always get the results needed. Sometimes I get 'MYTABLE] ADD' and sometimes I get the part of the name I want, and sometimes depending on the length of the tablename I only get part the first part of the name with part of the name cut off at the end. Is there anyway to get this right, or is there a better way of writing it? Any help would be greatly appreciated. Thanks in advance.
Long, but here's a formula using the brackets:
Declare #text varchar(200);
Select #text='ALTER TABLE [TABLENAME].[MYTABLE] ADD [VIP_CUSTOMER] [int] NULL';
Select SUBSTRING(#text,
CHARINDEX('[', #text, CHARINDEX('[', #text) + 1 ) +1,
CHARINDEX(']', #text, CHARINDEX('[', #text, CHARINDEX('[', #text) + 1 ) ) -
CHARINDEX('[', #text, CHARINDEX('[', #text) + 1 ) - 1 );
Replace #text with your column name.
Give this a shot:
select SUBSTRING(db.Event_Text, CHARINDEX('.', db.Event_Text) + 2
, CHARINDEX(']', db.Event_Text) - 2) as OBJName
FROM DBA_AUDIT_EVENT DB
WHERE DATABASE_NAME = 'XYZ'
this is a pretty ugly way to get the length, but I've used something like this before:
select SUBSTRING(db.Event_Text,
CHARINDEX('.', db.Event_Text) + 2,
charindex('] ADD',db.Event_Text) - CHARINDEX('.',db.Event_Text)-2))
Give it a try, it may work for you.

Search and Replace Serialized DB Dump

I am moving a database from one server to an other and have lots of serialized data in there. So, I am wondering:
Is it possible to use regex to replace all occurrences like the following (and similar)
s:22:\"http://somedomain.com/\"
s:26:\"http://somedomain.com/abc/\"
s:29:\"http://somedomain.com/abcdef/\"
to
s:27:\"http://someOtherdomain.com/\"
s:31:\"http://someOtherdomain.com/abc/\"
s:34:\"http://someOtherdomain.com/abcdef/\"
If that column, that holds these data, is of the same length, and these occurrences 22, 26, 29,... are at the same position from the beginning of the string. Then, for SQL Server, you can use REPLACE , SUBSTRING with CHARINDEX to do that:
DECLARE #s VARCHAR(50);
DECLARE #sub INT;
SET #s = 's:27:\"http://somedomain.com/\"';
SET #sub = CONVERT(INT, SUBSTRING(#s, CHARINDEX(':', #s) + 1, 2));
SELECT REPLACE(REPLACE(#s, 'somedomain', 'someOtherdomain'), #sub, #sub + 5);
So s:number:\"http://somedomain.com/\" will become s:number + 5:\"http://someOtherdomain.com/\".
If you want to run an UPDATE against that table you can write it this way:
UPDATE #t
SET s = REPLACE(REPLACE(s, 'somedomain', 'someOtherdomain'),
CONVERT(INT, SUBSTRING(s, CHARINDEX(':', s) + 1, 2)),
CONVERT(INT, SUBSTRING(s, CHARINDEX(':', s) + 1, 2)) + 5);
What does this query do, is that, it searches for the occurrence of somedomain and replaces it with someOtherdomain, get the number between the first two :'s, convert it to INT and replace it with the same number + 5. The following is how your data should looks like after you run the previous query:
s:27:\"http://someOtherdomain.com/\"
s:31:\"http://someOtherdomain.com/abc/\"
s:34:\"http://someOtherdomain.com/abcdef/\"
Here is a Live Demo.