Replace() on a field with line breaks in it? - sql-server-2005

So I have a field that's basically storing an entire XML file per row, complete with line breaks, and I need to remove some text from close to three hundred rows. The replace() function doesn't find the offending text no matter what I do, and all I can find by searching is a bunchy of people trying to remove the line breaks themselves. I don't see any reason that replace() just wouldn't work, so I must just be formatting it wrong somehow. Help?
Edit: Here's an example of what I mean in broad terms:
<script>...</script><dependencies>...</dependencies><bunch of other stuff></bunch of other stuff><labels><label description="Field2" languagecode="1033" /></labels><events><event name="onchange" application="false" active="true"><script><![field2.DataValue = (some equation);
</script><dependencies /></event></events><a bunch more stuff></a bunch more stuff>
I need to just remove everything between the events tags. So my sql code is this:
replace(fieldname, '<events><event name="onchange" application="false" active="true"><script><![field2.DataValue = (some equation);
</script><dependencies /></event></events>', '')
I've tried it like that, and I've tried it all on one line, and I've tried using char(10) where the line breaks are supposed to be, and nothing.

Nathan's answer was close. Since this question is the first thing that came up from a search I wanted to add a solution for my problem.
select replace(field,CHAR(13)+CHAR(10),' ')
I replaced the line break with a space incase there was no break. It may be that you want to always replace it with nothing in which case '' should be used instead of ' '.
Hope this helps someone else and they don't have to click the second link in the results from the search engine.

Worked for me on SQL2012-
UPDATE YourTable
SET YourCol = REPLACE(YourCol, CHAR(13) + CHAR(10), '')

If your column is an xml typed column, you can use the delete method on the column to remove the events nodes. See http://msdn.microsoft.com/en-us/library/ms190254(v=SQL.90).aspx for more info.

try two simple tests.
try the replace on an xml string that has no double quotes (or single quotes) but does have CRLFs. Does it work? If yes, you need to escape the quote marks.
try the replace on an xml string that has no CRLFs. Does it work? Great. If yes use two nested replace() one for the CRLFs only, then a second outter replace for the string in question.

A lot of people do not remember that line breaks are two characters
(Char 10 \n, and Char 13 \r)
replace both, and you should be good.
SELECT
REPLACE(field , CHR(10)+CHR(13), '' )
FROM Blah..

Related

How to replace some characters after a specific character to another specific character in one big sql line in notepad++

I have a big sql file with thousand user something like this:
('someone1#mydomain.com','{SSHA512}JWHCqHzazH2vGneLPfhMKkoAamzvxdNCWYOlhZ+uDx36jHdoMXwQmbEemvUMn7ZG6c9+22noXjjb2hAb99/5A/slscDJPKav','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone1/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2020-03-19 13:15:58','2015-08-03 06:11:53','2020-03-19 13:15:58','9999-12-31 00:00:00',1'someone1'),
('someone2#mydomain.com','{SSHA512}UoMeyocmdC2DxM0S7B4WFdjnCNuvkngzzLus33h9nugKVlvdhlcboKmMDDuAkCHEyLBUgf8DicKWFPJVS7EOF/ytv27MQ3Ch','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone2/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2015-12-17 12:27:35','2015-08-03 06:44:10','2021-06-08 06:55:33','9999-12-31 00:00:00',1'someone2'),
('someone3#mydomain.com','{SSHA512}A6ToCf4OfP3XNEU9ngEmGN/LDquH9+s9Qxme3SoJaDyVvxiWpnwwTiAALSdnmhIxDB2VQK0zhdF+jP8ARvh0N3IDL0Xv/KmL','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone3/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2018-04-03 12:31:09','2015-08-03 06:50:01','2018-04-03 12:31:18','9999-12-31 00:00:00',1'someone3'),
('someone4#mydomain.com','{SSHA512}t7/JbUPQ+rtKeRTgWRH6KlETr2JsqYORBOZouzOzs4Wo6YfHYLoy0m+U4kZXk+AeNgMep2hGZSodPZdK2l2bn9MhOKHOuF/L','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone4/',0,'mydomain.com','','','normal',''0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2020-03-18 07:48:26','2016-11-14 06:59:04','2021-06-08 05:54:28',9999-12-31 00:00:00',1'someone4')
And now I need to delete the last word ('someone1' , 'someone2' , 'someone3' , 'someone4') for every user which adjoining to 1. It will be looks like
....9999-12-31 00:00:00',1)
not like in original
....9999-12-31 00:00:00',1'someone1')
....9999-12-31 00:00:00',1'someone2')
etc
But don't forget they are not in different lines. All this is in one big line and this makes me to ask you help. Thanks a lot.
It seems that (from your examples) the rows do not contain any parentheses except their start and end characters. So you can search for one quotation mark ', and a number of letters and/or digits, and one quotation mark ', and than ).
To do this;
Open Replace window in Notepad++ by using ctrl+h shortcut
From Search Mode section select Reqular expression
Write '[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?'\) to Find what box
Write '\) to Replace with box
Click Replace All button.
This works if user names consist of letters or digits and _, -, . at most 3 times.
Be Sure that you have a copy of original file as a backup. And also be aware of that the regular expression that we use may find unrelated parts if any row contains closing parentheses except end of it.

Replacing characters in strings. Intersystems cache SQL

I recently reached out to this community for assistance on how to remove a specific character from the very beginning of a string and at the end of a string. In my case, the character I needed removed was an ampersand. Here is the code I used that resolved my issue:
select substr((rpu.userrole), 2, length(rpu.userrole) - 2) AS UserRole
However, now I am left with strings like this after the very first and last ampersand have been removed:
BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl
As you can see, there are anywhere between zero and several ampersands separating these role positions. Cache seems to have a lot of functions to concatenate strings, but am not having any luck finding functions to replace characters in a string. There is a "$replace" function but I believe it only works in ObjectScript.
Can anyone assist me in replacing all ampersands regardless of how many there are in each string with the literal ', ' ? I need to separate these with a single comma and one space. I included the tick marks as that are what I use in the code for my strings.
Any assistance would be greatly appreciated.
Thanks!
You can use REPLACE in SQL as well (Caché/Ensemble and IRIS).
SELECT REPLACE('BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl','&',',') as UserRole
will give
BachelorLvlProvider,ShortTermAccess,WrkflwBachelorLvl
My test below works perfectly well :
SELECT REPLACE(substr(('&BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl&'), 2, length('&BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl&') - 2),'&',',') as UserRole

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

Need to remove GO from string but only if followed or preceded by hidden character or space

I have a string (=SQL query) and I need to remove all GO commands.
That could be done simply like this: REPLACE(<columnname>,'GO','') but strings like 'Be gone!' will suddenly look like 'Be ne!'
So my idea is to use something like this:
REPLACE(<columnname>,'GO' + <hidden character>,'')
But how to do that?
If returns are also a problem, you'll have to nest replace like:
REPLACE(REPLACE(<columnname>,'GO ',''), CHAR(10)+CHAR(13), '').
Note this replaces a char(10)+char(13), which is a windows return (Carriage Return Line Feed). If you (also) have Carriage Returns or Line Feeds without the other, you'll have to correct for that. If you have a combination of possible line endings, you'll have to nest replace even further. This should be the general pattern, though.
replace ([columnA], 'GO' + char(13),'') seems to do the trick.

Unable to replace Char(63) by SQL query

I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.
Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')
char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.
I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;
That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**
It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')
This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63