Replacing characters in strings. Intersystems cache SQL - sql

I recently reached out to this community for assistance on how to remove a specific character from the very beginning of a string and at the end of a string. In my case, the character I needed removed was an ampersand. Here is the code I used that resolved my issue:
select substr((rpu.userrole), 2, length(rpu.userrole) - 2) AS UserRole
However, now I am left with strings like this after the very first and last ampersand have been removed:
BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl
As you can see, there are anywhere between zero and several ampersands separating these role positions. Cache seems to have a lot of functions to concatenate strings, but am not having any luck finding functions to replace characters in a string. There is a "$replace" function but I believe it only works in ObjectScript.
Can anyone assist me in replacing all ampersands regardless of how many there are in each string with the literal ', ' ? I need to separate these with a single comma and one space. I included the tick marks as that are what I use in the code for my strings.
Any assistance would be greatly appreciated.
Thanks!

You can use REPLACE in SQL as well (Caché/Ensemble and IRIS).
SELECT REPLACE('BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl','&',',') as UserRole
will give
BachelorLvlProvider,ShortTermAccess,WrkflwBachelorLvl

My test below works perfectly well :
SELECT REPLACE(substr(('&BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl&'), 2, length('&BachelorLvlProvider&ShortTermAccess&WrkflwBachelorLvl&') - 2),'&',',') as UserRole

Related

How to resolve the invalid number in orcale

I am facing an issue with some data that start with a strange character before the number 5
how can I discover all of these characters and remove it
5,AX,AMEX,0,0,0,0,0,0,0,
DM,BSHB,0,0,0,0,0,0,0,MC,
BSHB,1,323.50,0,0,0,0,1,P1,
BSHB,81,7819.25,0,0,0,0,81,
VC,BSHB,5,212.95,0,0,0,0,5
what do you recommend to resolve this issue knowing that I get the data from a specific source so I can not change anything but I am trying to mask it in the view?
regexp_replace can always help to find or replace/remove any characters you want.
For example, if you want to delete all characters escept alphanumeric, space, comma and dot:
regexp_replace(t.str,'[^ ,.[:alnum:]]')

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

Write regex for pattern like W00001

I am new to Regular Expressions and any help is highly appreciated.
Pattern like W00000,W00001,W00002,W00004
Must begin with W
Each string before comma must be six characters
String can only be repeated four times
Comma in between
Must not begin or end with comma
I tried below pattern and some others, like (^[W]{1}\d{5}){1,4}'), and none of them work correctly:
Select 'X' from dual Where REGEXP_LIKE ('W12342','(^[W]{1}\d{5})(?<!,)$')
My understanding is that the OP is saying the match should fail if the string begins or ends with a comma, not just that the preceding or trailing commas shouldn't match, so anchors are needed. Also, based on the regex he attempted, I infer that a single group, such as W00000, should match. So, I think the regex should be this, if the characters following the W must always be digits:
^W[:digit:]{5}(,W[:digit:]{5}){0,3}$
Or this, if they can be something other than digits:
^W[^,]{5}(,W[^,]{5}){0,3}$
UPDATE:
The OP posted the following comment:
I am on Oracle 11g and [:digit:] doesn't work. When I replace it with [0-9] it then works fine.
According to the documentation, Oracle 11g conforms to the POSIX regex standard and should be able to use POSIX character classes such as [:digit:]. However, I noticed in the docs that Oracle 11g does support Perl-style backslash character class abbreviations, which I didn't think was the case when I originally wrote this answer. In that case, the following should work:
^W\d{5}(,W\d{5}){0,3}$
Well in that case, you can do this:
(W[^,]{5},){3}W[^,]{5}
If I understood correctly, this should do it!
^W[0-9]{5}(,W[0-9]{5}){0,3}$
One W12345 pattern, maybe followed by one to 3 ,W12345 blocks.
Edit1: Adding ^$ to fail if there is a comma
Edit2: Fix class, since it fails on Oracle 11g

Unable to replace Char(63) by SQL query

I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.
Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')
char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.
I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;
That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**
It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')
This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63

Replace() on a field with line breaks in it?

So I have a field that's basically storing an entire XML file per row, complete with line breaks, and I need to remove some text from close to three hundred rows. The replace() function doesn't find the offending text no matter what I do, and all I can find by searching is a bunchy of people trying to remove the line breaks themselves. I don't see any reason that replace() just wouldn't work, so I must just be formatting it wrong somehow. Help?
Edit: Here's an example of what I mean in broad terms:
<script>...</script><dependencies>...</dependencies><bunch of other stuff></bunch of other stuff><labels><label description="Field2" languagecode="1033" /></labels><events><event name="onchange" application="false" active="true"><script><![field2.DataValue = (some equation);
</script><dependencies /></event></events><a bunch more stuff></a bunch more stuff>
I need to just remove everything between the events tags. So my sql code is this:
replace(fieldname, '<events><event name="onchange" application="false" active="true"><script><![field2.DataValue = (some equation);
</script><dependencies /></event></events>', '')
I've tried it like that, and I've tried it all on one line, and I've tried using char(10) where the line breaks are supposed to be, and nothing.
Nathan's answer was close. Since this question is the first thing that came up from a search I wanted to add a solution for my problem.
select replace(field,CHAR(13)+CHAR(10),' ')
I replaced the line break with a space incase there was no break. It may be that you want to always replace it with nothing in which case '' should be used instead of ' '.
Hope this helps someone else and they don't have to click the second link in the results from the search engine.
Worked for me on SQL2012-
UPDATE YourTable
SET YourCol = REPLACE(YourCol, CHAR(13) + CHAR(10), '')
If your column is an xml typed column, you can use the delete method on the column to remove the events nodes. See http://msdn.microsoft.com/en-us/library/ms190254(v=SQL.90).aspx for more info.
try two simple tests.
try the replace on an xml string that has no double quotes (or single quotes) but does have CRLFs. Does it work? If yes, you need to escape the quote marks.
try the replace on an xml string that has no CRLFs. Does it work? Great. If yes use two nested replace() one for the CRLFs only, then a second outter replace for the string in question.
A lot of people do not remember that line breaks are two characters
(Char 10 \n, and Char 13 \r)
replace both, and you should be good.
SELECT
REPLACE(field , CHR(10)+CHR(13), '' )
FROM Blah..