Square bracket in regexp_replace pattern - sql

I want to use REGEXP_REPLACE with this pattern. but I don't know how to add square bracket in square bracket. I try to put escape character but i did not work. in this screenshot i want to also keep the [XXX] these square bracket. I need to add this square bracket somehow in my pattern. thanks.
Right now the output is this:
MSD_40001_ME_SPE__XXXX__Technical__Specification_REV9_(2021_05_27)_xls
but I want to like that:
MSD_40001_ME_SPE_[XXXX]_Technical__Specification_REV9_(2021_05_27)_xls
I tried the escape character \ but it did not work

You could try this regex pattern: [^][a-z_A-Z0-9()]
SELECT REGEXP_REPLACE('MSD_40001_ME_SPE_[XXXX]_Technical_%Specification_REV#9_(2021_05_27)_xls', '[^][a-z_A-Z0-9()]', '_')
FROM DUAL
To specify a right bracket (]) in the bracket expression, place it first in the list (after the initial circumflex (^), if any).
See demo here

From Regexp.Info,
One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression. So in POSIX, the regular expression [\d] matches a \ or a d. To match a ], put it as the first character after the opening [ or the negating ^. To match a -, put it right before the closing ]. To match a ^, put it before the final literal - or the closing ]. Put together, []\d^-] matches ], , d, ^ or -.

How about this different take on the problem? Match one or more of the following: space OR a dash OR a percent sign OR a dollar sign OR an at sign OR a period (escaping the characters that have special regex meaning) and replace with an underscore. Note this takes care of the double underscore after "Technical".
with tbl(data) as (
select 'MSD 40001-ME-SPE-[XXXX] Technical%$Specification#REV9 (2021.05.27).xls' from dual
)
select regexp_replace(data, '( |\-|%|\$|#|\.)+', '_') fixed
from tbl;
FIXED
---------------------------------------------------------------------
MSD_40001_ME_SPE_[XXXX]_Technical_Specification_REV9_(2021_05_27)_xls

Related

new lines are not getting eliminated

I'm trying to replace newline etc kind of values using regexp_replace. But when I open the result in query result window, I can still see the new lines in the text. Even when I copy the result, I can see new line characters. See output for example, I just copied from the result.
Below is my query
select regexp_replace('abc123
/n
CHAR(10)
头疼,'||CHR(10)||'allo','[^[:alpha:][:digit:][ \t]]','') from dual;
/ I just kept for testing characters.
Output:
abc123
/n
CHAR(10)
头疼,
allo
How to remove the new lines from the text?
Expected output:
abc123 /nCHAR(10)头疼,allo
There are two mistakes in your code. One of them causes the issue you noticed.
First, in a bracket expression, in Oracle regular expressions (which follow the POSIX standard), there are no escape sequences. You probably meant \t as escape sequence for tab - within the bracket expression. (Note also that in Oracle regular expressions, there are no escape sequences like \t and \n anyway. If you must preserve tabs, it can be done, but not like that.)
Second, regardless of this, you include two character classes, [:alpha:] and [:digit:], and also [ \t] in the (negated) bracket expression. The last one is not a character class, so the [ as well as the space, the backslash and the letter t are interpreted as literal characters - they stand in for themselves. The closing bracket, on the other hand, has special meaning. The first of your two closing brackets is interpreted as the end of the bracket expression; and the second closing bracket is interpreted as being an additional, literal character that must be matched! Since there is no such literal closing bracket anywhere in the string, nothing is replaced.
To fix both mistakes, replace [ \t] with the [:blank:] character class, which consists exactly of space and tab. (And, note that [:alpha:][:digit:] can be written more compactly as [:alnum:].)

Regex to find period in square brackets

I am parsing some SQL statements and have found places where the SELECT statement may be:
SELECT [tblCustomer].[FirstName], [tblCustomer.LastName], [tblOrder].[Order_No]
and as you see the second column has a . inside the square brackets. This is acceptable in Access SQL but not SQL Server. I'm trying to build a RegEx to identify when there is a . inside square brackets and replace it with ].[
I've tried: \[.+?\](?![\.]) which will get me a period inside square brackets but it doesn't stop searching when it finds the closing bracket.
I'm using ECMAScript to be compatible with VBA and I don't have concerns about nested brackets.
Example: https://regex101.com/r/Inxhdg/1/
You can use
Search for: (\[\w+)\.(?=\w+])
Replace with: $1].[
See the regex demo. Details:
(\[\w+) - Group 1 ($1): [ and then any one or more letters, digits, or underscores
\. - a dot
(?=\w+]) - a positive lookahead that requires one or more letters, digits or underscores and then a ] char immediately to the right of the current location.

sql regexp string end with ".0"

I want to judge if a positive number string is end with ".0", so I wrote the following sql:
select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape.
So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false.
Could anyone tell me the right pattern?
Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:
select '12310' REGEXP '^[0-9]*\\.0$';
Result:
false
Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:
hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true
Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.
If you know it is a number string, why not just use:
select ( val like '%.0' )
You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.
As for your question . is a wildcard in regular expressions. It matches any character.

Oracle regexp_like pattern with POSIX character class

I have username in this pattern
ref_2_34_aaa_dos
ref_2_34_bbb_dos
How can I use regexp_like for this?
SELECT username FROM all_users WHERE regexp_like(username, '^ref_2_34_[:alpha:]_dos$')
does not work. Nor can I use ESCAPE '\' with regexp_like. it would give a syntax error.
You need to put the POSIX character class [:alpha:] into a bracket expression (i.e. [...]) and apply a + quantifier to it:
regexp_like(username, '^ref_2_34_[[:alpha:]]+_dos$')
The + quantifier means there can be 1 or more letters between the last but one and last underscores.
If your string may have no user name at that location (it is empty), and you want to get those entries too, you would need to replace the plus with the * quantifier that matches zero or more occurrences of the quantified subpattern.
Since comments require some more clarifications, here is some bracket expression and POSIX character class reference.
Bracket expressions
Matches any single character in the list within the brackets. The following operators are allowed within the list, but other metacharacters included are treated as literals:
Range operator: -
POSIX character class: [: :]
POSIX collation element: [. .]
POSIX character equivalence class: [= =]
A dash (-) is a literal when it occurs first or last in the list, or as an ending range point in a range expression, as in [#--]. A right bracket (]) is treated as a literal if it occurs first in the list.
POSIX Character Class (can be a part of the bracket expression):
[:class:] - Matches any character belonging to the specified POSIX character class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported.
...
The expression [[:upper:]]+ searches for one or more consecutive uppercase characters.
Bracket expressions can be considered a kind of a "container" construct for multiple atoms, that, as a whole regex unit, matches some class of characters you defined. If you need to match a <, or >, or letters, you may combine them into 1 bracket expression [<>[:alpha:]]. To match zero or more of <, > or letters, add a * quantifier after ]: [<>[:alpha:]]*.
Or, to imitate a trailing word boundary, one might use [^_[:alnum:]] (say, in a ($|[^_[:alnum:]]) pattern) that matches any character but a _, digits and letters ([:alnum:] matches alphanumerical symbols).

Why does this space matter in this regular expression?

Why does the space make all the difference?
select * from beds where id~'.*Extra large.* (Red).*';
and
select * from beds where id~'.*Extra large.*(Red).*';
The first one returned nothing and the second acted as I wanted. An example of what I want matched is:
"Extra large" (Red) {2012 model}
I thought the first would work since there is a space after (Red)?
EDIT:Even if I escape the brackets with '\' I still can't have a space there.
The problem is that you have not escaped your brackets around "Red". Your regex should be:
'.*Extra large.* \(Red\).*'
This makes the brackets literal brackets, but without escaping them they create a regex group (and not characters to be matched).
Your first regex grouped the characters Red and required a space to precede that group Red, so it would match "... Red...", but there is a bracket in your input before Red, so it doesn't match.
Your second regex accepts any character(s) (via .*) before Red, so it matches.
This is because you're not escaping the ().
The brackets around "Red" create a group and are not included in the match. This
is the reason why the regexp without the whitespace works.
The .* in the regexp without the whitespace matches " (, then comes Red and after that ) {2012 model}. The brackets are matched by the .* operators.
The .* in the regexp with the whitespace matches " and the ( is not included in the pattern.
So the right pattern would be this:
.*Extra large.*\(Red\).*