Oracle regexp_like pattern with POSIX character class - sql

I have username in this pattern
ref_2_34_aaa_dos
ref_2_34_bbb_dos
How can I use regexp_like for this?
SELECT username FROM all_users WHERE regexp_like(username, '^ref_2_34_[:alpha:]_dos$')
does not work. Nor can I use ESCAPE '\' with regexp_like. it would give a syntax error.

You need to put the POSIX character class [:alpha:] into a bracket expression (i.e. [...]) and apply a + quantifier to it:
regexp_like(username, '^ref_2_34_[[:alpha:]]+_dos$')
The + quantifier means there can be 1 or more letters between the last but one and last underscores.
If your string may have no user name at that location (it is empty), and you want to get those entries too, you would need to replace the plus with the * quantifier that matches zero or more occurrences of the quantified subpattern.
Since comments require some more clarifications, here is some bracket expression and POSIX character class reference.
Bracket expressions
Matches any single character in the list within the brackets. The following operators are allowed within the list, but other metacharacters included are treated as literals:
Range operator: -
POSIX character class: [: :]
POSIX collation element: [. .]
POSIX character equivalence class: [= =]
A dash (-) is a literal when it occurs first or last in the list, or as an ending range point in a range expression, as in [#--]. A right bracket (]) is treated as a literal if it occurs first in the list.
POSIX Character Class (can be a part of the bracket expression):
[:class:] - Matches any character belonging to the specified POSIX character class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported.
...
The expression [[:upper:]]+ searches for one or more consecutive uppercase characters.
Bracket expressions can be considered a kind of a "container" construct for multiple atoms, that, as a whole regex unit, matches some class of characters you defined. If you need to match a <, or >, or letters, you may combine them into 1 bracket expression [<>[:alpha:]]. To match zero or more of <, > or letters, add a * quantifier after ]: [<>[:alpha:]]*.
Or, to imitate a trailing word boundary, one might use [^_[:alnum:]] (say, in a ($|[^_[:alnum:]]) pattern) that matches any character but a _, digits and letters ([:alnum:] matches alphanumerical symbols).

Related

new lines are not getting eliminated

I'm trying to replace newline etc kind of values using regexp_replace. But when I open the result in query result window, I can still see the new lines in the text. Even when I copy the result, I can see new line characters. See output for example, I just copied from the result.
Below is my query
select regexp_replace('abc123
/n
CHAR(10)
头疼,'||CHR(10)||'allo','[^[:alpha:][:digit:][ \t]]','') from dual;
/ I just kept for testing characters.
Output:
abc123
/n
CHAR(10)
头疼,
allo
How to remove the new lines from the text?
Expected output:
abc123 /nCHAR(10)头疼,allo
There are two mistakes in your code. One of them causes the issue you noticed.
First, in a bracket expression, in Oracle regular expressions (which follow the POSIX standard), there are no escape sequences. You probably meant \t as escape sequence for tab - within the bracket expression. (Note also that in Oracle regular expressions, there are no escape sequences like \t and \n anyway. If you must preserve tabs, it can be done, but not like that.)
Second, regardless of this, you include two character classes, [:alpha:] and [:digit:], and also [ \t] in the (negated) bracket expression. The last one is not a character class, so the [ as well as the space, the backslash and the letter t are interpreted as literal characters - they stand in for themselves. The closing bracket, on the other hand, has special meaning. The first of your two closing brackets is interpreted as the end of the bracket expression; and the second closing bracket is interpreted as being an additional, literal character that must be matched! Since there is no such literal closing bracket anywhere in the string, nothing is replaced.
To fix both mistakes, replace [ \t] with the [:blank:] character class, which consists exactly of space and tab. (And, note that [:alpha:][:digit:] can be written more compactly as [:alnum:].)

REGEXP_REPLACE explanation

Hi may i know what does the below query means?
REGEXP_REPLACE(number,'[^'' ''-/0-9:-#A-Z''[''-`a-z{-~]', 'xy') ext_number
part 1
In terms of explaining what the function function call is doing:
It is a function call to analyse an input string 'number' with a regex (2nd argument) and replace any parts of the string which match a specific string. As for the name after the parenthesis I am not sure, but the documentation for the function is here
part 2
Sorry to be writing a question within an answer here but I cannot respond in comments yet (not enough rep)
Does this regex work? Unless sql uses different syntax this would appear to be a non-functional regex. There are some red flags, e.g:
The entire regex is wrapped in square parenthesis, indicating a set of characters but seems to predominantly hold an expression
There is a range indicator between a single quote and a character (invalid range: if a dash was required in the match it should be escaped with a '\' (backslash))
One set of square brackets is never closed
After some minor tweaks this regex is valid syntax:
^'' ''\-\/0-9:-#A-Z''[''-a-z{-~]`, but does not match anything I can think of, it is important to know what string is being examined/what the context is for the program in order to identify what the regex might be attempting to do
It seems like it is meant to replaces all ASCII control characters in the column or variable number with xy.
[] encloses a class of characters. Any character in that class matches. [^] negates that, hence all characters match, that are not in the class.
- is a range operator, e.g. a-z means all characters from a to z, like abc...xyz.
It seams like characters enclosed in ' should be escaped (The second ' is to escape the ' in the string itself.) At least this would make some sense. (But for none of the DBMS I found having a regexp_replace() function (Postgres, Oracle, DB2, MariaDB, MySQL), I found something in the docs, that would indicate this escape mechanism. They all use \, but maybe I missed something? Unfortunately you didn't tag which DBMS you're actually using!)
Now if you take an ASCII table you'll see, that the ranges in the expression make up all printable characters (counting space as printable) in groups from space to /, 0 to 9, : to #, etc.. Actually it might have been shorter to express it as '' ''-~, space to ~.
Given the negation, all these don't match. The ones left are from NUL to US and DEL. These match and get replaced by xy one by one.

Oracle REGEXP_LIKE doesn't work as expected

I was testing a regular expression in Oracle SQL and found something I could not understand:
-- NO MATCH
SELECT 1 FROM DUAL WHERE REGEXP_LIKE ('Professor Frank', '(^|\s)Prof[^\s]*(\s|$)');
Above doesn't match, while the following matches:
-- MATCH
SELECT 1 FROM DUAL WHERE REGEXP_LIKE ('Professor Frank', '(^|\s)Prof\S*(\s|$)');
In other regex flavors, It will be like \bProf[^\s]*\b versus \bProf\S*\b and have similar results. Note: Oracle SQL regex does not have \b or word boundary.
Question: Why don't [^\s]* and \S* work the same way in Oracle SQL?
I notice if I remove the (\s|$) at the end, the first regex will match.
In Oracle regular expressions, \s is indeed the escape sequence for a space, but NOT in a matching character set (that is, [.....], or [^....] for excluding one character). In a matching character set, only two characters have a special meaning, - for ranges and ] for closing the set enumeration. They can't be escaped; if needed in the matching set, ] must always be the first character right after the opening [ (it is the ONLY position in which a closing ] stands for itself as a character, and does not denote the end of the matching set), and - must be first or last (best to leave it always to the end of the matching set) - anywhere else it is seen as a range marker. To include (or exclude, if using the [^.....] syntax) a space, just type an actual physical space in the matching set.
Edit: What I said above is not entirely right. There is another special character in a matching set, namely ^. If it is used in the first position, it means "match any character OTHER THAN." In any other position it stands for itself. For example, '[^^]' will match any single character OTHER THAN ^ (the first ^ has special meaning, the second stands in for itself). And, a closing bracket ] stands for itself if it is the second character in brackets, if the first character is ^ (with its SPECIAL meaning). That is, to match any single character OTHER THAN ], we can use the matching pattern '[^]]'.

escape in a select statement

In the following sql, what the use of escape is ?
select * from dual where dummy like 'funny&_' escape '&';
SQL*Plus ask for the value of _ whether escape is specified or not.
The purpose of the escape clause is to stop the wildcard characters (eg. % or _) from being considered as wildcards, as per the documentation
The reason why you're being prompted for the value of _ is because you're using &, which is also usually the character used to prompt for a substitution variable.
To stop the latter from happening, you could:
change to a different escape character
prior to running your statement, run set define off if you're using SQL*Plus (or as a script in a GUI, eg. Toad) or turn off the substitution variable prompting if you're using a GUI.
change the define character to something different by running set define <character>
The escape character is used to indicate that the underscore should be matched as an actual character, rather than as a single-character wildcard. This is explained in the documentation.
You can include the actual characters % or _ in the pattern by using the ESCAPE clause, which identifies the escape character. If the escape character precedes the character % or _ in the pattern, then Oracle interprets this character literally in the pattern rather than as a special pattern-matching character.
If you didn't have the escape clause then the underscore would match any single character, so where dummy like 'funny_' would match 'funnyA', 'funnyB', etc. and not just an actual underscore.
The escape character you've chosen is & which is the default SQL*Plus client substitution variable marker. It has nothing to do with the escape clause, and using that is causing the &_ part of the pattern to be interpreted as a substitution variable called _, hence your being prompted. As it isn't related, the escape clause has no effect on that.
The simplest thing is probably to choose a different escape character. If you want to use that specific escape character and not be prompted, disable or change the substitution character:
set define off
select * from dual where dummy like 'funny&_' escape '&';
set define on
That will then match rows where dummy contains exactly the string 'funny_'. (It's therefore equivalent to where dummy = 'funny_', as there are no unescaped wildcards, making the like pattern matching redundant). It will not match any that start with that pattern (it's sort of like using regexp_like with start and end anchors, and you might be expecting it to work as if you hadn't supplied anchors, but it doesn't). You would need to add a % wildcard for that:
set define off
select * from dual where dummy like 'funny&_%' escape '&';
set define on
And if you want to match any that don't start with funny_ but have it somewhere in the middle of the value, you would need to add another wildcard before it too:
set define off
select * from dual where dummy like '%funny&_%' escape '&';
set define on
You haven't shown any sample data or expected results to it isn't clear which pattern you need.
SQL Fiddle doesn't have substitution variables but here's an example showing how those three patterns match various values.
The syntax for the SQL LIKE Condition is:
expression LIKE pattern [ ESCAPE 'escape_character' ]
Parameters or Arguments
expression : A character expression such as a column or field.
pattern : A character expression that contains pattern matching. The patterns that you can choose from are:
Wildcard | Explanation
---------+-------------
% | Allows you to match any string of any length (including zero length)
_ | Allows you to match on a single character
escape_character: Optional. It allows you to test for literal instances of a wildcard character such as % or _.
Source : http://www.techonthenet.com/sql/like.php

why ldap search return all results when using %?

When I search one ldap server using the following filter
(cn=%*)
It return all results under the base dn? LDAP treat '%' specially? But I haven't found any description about it.
What is your directory server ?
Are you sure tha '%' is not replace by your command line interpreter or your compiler ?
According to RFC2254 % is not a special character
If a value should contain any of the following characters
Character ASCII value
---------------------------
* 0x2a
( 0x28
) 0x29
\ 0x5c
NUL 0x00
the character must be encoded as the backslash '\' character (ASCII
0x5c) followed by the two hexadecimal digits representing the ASCII
value of the encoded character. The case of the two hexadecimal
digits is not significant.
This simple escaping mechanism eliminates filter-parsing ambiguities
and allows any filter that can be represented in LDAP to be
represented as a NUL-terminated string. Other characters besides the
ones listed above may be escaped using this mechanism, for example,
non-printing characters.
For example, the filter checking whether the "cn" attribute contained
a value with the character "" anywhere in it would be represented as
"(cn=\2a*)".
Note that although both the substring and present productions in the
grammar above can produce the "attr=*" construct, this construct is
used only to denote a presence filter.