why ldap search return all results when using %? - ldap

When I search one ldap server using the following filter
(cn=%*)
It return all results under the base dn? LDAP treat '%' specially? But I haven't found any description about it.

What is your directory server ?
Are you sure tha '%' is not replace by your command line interpreter or your compiler ?
According to RFC2254 % is not a special character
If a value should contain any of the following characters
Character ASCII value
---------------------------
* 0x2a
( 0x28
) 0x29
\ 0x5c
NUL 0x00
the character must be encoded as the backslash '\' character (ASCII
0x5c) followed by the two hexadecimal digits representing the ASCII
value of the encoded character. The case of the two hexadecimal
digits is not significant.
This simple escaping mechanism eliminates filter-parsing ambiguities
and allows any filter that can be represented in LDAP to be
represented as a NUL-terminated string. Other characters besides the
ones listed above may be escaped using this mechanism, for example,
non-printing characters.
For example, the filter checking whether the "cn" attribute contained
a value with the character "" anywhere in it would be represented as
"(cn=\2a*)".
Note that although both the substring and present productions in the
grammar above can produce the "attr=*" construct, this construct is
used only to denote a presence filter.

Related

regex capture middle of url

I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times

Oracle regexp_like pattern with POSIX character class

I have username in this pattern
ref_2_34_aaa_dos
ref_2_34_bbb_dos
How can I use regexp_like for this?
SELECT username FROM all_users WHERE regexp_like(username, '^ref_2_34_[:alpha:]_dos$')
does not work. Nor can I use ESCAPE '\' with regexp_like. it would give a syntax error.
You need to put the POSIX character class [:alpha:] into a bracket expression (i.e. [...]) and apply a + quantifier to it:
regexp_like(username, '^ref_2_34_[[:alpha:]]+_dos$')
The + quantifier means there can be 1 or more letters between the last but one and last underscores.
If your string may have no user name at that location (it is empty), and you want to get those entries too, you would need to replace the plus with the * quantifier that matches zero or more occurrences of the quantified subpattern.
Since comments require some more clarifications, here is some bracket expression and POSIX character class reference.
Bracket expressions
Matches any single character in the list within the brackets. The following operators are allowed within the list, but other metacharacters included are treated as literals:
Range operator: -
POSIX character class: [: :]
POSIX collation element: [. .]
POSIX character equivalence class: [= =]
A dash (-) is a literal when it occurs first or last in the list, or as an ending range point in a range expression, as in [#--]. A right bracket (]) is treated as a literal if it occurs first in the list.
POSIX Character Class (can be a part of the bracket expression):
[:class:] - Matches any character belonging to the specified POSIX character class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported.
...
The expression [[:upper:]]+ searches for one or more consecutive uppercase characters.
Bracket expressions can be considered a kind of a "container" construct for multiple atoms, that, as a whole regex unit, matches some class of characters you defined. If you need to match a <, or >, or letters, you may combine them into 1 bracket expression [<>[:alpha:]]. To match zero or more of <, > or letters, add a * quantifier after ]: [<>[:alpha:]]*.
Or, to imitate a trailing word boundary, one might use [^_[:alnum:]] (say, in a ($|[^_[:alnum:]]) pattern) that matches any character but a _, digits and letters ([:alnum:] matches alphanumerical symbols).

Regular Expression for alphanumeric and some special characters not adjacent

I would like to have a regular expression to make an Oracle SQL REGEXP_LIKE query that checks
if a string starts with one alphanumeric character
if the string ends with one alphanumeric character
if the "body" of the string contains only alphanumeric character OR these authorized characters (written) : hyphen (dash), dot, apostrophe, space
if the authorised characters are NOT adjacent (to avoid something like "he--'''l..'-lo")
I started with this :
^[a-zA-Z0-9]+(a-zA-Z0-9\-\.'|([^\-\.'])\1)*[a-zA-Z0-9]$
I used backslash to escape assuming that dot and hyphen are metacharacters
I think this is what you want:
^[a-zA-Z0-9]+([-.' ][a-zA-Z0-9]|[a-zA-Z0-9])*\w?$
It looks for
at least 1 alphanumeric (alnum),
followed by
either an authorized character followed by an alphanumeric or just an alphanumeric, repeated any number of times (including 0).
optionally followed by
an alnum
This meets your specification. I'm not sure if starts with one alnum and ends with one alnum means that there must be at least 2 alnums, or if they can be the same. If there must be at least 2 of them, remove the last ? (which make the last alnum optional).
Regards
assuming you meant "authorised characters are NOT adjacent to each other"
try something along these lines
^[a-zA-Z0-9]+([a-zA-Z0-9]+[\-\.' ]?)*[a-zA-Z0-9]$
so that the repeating middle part always has one alphanumeric character followed by zero to one special characters.

Approximate search with openldap

I am trying to write a search that queries our directory server running openldap.
The users are going to be searching using the first or last name of the person they're interested in.
I found a problem with accented characters (like áéíóú), because first and last names are written in Spanish, so while the proper way is Pérez it can be written for the sake of the search as Perez, without the accent.
If I use '(cn=*Perez*)' I get only the non-accented results.
If I use '(cn=*Pérez*)' I get only accented results.
If I use '(cn=~Perez)' I get weird results (or at least nothing I can use, because while the results contain both Perez and Pérez ocurrences, I also get some results that apparently have nothing to do with the query...
In Spanish this happens quite a lot... be it lazyness, be it whatever you want to call it, the fact is that for this kind of thing people tend NOT to write the accents because it's assumend all these searches work with both options (I guess since Google allowes it, everybody assumes it's supposed to work that way).
Other than updating the database and removing all accents and trimming them on the query... can you think of another solution?
You have your ~ and = swapped above. It should be (cn~=Perez). I still don't know how well that will work. Soundex has always been strange. Since many attributes are multi-valued including cn you could store a second value on the attribute that has the extended characters converted to their base versions. You would at least have the original value to still go off of when you needed it. You could also get real fancy and prefix the converted value with something and use the valuesReturnFilter to filter it out from your results.
#Sample object
dn:cn=Pérez,ou=x,dc=y
cn:Pérez
cn:{stripped}Perez
sn:Pérez
#etc.
Then modify your query to use an or expression.
(|(cn=Pérez)(cn={stripped}Perez))
And you would include a valuesReturnFilter that looked like
(!(cn={stripped}*))
See RFC3876 http://www.networksorcery.com/enp/rfc/rfc3876.txt for details. The method for adding a request control varies by what platform/library you are using to access the directory.
Search filters ("queries") are specified by RFC2254.
Encoding:
RFC2254
actually requires filters (indirectly defined) to be an
OCTET STRING, i.e. ASCII 8-byte String:
AttributeValue is OCTET STRING,
MatchingRuleId
and AttributeDescription
are LDAPString, LDAPString is an OCTET STRING.
The standard on escaping: Use "<ASCII HEX NUMBER>" to replace special characters
(https://www.rfc-editor.org/rfc/rfc4515#page-4, examples https://www.rfc-editor.org/rfc/rfc4515#page-5).
Quote:
The <valueencoding> rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters "*" (ASCII 0x2a), "(" (ASCII 0x28), ")" (ASCII
0x29), "\" (ASCII 0x5c), and NUL (ASCII 0x00) are
represented as a backslash "\" (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.
Additionally, you should probably replace all characters that semantically modify the filter (RFC 4515's grammar gives a list), and do a Regex replace of non-ASCII characters with wildcards (*) to be sure. This will also help you with characters like "é".

RegEx to find % symbols in a string that don't form the start of a legal two-digit escape sequence?

I would like a regular expression to find the %s in the source string that don't form the start of a valid two-hex-digit escaped character (defined as a % followed by exactly two hexadecimal digits, upper or lower case) that can be used to replace only these % symbols with %25.
(The motivation is to make the best guess attempt to create legally escaped strings from strings of various origins that may be legally percent escaped and may not, and may even be a mixture of the two, without damaging the data intent if the original string was already correctly encoded, e.g. by blanket re-encoding).
Here's an example input string.
He%20has%20a%2050%%20chance%20of%20living%2C%20but%20there%27s%20only%20a%2025%%20chance%20of%20that.
This doesn't conform to any encoding standard because it is a mix of valid escaped characters eg. %20 and two loose percentage symbols. I'd like to convert those %s to %25s.
My progress so far is to identify a regex %[0-9a-z]{2} that finds the % symbols that are legal but I can't work out how to modify it to find the ones that aren't legal.
%(?![0-9a-fA-F]{2})
Should do the trick. Use a look-ahead to find a % NOT followed by a valid two-digit hexadecimal value then replace the found % symbol with your %25 replacement.
(Hopefully this works with (presumably) NSRegularExpression, or whatever you're using)
%(?![a-fA-F0-9]{2})
That's a percent followed by a negative lookahead for two hex digits.