I want to check if an email address fits a pattern:
-Only letters, numbers, and '.' or '_' symbols.
-The last part (ex: .com) must contain between 2 and 4 letters.
This is my Reg Exp: '[a-zA-Z0-9._]+#[a-zA-Z0-9._]+.[a-zA-Z]{2,4}'
The problem is that it accepts symbols like %, and .commmm is accepted as the last part. How could I solve it?
The main problems are actually two here:
You are using an unescaped . outside the character class that may match any symbol (but a newline)
You are not using anchors ^ and $, and thus you may match substring inside a larger string.
Use
'^[a-zA-Z0-9._]+#[a-zA-Z0-9._]+[.][a-zA-Z]{2,4}$'
^ ^^^ ^
When you place a . into a pair of square brackets, you match a literal period.
I think you just need ^ and $ to specify the beginning and end of the string:
'^[a-zA-Z0-9.]+#[a-zA-Z0-9.]+.[a-zA-Z]{2,4}$'
You might want to slightly adjust the rules so the email and domain cannot start with a period:
'^\w[a-zA-Z0-9.]*#\w[a-zA-Z0-9.]*.[a-zA-Z]{2,4}$'
Related
I am a complete Reg-exp noob, so please bear with me. Tried to google this, but haven't found it yet.
What would be an appropriate way of writing a Regular expression matching files starting with a dot, such as .buildpath or .htaccess?
Thanks a lot!
In most regex languages, ^\. or ^[.] will match a leading dot.
The ^ matches the beginning of a string in most languages. This will match a leading .. You need to add your filename expression to it.
^\.
Likewise, $ will match the end of a string.
You may need to substitute the \ for the respective language escape character. However, under Powershell the Regex I use is: ^(\.)+\/
Test case:
"../NameOfFile.txt" -match '^(\\.)+\\\/'
works, while
"_./NameOfFile.txt" -match '^(\\.)+\\\/'
does not.
Naturally, you may ask, well what is happening here?
The (\\.) searches for the literal . followed by a +, which matches the previous character at least once or more times.
Finally, the \\\/ ensures that it conforms to a Window file path.
It depends a bit on the regular expression library you use, but you can do something like this:
^\.\w+
The ^ anchors the match to the beginning of the string, the \. matches a literal period (since an unescaped . in a regular expression typically matches any character), and \w+ matches 1 or more "word" characters (alphanumeric plus _).
See the perlre documentation for more info on Perl-style regular expressions and their syntax.
It depends on what characters are legal in a filename, which depends on the OS and filesystem.
For example, in Windows that would be:
^\.[^<>:"/\\\|\?\*\x00-\x1f]+$
The above expression means:
Match a string starting with the literal character .
Followed by at least one character which is not one of (whole class of invalid chars follows)
I used this as reference regarding which chars are disallowed in filenames.
To match the string starting with dot in java you will have to write a simple expression
^\\..*
^ means regular expression is to be matched from start of string
\. means it will start with string literal "."
.* means dot will be followed by 0 or more characters
We have a problem with a regular expression on hive.
We need to exclude the numbers with +37 or 0037 at the beginning of the record (it could be a false result on the regex like) and without letters or space.
We're trying with this one:
regexp_like(tel_number,'^\+37|^0037+[a-zA-ZÀÈÌÒÙ ]')
but it doesn't work.
Edit: we want it to come out from the select as true (correct number) or false.
To exclude numbers which start with +01 0r +001 or +0001 and having only digits without spaces or letters:
... WHERE tel_number NOT rlike '^\\+0{1,3}1\\d+$'
Special characters like + and character classes like \d in Hive should be escaped using double-slash: \\+ and \\d.
The general question is, if you want to describe a malformed telephone number in your regex and exclude everything that matches the pattern or if you want to describe a well-formed telephone number and include everything that matches the pattern.
Which way to go, depends on your scenario. From what I understand of your requirements, adding "not starting with 0037 or +37" as a condition to a well-formed telephone number could be a good approach.
The pattern would be like this:
Your number can start with either + or 00: ^(\+|00)
It cannot be followed by a 37 which in regex can be expressed by the following set of alternatives:
a. It is followed first by a 3 then by anything but 7: 3[0-689]
b. It is followed first by anything but 3 then by any number: [0-24-9]\d
After that there is a sequence of numbers of undefined length (at least one) until the end of the string: \d+$
Putting everything together:
^(\+|00)(3[0-689]|[0-24-9]\d)\d+$
You can play with this regex here and see if this fits your needs: https://regex101.com/r/KK5rjE/3
Note: as leftjoin has pointed out: To use this regex in hive you might need to additionally escape the backslashes \ in the pattern.
You can use
regexp_like(tel_number,'^(?!\\+37|0037)\\+?\\d+$')
See the regex demo. Details:
^ - start of string
(?!\+37|0037) - a negative lookahead that fails the match if there is +37 or 0037 immediately to the right of the current location
\+? - an optional + sign
\d+ - one or more digits
$ - end of string.
I am using ANTLR to parse a language which uses the colon for both a comment indicator and as part of a 'becomes equal to' assignment. So for example in the line
Index := 2 :Set Index
I need to recognize the first part as an assignment statement and the text after the second colon as a comment. Currently I do this using the rule:
COMMENT : ':'+ ~[:='\r\n']*;
This seems to work OK apart from when the colon is immediately followed by a new line. e.g. in the line
Index := 2 :
the newline occurs immediately after the second colon. In this case the comment is not recognized and the rest of the code is not parsed in the correct context. If there is a single space after the second colon the line is parsed correctly.
I expected the '\r'\n' to cope with this but it only seems to work if there is at least one character after the comment symbol - have I missed something from the command?
The braces denote a collection of characters without any quotes. Hence your '\r\n' literal doesn't work there (you should have got a warning that the apostrophe is included more than once in the char range.
Define the comment like this instead:
COMMENT: ':'+ ~[:=\n\r]*;
I'm looking for a regular expression to match the pattern xxxx/xxxxU, where x can be 0-9 and the "U" at the end is optional.
Valid examples: 1111/1111, 1111/1111U
Invalid examples: 1111/1111Z, 111/1111
I could reach until '[0-9]{4}/[0-9]{4}', but I'm not sure how to handle the optional "U" at the end.
Always match numbers-slash-numbers; U included, if present:
[0-9]{4}/[0-9]{4}U?
or if you replace the number ranges with \d (for a digit character):
\d{4}/\d{4}U?
The ? means zero or one of the preceding character. So zero or one of U. Test it
The entire string 1111\2222U will be matched, while the match for 1111\2222Z will include the digits-slash-digits part but not the Z.
Only match if string ends in a digit or U:
If a string fragment ending in any letter other than U is not to be matched at all, try something like:
^\d{4}/\d{4}U?$
which matches if the numbers-slash-numbers plus optional U is the only content in the string (test it) or
\d{4}/\d{4}U?(\s|$)
which matches if the numbers-slash-numbers plus optional U is followed by either a white space character (included in the match) or the end of the string. (Test it.)
(Note: the test it links show the "/" between the numbers escaped with "\" [e.g. "/"]--something required by that implementation. I'm not familiar with Oracle's regex syntax, so this may not be required on that platform.)
I'd use '[0-9]{4}/[0-9]{4}U?' or '[0-9]{4}/[0-9]{4}U{0,1}'
Found at: http://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_re.htm
You could try this expression:
[0-9]{4}/[0-9]{4}U?
The ? means: optional (0 or 1). Have a look at this useful regex overview table.
Thanks for all your comments, but there were some cases which your answer did not cover.
'^[0-9]{4}/[0-9]{4}U?$'
The above works for all of my above cases
I am trying to construct a regular expression to find the text of the following variations.
NSLocalizedString(#"TEXT")
NSLocalizedStringFromTable(#"TEXT")
NSLocalizedStringWithDefaultValue(#"TEXT")
...
The goal is to extract TEXT. I have been able to construct a regex for each individual function or macro, e.g., (?<=NSLocalizedString)\(#"(.*?)". However, I am looking for a solution that does the job no matter what the name of the function as long as it starts with NSLocalizedString.
I assumed it was as simple as (?<=NSLocalizedString\w+)\(#"(.*?)", but that does't seem to do the trick.
How about this one?
/NSLocalizedString\w*\(#"(.*)"\)/
Explanation:
NSLocalizedString 'NSLocalizedString'
\w+ word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
\( '('
#" '#"'
( group and capture to \1:
.* any character except \n (0 or more times
(matching the most amount possible))
) end of \1
" '"'
\) ')'
The only reason your regex doesn't work is because the regex engine doesn't support variable length lookbehinds. The (?<=NSLocalizedString\w+) is variable length so can't be used.
Firstly it needs to be \w* not \w+, to allow your first example string to match.
If you move the \w* outside the lookbehind (?<=NSLocalizedString)\w* it will work just fine.
Alternatively, since you have to use a capturing group to grab the text value anyway, theres no need for the lookbehind at all. Change the (?<= to a (?: and it becomes a non-capturing group (which can be variable length), and then just grab your text value from group 1.
Your attempt was:
(?<=NSLocalizedString\w+)\(#"(.*?)"
Both of these minor changes should make it work:
(?<=NSLocalizedString)\w*\(#"(.*?)"
(?:NSLocalizedString\w*)\(#"(.*?)"
The following is actually not supported in Objective-C:
The solution that will extract exactly TEXT without using any groups is:
NSLocalizedString\w*\(#"\K[^"]*
It avoids the need to use a negative lookbehind (which can't be used for reasons I explain below) by using the \K modifier, which chops off anything before it from the match.