I want to write regex that:
pass string starting with /scripts/my_app/ (/scripts/my_app/xxxx.sh or /scripts/my_app/yyyyy/xxxz.sh)
exclude another strings starting with /scripts/ (/scripts/another_string/xxxx.sh or /scripts/some_string/yyyyy/xxxz.sh)
pass any others strings like /xxxx/yyyy/zzzzz/qqqq.sh or /aaaa/bbbb/ccc.sh
Something like this ^[(\/scripts\/my_app\/)|(?!\/scripts\/)].* doesn't work.
You can use
^/(?:scripts/my_app/|(?!scripts/)).*
Explanation
^/ Start of string and match /
(?: Non capture group
scripts/my_app/ Match literally
| Or
(?!scripts/) Negative lookahead, assert not scripts/
) Close non capture group
.* Match 0+ times any character
Regex demo
Another option is using a double negative lookahead
^/(?!scripts/(?!my_app/)).*
Regex demo
Related
I am a complete Reg-exp noob, so please bear with me. Tried to google this, but haven't found it yet.
What would be an appropriate way of writing a Regular expression matching files starting with a dot, such as .buildpath or .htaccess?
Thanks a lot!
In most regex languages, ^\. or ^[.] will match a leading dot.
The ^ matches the beginning of a string in most languages. This will match a leading .. You need to add your filename expression to it.
^\.
Likewise, $ will match the end of a string.
You may need to substitute the \ for the respective language escape character. However, under Powershell the Regex I use is: ^(\.)+\/
Test case:
"../NameOfFile.txt" -match '^(\\.)+\\\/'
works, while
"_./NameOfFile.txt" -match '^(\\.)+\\\/'
does not.
Naturally, you may ask, well what is happening here?
The (\\.) searches for the literal . followed by a +, which matches the previous character at least once or more times.
Finally, the \\\/ ensures that it conforms to a Window file path.
It depends a bit on the regular expression library you use, but you can do something like this:
^\.\w+
The ^ anchors the match to the beginning of the string, the \. matches a literal period (since an unescaped . in a regular expression typically matches any character), and \w+ matches 1 or more "word" characters (alphanumeric plus _).
See the perlre documentation for more info on Perl-style regular expressions and their syntax.
It depends on what characters are legal in a filename, which depends on the OS and filesystem.
For example, in Windows that would be:
^\.[^<>:"/\\\|\?\*\x00-\x1f]+$
The above expression means:
Match a string starting with the literal character .
Followed by at least one character which is not one of (whole class of invalid chars follows)
I used this as reference regarding which chars are disallowed in filenames.
To match the string starting with dot in java you will have to write a simple expression
^\\..*
^ means regular expression is to be matched from start of string
\. means it will start with string literal "."
.* means dot will be followed by 0 or more characters
We have a problem with a regular expression on hive.
We need to exclude the numbers with +37 or 0037 at the beginning of the record (it could be a false result on the regex like) and without letters or space.
We're trying with this one:
regexp_like(tel_number,'^\+37|^0037+[a-zA-ZÀÈÌÒÙ ]')
but it doesn't work.
Edit: we want it to come out from the select as true (correct number) or false.
To exclude numbers which start with +01 0r +001 or +0001 and having only digits without spaces or letters:
... WHERE tel_number NOT rlike '^\\+0{1,3}1\\d+$'
Special characters like + and character classes like \d in Hive should be escaped using double-slash: \\+ and \\d.
The general question is, if you want to describe a malformed telephone number in your regex and exclude everything that matches the pattern or if you want to describe a well-formed telephone number and include everything that matches the pattern.
Which way to go, depends on your scenario. From what I understand of your requirements, adding "not starting with 0037 or +37" as a condition to a well-formed telephone number could be a good approach.
The pattern would be like this:
Your number can start with either + or 00: ^(\+|00)
It cannot be followed by a 37 which in regex can be expressed by the following set of alternatives:
a. It is followed first by a 3 then by anything but 7: 3[0-689]
b. It is followed first by anything but 3 then by any number: [0-24-9]\d
After that there is a sequence of numbers of undefined length (at least one) until the end of the string: \d+$
Putting everything together:
^(\+|00)(3[0-689]|[0-24-9]\d)\d+$
You can play with this regex here and see if this fits your needs: https://regex101.com/r/KK5rjE/3
Note: as leftjoin has pointed out: To use this regex in hive you might need to additionally escape the backslashes \ in the pattern.
You can use
regexp_like(tel_number,'^(?!\\+37|0037)\\+?\\d+$')
See the regex demo. Details:
^ - start of string
(?!\+37|0037) - a negative lookahead that fails the match if there is +37 or 0037 immediately to the right of the current location
\+? - an optional + sign
\d+ - one or more digits
$ - end of string.
I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/ . But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.
Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a STRING where all substrings of value that match regular
expression regexp are replaced with replacement.
You can use backslashed-escaped digits (\1 to \9) within the
replacement argument to insert text matching the corresponding
parenthesized group in the regexp pattern. Use \0 to refer to the
entire matching text.
I would like to write a regular expression that starts with the string "wp" and ends with the string "php" to locate a file in a directory. How do I do it?
Example file: wp-comments-post.php
This should do it for you ^wp.*php$
Matches
wp-comments-post.php
wp.something.php
wp.php
Doesn't match
something-wp.php
wp.php.txt
^wp.*\.php$ Should do the trick.
The .* means "any character, repeated 0 or more times". The next . is escaped because it's a special character, and you want a literal period (".php"). Don't forget that if you're typing this in as a literal string in something like C#, Java, etc., you need to escape the backslash because it's a special character in many literal strings.
Example:
ajshdjashdjashdlasdlhdlSTARTasdasdsdaasdENDaknsdklansdlknaldknaaklsdn
1) START\w*END
return: STARTasdasdsdaasdEND - will give you words between START and END
2) START\d*END
return: START12121212END - will give you numbers between START and END
3) START\d*_\d*END
return: START1212_1212END - will give you numbers between START and END having _
I want to do a find/replace on the string matching (abcdefg)1 with \12, but it doesn't work. How do I do it?
Add a leading zero to the backreference:
\012
References
Example: Regular Expression Matching a Valid Date
How do you get a literal digit after a backreference in the search pattern