Regular Expression Match Filenames - sql

I have a smaller SSIS packat that I am trying to match a file name with VB Regex and delete "said" file. My regex looks like this, ^RegZStmntAdj.[A-z0-9_].\.txt$, and I am trying to figure out why it won't match any of the files in the directory. This is valid syntax if I am thinking correctly.
RegZStmntAdj2_07272011.txt
RegZStmntAdj1_07272011.txt
RegZStmntAdj2_07272011.txxt
New Text Document.txt
If I run the regex with ^RegZStmntAdj.*.\.txt$, it matches the correct files and deletes them. I know * works, but I would like to learn to make more precise Regular Expressions.
RegZStmntAdj2_07272011.txt
RegZStmntAdj1_07272011.txt

"^RegZStmntAdj.[A-z0-9_].\.txt$" matches
literal RegZStmntAdj at BOS
one character (except \n)
one character of the A-z, 0-9, and _ set
one character (except \n)
a dot
literal txt at EOS
but your typical infix "2_07272011" surely has more than 3 characters. Try
"^RegZStmntAdj[A-Za-z0-9_]+\.txt$" instead.

Try the following Regex:
^RegZStmntAdj.[\w_]{9}\.txt$
I've used \w, which is the same as A-Za-z0-9, and told it to match 9 characters so that it will match the _<date> part of your filename. You were only matching the first character from there (i.e. the underscore).
Using Powershell to verify:
PS> $test = "^RegZStmntAdj.[\w_]{9}\.txt$"
PS> "RegZStmntAdj2_07272011.txt" -match $test
True
PS> "RegZStmntAdj1_07272011.txt" -match $test
True
PS> "RegZStmntAdj2_07272011.txxt" -match $test
False # (Correct as contains 2 "xx"s in extension)
PS> "New Text Document.txt" -match $test
False # (Correct as nowhere near a match!!)
To make your regex even more precise, you could use ^RegZStmntAdj\d_[\d]{8}\.txt$, which translates to:
A string starting with "RegZStmntAdj", then a digit, then an
underscore, then 8 digits, then ending in ".txt"
which I believe is what you are looking for.

Related

Trino regexp_replace this character in the beginning but not in the middle Trino [duplicate]

I am a complete Reg-exp noob, so please bear with me. Tried to google this, but haven't found it yet.
What would be an appropriate way of writing a Regular expression matching files starting with a dot, such as .buildpath or .htaccess?
Thanks a lot!
In most regex languages, ^\. or ^[.] will match a leading dot.
The ^ matches the beginning of a string in most languages. This will match a leading .. You need to add your filename expression to it.
^\.
Likewise, $ will match the end of a string.
You may need to substitute the \ for the respective language escape character. However, under Powershell the Regex I use is: ^(\.)+\/
Test case:
"../NameOfFile.txt" -match '^(\\.)+\\\/'
works, while
"_./NameOfFile.txt" -match '^(\\.)+\\\/'
does not.
Naturally, you may ask, well what is happening here?
The (\\.) searches for the literal . followed by a +, which matches the previous character at least once or more times.
Finally, the \\\/ ensures that it conforms to a Window file path.
It depends a bit on the regular expression library you use, but you can do something like this:
^\.\w+
The ^ anchors the match to the beginning of the string, the \. matches a literal period (since an unescaped . in a regular expression typically matches any character), and \w+ matches 1 or more "word" characters (alphanumeric plus _).
See the perlre documentation for more info on Perl-style regular expressions and their syntax.
It depends on what characters are legal in a filename, which depends on the OS and filesystem.
For example, in Windows that would be:
^\.[^<>:"/\\\|\?\*\x00-\x1f]+$
The above expression means:
Match a string starting with the literal character .
Followed by at least one character which is not one of (whole class of invalid chars follows)
I used this as reference regarding which chars are disallowed in filenames.
To match the string starting with dot in java you will have to write a simple expression
^\\..*
^ means regular expression is to be matched from start of string
\. means it will start with string literal "."
.* means dot will be followed by 0 or more characters

Use variable as key word for grep function

I have successfully used this syntax to assign and locate keywords. I'd like to take the same approach but instead of knowing the word in advance like in the example below, I want to pass the word/expression as a variable, for example replacing options, with a var. How would I do that?
phrase <- "(options) ([^ ]+)"
You can use paste0:
phrase <- paste0("(", optionsVar, ") ([^ ]+)")
Also, note that [^ ]+ matches one or more chars other than spaces, but you can probably replace it with \\S+, one or more non-whitespace chars. Same with the literal space: \\s or \\s+ might prove more flexible.

Accessing parts of match in Perl 6

When I use a named regex, I can print its contents:
my regex rgx { \w\w };
my $string = 'abcd';
$string ~~ / <rgx> /;
say $<rgx>; # 「ab」
But if I want to match with :g or :ex adverb, so there is more than one match, it doesn't work. The following
my regex rgx { \w\w };
my $string = 'abcd';
$string ~~ m:g/ <rgx> /;
say $<rgx>; # incorrect
gives an error:
Type List does not support associative indexing.
in block <unit> at test1.p6 line 5
How should I modify my code?
UPD: Based on #piojo's explanation, I modified the last line as follows and that solved my problem:
say $/[$_]<rgx> for ^$/.elems;
The following would be easier, but for some reason it doesn't work:
say $_<verb> for $/; # incorrect
It seems like :g and :overlap are special cases: if your match is repeated within the regex, like / <rgx>* /, then you would access the matches as $<rgx>[0], $<rgx>[1], etc.. But in this case, the engine is doing the whole match more than once. So you can access those matches through the top-level match operator, $/. In fact, $<foo> is just a shortcut for $/<foo>.
So based on the error message, we know that in this case, $/ is a list. So we can access your matches as $/[0]<rgx> and $/[1]<rgx>.

sed replacing without untouching a string

Im trying to replace all lines within files that contains:
/var/www/webxyz/html
to
/home/webxyz/public_html
the string: webxyz is variable: like web1, web232
So only the string before and after webxyz should be replaced.
Tried this without solution:
sed -i 's/"var/www/web*/html"/"home/web*/public_html"/g'
Also i want this should check and replace files (inclusive subdirectory and files),
the * operator don't work.
Within a regular expression, you’ll need to escape the delimiting character that surround them, in your case the /. But you can also use a different delimiter like ,:
sed -i 's,"var/www/web*/html","home/web*/public_html",g'
But to get it working as intended, you’ll also need to remove the " and replace the b* (sed doesn’t understand globbing wildcards) to something like this:
sed -i 's,var/www/web\([^/]*\)/html,home/web\1/public_html,g'
Here \([^/]*\) is used to match anything after web except a /. The matching string is then referenced by \1 in the replacement part.
Here is what your replacement operation should look like (see sed for more info):
sed -i 's/var\/www\(\/.*\/\)html/home\1public_html/g'
Note that \(...\) is a grouping, and specifies a "match variable" which shows up in the replacement side as \1. Note also the .* which says "match any single character (dot) zero or more times (star)". Note further that your / characters must be escaped so that they are not treated as part of the sed control structure.

Quoting -replace & variables

This is in response to my previous question:
PowerShell: -replace, regex and ($) dollar sign woes
My question is: why do these 2 lines of code have different output:
'abc' -replace 'a(\w)', '$1'
'abc' -replace 'a(\w)', "$1"
AND according to the 2 articles below, why doesn't the variable '$1' in single quotes get used as a literal string? Everything in single quotes should be treated as a literal text string, right?
http://www.computerperformance.co.uk/powershell/powershell_quotes.htm
http://blogs.msdn.com/b/powershell/archive/2006/07/15/variable-expansion-in-strings-and-herestrings.aspx
When you use single quotes you tell PowerShell to use a string literal meaning everything between the opening and closing quote is to be interpreted literally.
When you use double quotes, PowerShell will interpret specific characters inside the double quotes.
See get-help about_quoting_rules or click here.
The dollar sign has a special meaning in regular expressions and in PowerShell. You want to use the single quotes if you intend the dollar sign to be used as the regular expression.
In your example the regex a(\w) is matching the letter 'a' and then a word character captured in back reference #1. So when you replace with $1 you are replacing the matched text ab with back reference match b. So you get bc.
In your second example with using double quotes PowerShell interprets "$1" as a string with the variable $1 inside. You don't have a variable named $1 so it's null. So the regex replaced ab with null which is why you only get c.
In your second line:
'abc' -replace 'a(\w)', "$1"
Powershell replaces the $1 before it gets to the regex replace operation, as others have stated. You can avoid that replacement by using a backtick, as in:
'abc' -replace 'a(\w)', "`$1"
Thus, if you had a string in a variable $prefix which you wanted to include in the replacement string, you could use it in the double quotes like this:
'abc' -replace 'a(\w)', "$prefix`$1"
The '$1' is a regex backreference. It's created by the regex match, and it only exists within the context of that replace operation. It is not a powershell variable.
"$1" will be interpreted as a Powershell variable. If no variable called $1 exists, the replacement value will be null.
Since I cannot comment or upvote, David Rogers' answer worked for me. I needed to use both RegEx backreference as well as a Powershell variable in a RexEx replace.
I needed to understand what the backtick did before I implemented it, here is the explanation: backtick is Powershell's escape character.
My usecase
$new = "AAA"
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', "`$1.`$2$new."
Result
REPORT.TESTAAA.TXT
Alternatives
Format string
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', ('$1.$2{0}.' -f )
Comments
as per https://get-powershellblog.blogspot.com/2017/07/bye-bye-backtick-natural-line.html I'll probably use the format string method to avoid the use of backticks.
Here's the powershell 7 version where you don't have to deal with a single quoted $1, with a script block as the second argument, replacing 'ab' with 'b':
'abc' -replace 'a(\w)', {$_.groups[1]}
bc