This is in response to my previous question:
PowerShell: -replace, regex and ($) dollar sign woes
My question is: why do these 2 lines of code have different output:
'abc' -replace 'a(\w)', '$1'
'abc' -replace 'a(\w)', "$1"
AND according to the 2 articles below, why doesn't the variable '$1' in single quotes get used as a literal string? Everything in single quotes should be treated as a literal text string, right?
http://www.computerperformance.co.uk/powershell/powershell_quotes.htm
http://blogs.msdn.com/b/powershell/archive/2006/07/15/variable-expansion-in-strings-and-herestrings.aspx
When you use single quotes you tell PowerShell to use a string literal meaning everything between the opening and closing quote is to be interpreted literally.
When you use double quotes, PowerShell will interpret specific characters inside the double quotes.
See get-help about_quoting_rules or click here.
The dollar sign has a special meaning in regular expressions and in PowerShell. You want to use the single quotes if you intend the dollar sign to be used as the regular expression.
In your example the regex a(\w) is matching the letter 'a' and then a word character captured in back reference #1. So when you replace with $1 you are replacing the matched text ab with back reference match b. So you get bc.
In your second example with using double quotes PowerShell interprets "$1" as a string with the variable $1 inside. You don't have a variable named $1 so it's null. So the regex replaced ab with null which is why you only get c.
In your second line:
'abc' -replace 'a(\w)', "$1"
Powershell replaces the $1 before it gets to the regex replace operation, as others have stated. You can avoid that replacement by using a backtick, as in:
'abc' -replace 'a(\w)', "`$1"
Thus, if you had a string in a variable $prefix which you wanted to include in the replacement string, you could use it in the double quotes like this:
'abc' -replace 'a(\w)', "$prefix`$1"
The '$1' is a regex backreference. It's created by the regex match, and it only exists within the context of that replace operation. It is not a powershell variable.
"$1" will be interpreted as a Powershell variable. If no variable called $1 exists, the replacement value will be null.
Since I cannot comment or upvote, David Rogers' answer worked for me. I needed to use both RegEx backreference as well as a Powershell variable in a RexEx replace.
I needed to understand what the backtick did before I implemented it, here is the explanation: backtick is Powershell's escape character.
My usecase
$new = "AAA"
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', "`$1.`$2$new."
Result
REPORT.TESTAAA.TXT
Alternatives
Format string
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', ('$1.$2{0}.' -f )
Comments
as per https://get-powershellblog.blogspot.com/2017/07/bye-bye-backtick-natural-line.html I'll probably use the format string method to avoid the use of backticks.
Here's the powershell 7 version where you don't have to deal with a single quoted $1, with a script block as the second argument, replacing 'ab' with 'b':
'abc' -replace 'a(\w)', {$_.groups[1]}
bc
Related
I scanned a document in to kotlin and it has words, numbers, values, etc... but I only want the values that start with a $ and have 2 decimal places after the .(so the price) do I use a combination of a substring with other string parses?
Edit: I have looked into Regex and the problem I am having now is I am using this line
val reg = Regex("\$([0-9]*\.[0-9]*)")
to grab all the prices however the portion of *. is saying Invalid escape. However in other languages this works just fine.
You have to use double \ instead of single . It's because the \ is an escape character both in Regex and in Kotlin/Java strings. So when \ appears in a String, Kotlin expects it to be followed by a character that needs to be escaped. But you aren't trying to escape a String's character...you're trying to escape a Regex character. So you have to escape your backslash itself using another backslash, so the backslash is part of the computed String literal and can be understood by Regex.
You also need double \ before your dollar sign for it to behave correctly. Technically, I think it should be triple \ because $ is a special character in both Kotlin and in Regex and you want to escape it in both. However, Kotlin seems smart enough to guess what you're trying to do with a double escape if no variable name or expression follows the dollar sign. Rather than rely on that, I would use the triple escape.
val reg = Regex("\\\$([0-9]*\\.[0-9]*)")
I have successfully used this syntax to assign and locate keywords. I'd like to take the same approach but instead of knowing the word in advance like in the example below, I want to pass the word/expression as a variable, for example replacing options, with a var. How would I do that?
phrase <- "(options) ([^ ]+)"
You can use paste0:
phrase <- paste0("(", optionsVar, ") ([^ ]+)")
Also, note that [^ ]+ matches one or more chars other than spaces, but you can probably replace it with \\S+, one or more non-whitespace chars. Same with the literal space: \\s or \\s+ might prove more flexible.
I tried many ways to get a single backslash from an executed (I don't mean an input from html).
I can get special characters as tab, new line and many others then escape them to \\t or \\n or \\(someother character) but I cannot get a single backslash when a non-special character is next to it.
I don't want something like:
str = "\apple"; // I want this, to return:
console.log(str); // \apple
and if I try to get character at 0 then I get a instead of \.
(See ES2015 update at the end of the answer.)
You've tagged your question both string and regex.
In JavaScript, the backslash has special meaning both in string literals and in regular expressions. If you want an actual backslash in the string or regex, you have to write two: \\.
The following string starts with one backslash, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash in the string:
var str = "\\I have one backslash";
The following regular expression will match a single backslash (not two); again, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash character in the regular expression pattern:
var rex = /\\/;
If you're using a string to create a regular expression (rather than using a regular expression literal as I did above), note that you're dealing with two levels: The string level, and the regular expression level. So to create a regular expression using a string that matches a single backslash, you end up using four:
// Matches *one* backslash
var rex = new RegExp("\\\\");
That's because first, you're writing a string literal, but you want to actually put backslashes in the resulting string, so you do that with \\ for each one backslash you want. But your regex also requires two \\ for every one real backslash you want, and so it needs to see two backslashes in the string. Hence, a total of four. This is one of the reasons I avoid using new RegExp(string) whenver I can; I get confused easily. :-)
ES2015 and ES2018 update
Fast-forward to 2015, and as Dolphin_Wood points out the new ES2015 standard gives us template literals, tag functions, and the String.raw function:
// Yes, this unlikely-looking syntax is actually valid ES2015
let str = String.raw`\apple`;
str ends up having the characters \, a, p, p, l, and e in it. Just be careful there are no ${ in your template literal, since ${ starts a substitution in a template literal. E.g.:
let foo = "bar";
let str = String.raw`\apple${foo}`;
...ends up being \applebar.
Try String.raw method:
str = String.raw`\apple` // "\apple"
Reference here: String.raw()
\ is an escape character, when followed by a non-special character it doesn't become a literal \. Instead, you have to double it \\.
console.log("\apple"); //-> "apple"
console.log("\\apple"); //-> "\apple"
There is no way to get the original, raw string definition or create a literal string without escape characters.
please try the below one it works for me and I'm getting the output with backslash
String sss="dfsdf\\dfds";
System.out.println(sss);
Im trying to replace all lines within files that contains:
/var/www/webxyz/html
to
/home/webxyz/public_html
the string: webxyz is variable: like web1, web232
So only the string before and after webxyz should be replaced.
Tried this without solution:
sed -i 's/"var/www/web*/html"/"home/web*/public_html"/g'
Also i want this should check and replace files (inclusive subdirectory and files),
the * operator don't work.
Within a regular expression, you’ll need to escape the delimiting character that surround them, in your case the /. But you can also use a different delimiter like ,:
sed -i 's,"var/www/web*/html","home/web*/public_html",g'
But to get it working as intended, you’ll also need to remove the " and replace the b* (sed doesn’t understand globbing wildcards) to something like this:
sed -i 's,var/www/web\([^/]*\)/html,home/web\1/public_html,g'
Here \([^/]*\) is used to match anything after web except a /. The matching string is then referenced by \1 in the replacement part.
Here is what your replacement operation should look like (see sed for more info):
sed -i 's/var\/www\(\/.*\/\)html/home\1public_html/g'
Note that \(...\) is a grouping, and specifies a "match variable" which shows up in the replacement side as \1. Note also the .* which says "match any single character (dot) zero or more times (star)". Note further that your / characters must be escaped so that they are not treated as part of the sed control structure.
I have a smaller SSIS packat that I am trying to match a file name with VB Regex and delete "said" file. My regex looks like this, ^RegZStmntAdj.[A-z0-9_].\.txt$, and I am trying to figure out why it won't match any of the files in the directory. This is valid syntax if I am thinking correctly.
RegZStmntAdj2_07272011.txt
RegZStmntAdj1_07272011.txt
RegZStmntAdj2_07272011.txxt
New Text Document.txt
If I run the regex with ^RegZStmntAdj.*.\.txt$, it matches the correct files and deletes them. I know * works, but I would like to learn to make more precise Regular Expressions.
RegZStmntAdj2_07272011.txt
RegZStmntAdj1_07272011.txt
"^RegZStmntAdj.[A-z0-9_].\.txt$" matches
literal RegZStmntAdj at BOS
one character (except \n)
one character of the A-z, 0-9, and _ set
one character (except \n)
a dot
literal txt at EOS
but your typical infix "2_07272011" surely has more than 3 characters. Try
"^RegZStmntAdj[A-Za-z0-9_]+\.txt$" instead.
Try the following Regex:
^RegZStmntAdj.[\w_]{9}\.txt$
I've used \w, which is the same as A-Za-z0-9, and told it to match 9 characters so that it will match the _<date> part of your filename. You were only matching the first character from there (i.e. the underscore).
Using Powershell to verify:
PS> $test = "^RegZStmntAdj.[\w_]{9}\.txt$"
PS> "RegZStmntAdj2_07272011.txt" -match $test
True
PS> "RegZStmntAdj1_07272011.txt" -match $test
True
PS> "RegZStmntAdj2_07272011.txxt" -match $test
False # (Correct as contains 2 "xx"s in extension)
PS> "New Text Document.txt" -match $test
False # (Correct as nowhere near a match!!)
To make your regex even more precise, you could use ^RegZStmntAdj\d_[\d]{8}\.txt$, which translates to:
A string starting with "RegZStmntAdj", then a digit, then an
underscore, then 8 digits, then ending in ".txt"
which I believe is what you are looking for.