Use variable as key word for grep function - variables

I have successfully used this syntax to assign and locate keywords. I'd like to take the same approach but instead of knowing the word in advance like in the example below, I want to pass the word/expression as a variable, for example replacing options, with a var. How would I do that?
phrase <- "(options) ([^ ]+)"

You can use paste0:
phrase <- paste0("(", optionsVar, ") ([^ ]+)")
Also, note that [^ ]+ matches one or more chars other than spaces, but you can probably replace it with \\S+, one or more non-whitespace chars. Same with the literal space: \\s or \\s+ might prove more flexible.

Related

Trino regexp_replace this character in the beginning but not in the middle Trino [duplicate]

I am a complete Reg-exp noob, so please bear with me. Tried to google this, but haven't found it yet.
What would be an appropriate way of writing a Regular expression matching files starting with a dot, such as .buildpath or .htaccess?
Thanks a lot!
In most regex languages, ^\. or ^[.] will match a leading dot.
The ^ matches the beginning of a string in most languages. This will match a leading .. You need to add your filename expression to it.
^\.
Likewise, $ will match the end of a string.
You may need to substitute the \ for the respective language escape character. However, under Powershell the Regex I use is: ^(\.)+\/
Test case:
"../NameOfFile.txt" -match '^(\\.)+\\\/'
works, while
"_./NameOfFile.txt" -match '^(\\.)+\\\/'
does not.
Naturally, you may ask, well what is happening here?
The (\\.) searches for the literal . followed by a +, which matches the previous character at least once or more times.
Finally, the \\\/ ensures that it conforms to a Window file path.
It depends a bit on the regular expression library you use, but you can do something like this:
^\.\w+
The ^ anchors the match to the beginning of the string, the \. matches a literal period (since an unescaped . in a regular expression typically matches any character), and \w+ matches 1 or more "word" characters (alphanumeric plus _).
See the perlre documentation for more info on Perl-style regular expressions and their syntax.
It depends on what characters are legal in a filename, which depends on the OS and filesystem.
For example, in Windows that would be:
^\.[^<>:"/\\\|\?\*\x00-\x1f]+$
The above expression means:
Match a string starting with the literal character .
Followed by at least one character which is not one of (whole class of invalid chars follows)
I used this as reference regarding which chars are disallowed in filenames.
To match the string starting with dot in java you will have to write a simple expression
^\\..*
^ means regular expression is to be matched from start of string
\. means it will start with string literal "."
.* means dot will be followed by 0 or more characters

Why awk does not remove BOM from the middle of a line?

I try to use awk to remove all byte order marks from a file (I have many of them):
awk '{sub(/\xEF\xBB\xBF/,"")}{print}' f1.txt > f2.txt
It seems to remove all the BOMs that are in the beginning of the line but those in the middle are not removed. I can verify that by:
grep -U $'\xEF\xBB\xBF' f2.txt
Grep returns me one line where BOM is in the middle.
As mentioned sub() will only swap out the leftmost substring, so if global is what you're after then using gsub(), or even better gensub() is the way to go.
sub(regexp, replacement [, target])
Search target, which is treated as a string, for the leftmost, longest
substring matched by the regular expression regexp. Modify the entire
string by replacing the matched text with replacement. The modified
string becomes the new value of target. Return the number of
substitutions made (zero or one).
gsub(regexp, replacement [, target])
Search target for all of the longest, leftmost, nonoverlapping
matching substrings it can find and replace them with replacement. The
‘g’ in gsub() stands for “global,” which means replace everywhere.
gensub(regexp, replacement, how [, target]) #
Search the target string target for matches of the regular expression
regexp. If how is a string beginning with ‘g’ or ‘G’ (short for
“global”), then replace all matches of regexp with replacement.
Otherwise, "how" is treated as a number indicating which match of regexp
to replace. gensub() is a general substitution function. Its purpose is to provide more features than the standard sub() and gsub() functions.
There's tons more helpful information and examples linked below:
↳ The GNU Awk User's Guide: String Functions / 9.1.3 String-Manipulation Functions

Regular expression to match specific variations of function

I am trying to construct a regular expression to find the text of the following variations.
NSLocalizedString(#"TEXT")
NSLocalizedStringFromTable(#"TEXT")
NSLocalizedStringWithDefaultValue(#"TEXT")
...
The goal is to extract TEXT. I have been able to construct a regex for each individual function or macro, e.g., (?<=NSLocalizedString)\(#"(.*?)". However, I am looking for a solution that does the job no matter what the name of the function as long as it starts with NSLocalizedString.
I assumed it was as simple as (?<=NSLocalizedString\w+)\(#"(.*?)", but that does't seem to do the trick.
How about this one?
/NSLocalizedString\w*\(#"(.*)"\)/
Explanation:
NSLocalizedString 'NSLocalizedString'
\w+ word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
\( '('
#" '#"'
( group and capture to \1:
.* any character except \n (0 or more times
(matching the most amount possible))
) end of \1
" '"'
\) ')'
The only reason your regex doesn't work is because the regex engine doesn't support variable length lookbehinds. The (?<=NSLocalizedString\w+) is variable length so can't be used.
Firstly it needs to be \w* not \w+, to allow your first example string to match.
If you move the \w* outside the lookbehind (?<=NSLocalizedString)\w* it will work just fine.
Alternatively, since you have to use a capturing group to grab the text value anyway, theres no need for the lookbehind at all. Change the (?<= to a (?: and it becomes a non-capturing group (which can be variable length), and then just grab your text value from group 1.
Your attempt was:
(?<=NSLocalizedString\w+)\(#"(.*?)"
Both of these minor changes should make it work:
(?<=NSLocalizedString)\w*\(#"(.*?)"
(?:NSLocalizedString\w*)\(#"(.*?)"
The following is actually not supported in Objective-C:
The solution that will extract exactly TEXT without using any groups is:
NSLocalizedString\w*\(#"\K[^"]*
It avoids the need to use a negative lookbehind (which can't be used for reasons I explain below) by using the \K modifier, which chops off anything before it from the match.

sed replacing without untouching a string

Im trying to replace all lines within files that contains:
/var/www/webxyz/html
to
/home/webxyz/public_html
the string: webxyz is variable: like web1, web232
So only the string before and after webxyz should be replaced.
Tried this without solution:
sed -i 's/"var/www/web*/html"/"home/web*/public_html"/g'
Also i want this should check and replace files (inclusive subdirectory and files),
the * operator don't work.
Within a regular expression, you’ll need to escape the delimiting character that surround them, in your case the /. But you can also use a different delimiter like ,:
sed -i 's,"var/www/web*/html","home/web*/public_html",g'
But to get it working as intended, you’ll also need to remove the " and replace the b* (sed doesn’t understand globbing wildcards) to something like this:
sed -i 's,var/www/web\([^/]*\)/html,home/web\1/public_html,g'
Here \([^/]*\) is used to match anything after web except a /. The matching string is then referenced by \1 in the replacement part.
Here is what your replacement operation should look like (see sed for more info):
sed -i 's/var\/www\(\/.*\/\)html/home\1public_html/g'
Note that \(...\) is a grouping, and specifies a "match variable" which shows up in the replacement side as \1. Note also the .* which says "match any single character (dot) zero or more times (star)". Note further that your / characters must be escaped so that they are not treated as part of the sed control structure.

Quoting -replace & variables

This is in response to my previous question:
PowerShell: -replace, regex and ($) dollar sign woes
My question is: why do these 2 lines of code have different output:
'abc' -replace 'a(\w)', '$1'
'abc' -replace 'a(\w)', "$1"
AND according to the 2 articles below, why doesn't the variable '$1' in single quotes get used as a literal string? Everything in single quotes should be treated as a literal text string, right?
http://www.computerperformance.co.uk/powershell/powershell_quotes.htm
http://blogs.msdn.com/b/powershell/archive/2006/07/15/variable-expansion-in-strings-and-herestrings.aspx
When you use single quotes you tell PowerShell to use a string literal meaning everything between the opening and closing quote is to be interpreted literally.
When you use double quotes, PowerShell will interpret specific characters inside the double quotes.
See get-help about_quoting_rules or click here.
The dollar sign has a special meaning in regular expressions and in PowerShell. You want to use the single quotes if you intend the dollar sign to be used as the regular expression.
In your example the regex a(\w) is matching the letter 'a' and then a word character captured in back reference #1. So when you replace with $1 you are replacing the matched text ab with back reference match b. So you get bc.
In your second example with using double quotes PowerShell interprets "$1" as a string with the variable $1 inside. You don't have a variable named $1 so it's null. So the regex replaced ab with null which is why you only get c.
In your second line:
'abc' -replace 'a(\w)', "$1"
Powershell replaces the $1 before it gets to the regex replace operation, as others have stated. You can avoid that replacement by using a backtick, as in:
'abc' -replace 'a(\w)', "`$1"
Thus, if you had a string in a variable $prefix which you wanted to include in the replacement string, you could use it in the double quotes like this:
'abc' -replace 'a(\w)', "$prefix`$1"
The '$1' is a regex backreference. It's created by the regex match, and it only exists within the context of that replace operation. It is not a powershell variable.
"$1" will be interpreted as a Powershell variable. If no variable called $1 exists, the replacement value will be null.
Since I cannot comment or upvote, David Rogers' answer worked for me. I needed to use both RegEx backreference as well as a Powershell variable in a RexEx replace.
I needed to understand what the backtick did before I implemented it, here is the explanation: backtick is Powershell's escape character.
My usecase
$new = "AAA"
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', "`$1.`$2$new."
Result
REPORT.TESTAAA.TXT
Alternatives
Format string
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', ('$1.$2{0}.' -f )
Comments
as per https://get-powershellblog.blogspot.com/2017/07/bye-bye-backtick-natural-line.html I'll probably use the format string method to avoid the use of backticks.
Here's the powershell 7 version where you don't have to deal with a single quoted $1, with a script block as the second argument, replacing 'ab' with 'b':
'abc' -replace 'a(\w)', {$_.groups[1]}
bc