${If} $9 == $8
The above script in NSIS does a case insensitive string comparison, how can I ensure the comparison is made case sensitive
You should use the case-sensitive operator: S==
From Logiclib documentation:
Case-sensitive string tests (using System.dll):
a S== b; a S!= b
Related
I am facing a case where I need to transform a string to an int equivalent with gawk5.
This transformation must be deterministic.
My first, naive, approach is to convert each letter of the string to its equivalent position in the latin alphabet and then concat the results back into a string.
For example:
my_string = "AB"
A = 1
B = 2
my_int=12
However, this has several downsides:
Very long strings may generate an integer that goes beyond maximum integer size.
What to do in case of special characters, symbols, etc. ?
This requires me to hold a table of each character position in the alphabet.
So, basically, it's a no go.
What is a good and robust method to generate an integer from a string with gawk5 ?
PS: Some will comment that gawk may not be the tool for that, and they may be right and I am aware of that. But this is for a personnal project that should include only awk if possible ;)
If your string contains only ASCII characters, no newlines, and if you use GNU awk, the following simply converts each character into its 3-digits ASCII code:
$ echo "abc" | awk -vFS= '
BEGIN {for(i=0;i<128;i++) c[sprintf("%c",i)]=i}
{for(i=1;i<=NF;i++) printf("%03d",c[$i])}'
097098099
Of course this expands the string by a factor of 3, which can be sub-optimal. If you know that your string contains only ASCII characters in the 32-127 range you can reduce this factor to 2:
$ echo "abc" | awk -vFS= '
BEGIN {for(i=32;i<128;i++) c[sprintf("%c",i)]=i-32}
{for(i=1;i<=NF;i++) printf("%02d",c[$i])}'
656667
In GNU Awk User's Guide, I went through the section 6.2.2 String Concatenation and found interesting insights:
Because string concatenation does not have an explicit operator, it is often necessary to ensure that it happens at the right time by using parentheses to enclose the items to concatenate.
Then, I was quite surprised to read the following:
Parentheses should be used around concatenation in all but the most common contexts, such as on the righthand side of ‘=’. Be careful about the kinds of expressions used in string concatenation. In particular, the order of evaluation of expressions used for concatenation is undefined in the awk language. Consider this example:
BEGIN {
a = "don't"
print (a " " (a = "panic"))
}
It is not defined whether the second assignment to a happens before or after the value of a is retrieved for producing the concatenated value. The result could be either ‘don't panic’, or ‘panic panic’.
In particular, in my GNU Awk 5.0.0 it performs like this, doing the replacement before printing the value:
$ gawk 'BEGIN {a = "dont"; print (a " " (a = "panic"))}'
dont panic
However, I wonder: why isn't the order of evaluation of expressions defined? What are the benefits of having "undefined" outputs that may vary depending on the version of Awk you are running?
This particular example is about expressions with side-effects. Traditionally, in C and awk syntax (closely inspired by C), assignments are allowed inside expressions. How those expressions are then evaluated is up to the implementation.
Leaving something unspecified would make sure that people don't use potentially confusing or ambiguous language constructs. But that assumes they are aware of the lack of specification.
I do not know how to pass an regular expression as an argument to a function.
If I pass a string, it is OK,
I have the following awk file,
#!/usr/bin/awk -f
function find(name){
for(i=0;i<NF;i++)if($(i+1)~name)print $(i+1)
}
{
find("mysql")
}
I do something like
$ ./fct.awk <(echo "$str")
This works OK.
But when I call in the awk file,
{
find(/mysql/)
}
This does not work.
What am I doing wrong?
Thanks,
Eric J.
you cannot (should not) pass regex constant to a user-defined function. you have to use dynamic regex in this case. like find("mysql")
if you do find(/mysql/), what does awk do is : find($0~/mysql/) so it pass a 0 or 1 to your find(..) function.
see this question for detail.
awk variable assignment statement explanation needed
also
http://www.gnu.org/software/gawk/manual/gawk.html#Using-Constant-Regexps
section: 6.1.2 Using Regular Expression Constants
warning: regexp constant for parameter #1 yields boolean value
The regex gets evaluated (matching against $0) before it's passed to the function. You have to use strings.
Note: make sure you do proper escaping: http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps
If you use GNU awk, you can use regular expression as user defined function parameter.
You have to define your regex as #/.../.
In your example, you would use it like this:
function find(regex){
for(i=1;i<=NF;i++)
if($i ~ regex)
print $i
}
{
find(#/mysql/)
}
It's called strongly type regexp constant and it's available since GNU awk version 4.2 (Oct 2017).
Example here.
use quotations, treat them as a string. this way it works for mawk, mawk2, and gnu-gawk. but you'll also need to double the backslashes since making them strings will eat away one of them right off the bat.
in your examplem just find("mysql") will suffice.
you can actually get it to pass arbitrary regex as you wish, and not be confined to just gnu-gawk, as long as you're willing to make them strings not the #/../ syntax others have mentioned. This is where the # of backslashes make a difference.
You can even make regex out of arbitrary bytes too, preferably via octal codes. if you do "\342\234\234" as a regex, the system will convert that into actual bytes in the regex before matching.
While there's nothing with that approach, if you wanna be 100% safe and prefer not having arbitrary bytes flying around , write it as
"[\\342][\\234][\\234]" ----> ✜
Once initially read by awk to create an internal representation, it'll look like this :
[\342][\234][\234]
which will still match the identical objects you desire (in this case, some sort of cross-looking dingbat). This will spit out annoying warnings in unicode-aware mode of gawk due to attempting to enclose non-ASCII bytes directly into square brackets. For that use case,
"\\342\\234\\234" ------(eqv to )---> /\342\234\234/
will keep gawk happy and quiet. Lately I've been filling the gaps in my own codes and write regex that can mimic all the Unicode-script classes that perl enjoys.
This is in response to my previous question:
PowerShell: -replace, regex and ($) dollar sign woes
My question is: why do these 2 lines of code have different output:
'abc' -replace 'a(\w)', '$1'
'abc' -replace 'a(\w)', "$1"
AND according to the 2 articles below, why doesn't the variable '$1' in single quotes get used as a literal string? Everything in single quotes should be treated as a literal text string, right?
http://www.computerperformance.co.uk/powershell/powershell_quotes.htm
http://blogs.msdn.com/b/powershell/archive/2006/07/15/variable-expansion-in-strings-and-herestrings.aspx
When you use single quotes you tell PowerShell to use a string literal meaning everything between the opening and closing quote is to be interpreted literally.
When you use double quotes, PowerShell will interpret specific characters inside the double quotes.
See get-help about_quoting_rules or click here.
The dollar sign has a special meaning in regular expressions and in PowerShell. You want to use the single quotes if you intend the dollar sign to be used as the regular expression.
In your example the regex a(\w) is matching the letter 'a' and then a word character captured in back reference #1. So when you replace with $1 you are replacing the matched text ab with back reference match b. So you get bc.
In your second example with using double quotes PowerShell interprets "$1" as a string with the variable $1 inside. You don't have a variable named $1 so it's null. So the regex replaced ab with null which is why you only get c.
In your second line:
'abc' -replace 'a(\w)', "$1"
Powershell replaces the $1 before it gets to the regex replace operation, as others have stated. You can avoid that replacement by using a backtick, as in:
'abc' -replace 'a(\w)', "`$1"
Thus, if you had a string in a variable $prefix which you wanted to include in the replacement string, you could use it in the double quotes like this:
'abc' -replace 'a(\w)', "$prefix`$1"
The '$1' is a regex backreference. It's created by the regex match, and it only exists within the context of that replace operation. It is not a powershell variable.
"$1" will be interpreted as a Powershell variable. If no variable called $1 exists, the replacement value will be null.
Since I cannot comment or upvote, David Rogers' answer worked for me. I needed to use both RegEx backreference as well as a Powershell variable in a RexEx replace.
I needed to understand what the backtick did before I implemented it, here is the explanation: backtick is Powershell's escape character.
My usecase
$new = "AAA"
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', "`$1.`$2$new."
Result
REPORT.TESTAAA.TXT
Alternatives
Format string
"REPORT.TEST998.TXT" -Replace '^([^.]+)\.([^.]+)([^.]{3})\.', ('$1.$2{0}.' -f )
Comments
as per https://get-powershellblog.blogspot.com/2017/07/bye-bye-backtick-natural-line.html I'll probably use the format string method to avoid the use of backticks.
Here's the powershell 7 version where you don't have to deal with a single quoted $1, with a script block as the second argument, replacing 'ab' with 'b':
'abc' -replace 'a(\w)', {$_.groups[1]}
bc
I came across code similar to the following in an Oracle stored procedure:
SELECT * FROM hr.employees WHERE REGEXP_LIKE(FIRST_NAME, '\A'||:iValue||'\Z', 'c');
And I am not sure what the \A and \Z do.
From what I can glean from the Oracle documentation, I think that they simply suppress the meaning of special characters in the iValue parameter. If so, the above must be equivalent to
SELECT * FROM hr.employees WHERE FIRST_NAME=:iValue;
Can anyone confirm this? Empirically this seems to be the case.
I think that in the past they wanted case insensitive searching so the 'c' was an 'i' before. So in this case we do not need to use the REGEXP_LIKE function any more and can replace it with an equals.
\A matches the position at the beginning of the string.
\Z matches the position at the end of the string or before a newline at the end of the string.
\z matches the position at the end of the string.
These are independent of multiline mode, unlike ^ and $.
Example:
foo\Z would match on foo\n, but foo\z would not match on foo\n.
See Oracle reference.
if || is used for string concatenation, then it's not the same as simple string comparison as it would allow you to use regex. (Also I'm not sure how Oracle treats case sensitivity when using =, MySQL ignores case by default when comparing strings.)
\A matches the very start of input.
\Z matches the very end of input.
Check out regular-expressions.info, which is a great regex resource