Test two variables in the same script - variables

Question :
I'm breaking my teeths under this:
awk -v full=wc -v empty=wp '{
...blablabla...
if ($4==full) stop=yes
if (stop==yes && $4==empty) exit
...blablabla...
}'
The code works well (I mean, I get an output) if I don't declare the two variables full and empty at the beginning, and use instead the values of these in the script. If I only use the first variable in the script, I get the same output. But if I only use the second variable, I get no output at all.

Consider what happens when the variables are expanded. A portion of the awk body goes from:
' ... if ($4=='$full') ...'
to
' ... if ($4==wc) ...'
Since wc is a "bare word", awk thinks it is a variable and substitutes it's value (empty string), so you get this:
' ... if ($4=="") ...'
When you're building your awk script, you need to be aware of quoting strings in awk. You need:
' ... if ($4=="'$full'") ...'
However, it is much more elegant to pass values with awk's -v option as you have done.

Related

Replacing columns of a CSV with a string using awk and gsub

I have an input csv file that looks something like:
Name,Index,Location,ID,Message
Alexis,10,Punggol,4090b43,Production 4090b43
Scott,20,Bedok,bfb34d3,Prevent
Ronald,30,one-north,86defac,Difference 86defac
Cindy,40,Punggol,40d0ced,Central
Eric,50,one-north,aeff08d,Military aeff08d
David,60,Bedok,5d1152d,Study
And I want to write a bash shell script using awk and gsub to replace 6-7 alpha numeric character long strings under the ID column with "xxxxx", with the output in a separate .csv file.
Right now I've got:
#!/bin/bash
awk -F ',' -v OFS=',' '{gsub(/^([a-zA-Z0-9]){6,7}/g, "xxxxx", $4);}1' input.csv > output.csv
But the output from I'm getting from running bash myscript.sh input.csv doesn't make any sense. The output.csv file looks like:
Name,Index,Location,ID,Message
Alexis,10,Punggol,4xxxxx9xxxxxb43,Production 4090b43
Scott,20,Bedok,bfb34d3,Prevent
Ronald,30,one-north,86defac,Difference 86defac
Cindy,40,Punggol,4xxxxxdxxxxxced,Central
Eric,50,one-north,aeffxxxxx8d,Military aeff08d
David,60,Bedok,5d1152d,Study
but the expected output csv should look like:
Name,Index,Location,ID,Message
Alexis,10,Punggol,xxxxx,Production 4090b43
Scott,20,Bedok,xxxxx,Prevent
Ronald,30,one-north,xxxxx,Difference 86defac
Cindy,40,Punggol,xxxxx,Central
Eric,50,one-north,xxxxx,Military aeff08d
David,60,Bedok,xxxxx,Study
With your shown sample, please try the following code:
awk -F ',[[:space:]]+' -v OFS=',\t' '
{
sub(/^([a-zA-Z0-9]){6,7}$/, "xxxxx", $4)
$1=$1
}
1
' Input_file | column -t -s $'\t'
Explanation: Setting field separator as comma, space(s), then setting output field separator as comma tab here. Then substituting from starting to till end of value(6 to 7 occurrences) of alphanumeric(s) with xxxxx in 4th field. Finally printing current line. Then sending output of awk program to column command to make it as per shown sample of OP.
EDIT: In case your Input_file is separated by only , as per edited samples now, then try following.
awk -F ',' -v OFS=',' '
{
sub(/^([a-zA-Z0-9]){6,7}$/, "xxxxx", $4)
}
1
' Input_file
Note: OP has installed latest version of awk from older version and these codes helped.
The short version to your answer would be the following:
$ awk 'BEGIN{FS=OFS=","}(FNR>1){$4="xxxxxx"}1' file
This will replace all entries in column 4 by "xxxxxx".
If you only want to change the first 6 to 7 characters of column 4 (and not if there are only 5 of them, there are a couple of ways:
$ awk 'BEGIN{FS=OFS=","}(FNR>1)&&(length($4)>5){$4="xxxxxx" substr($4,8)}1' file
$ awk 'BEGIN{FS=OFS=","}(FNR>1)&&{sub(/.......?/,"xxxxxx",$4)}1' file
Here, we will replace 123456abcde into xxxxxxabcde
Why is your script failing:
Besides the fact that the approach is wrong, I'll try to explain what the following command does: gsub(/([a-zA-Z0-9]){6,7}/g,"xxxxx",$4)
The notation /abc/g is valid awk syntax, but it does not do what you expect it to do. The notation /abc/ is an ERE-token (an extended regular expression). The notation g is, at this point, nothing more than an undefined variable which defaults to an empty string or zero, depending on its usage. awk will now try to execute the operation /abc/g by first executing /abc/ which means: if my current record ($0) matches the regular expression "abc", return 1 otherwise return 0. So it converts /abc/g into 0g which means to concatenate the content of g to the number 0. For this, it will convert the number 0 to a string "0" and concatenate it with the empty string g. In the end, your gsub command is equivalent to gsub("0","xxxxx",$4) and means to replace all the ZERO's by "xxxxx".
Why are you getting always gsub("0","xxxxx",$4) and never gsub("1","xxxxx",$4). The reason is that your initial regular expression never matches anything in the full record/line ($0). Your reguar expression reads /^([a-zA-Z0-9]){6,7}/, and while there are lines that start with 6 or 7 characters, it is likely that your awk does not recognize the extended regular expression notation '{m,n}' which makes it fail. If you use gnu awk, the output would be different when using -re-interval which in old versions of GNU awk is not enabled by default.
I tried to find why your code behave like that, for simplicty sake I made example concering only gsub you have used:
awk 'BEGIN{id="4090b43"}END{gsub(/^([a-zA-Z0-9]){6,7}/g, "xxxxx", id);print id}' emptyfile.txt
output is
4xxxxx9xxxxxb43
after removing g in first argument
awk 'BEGIN{id="4090b43"}END{gsub(/^([a-zA-Z0-9]){6,7}/, "xxxxx", id);print id}' emptyfile.txt
output is
xxxxx
So regular expression followed by g caused malfunction. I was unable to find relevant passage in GNU AWK manual what g after / is supposed to do.
(tested in gawk 4.2.1)

Using shell variables in gensub in Awk

I tried to answer a question asked here
How do I sed only matched grep line
as below
awk -F":" -v M="Mary Jane" -v A="Runs" -v S="Sleeps" '{OFS=":" ; print gensub(/(M):(A)/,"\\1;S","g")}'
but it did not work so I am guessing gensub is not able to recognize the shell variables M, A and S passed to awk , Is there a way to use shell variables in gensub in awk ?
What you're experiencing is not because you're using gensub(), it's to do with regexps. There are regexp literals, e.g. $0~/foo/, and then there are dynamic regexps which are strings that are converted to regexps when they are evaluated, e.g. $0~"foo" or {var="foo"} $0~var. If you want to use a variable in ANY regexp context then you need a dynamic regexp and so that is the syntax you need to use, not the regexp literal syntax. See https://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps.
The correct syntax for what you wrote would be:
awk -F":" -v M="Mary Jane" -v A="Runs" -v S="Sleeps" '{OFS=":" ; print gensub("("M"):"A,"\\1;"S,"g")}'
but that has some semantic issues (e.g. partial matches and setting OFS for every line instead of once when it's never even used and almost certainly unnecessarily using "g" instead of 1 and why use regexps at all when you're really trying to do exact matches on strings)

awk: setting environment variables directly from within an awk script

first post here, but been a lurker for ages. i have googled for ages, but cant find what i want (many abigious topic subjects which dont request what the topic suggests it does ...). not new to awk or scripting, just a little rusty :)
i'm trying to write an awk script which will set shell env values as it runs - for another bash script to pick up and use later on. i cannot simply use stdout from awk to report this value i want setting (i.e. "export whatever=awk cmd here"), as thats already directed to a 'results file' which the awkscript is creating (plus i have more than one variable to export in the final code anyway).
As an example test script, to demo my issue:
echo $MYSCRIPT_RESULT # returns nothing, not currently set
echo | awk -f scriptfile.awk # do whatever, setting MYSCRIPT_RESULT as we go
echo $MYSCRIPT_RESULT # desired: returns the env value set in scriptfile.awk
within scriptfile.awk, i have tried (without sucess)
1/) building and executing an adhoc string directly:
{
cmdline="export MYSCRIPT_RESULT=1"
cmdline
}
2/) using the system function:
{
cmdline="export MYSCRIPT_RESULT=1"
system(cmdline)
}
... but these do not work. I suspect that these 2 commands are creating a subshell within the shell awk is executing from, and doing what i ask (proven by touching files as a test), but once the "cmd"/system calls have completed, the subshell dies, unfortunatley taking whatever i have set with it - so my env setting changes dont stick from "the caller of awk"'s perspective.
so my question is, how do you actually set env variables within awk directly, so that a calling process can access these env values after awk execution has completed? is it actually possible?
other than the adhoc/system ways above, which i have proven fail for me, i cannot see how this could be done (other than writing these values to a 'random' file somewhere to be picked up and read by the calling script, which imo is a little dirty anyway), hence, help!
all ideas/suggestions/comments welcomed!
You cannot change the environment of your parent process. If
MYSCRIPT_RESULT=$(awk stuff)
is unacceptable, what you are asking cannot be done.
You can also use something like is described at
Set variable in current shell from awk
unset var
var=99
declare $( echo "foobar" | awk '/foo/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
The awk END clause is essential otherwise if there are no matches to the pattern declare dumps the current environment to stdout and doesn't change the content of your variable.
Multiple values can be set by separating them with spaces.
declare a=1 b=2
echo -e "a=$a\nb=$b"
NOTE: declare is bash only, for other shells, use eval with the same syntax.
You can do this, but it's a bit of a kludge. Since awk does not allow redirection to a file descriptor, you can use a fifo or a regular file:
$ mkfifo fifo
$ echo MYSCRIPT_RESULT=1 | awk '{ print > "fifo" }' &
$ IFS== read var value < fifo
$ eval export $var=$value
It's not really necessary to split the var and value; you could just as easily have awk print the "export" and just eval the output directly.
I found a good answer. Encapsulate averything in a subshell!
The comand declare works as below:
#Creates 3 variables
declare var1=1 var2=2 var3=3
ex1:
#Exactly the same as above
$(awk 'BEGIN{var="declare "}{var=var"var1=1 var2=2 var3=3"}END{print var}')
I found some really interesting uses for this technique. In the next exemple I have several partitions with labels. I create variables using the labels as variable names and the device name as variable values.
ex2:
#Partition data
lsblk -o NAME,LABEL
NAME LABEL
sda
├─sda1
├─sda2
├─sda5 System
├─sda6 Data
└─sda7 Arch
#Creates a subshell to execute the text
$(\
#Pipe lsblk to awk
lsblk -o NAME,LABEL | awk \
#Initiate the variable with the text for the declare command
'BEGIN{txt="declare "}'\
#Filters devices with labels Arch or Data
'/Data|Arch/'\
#Concatenate txt with itself plus text for the variables(name and value)
#substr eliminates the special caracters before the device name
'{txt=txt$2"="substr($1,3)" "}'\
#AWK prints the text and the subshell execute as a command
'END{print txt}'\
)
The end result of this is 2 variables: Data with value sda6 and Arch with value sda7.
The same exemple in a single line.
$(lsblk -o NAME,LABEL | awk 'BEGIN{txt="declare "}/Data|Arch/{txt=txt$2"="substr($1,3)" "}END{print txt}')

Does awk print all if field variable doesn't exist?

I am trying to understand some scripts that I have inherited and make use of awk. In one of the scripts are these lines:
report=`<make call to Java class that generates a report`
report=`echo $report|awk '{print $5}'`
The report generated in line 1 has data like this:
ABC1234:0123456789:ABCDE
ABC4321:9876543210:EDCBA
...
The awk generated report is the same as the original one.
There is no 5th field in the report since there is no whitespace and a different field separator has not been defined. I know that using $0 will return all fields. Does specifying a field that doesn't exist do the same?
No:
echo "1 2 3"|awk '{print $5}'
The above prints nothing. Don't know why it is behaving like you are specifying. If you were to use " instead of ', then it would print because $5 would be expanded by shell, but as written it should not.
Something is wrong with your test.
The expected awk behavior in this case is to print a blank line for each input line, and that's what I see when I run with either the 1TA or gawk.

Shell variable interpreted wrongly in awk

In following code I am trying to pass shell varibale to awk. But when I try to run it as a.sh foo_bar the output printed is "foo is not declared" and when I run it as a.sh bar_bar the output printed is " foo is declared" . Is there a bug in awk or I am doing something wrong here?
I am using gawk-3.0.3.
#!/bin/awk
model=$1
awk ' {
match("'$model'", /foo/)
ismodel=substr("'$model'", RSTART, RLENGTH)
if ( ismodel != foo ) {
print " foo is not declared"
} else {
print " foo is declared"
}
}
' dummy
dummy is file with single blank line.
Thanks,
You should use AWK's variable passing instead of complex quoting:
awk -v awkvar=$shellvar 'BEGIN {print awkvar}'
Your script is written as a shell script, but you have an AWK shebang line. You could change that to #!/bin/sh.
This is not a bug, but an error in your code. The problematic line is:
if ( ismodel != foo ) {
Here foo should be "foo". Right now you are comparing with an empty variable. This gives false when you have a match, and true when you have no match. So the problem is not the way you use the shell variables.
But as the other answerers have said, the preferred way of passing arguments to awk is by using the -v switch. This will also work when you decide to put your awk script in a separate file and prevents all kind of quoting issues.
I'm also not sure about your usage of a dummy file. Is this just for the example? Otherwise you should omit the file and put all your code in the BEGIN {} block.
use -v option to pass in variable from the shell
awk -v model="$1" '{
match(model, /foo/)
.....
}
' dummy