How to expand awk variables within the code? - awk

Assuming that i passed some variables to the awk script:
$AWK -f script.awk -v var01="foo" var02="bar"
And inside the script i obtain some pattern:
# pattern01 var01
/pattern01/ {
if (??? == "foo") print
}
I want to expand the variable "$2" ("var01") to its given value.
I have been trying with gawk and it seems to be able to expand variables in the following way:
print $$x
But this, for some reason, doesn't work in the first example, also i need to keep POSIX compatibility. Is it possible to expand the variable in the given example?
(Note: I want specifically this behavior (if possible), so i don't want workarounds with other tools or shell expansion)
Equivalent in shell:
file01:
foobar
some random text
pattern01 var01
more random text...
code.sh:
#!/bin/sh
var01="Hello"
x="$(grep '^pattern01' file01 | awk '{print $2}')"
eval echo "$"$x # prints Hello

Using POSIX awk, there is no way to lookup the value of a variable by it's name. Instead consider using an array to store the values. Not the most elegant, but portable:
$AWK -e 'BEGIN { v["var01"] = "foo" ; v["var02"] = "bar" }' -f script.awk
script.awk
# pattern01 var01
/pattern01/ {
if ( v[$2] == "foo") print
}
If you know that you will be new GNU AWK version, and OK with using extensions, you can use the SYMTAB array. From man page:
SYMTAB An array whose indices are the names of all currently
defined global variables and arrays in the program. The array may be
used for indirect access to read or write the value of a variable:
foo = 5
SYMTAB["foo"] = 4
print foo # prints 4
$AWK -f script.awk -v var01="foo" var02="bar"
script.awk
# pattern01 var01
/pattern01/ {
if ( SYMTAB[$2] == "foo") print
}
Both approached eliminate the need to create environment variables, which may have impact on other programs, and may be hard to scale.

I have found one solution, by setting the variable as part of the environment and then calling the special variable "ENVIRON" with the name (as it acts as a dictionary):
# pattern01 var01
/pattern01/ {
if (ENVIRON[$2] == "foo") print
}
I think that by creating manually the dictionary at the BEGIN stage, the same behaviour could be achieved without making use of the environment.

Can you try this
var01="Hello"
x="$(grep '^pattern01' file01 | awk '{print $2}')"
echo ${!x}
hope this helps..
Thanks,

Related

How can I send the output of an AWK script to a file?

Within an AWK script, I'm needing to send the output of the script to a file while also printing it to the terminal. Is there a nice and tidy way I can do this without having a copy of every print redirect to the file?
I'm not particularly good at making SSCCE examples but here's my attempt at demonstrating my problem;
BEGIN{
print "This is an awk script"
# I don't want to have to do this for every print
print "This is an awk script" > thisiswhack.out
}
{
# data manip. stuff here
# ...
print "%s %s %s" blah, blah blah
# I don't want to have to do this for every print again
print "%s %s %s" blah blah blah >> thisiswhack.out
}
END{
print "Yay we're done!"
# Seriously, there has to be a better way to do this within the script
print "Yay we're done!" >> thisiswhack.out
}
Surely there must be a way to send the entire output of the script to an output file within the script itself, right?
The command to duplicate streams is tee, and we can use it inside awk:
awk '
BEGIN {tee = "tee out.txt"}
{print | tee}' in.txt
This invokes tee with the file argument out.txt, and opens a stream to this command.
The stream (and therefore tee) remains open until awk exits, or close(tee) is called.
Every time print | tee is used, the data is printed to that stream. tee then appends this data both to the file out.txt, and stdout.
The | command feature is POSIX awk. Also the tee variable isn't compulsory (you can use the string).
Of course, we can use tee outside awk too: awk ... | tee out.txt.
GNU AWK's Redirection allows sending output to command, rather than file, therefore I suggest following exploit of said feature:
awk 'BEGIN{command="tee output.txt"}{print tolower($0) | command}' input.txt
Note: I use tolower($0) for demonstration purposes. I redirect print into tee command, which does output to mentioned file and standard output, thus you should get lowercase version of input.txt written to output.txt and standard output.
If you are not confined to single awk usage then you might alternatively use tee outside, like so
awk '{print tolower($0)}' input.txt | tee output.txt
awk '
function prtf(str) {
printf "%s", str > "thisiswhack.out"
printf "%s", str
fflush()
}
function prt(str) {
prtf( str ORS )
}
{
# to print adding a newline at the end:
prt( "foo" )
# to print as-is without adding a newline:
prtf( sprintf("%s, %s, %d", $2, "bar", 17) )
}
' file
In the above we are not spawning a subshell to call any other command so it's efficient, and we're using fflush() after every print to ensure both output streams (stdout and the extra file) don't get out of sync with respect to each other (e.g. stdout displays less text than the file or vice-versa if the command is killed).
The above always overwrites the contents of "thisiswhack.out" with whatever the script outputs. If you want to append instead then change > to >>. If you want the option of doing both, introduce a variable (which I've named prtappend below) to control it which you can set on the command line, e.g. change:
printf "%s", str > "thisiswhack.out"
to:
printf "%s", str >> "thisiswhack.out"
and add:
BEGIN {
if ( !prtappend ) {
printf "" > "thisiswhack.out"
}
}
then if you do awk -v prtappend=1 '...' it'll append to thisiswhack.out instead of overwriting it.
Of course, the better approach if you're on a Unix system is to have your awk script called from a shell script with it's output piped to tee, e.g.:
#!/usr/bin/env bash
awk '
{
print "foo"
printf"%s, %s, %d", $2, "bar", 17
}
' "${#:--}" |
tee 'thisiswhack.out'
Note that this is one more example of why you should not call awk from a shebang.

AWK:Convert columns to rows with condition (create list ) [duplicate]

I have a tab-delimited file with three columns (excerpt):
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase
AC147602.5_FG004 IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR002110 Ankyrin repeat
AC148152.3_FG001 IPR026961 PGG domain
and I'd like to get this using bash:
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR023079 Sedoheptulose-1,7-bisphosphatase IPR002110 Ankyrin repeat IPR026961 PGG domain
So if ID in the first column are the same in several lines, it should produce one line for each ID with all other parts of lines joined. In the example it will give two-row file.
give this one-liner a try:
awk -F'\t' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file
For whatever reason, the awk solution does not work for me in cygwin. So I used Perl instead. It joins around a tab character and separates line by \n
cat FILENAME | perl -e 'foreach $Line (<STDIN>) { #Cols=($Line=~/^\s*(\d+)\s*(.*?)\s*$/); push(#{$Link{$Cols[0]}}, $Cols[1]); } foreach $List (values %Link) { print join("\t", #{$List})."\n"; }'
will depend off file size (and awk limitation)
if too big this will reduce the awk need by sorting file first and only keep 1 label in memory for printing
A classical version with post print using a modification of the whole line
sort YourFile \
| awk '
last==$1 { sub( /^[^[:blank:]]*[[:blank:]]+/, ""); C = C " " $0; next}
NR > 1 { print Last C; Last = $1; C = ""}
END { print Last}
'
Another version using field and pre-print but less "human readable"
sort YourFile \
| awk '
last!=$1 {printf( "%s%s", (! NR ? "\n" : ""), Last=$1)}
last==$1 {for( i=2;i<NF;i++) printf( " %s", $i)}
'
A pure bash version. It has no additional dependencies, but requires bash 4.0 or above (2009) for associative array support.
All on one line:
{ declare -A merged; merged=(); while IFS=$'\t' read -r key value; do merged[$key]="${merged[$key]}"$'\t'"$value"; done; for key in "${!merged[#]}"; do echo "$key${merged[$key]}"; done } < INPUT_FILE.tsv
Readable and commented equivalent:
{
# Define `merged` as an empty associative array.
declare -A merged
merged=()
# Read tab-separated lines. Any leftover fields also end up in `value`.
while IFS=$'\t' read -r key value
do
# Append to any value that's already there, separated by a tab.
merged[$key]="${merged[$key]}"$'\t'"$value"
done
# Loop over the input keys. Note that the order is arbitrary;
# pipe through `sort` if you want a predictable order.
for key in "${!merged[#]}"
do
# Each value is prefixed with a tab, so no need for a tab here.
echo "$key${merged[$key]}"
done
} < INPUT_FILE.tsv

How to use variables in awk scripts

I am having trouble in using variables in awk scripts.
myvariable = tolower(substr($1,0,2)) tolower(substr($2,0,8))
so I can use $myvariable in the script instead of using the above every time.
I have tried ,but it prints everything nothing cut from the strings.
Thanks.
Awk is different from linux shell scripting.. you don't need to use "$" in front of variable names. In awk, "$" is special in its own way, for example it is used to reference a field/line/record.
if you declaring your variable declare it like this
myvariable = tolower(substr($1,0,2)) tolower(substr($2,0,8))
that is, drop the $ in front of your awk variable inside your awk statement
if you have a variable declare in your linux shell and you want to use that variable in your awk script
You can assign your shell variable to awk like
awk -v awkvariable="$myshellvariable" '{...commands....}'
AWK have three "types" of variables:
one for accessing fields ($1, $2, $3, ...)
a bunch of system variables (NR, NF, FS, RS, ...)
user-defined variables (my_var = 123, i = "hello", ...)
To print the lower to you simply write the variable name:
my_var = "john doe";
print my_var;
print NF; // Print Number of Fields
eg
echo john was here | awk '{print NF}' # 3
The interesting part is that you can mix system/user-defined variables with field variables ($1, ...) like so:
my_var = 2;
print $my_var; # print second field
print $NF; # Print last field (Using the "Number of Fields" variable)
eg
echo john was here | awk '{print $NF}' # here
echo john was here | awk '{print $(NF-1)}' # was

Can we use shell variables in awk?

Can we use shell variables in AWK like $VAR instead of $1, $2? For example:
UL=(AKHIL:AKHIL_NEW,SWATHI:SWATHI_NEW)
NUSR=`echo ${UL[*]}|awk -F, '{print NF}'`
echo $NUSR
echo ${UL[*]}|awk -F, '{print $NUSR}'
Actually am an oracle DBA we get lot of import requests. I'm trying to automate it using the script. The script will find out the users in the dump and prompt for the users to which dump needs to be loaded.
Suppose the dumps has two users AKHIL, SWATHI (there can be may users in the dump and i want to import more number of users). I want to import the dumps to new users AKHIL_NEW and SWATHI_NEW. So the input to be read some think like AKHIL:AKHIL_NEW,SWATHI:SWATHI_NEW.
First, I need to find the Number of users to be created, then I need to get new users i.e. AKHIL_NEW,SWATHI_NEW from the input we have given. So that I can connect to the database and create the new users and then import. I'm not copying the entire code: I just copied the code from where it accepts the input users.
UL=(AKHIL:AKHIL_NEW,SWATHI:SWATHI_NEW) ## it can be many users like USER1:USER1_NEW,USER2_USER2_NEW,USER3:USER_NEW..
NUSR=`echo ${UL[*]}|awk -F, '{print NF}'` #finding number of fields or users
y=1
while [ $y -le $NUSR ] ; do
USER=`echo ${UL[*]}|awk -F, -v NUSR=$y '{print $NUSR}' |awk -F: '{print $2}'` #getting Users to created AKHIL_NEW and SWATHI_NEW and passing to SQLPLUS
if [[ $USER = SCPO* ]]; then
TBS=SCPODATA
else
if [[ $USER = WWF* ]]; then
TBS=WWFDATA
else
if [[ $USER = STSC* ]]; then
TBS=SCPODATA
else
if [[ $USER = CSM* ]]; then
TBS=CSMDATA
else
if [[ $USER = TMM* ]]; then
TBS=TMDATA
else
if [[ $USER = IGP* ]]; then
TBS=IGPDATA
fi
fi
fi
fi
fi
fi
sqlplus -s '/ as sysdba' <<EOF # CREATING the USERS in the database
CREATE USER $USER IDENTIFIED BY $USER DEFAULT TABLESPACE $TBS TEMPORARY TABLESPACE TEMP QUOTA 0K on SYSTEM QUOTA UNLIMITED ON $TBS;
GRANT
CONNECT,
CREATE TABLE,
CREATE VIEW,
CREATE SYNONYM,
CREATE SEQUENCE,
CREATE DATABASE LINK,
RESOURCE,
SELECT_CATALOG_ROLE
to $USER;
EOF
y=`expr $y + 1`
done
impdp sysem/manager DIRECTORY=DATA_PUMP DUMPFILE=imp.dp logfile=impdp.log SCHEMAS=AKHIL,SWATHI REMPA_SCHEMA=${UL[*]}
In the last impdp command I need to get the original users in the dumps i.e AKHIL,SWATHI using the variables.
Yes, you can use the shell variables inside awk. There are a bunch of ways of doing it, but my favorite is to define a variable with the -v flag:
$ echo | awk -v my_var=4 '{print "My var is " my_var}'
My var is 4
Just pass the environment variable as a parameter to the -v flag. For example, if you have this variable:
$ VAR=3
$ echo $VAR
3
Use it this way:
$ echo | awk -v env_var="$VAR" '{print "The value of VAR is " env_var}'
The value of VAR is 3
Of course, you can give the same name, but the $ will not be necessary:
$ echo | awk -v VAR="$VAR" '{print "The value of VAR is " VAR}'
The value of VAR is 3
A note about the $ in awk: unlike bash, Perl, PHP etc., it is not part of the variable's name but instead an operator.
Awk and Gawk provide the ENVIRON associative array that holds all exported environment variables. So in your awk script you can use ENVIRON["VarName"] to get the value of VarName, provided that VarName has been exported before running awk.
Note ENVIRON is a predefined awk variable NOT a shell environment variable.
Since I don't have enough reputation to comment on the other answers I have to include them here!
The earlier answer showing $ENVIRON is incorrect - that syntax would be expanded by the shell, and probably result in expanding to nothing.
Further earlier comments about C not being able to access environment variable is wrong. Contrary to what is said above, C (and C++) can access environment variables using the getenv("VarName") function. Many other languages provide similar access (e.g., Java: System.getenv(), Python: os.environ, Haskell System.Environment, ...). Note in all cases access to environment variables is read-only, you cannot change an environment variable in a program and get that value back to the calling script.
There are two ways to pass variables to awk: one way is defining the variable in a command line argument:
$ echo ${UL[*]}|awk -F, -v NUSR=$NUSR '{print $NUSR}'
SWATHI:SWATHI_NEW
Another way is converting the shell variable to an environment variable using export, and reading the environment variable from the ENVIRON array:
$ export NUSR
$ echo ${UL[*]}|awk -F, '{print $ENVIRON["NUSR"]}'
SWATHI:SWATHI_NEW
Update 2016: The OP has comma-separated data and wants to extract an item given its index. The index is in the shell variable NUSR. The value of NUSR is passed to awk, and awk's dollar operator extracts the item.
Note that it would be simpler to declare UL as an array of more than one element, and do the extraction in bash, and take awk out of the equation completely. This however uses 0-based indexing.
UL=(AKHIL:AKHIL_NEW SWATHI:SWATHI_NEW)
NUSR=1
echo ${UL[NUSR]} # prints SWATHI:SWATHI_NEW
There is another way, but it could cause immense confusion:
$ VarName="howdy" ; echo | awk '{print "Just saying '$VarName'"}'
Just saying howdy
$
So you are temporarily exiting the single quote environment (which would normally prevent the shell from interpreting '$') to interpret the variable and then going back into it. It has the virtue of being relatively brief.
Not sure if i understand your question.
But lets say we got a variable number=3 and we want to use it istead of $3, in awk we can do that with the following code
results="100 Mbits/sec 110 Mbits/sec 90 Mbits/sec"
number=3
speed=$(echo $results | awk '{print '"\$${number}"'}')
so the speed variable will get the value 110.
Hope this helps.
No. You can pass the value of a shell variable to an awk script just like you can pass the value of a shell variable to a C program but you cannot access a shell variable in an awk script any more than you could access a shell variable in a C program. Like C, awk is not shell. See question 24 in the comp.unix.shell FAQ at cfajohnson.com/shell/cus-faq-2.html#Q24.
One way to write your code would be:
UL="AKHIL:AKHIL_NEW,SWATHI:SWATHI_NEW"
NUSR=$(awk -F, -v ul="$UL" 'BEGIN{print gsub(FS,""); exit}')
echo "$NUSR"
echo "$UL" | awk -F, -v nusr="$NUSR" '{print $nusr}' # could have just done print $NF
but since your original starting point:
UL=(AKHIL:AKHIL_NEW,SWATHI:SWATHI_NEW)
was declaring UL as an array with just one entry, you might want to rethink whatever it is you're trying to do as you may have completely the wrong approach.

Shell variable interpreted wrongly in awk

In following code I am trying to pass shell varibale to awk. But when I try to run it as a.sh foo_bar the output printed is "foo is not declared" and when I run it as a.sh bar_bar the output printed is " foo is declared" . Is there a bug in awk or I am doing something wrong here?
I am using gawk-3.0.3.
#!/bin/awk
model=$1
awk ' {
match("'$model'", /foo/)
ismodel=substr("'$model'", RSTART, RLENGTH)
if ( ismodel != foo ) {
print " foo is not declared"
} else {
print " foo is declared"
}
}
' dummy
dummy is file with single blank line.
Thanks,
You should use AWK's variable passing instead of complex quoting:
awk -v awkvar=$shellvar 'BEGIN {print awkvar}'
Your script is written as a shell script, but you have an AWK shebang line. You could change that to #!/bin/sh.
This is not a bug, but an error in your code. The problematic line is:
if ( ismodel != foo ) {
Here foo should be "foo". Right now you are comparing with an empty variable. This gives false when you have a match, and true when you have no match. So the problem is not the way you use the shell variables.
But as the other answerers have said, the preferred way of passing arguments to awk is by using the -v switch. This will also work when you decide to put your awk script in a separate file and prevents all kind of quoting issues.
I'm also not sure about your usage of a dummy file. Is this just for the example? Otherwise you should omit the file and put all your code in the BEGIN {} block.
use -v option to pass in variable from the shell
awk -v model="$1" '{
match(model, /foo/)
.....
}
' dummy