AWK - explanation for example - awk

I have a file 1:
1
2
3
This command prints:
$ awk 'BEGIN{system("cat " 1)}'
1
2
3
.
$ awk 'BEGIN{system( "cat '\''" 1 "'\''") }'
1
2
3
I changed the filename from "1" to "one"
I have a file one:
1
2
3
Now, the command does not work:
$ awk 'BEGIN{system("cat " one)}'
|
.
$ awk 'BEGIN{system( "cat '\''" one "'\''") }'
cat: : No such file or directory
Why now commands do not work?
Thank you for the explanation.

In your examples, awk interprets 1 as a literal 1 and one as a variable which has no value. That means system is trying to execute cat {empty} or cat ''.
Try this:
awk 'BEGIN{one = "one" ; system("cat " one)}'
or:
awk 'BEGIN{system("cat one")}'

if your file name is a number, you could write the number in system() directly, awk will convert it into string value internally. you even could write expression to get the number as result. for example:
system("cat "4-3) should work for your "1" case as well. to see the difference, you could name a file as "1+1" then system("cat "1+1) won't work, it will complain that file "2" doesn't exist.
for the "one" example: the 'one' in your system() worked not like string "one" but a variable named "one". in awk, if a variable was not assigned, its default value is empty string.

In awk, 1 has the value of 1 and the variable named one is an empty string. So "cat" 1 is the string cat 1, but "cat" one is the string cat. The string "cat '\''" one "'\''" is the string cat '', so you are passing the empty string as the first argument to cat.

Related

print whole variable contents if the number of lines are greater than N

How to print all lines if certain condition matches.
Example:
echo "$ip"
this is a sample line
another line
one more
last one
If this file has more than 3 lines then print the whole variable.
I am tried:
echo $ip| awk 'NR==4'
last one
echo $ip|awk 'NR>3{print}'
last one
echo $ip|awk 'NR==12{} {print}'
this is a sample line
another line
one more
last one
echo $ip| awk 'END{x=NR} x>4{print}'
Need to achieve this:
If this file has more than 3 lines then print the whole file. I can do this using wc and bash but need a one liner.
The right way to do this (no echo, no pipe, no loops, etc.):
$ awk -v ip="$ip" 'BEGIN{if (gsub(RS,"&",ip)>2) print ip}'
this is a sample line
another line
one more
last one
You can use Awk as follows,
echo "$ip" | awk '{a[$0]; next}END{ if (NR>3) { for(i in a) print i }}'
one more
another line
this is a sample line
last one
you can also make the value 3 configurable from an awk variable,
echo "$ip" | awk -v count=3 '{a[$0]; next}END{ if (NR>count) { for(i in a) print i }}'
The idea is to store the contents of the each line in {a[$0]; next} as each line is processed, by the time the END clause is reached, the NR variable will have the line count of the string/file you have. Print the lines if the condition matches i.e. number of lines greater than 3 or whatever configurable value using.
And always remember to double-quote the variables in bash to avoid undergoing word-splitting done by the shell.
Using James Brown's useful comment below to preserve the order of lines, do
echo "$ip" | awk -v count=3 '{a[NR]=$0; next}END{if(NR>3)for(i=1;i<=NR;i++)print a[i]}'
this is a sample line
another line
one more
last one
Another in awk. First test files:
$ cat 3
1
2
3
$ cat 4
1
2
3
4
Code:
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 3 # look ma, no lines
[this line left intentionally blank. no wait!]
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 4
1
2
3
4
Explained:
NR<4 { # for tghe first 3 records
b=b (NR==1?"":ORS) $0 # buffer them to b with ORS delimiter
next # proceed to next record
}
b { # if buffer has records, ie. NR>=4
print b # output buffer
b="" # and reset it
}1 # print all records after that

Word Count using AWK

I have file like below :
this is a sample file
this file will be used for testing
this is a sample file
this file will be used for testing
I want to count the words using AWK.
the expected output is
this 2
is 1
a 1
sample 1
file 2
will 1
be 1
used 1
for 1
the below AWK I have written but getting some errors
cat anyfile.txt|awk -F" "'{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}'
It works fine for me:
awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1
PS you do not need to set -F" ", since its default any blank.
PS2, do not use cat with programs that can read data itself, like awk
You can add sort behind code to sort it.
awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile | sort -k 2 -n
a 1
be 1
for 1
is 1
sample 1
testing 1
used 1
will 1
file 2
this 2
Instead of looping each line and saving the word in array ({for(i=1;i<=NF;i++) a[$i]++}) use gawk with multi-char RS (Record Separator) definition support option and save each field in array as following(It's a little bit fast):
gawk '{a[$0]++} END{for (k in a) print k,a[k]}' RS='[[:space:]]+' file
Output:
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1
In above gawk command I defines space-character-class [[:space:]]+ (including one or more spaces or \new line character) as record separator.
Here is Perl code which provides similar sorted output to Jotne's awk solution:
perl -ne 'for (split /\s+/, $_){ $w{$_}++ }; END{ for $key (sort keys %w) { print "$key $w{$key}\n"}}' testfile
$_ is the current line, which is split based on whitespace /\s+/
Each word is then put into $_
The %w hash stores the number of occurrences of each word
After the entire file is processed, the END{} block is run
The keys of the %w hash are sorted alphabetically
Each word $key and number of occurrences $w{$key} is printed

Extracting block of data from a file

I have a problem, which surely can be solved with an awk one-liner.
I want to split an existing data file, which consists of blocks of data into separate files.
The datafile has the following form:
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
And i want to store every single block of data in a separate file, named - for example - "1.dat", ".dat", "3.dat",...
The problem is, that each block doesn't have a specific line number, they are just delimited by two "new lines".
Thanks in advance,
Jürgen
This should get you started:
awk '{ print > ++i ".dat" }' RS= file.txt
If by two "new lines" you mean, two newline characters:
awk '{ print > ++i ".dat" }' RS="\n\n" file.txt
See how the results differ? Setting a null RS (i.e. the first example) is probably what you're looking for.
Another approach:
awk 'NF != 0 {print > $1 ".dat"}' file.txt

awk - assign variables from multiple sources

I want to print the output which is the combination of multiple file content.
Example:
grep -v 'word' FILE_A | awk -v var1="string1" -v var2="string2" -v var3="string3"
'{ print $1 var1 var2 var3}'
I can do this with the command above: assign specific strings to variables and print it with the grep'ed content.
However if string1/2/3's are long it's quite complicated to assign such long words.
Question:
If I would write string1/2/3 as single lines to a File_B how can I assign such File_B lines as variables?
Example:
cat File_B
string1
string2
string3
Why not just put it all in an awk script instead of just the variables:
$ cat script.awk
!/word/ {
var1="string1"
var2="string2"
var3="string3longlonglonglonglonglong"
print $1,var1,var2,var3
}
$ cat file
word no match
match1
word no match
match2 match 123
$ awk -f script.awk file
match1 string1 string2 string3longlonglonglonglonglong
match2 string1 string2 string3longlonglonglonglonglong
You never need to combine grep and awk.

How to strictly compare two long numeric string in awk

I wrote some awk script to process some data, and found the result unexpected.
I found the root cause is that the following string comparison is not correct
echo "59558711052462309110012 59558711052462313120012"|awk '{print $1;print $2;print ($1==$2)?"eq":"ne"}'
The result is
59558711052462309110012
59558711052462313120012
eq
I guess the reason is that awk treats the two numeric strings as numbers, and cuts off them to compare.
My question is that how can I strictly compare the two strings in awk.
Force a string comparison by telling awk that at least one of the operands IS a string by concatenating that operand with the null string:
echo "59558711052462309110012 59558711052462313120012"|
awk '{print $1;print $2;print ($1""==$2)?"eq":"ne"}'
59558711052462309110012
59558711052462313120012
ne
#EdMorton's solution already fails when a positive sign exists in front of it, all else being equal :
echo '59558711052462309110012' | mawk '1; 1' |
mawk '($++NF = (!_?_="+":_=__) $!__)^__' | ...
... | awk ' { print $0, "\f\r\t" ( ($1""==$2) ? "eq" : "ne" ) }'
1 59558711052462309110012 +59558711052462309110012
ne
2 59558711052462309110012 59558711052462309110012
eq