Awk Iterate through several Arrays in a for loop - awk

I have created an awk program to go through the columns of a file and count each distinct word and then output totals into separate files
awk -F"$delim" {Field_Arr1[$1]++; Field_Arr2[$2]++; Field_Arr3[$3]++; Field_Arr4[$4]++};
END{\
# output fields
out_field1="top_field1"
out_field2="top_field2"
out_field3="top_field3"
out_field4="top_field4"
for( i=1; i <= NF; i++)
{
for (element in Field_Arr$i)
{
print element"\t"Field_Arr$i[element] >>out_field$i;
}
}
}' inputfile
but I don't know the appropriate syntax, so that the for loop will iterate through Field_Arr1, Field_Arr2, Field_Arr3, Field_Arr4?
I have tried using: i, $i, ${i}, {i}, "$i", and "i".
Am I trying the wrong approach or is there a way to change Field_Arr$i to Field_Arr1..4?
Thanks for the advice.

awk variables don't work that way; you'll have to do them individually by name, or use fake multidimensional arrays and parse out the components, something along the lines of:
{Field_Arr[1, $1]++; Field_Arr[2, $2]++; Field_Arr[3, $3]++; Field_Arr[4, $4]++}
END {
for (elt in Field_Arr) {
split(elt, ec, SUBSEP)
print ec[2] "\t" Field_Arr[elt] >> ("top_field" ec[1])
}
}

To count the frequencies for each column (3 in my example), try this
# Print list of word frequencies
function p_array(t,a) {
print t
for (i in a) {
print i, a[i]
}
}
{
c1[$1]++
c2[$1]++
c3[$1]++
}
END {
p_array("1st col",c1)
p_array("2nd col",c2)
p_array("3rd col",c3)
}

Related

Else syntax error when nesting array formula

I am recieving a syntax error on "else" for this shell:
{for (i=8;i<=NF;i+=3)
{if ($0~"=>") # if-else statement designed to flag file / directory transfers
print "=> flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2);
{split ($(i+2), array, "/");
for (x in array)
{j++;
a[j] =j;
printf (array[x] ",");}
printf ("%s\n", "");}
else
print "no => flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2)
}
}
Can't figure out why. If I delete the array block (starting with split()), all is well. But I need to scan the contents of $(i+2), so cutting it does me no good.
Also, if anyone has guidance on a good list of how to interpret error messages, that would be great.
Thanks for your advice.
EDIT: here is the above script laid out with sensible formatting:
{
for (i=8;i<=NF;i+=3) {
if ($0~"=>") # if-else statement designed to flag file / directory transfers
print "=> flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2);
{
split ($(i+2), array, "/");
for (x in array) {
j++;
a[j] =j;
printf (array[x] ",");
}
printf ("%s\n", "");
}
else
print "no => flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2)
}
}
First thing first, since you didn't post any samples of input and expected output so didn't test it at all. Could you please try following, I hope you are running this in .awk script style. Also these are mostly syntax/cosmetic changes NOT on logic part, since no background was given on problem.
BEGIN{
OFS=","
}
{
for (i=8;i<=NF;i+=3){
if ($0~/=>/){
print "=> flag,"$1,$2,$3,$4,$5,$6,$7,$(i),$(i+1),$(i+2)
split ($(i+2), array, "/");
for(x in array){
j++;
a[j] =j;
printf (array[x] ",")
}
printf ("%s\n", "")
}
else{
print "no => flag",$1,$2,$3,$4,$5,$6,$7,$(i),$(i+1),$(i+2)
}
}
}
Problems fixed in OP's attempt:
{ starting curly braces(which indicates that if condition of for loop with multiple statements is started) could be in last of the line where they are present, NOT in next line, for better visibility purposes, I fixed in for loop and if condition first.
Since you are using regexp matching with a pattern so I fixed from $0~"=>" TO $0~/=>/.
Added BEGIN section in your attempt where I have set OFS(output field separator) value to , so that you need NOT to print like "," to print comma between variables, just , between variables will do the trick.
Fixed indentation, so that we are NOT confused where to close loop/condition and where to NOT.

Awk how to negate a condition

I'm trying to compute some stuff in awk, and at the end print the result in the order of the input. For each line, I check if it has not been already seen. If not, I add it to the array and also store it in an order array.
{
if (! $0 in seen) {
seen[$0] = 1
order[o++] = $0
}
} END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}
You can try it with
printf 'a\nb\na\nc\nb\na\n' | awk script_above
It prints nothing. If I print the variable o at the end, it shows that its value is still 0. What am I doing wrong?
You just need to add parens to get the right operator precedence*:
# a.awk
{
if (!($0 in seen)) {
seen[$0] = 1
order[o++] = $0
}
}
END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}
Test:
$ awk -f a.awk file
a
b
c
* (The unary ! binds more tightly than the in operator: https://www.gnu.org/software/gawk/manual/html_node/Precedence.html)
What you are trying to do is in Shell way, awk has a way where you could keep checking if an element is part of an array or not, try following once.
printf 'a\nb\na\nc\nb\na\n' | awk '
!seen[$0]++ {
order[o++] = $0
}
END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}'
Here !seen[$0]++ means it is checking condition if an element is NOT a part of indexes of array named a then go inside the BLOCK(where your next statements are provided) then it does ++ which makes sure that this element(which was NOT there in array before checking condition)'s counter incremented by 1 so that next time this !seen[$0]++` condition is NOT TRUE for the already passed element.

awk to reorder lines in output file

In the below awk I am printing out specific tags in the input. However, I can not seem to get line 2 in the current output to be line 1. It looks like because of the way the input is formatted that is why the output is ordered in the way it is. I can not seem to change it in the awk. Thank you :).
input
"barcodedSamples": {"MEV37": {"barcodeSampleInfo": {"IonXpress_007": {"controlSequenceType": "", "expName": "R_2016_09_20_12_47_36_user_S5-00580-7-Medexome",
awk
awk -F"[]\":{}, ]*" '
{for (i=1; i<NF; i++) {if ($i =="expName") print $(i+1)
if ($i =="barcodeSampleInfo") print $(i+1) " " $(i-1)
}
}
' input
current output
IonXpress_007 MEV37
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
desired output
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
IonXpress_007 MEV37
With jq :
INPUT FILE
{
"barcodedSamples" : {
"MEV37" : {
"barcodeSampleInfo" : {
"IonXpress_007" : {
"controlSequenceType" : "",
"expName" : "R_2016_09_20_12_47_36_user_S5-00580-7-Medexome"
}
}
}
}
}
COMMAND
% jq '.barcodedSamples.MEV37.barcodeSampleInfo.IonXpress_007.expName' file.json
OUTPUT
"R_2016_09_20_12_47_36_user_S5-00580-7-Medexome"
Or with nodejs :
% node
> j = { "barcodedSamples": {"MEV37": {"barcodeSampleInfo": {"IonXpress_007": {"controlSequenceType": "", "expName": "R_2016_09_20_12_47_36_user_S5-00580-7-Medexome"}}}}}
{ barcodedSamples: { MEV37: { barcodeSampleInfo: [Object] } } }
> console.log(j.barcodedSamples.MEV37.barcodeSampleInfo.IonXpress_007.expName)
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
NOTE
now that you understand how to access any part, just modify this a bit to fully fits your needs
You can create one or more arrays in the BEGIN function. When processing lines do not print. Instead of printing do append to these arrays in the order you want. In the END function print out these arrays.
awk -F\" '{print $(NF - 1)"\n" $8,$4}' file
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
IonXpress_007 MEV37

count occurrence of value in multiple fields independently (awk)

I have seen numerous posts to achieve this task for individual fields, but I am struggling to apply it on multiple field separately.
input:
group1|apple|orange|lemon
group1|apple|kiwi|banana
group1|orange|cherry| lemon
group1|apple|orange|pear
(The real file has many more fields, so I need to use a loop to process each fields)
output:
Field|Fruit|Count
2|apple|3
2|orange|1
3|orange|2
3|kiwi|1
3|cherry|1
4|lemon|2
4|banana|1
4|pear|1
What I tried so far, but returns the entire count for all the fields:
awk '
BEGIN{FS=OFS="|"; print "Field|Fruit|Count"}
{
for(i=2; i<=NF; i++){
a[$i]=$i
count[$i]++
}
}
END{
for(j in count) print j OFS count[j]
}'
Use the field number as part of the key in the count array.
awk '
BEGIN{FS=OFS="|"; print "Field|Fruit|Count"}
{
for (i = 2; i <= NF; i++) {
count[i OFS $i]++;
}
}
END {
for (j in count) {
print j, count[j];
}
}'

Awk reverse both lines and words

I'm new to programming language and stuff
so I have to reverse with awk all the lines and as well all the words in those lines, from a file and print them out.
"File1" to reverse:
aa bb cc
foo do as
And the output printing of the "File1" should be this:
as do foo
cc bb aa
I tried this for word reverse in each line:
for (i=NF; i>1; i--) printf("%s ",$i); printf("%s\n",$1)
but if I want to print the reversed lines I have to do this
{a[NR]=$0
}END{for(i=NR; i; i--) print a[i]}
I need to work with two files with this command in terminal:
awk -f commandFile FileToBePrinted
The problem is I'm beginer in all this and I don't know how to combine those two.
Thanks!
Kev's solution looks to reverse the text in each word. You example output doesn't show that, but his key point is to use a function.
You have the code you need, you just need to rearrange it a little.
cat file1
aa bb cc
foo do as
cat commandFile
function reverse( line ) {
n=split(line, tmpLine)
for (j=n; j>0; j--) {
printf("%s ",tmpLine[j] )
}
}
# main loop
{ a[NR]=$0 }
# print reversed array
END{ for(i=NR; i>0; i--) printf( "%s\n", reverse(a[i]) ) }
Running
awk -f commandFile file1
output
as do foo
cc bb aa
There were a couple of minor changes I made, using n=split(line, tmpLine) ... print tmpLine[j], is a common method of parsing a line of input in a function to print it out. I don't think the $1 vars have scope from a value passed in from an array (your a[i] value), so I changed it to split..tmpLine[j]. I also found that the 'i' variable from END section was kept in scope in the function reverse, so I changed that to j to disambiguate the situation.
I had to figure out a few things, so below is the debug version that I used.
If you're going to have access to gawk, then you'll do well to learn how to use the debugger that is available. If you're using awk/gawk/nawk on systems without a debugger, then this is one method for understanding what is happening in your code. If you're redirecting your programs output to a file or pipe, AND if you system supports "/dev/stderr" notation, you could print the debug lines there, i.e.
#dbg print "#dbg0:line=" line > "/dev/stderr"
Some systems have other notations for accessing stderr, so if you'll be doing this much, it is worthwhile to find out what is available.
cat commandFile.debug
function reverse( line ) {
n=split(line, tmpLine)
#dbg print "#dbg0:line=" line
#dbg print "#dbg1:n=" n "\tj=" j "\ttmpLine[j]=" tmpLine[j]
for (j=n; j>0; j--) {
#dbg print "#dbg2:n=" n "\tj=" j "\ttempLine[j]=" tmpLine[j]
printf("%s ",tmpLine[j] )
}
}
# main loop
{ a[NR]=$0 }
# print reversed array
#dbg END{ print "AT END"; for(i=NR; i>0; i--) printf( "#dbg4:i=%d\t%s\n%s\n", i, a[i] , reverse(a[i])
) }
END{ for(i=NR; i>0; i--) printf( "%s\n", reverse(a[i]) ) }
I hope this helps.
awk '
{
x=reverse($0) (x?"\n":"") x
}
END{
print x
}
function reverse(s, p)
{
for(i=length(s);i>0;i--)
p=p substr(s,i,1)
return p
}' File1
use tac to reverse the lines of the file, then awk to reverse the words.
tac filename | awk '{for (i=NF; i>1; i--) printf("%s ",$i); printf("%s\n",$1)}'
You could also use something like Perl that has list-reversing functions built-in:
tac filename | perl -lane 'print join(" ", reverse(#F))'