what happened when delete array element in awk? - awk

I wrote the following code :
awk -F"\t" '{
a[1]=1; a[2]=2; a[3]=3; a[4]=4; a[5]=5;
delete a[4];
print "len", length(a);
for( i =1; i<=length(a); i++)
print i"\t"a[i]
for( i in a)
print i"\t"a[i]
}' -
And the output is:
len 4
1 1
2 2
3 3
4
5 5
4
5 5
1 1
2 2
3 3
my question is as I have deleted the 4th element and the length of the array a has become 4, so why there is still 5 elements with the value of the 4th elements become blank when I print the array? Does that indicate that 'delete' only delete the value and the corresponding index remains?

Remove the middle for loop and you'll see what's happening:
$ echo x | awk -F"\t" '{
a[1]=1; a[2]=2; a[3]=3; a[4]=4; a[5]=5;
delete a[4];
print "len", length(a);
for( i in a)
print i"\t"a[i]
}'
len 4
2 2
3 3
5 5
1 1
The delete is working as you expect, removing the array element with index 4, leaving 4 elements with indices 1, 2, 3, and 5. (Even though you are using numeric indices, it's still an associative array and the old a[5] is not now accessible as a[4] --- it's still a[5].)
The reason you're seeing five elements in your example is the middle for loop:
for( i =1; i<=length(a); i++)
print i"\t"a[i]
By simply referring to a[4] in the above print statement, you are recreating an element of the a array with that index having an empty value.

Related

Calculate average and write it in other file [duplicate]

This question already has answers here:
Average marks from list
(3 answers)
Closed 3 years ago.
I have a list of students with ID and marks, and I need to make another one with their average marks. main_list:
#name surname student_index_number course_group_id lecturer_id list_of_marks
athos musketeer 1 1 1 3,4,5,3.5
porthos musketeer 2 1 1 2,5,3.5
aramis musketeer 3 2 2 2,1,4,5
And I have this script
awk '{ n = split($6, a, ","); total=0; for (v in a) total += a[v]; print total / n }' main_list
But I don't want to print it, I want to write it in other file called average marks. Final content should be like this, average_list:
athos musketeer 1 1 1 3.875
porthos musketeer 2 1 1 3.5
aramis musketeer 3 2 2 3
Could you please try following once.
while read first second third fourth fifth sixth
do
if [[ "$first" =~ (^#) ]]
then
continue
fi
count="${sixth//[^,]}"
val=$(echo "(${#count}+1)" | bc)
new_val=$(echo "scale=2; (${sixth//,/+})/$val" | bc)
echo "$first $second $third $fourth $fifth $new_val"
done < "Input_file" > "Output_file"
With your attempt try following.
awk '{ n = split($6, a, ","); total=0; for (v in a) total += a[v]; print $1,$2,$3,$4,$5,total / n }' Input_file > "Output_file"
With awk:
awk '{n=split($NF,array,","); NF--; sum=0; for(i=1; i<=n; i++){sum+=array[i]} print $0,sum/n}' file
Split last field ($NF) with , to an array (array). n contains number of elements. Reduce number of columns in current line by one (NF--). Add up array content with for loop and output rest of current line ($0) and result (sum/n)
Output:
athos musketeer 1 1 1 3.875
porthos musketeer 2 1 1 3.5
aramis musketeer 3 2 2 3
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

Counting the number of lines in each column

Is it possible to count the number of lines in each column of a file? For example, I've been trying to use awk to separate columns on the semi-colon symbol, specify each column individually and us wc command to count any and all occurrences within that column.
For the below command I am trying to find the number of items in column 3 without counting blank lines. Unfortunately, this command just counts the entire file. I could move the column to a different file and count that file but I just want to know if there is a much quicker way of going about this?
awk -F ';' '{print $3}' file.txt | wc -l
Data file format
; 1 ; 2 ; 3 ; 4 ; 5 ; 6 ;
; 3 ; 4 ; 5 ; 6 ; ; 4 ;
; ; 3 ; 5 ; 6 ; 9 ; 8 ;
; 1 ; 6 ; 3 ; ; ; 4 ;
; 2 ; 3 ; ; 3 ; ; 5 ;
Example output wanted
Column 1 = 4 aka(1 + 3 + 1 + 2)
Column 2 = 5
Column 3 = 4
Colunm 4 = 4
Column 5 = 2
Column 6 = 5
Keep separate counts for each field using an array, then print the totals when you're done:
$ awk -F' *; *' '{ for (i = 2; i < NF; ++i) if ($i != "") ++count[i] }
END { for (i = 2; i < NF; ++i) print "Column", i-1, "=", count[i] }' file
Column 1 = 4
Column 2 = 5
Column 3 = 4
Column 4 = 4
Column 5 = 2
Column 6 = 5
Set the field separator to consume the semicolons as well as any surrounding spaces.
Loop through each field (except the first and last ones, which are always empty) and increment a counter for non-empty fields.
it would be tempting to use if ($i) but this would fail for a column containing a 0.
Print the counts in the END block, offsetting by -1 to start from 1 instead of 2.
One assumption made here is that the number of columns in each line is uniform throughout the file, so that NF from the last line can safely be used in the END block.
A slight variation, using a simpler field separator:
$ awk -F';' '{ for (i = 2; i < NF; ++i) count[i] += ($i ~ /[^ ]/) }
END { for (i = 2; i < NF; ++i) print "Column", i-1, "=", count[i] }' file
$i ~ /[^ ]/ is equal to 1 if any non-space characters exist in the ith field, 0 otherwise.

Use awk to find all columns which contain values above and below specified numbers?

I would like an Awk command where I can search a large file for columns which contain numbers both below 3 and above 5. It also needs to skip the first column.
e.g. for the following file
1 2 6 2
2 1 7 3
3 2 5 4
4 2 8 7
5 2 6 8
6 1 9 9
In this case, only column 4 is a match, as it is the only column with values above 5 and below 3 (except for column 1, which we skip).
Currently, I have this code:
awk '{for (i=2; i<=NF; i++) {if ($i < 3 && $i > 5) {print i}}}'
But this only reads one row at a time (so never makes a match). I want to search all of the rows, but I am unable to work out how this is done.
Ideally the output would simply be the column number. So for this example, simply '4'.
Many thanks.
Could you please try following and let me know if this helps you.
awk '{for(i=1;i<=NF;i++){if($i<3){col[i]++};if($i>5){col1[i]++}}} END{for(j in col){if(col[j]>=1 && col1[j]>=1){print j}}}' Input_file
If you want to start searching from second column then change i=1 to i=2 in above code.
EDIT: Adding a non-one liner form of solution too now.
awk '
{
for(i=1;i<=NF;i++){
if($i<3) { col[i]++ };
if($i>5) { col1[i]++}}
}
END{
for(j in col){
if(col[j]>=1 && col1[j]>=1){ print j }}
}' Input_file

move certain columns to end using awk

I have large tab delimited file with 1000 columns. I want to rearrange so that certain columns have to be moved to the end.
Could anyone help using awk
Example input:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Move columns 5,6,7,8 to the end.
Output:
1 2 3 4 9 10 11 12 13 14 15 16 17 18 19 20 5 6 7 8
This prints columns 1 to a, then b to the last, and then columns a+1 to b-1:
$ awk -v a=4 -v b=9 '{for (i=1;i<=NF;i+=i==a?b-a:1) {printf "%s\t",$i};for (i=a+1;i<b;i++) {printf "%s\t",$i};print""}' file
1 2 3 4 9 10 11 12 13 14 15 16
17 18 19 20 5 6 7 8
The columns are moved in this way for every line in the input file, however many lines there are.
How it works
-v a=4 -v b=9
This defines the variables a and b which determine the limits on which columns will be moved.
for (i=1;i<=NF;i+=i==a?b-a:1) {printf "%s\t",$i}
This prints all columns except the ones from a+1 to b-1.
In this loop, i is incremented by one except when i==a in which case it is incremented by b-a so as to skip over the columns to be moved. This is done with awk's ternary statement:
i += i==a ? b-a : 1
+= simply means "add to." i==a ? b-a : 1 is the ternary statement. The value that it returns depends on whether i==a is true or false. If it is true, the value before the colon is returned. If it is false, the value after the colon is returned.
for (i=a+1;i<b;i++) {printf "%s\t",$i}
This prints columns a+1 to b-1.
print""
This prints a newline character to end the line.
Alternative solution that avoids printf
This approach assembles the output into the variable out and then prints with a plain print command, avoiding printf and the need for percent signs:
awk -v a=4 -v b=9 '{out="";for (i=1;i<=NF;i+=i==a?b-a:1) out=out $i"\t";for (i=a+1;i<b;i++) out=out $i "\t";print out}' file
One way to rearrange 2 columns ($5 become $20 and $20 become $5) the rest stay unchanged :
$ awk '{x=$5; $5=$20; $20=x; print}' file.txt
for 4 columns :
$ awk '{
x=$5; $5=$20; $9=x;
y=$9; $9=$10; $10=y;
print
}' file.txt
My approach:
awk 'BEGIN{ f[5];f[6];f[7];f[8] } \
{ for(i=1;i<=NF;i++) if(!(i in f)) printf "%s\t", $i; \
for(c in f) printf "%s\t", $c; printf "\n"} ' file
It's splitted in 3 parts:
The BEGIN{} part determines which field should be moved to the end. The indexes of the array f are moved. In the example it's 5, 6, 7 and 8.
Cycle trough every field (doesn't matter if there are 1000 fields or more) and check if they are in the array. If not print them.
Now we need the skipped fields. Cycle trough the f array and print those values.
Another way in awk
Switch last A-B with last N fields
awk -vA=4 -vB=8 '{x=B-A;for(i=A;i<=B;i++){y=$i;$i=$(t=(NF-x--));$t=y}}1' file
Put N rows from end into positon A
awk -vA=3 -vB=8 '{split($0,a," ");x=A++;while(x++<B)$x=a[NF-(B-x)];while(B++<NF)$B=a[A++]}1' file

awk array initialization at each line fails

I have a log file where each line contains some numbers separated by ,. I just wanted to some operation with each number. It seemed easy with awk, but somehow i got stuck. The array which i'm using to split each line, is getting initialized only once at the first line. After that the array is not getting clear. split is supposed to clear the array first then use it, i have even used delete array. But still the problem persists. Any help will be appreciated.
Below is some example;
This is a sample file
[[bash_prompt$]]$ cat log
1,2,3
2
9
1,4
5,7
7
8
6,2
This is what i'm getting
[[bash_prompt$]]$ awk '{print "New Line " $1; delete a; split($1,a,","); for(var in a){ print "Array Element " var; } }' log
New Line 1,2,3
Array Element 1
Array Element 2
Array Element 3
New Line 2
Array Element 1
New Line 9
Array Element 1
New Line 1,4
Array Element 1
Array Element 2
New Line 5,7
Array Element 1
Array Element 2
New Line 7
Array Element 1
New Line 8
Array Element 1
New Line 6,2
Array Element 1
Array Element 2
But below is what I am expecting
[[bash_prompt$]]$ awk '{print "New Line " $1; delete a; split($1,a,","); for(var in a){ print "Array Element " var; } }' log
New Line 1,2,3
Array Element 1
Array Element 2
Array Element 3
New Line 2
Array Element 2
New Line 9
Array Element 9
New Line 1,4
Array Element 1
Array Element 4
New Line 5,7
Array Element 5
Array Element 7
New Line 7
Array Element 7
New Line 8
Array Element 8
New Line 6,2
Array Element 6
Array Element 2
Mine is GNU Awk 3.1.5. Also I have found another way by using combination of shell and awk, ( running awk individually on each line) but i want to know what I am doing wrong
for (var in array) sets var to the index of the next element of the array so var is the index, not the contents. Use
print "Array Element " a[var]
You don't need to use split() just set the field separator:
awk -F, '{print "New Line",$0;for(i=1;i<=NF;i++)print "Array Element",$i}' log
New Line 1,2,3
Array Element 1
Array Element 2
Array Element 3
New Line 2
Array Element 2
New Line 9
Array Element 9
New Line 1,4
Array Element 1
Array Element 4
New Line 5,7
Array Element 5
Array Element 7
New Line 7
Array Element 7
New Line 8
Array Element 8
New Line 6,2
Array Element 6
Array Element 2