This question already has answers here:
Average marks from list
(3 answers)
Closed 3 years ago.
I have a list of students with ID and marks, and I need to make another one with their average marks. main_list:
#name surname student_index_number course_group_id lecturer_id list_of_marks
athos musketeer 1 1 1 3,4,5,3.5
porthos musketeer 2 1 1 2,5,3.5
aramis musketeer 3 2 2 2,1,4,5
And I have this script
awk '{ n = split($6, a, ","); total=0; for (v in a) total += a[v]; print total / n }' main_list
But I don't want to print it, I want to write it in other file called average marks. Final content should be like this, average_list:
athos musketeer 1 1 1 3.875
porthos musketeer 2 1 1 3.5
aramis musketeer 3 2 2 3
Could you please try following once.
while read first second third fourth fifth sixth
do
if [[ "$first" =~ (^#) ]]
then
continue
fi
count="${sixth//[^,]}"
val=$(echo "(${#count}+1)" | bc)
new_val=$(echo "scale=2; (${sixth//,/+})/$val" | bc)
echo "$first $second $third $fourth $fifth $new_val"
done < "Input_file" > "Output_file"
With your attempt try following.
awk '{ n = split($6, a, ","); total=0; for (v in a) total += a[v]; print $1,$2,$3,$4,$5,total / n }' Input_file > "Output_file"
With awk:
awk '{n=split($NF,array,","); NF--; sum=0; for(i=1; i<=n; i++){sum+=array[i]} print $0,sum/n}' file
Split last field ($NF) with , to an array (array). n contains number of elements. Reduce number of columns in current line by one (NF--). Add up array content with for loop and output rest of current line ($0) and result (sum/n)
Output:
athos musketeer 1 1 1 3.875
porthos musketeer 2 1 1 3.5
aramis musketeer 3 2 2 3
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
Related
Currently i am using a awk script which compares numbers in non sequential order and prints the difference . It works pretty well for numbers but if i have alphanumeric characters , it doesn't seem to work well
In its current state , apart from simply comparing the numbers it does 2 things additionally :
Currently it accounts for the zeros before a number or character and compares the absolute values only ignoring zeros before a number or character
Currently If the same number or character occurs multiple times in both files , it outputs the additional occurance
i just want the script to work well for alphanumeric characters as well as currently it only seem to work well with plain numbers. Can someone please edit the script to have the desired output while also considering the above 2 conditions
Current script
awk '{k=$0+0}
NR==FNR {a[k]++; next}
!(k in a && a[k]-->0);
END {for(k in a) while(a[k]-->0) print k}' file1 file2
Example below
cat file1
1
01
001
8
2B
12
13C
027B
0027B
cat file2
1
2
08
12
13C
02B
9
27B
Expected output/result
1
1
2
9
27B
Explanation of expected output
In file1 : "1" , "01" , "001" evaluates to 1 * 3 times
In file 2 : "1" is present only once
Hence "1" is present twice in result ( 3-1 times )
"2" and "9" are exclusively present in file2 , So obviously both simply form part of output
In file1 : '027B" , "0027B" evaluates to 27B * 2 times
In file 2 - "27B" is present only once
Hence '27B" is present once in result ( 2 -1 times )
Explanation of matched items ( ones not forming part of expected output )
"8" from file1 ( line 4 )is matched with "08" from file2 ( line 3)
"12" from file1 ( line 6) is matched with "12" from file2 ( line 4)
"13C" from file1 (line 7 ) is matched with "13C" from file2 ( line 5 )
"2B" from file1 ( line 5 ) is matched with "02B" from file2 ( line 6 )
Lastly the order of items in expected output should be in ascending order like shown in my above example, lets say if the eg above had 3 in expected output it should read vertically as 1 1 2 3 9 27B
It should be enough to remove leading zeros when forming the key (with a special case for zero values like 0000):
/^0+$/ { k = 0 }
/[^0]/ { k = $0; sub(/^0*/, "", k) }
NR==FNR {a[k]++; next}
!(k in a && a[k]-->0);
END {for(k in a) while(a[k]-->0) print k}
$ awk -f a.awk file1 file2
2
9
27B
1
1
RE-EDIT
If you just want the values sorted numerically, pipe into sort:
$ awk -f a.awk file1 file2 | sort -n
1
1
2
3
4
5
9
27B
To output in the order as found in file2, you can remember the order in another array and then do all the printing in the END block. This version will output the values in the order of file2, with any values only in file1 printed last.
/^0+$/ { k = 0 }
/[^0]/ { k = $0; sub(/^0*/, "", k) }
NR==FNR {a[k]++; next}
{ b[FNR] = k }
!(k in a && a[k]--) { a[k] = 1 }
END {
for (i=1; i<=FNR; ++i) {
k = b[i]
while(a[k]-->0) print k
}
for (k in a) {
while(a[k]-->0) print k
}
}
$ awk -f a.awk file1 file2
1
1
2
9
27B
3
4
5
Here my file.dat
1 A 1 4
2 2 4
3 4 4
3 7 B
1 U 2
Running awk '{print $2}' file.dat gives:
A
2
4
7
U
But I would like to keep the empty field:
A
4
U
How to do it?
I must add that between :
column 1 and 2 there is 3 whitespaces field separator
column 2 and 3 and between column 3 and 4 one whitespace field separator
So in column 2 there are 2 fields missing (lines 2 and 4) and in column 4
there are also 2 fields missing (lines 3 and 5)
If this isn't all you need:
$ awk -F'[ ]' '{print $4}' file
A
4
U
then edit your question to provide a more truly representative example and clearer requirements.
If the input is fixed-width columns, you can use substr to extract the slice you want. I have assumed that you want a single character at index 5:
awk '{ print(substr($0,5,1)) }' file
Your awk code is missing field separators.
Your example file doesn't clearly show what that field separator is.
From observation your file appears to have 5 columns.
You need to determine what your field separator is first.
This example code expects \t which means <TAB> as the field separator.
awk -F'\t' '{print $3}' OFS='\t' file.dat
This outputs the 3rd column from the file. This is the 'read in' field separator -F'\t' and OFS='\t' is the 'read out'.
A
4
U
For GNU awk. It processes the file twice. On the first time it examines all records for which string indexes have only space and considers continuous space sequences as separator strings building up FIELDWIDTHS variable. On the second time it uses that for fixed width processing of the data.
a[i]:s get valus 0/1 and h (header) with this input will be 100010101 and that leads to FIELDWIDTHS="4 2 2 1":
1 A 1 4
2 2 4
3 4 4
3 7 B
1 U 2
| | | |
100010101 - while(match(h,/10*/))
\ /|/|/|
4 2 2 1
Script:
$ awk '
NR==FNR {
for(i=1;i<=length;i++) # all record chars
a[i]=((a[i]!~/^(0|)$/) || substr($0,i,1)!=" ") # keep track of all space places
if(--i>m)
m=i # max record length...
next
}
BEGINFILE {
if(NR!=0) { # only do this once
for(i=1;i<=m;i++) # ... used here
h=h a[i] # h=100010101
while(match(h,/10*/)) { # build FIELDWIDTHS
FIELDWIDTHS=FIELDWIDTHS " " RLENGTH # qnd
h=substr(h,RSTART+RLENGTH)
}
}
}
{
print $2 # and output
}' file file
And output:
A
4
U
You need to trim off the space from the fields, though.
I would like an Awk command where I can search a large file for columns which contain numbers both below 3 and above 5. It also needs to skip the first column.
e.g. for the following file
1 2 6 2
2 1 7 3
3 2 5 4
4 2 8 7
5 2 6 8
6 1 9 9
In this case, only column 4 is a match, as it is the only column with values above 5 and below 3 (except for column 1, which we skip).
Currently, I have this code:
awk '{for (i=2; i<=NF; i++) {if ($i < 3 && $i > 5) {print i}}}'
But this only reads one row at a time (so never makes a match). I want to search all of the rows, but I am unable to work out how this is done.
Ideally the output would simply be the column number. So for this example, simply '4'.
Many thanks.
Could you please try following and let me know if this helps you.
awk '{for(i=1;i<=NF;i++){if($i<3){col[i]++};if($i>5){col1[i]++}}} END{for(j in col){if(col[j]>=1 && col1[j]>=1){print j}}}' Input_file
If you want to start searching from second column then change i=1 to i=2 in above code.
EDIT: Adding a non-one liner form of solution too now.
awk '
{
for(i=1;i<=NF;i++){
if($i<3) { col[i]++ };
if($i>5) { col1[i]++}}
}
END{
for(j in col){
if(col[j]>=1 && col1[j]>=1){ print j }}
}' Input_file
I have a file as follows:
5 6
7 8
12 15
Using awk, how can I find the distance between the second column of one line with the first column of the next line. In this case, distance between 6 and 7 and 8 and 12 and print as follows, distance of first line set to zero:
5 6 0
7 8 1
12 15 4
awk '{print $0, (NR>1?$1-p:0); p=$2}' file
try:
awk 'NR==1{val=$2;print $0,"0";next} {print $0,$1-val;val=$2}' Input_file
Adding explanation now too successfully.
Checking for NR==1(when first line of Input_file) is there, then create a variable named val tp second field of the Input_file and then print the current line with "0" then do next(which will skip all further statements). Then printing the current line along with $1-val's value and then assigning the value of variable of val to $2 of the current line then.
Short awk approach:
awk 'NR==1{ $3=0 }NR>1{ $3=$1-p }{ p=$2 }1' file
The output:
5 6 0
7 8 1
12 15 4
p=$2 - capture the 2nd field value (p - considered as previous line value)
I have large tab delimited file with 1000 columns. I want to rearrange so that certain columns have to be moved to the end.
Could anyone help using awk
Example input:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Move columns 5,6,7,8 to the end.
Output:
1 2 3 4 9 10 11 12 13 14 15 16 17 18 19 20 5 6 7 8
This prints columns 1 to a, then b to the last, and then columns a+1 to b-1:
$ awk -v a=4 -v b=9 '{for (i=1;i<=NF;i+=i==a?b-a:1) {printf "%s\t",$i};for (i=a+1;i<b;i++) {printf "%s\t",$i};print""}' file
1 2 3 4 9 10 11 12 13 14 15 16
17 18 19 20 5 6 7 8
The columns are moved in this way for every line in the input file, however many lines there are.
How it works
-v a=4 -v b=9
This defines the variables a and b which determine the limits on which columns will be moved.
for (i=1;i<=NF;i+=i==a?b-a:1) {printf "%s\t",$i}
This prints all columns except the ones from a+1 to b-1.
In this loop, i is incremented by one except when i==a in which case it is incremented by b-a so as to skip over the columns to be moved. This is done with awk's ternary statement:
i += i==a ? b-a : 1
+= simply means "add to." i==a ? b-a : 1 is the ternary statement. The value that it returns depends on whether i==a is true or false. If it is true, the value before the colon is returned. If it is false, the value after the colon is returned.
for (i=a+1;i<b;i++) {printf "%s\t",$i}
This prints columns a+1 to b-1.
print""
This prints a newline character to end the line.
Alternative solution that avoids printf
This approach assembles the output into the variable out and then prints with a plain print command, avoiding printf and the need for percent signs:
awk -v a=4 -v b=9 '{out="";for (i=1;i<=NF;i+=i==a?b-a:1) out=out $i"\t";for (i=a+1;i<b;i++) out=out $i "\t";print out}' file
One way to rearrange 2 columns ($5 become $20 and $20 become $5) the rest stay unchanged :
$ awk '{x=$5; $5=$20; $20=x; print}' file.txt
for 4 columns :
$ awk '{
x=$5; $5=$20; $9=x;
y=$9; $9=$10; $10=y;
print
}' file.txt
My approach:
awk 'BEGIN{ f[5];f[6];f[7];f[8] } \
{ for(i=1;i<=NF;i++) if(!(i in f)) printf "%s\t", $i; \
for(c in f) printf "%s\t", $c; printf "\n"} ' file
It's splitted in 3 parts:
The BEGIN{} part determines which field should be moved to the end. The indexes of the array f are moved. In the example it's 5, 6, 7 and 8.
Cycle trough every field (doesn't matter if there are 1000 fields or more) and check if they are in the array. If not print them.
Now we need the skipped fields. Cycle trough the f array and print those values.
Another way in awk
Switch last A-B with last N fields
awk -vA=4 -vB=8 '{x=B-A;for(i=A;i<=B;i++){y=$i;$i=$(t=(NF-x--));$t=y}}1' file
Put N rows from end into positon A
awk -vA=3 -vB=8 '{split($0,a," ");x=A++;while(x++<B)$x=a[NF-(B-x)];while(B++<NF)$B=a[A++]}1' file