awk cumulative sum in on dimension

awk cumulative sum in on dimension - awk

Good afternoon,
I would like to make a cumulative sum for each column and line in awk.
My in file is :
1 2 3 4
2 5 6 7
2 3 6 5
1 2 1 2
And I would like : per column
1 2 3 4
3 7 9 11
5 10 15 16
6 12 16 18
6 12 16 18
And I would like : per line
1 3 5 9 9
2 7 13 20 20
2 5 11 16 16
1 3 4 6 6
I did the sum per column as :
awk '{ for (i=1; i<=NF; ++i) sum[i] += $i}; END { for (i in sum) printf "%s ", sum[i]; printf "\n"; }' test.txt # sum
And per line .
awk '
BEGIN {FS=OFS=" "}
{
sum=0; n=0
for(i=1;i<=NF;i++)
{sum+=$i; ++n}
print $0,"sum:"sum,"count:"n,"avg:"sum/n
}' test.txt
But I would like to print all the lines and columns.
Do you have an idea?

It looks like you have all the correct information available, all you are missing is the printout statements.
Is this what you are looking for?
accumulated sum of the columns:
% cat foo
1 2 3 4
2 5 6 7
2 3 6 5
1 2 1 2
% awk '{ for (i=1; i<=NF; ++i) {sum[i]+=$i; $i=sum[i] }; print $0}' foo
1 2 3 4
3 7 9 11
5 10 15 16
6 12 16 18
accumulated sum of the rows:
% cat foo
1 2 3 4
2 5 6 7
2 3 6 5
1 2 1 2
% awk '{ sum=0; for (i=1; i<=NF; ++i) {sum+=$i; $i=sum }; print $0}' foo
1 3 6 10
2 7 13 20
2 5 11 16
1 3 4 6
Both these make use of the following :
each variable has value 0 by default (if used numerically)
I replace the field $i with what the sum value
I reprint the full line with print $0

row sums with repeated last element
$ awk '{s=0; for(i=1;i<=NF;i++) $i=s+=$i; $i=s}1' file
1 3 6 10 10
2 7 13 20 20
2 5 11 16 16
1 3 4 6 6
$i=s sets the index value (now incremented to NF+1) to the sum and 1 prints the line with that extra field.
columns sums with repeated last row
$ awk '{for(i=1;i<=NF;i++) c[i]=$i+=c[i]}1; END{print}' file
1 2 3 4
3 7 9 11
5 10 15 16
6 12 16 18
6 12 16 18
END{print} repeats the last row
ps. your math seems to be wrong for the row sums

Related

Merging multiple files with null values using AWK

Sorry I am posting it again as i messed up in my earlier post:
I am interesting in joining multiple files (e.g., file 1 file2 file 3...) using matching values in column 1 and get this desired output. Would appreciate any help please:
file1:
A 2 3 4
B 3 7 8
C 4 6 9
file2:
A 7 6 3
C 2 4 7
D 1 6 4
file3:
A 3 2 7
B 4 7 3
E 3 6 8
Output:
A 2 3 4 7 6 3 3 2 7
B 3 7 8 n n n 4 7 3
C 4 6 9 2 4 7 n n n
D n n n 1 6 4 n n n
E n n n n n n 3 6 8

Here is one for awk. Tested with GNU awk, mawk, original-awk ie. awk 20121220 and Busybox awk:
$ awk '
function nn(c, b,i) {
if(c)
for(i=1;i<=c;i++)
b=b "n "
return b
}
FNR==1{nf+=(NF-1)}
{
for(i=2;i<=NF;i++)
b[$1]=b[$1] $i OFS
a[$1]=a[$1] (n[$1]<(nf-NF+1)?nn(nf-NF+1-n[$1]):"") b[$1]
n[$1]=nf+0
delete b[$1]
}
END{
for(i in a)
print i,a[i] (n[i]<(nf)?nn(nf-n[i]):"")
}' file1 file2 file3
Output:
A 2 3 4 7 6 3 3 2 7
B 3 7 8 n n n 4 7 3
C 4 6 9 2 4 7 n n n
D n n n 1 6 4 n n n
E n n n n n n 3 6 8

Variation with reputation for rows of file

I have a file
4 5 6 6
1 7 5 5
7 0 2 1
7 8 0 6
and I would like to produce files that have random chosen rows from this file with reputation. So the outputs can be for instance:
4 5 6 6
1 7 5 5
1 7 5 5
7 8 0 6
7 8 0 6
1 7 5 5
1 7 5 5
7 8 0 6
I mean that some of the rows will be in the output more times, some of the rows zero times. Is it possible to produce a list of random numbers with reputation and according to it choose rows from input? Is it possible in awk or is some other language more appropriate?

If this isn't all you need:
$ shuf -n $(wc -l < file) -r file
4 5 6 6
7 8 0 6
1 7 5 5
1 7 5 5
then edit your question to clarify your requirements.

Don't understand the concept of reputation but here is a way using just randomness:
$ awk -v seed=$RANDOM '{ # set the random seed externally
a[NR]=$0 # hash records to a
}
END {
srand(seed)
for(i=1;i<=4;i++) # 4 is the number of records to output
print a[int(1+rand()*NR)] # get a random array element and output it
}' file
An example of output:
7 8 0 6
7 8 0 6
7 8 0 6
1 7 5 5

You could also do this with coreutils shuf and sed, e.g.:
n=$(wc -l < infile)
shuf -n $n -i 1-$n -r | sed 's/$/p/' | sed -nf - infile
Output example:
4 5 6 6
4 5 6 6
1 7 5 5
1 7 5 5

Pad column with n zeros and trim excess values

For example, the original data file
file.org :
1 2 3 4 5
6 7 8 9 0
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
Insert three data points (0) in column 2,
The output file should look like this
file.out :
1 0 3 4 5
6 0 8 9 0
11 0 13 14 15
16 2 18 19 20
21 7 23 24 25
Please help.

The following awk will do the trick:
awk -v n=3 '{a[NR]=$2; $2=a[NR-n]+0}1' file

$ awk -v n=3 '{x=$2; $2=a[NR%n]+0; a[NR%n]=x} 1' file
1 0 3 4 5
6 0 8 9 0
11 0 13 14 15
16 2 18 19 20
21 7 23 24 25

If you want to try Perl,
$ cat file.orig
1 2 3 4 5
6 7 8 9 0
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
$ perl -lane ' BEGIN { push(#t,0,0,0) } push(#t,$F[1]);$F[1]=shift #t; print join(" ",#F) ' file.orig
1 0 3 4 5
6 0 8 9 0
11 0 13 14 15
16 2 18 19 20
21 7 23 24 25
$

EDIT: Since OP has edited question so adding solution as per new question.
awk -v count=3 '++val<=count{a[val]=$2;$2=0} val>count{if(++re_count<=count){$2=a[re_count]}} 1' Input_file
Output will be as follows.
1 0 3 4 5
6 0 8 9 0
11 0 13 14 15
16 2 18 19 20
21 7 23 24 25
Could you please try following.
awk -v count=5 '
BEGIN{
OFS="\t"
}
$2{
val=(val?val ORS OFS:OFS)$2
$2=0
occ++
$1=$1
}
1
END{
while(++occ<=count){
print OFS 0
}
print val
}' Input_file
Output will be as follows.
1 0 3 4 5
6 0 8 9 0
11 0 13 14 15
0
0
2
7
12

How to improve this example?

$ seq 12 | awk '{ if(NR%2) { print $0, (NR+1)/2 } else { print $0, NR/2} }'
1 1
2 1
3 2
4 2
5 3
6 3
7 4
8 4
9 5
10 5
11 6
12 6
How do I change the above command? I want to print:
1 1
2 1
3 1
4 2
5 2
6 2
7 3
8 3
9 3
10 4
11 4
12 4

Hmm. Just applying your algorithm to 3:
seq 12|awk '{if((NR%3)==1) { print $0, (NR+2)/3 } else if ((NR%3)==2) { print $0, (NR+1)/3 } else { print $0, NR/3} }'
But I'm sure there is also a shorter algo...

awk '{print $0, int((NR+2)/3)}'

AWK - removal of the same fields on the basis of the "$1"

I have a file1:
6
3
6
9
2
6
This command prints the result:
awk 'NR==1{a=$1};$0!=a' file1
3
9
2
Now I have file2:
6 1 2 3 4 5
3 3 4 4 4 6
6 5 2 2 5 1
9 1 3 5 4 1
2 5 6 4 8 5
6 1 5 2 3 1
I want to do the same thing, but with file2. I want to print out the result:
3 3 4 4 5 6
9 5 3 2 8 1
2 5 6 5 3 1
5 4 1
2
I want to do it in awk. Thank you for your help.

AWK is not really suited for what you are trying to do, since it is made for processing rows one at a time, while you are trying to shift numbers up and down between different rows. That said, this monster should do what you want:
awk 'NR==1{nc=NF;for(i=1;i<=nc;i++)a[i]=$i}{for(i=1;i<=nc;i++){if($i!=a[i]){v[m[i]++,i]=$i;if(m[i]>nl)nl=m[i]}}}END{for(l=0;l<nl;l++){for(i=1;i<=nc;i++){if(l<m[i]){printf("%d ", v[l,i])}else{printf(" ")}}printf("\n")}}'
If, on the other hand, your matrix of numbers had been transposed, this task would have been far simpler:
awk '{for(i=2;i<=NF;i++)if($i!=$1)printf(" %d",$i);printf("\n")}'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk cumulative sum in on dimension - awk

Related

Merging multiple files with null values using AWK

Variation with reputation for rows of file

Pad column with n zeros and trim excess values

How to improve this example?

AWK - removal of the same fields on the basis of the "$1"

Categories

Resources