Reading and summing columns from files - sum

I have 2 tables which contain 2 columns (id, value). All the tables have the same number of rows (id). So, I want to sum columns value from all the tables for each id.
Can anyone help me how I can do that with Fortran?
Please find below a part of the code that I wrote but in line 19, it does not estimate properly the summation.
integer :: id,idmax,nf,files,write_id
real, allocatable, dimension(:) :: val,vct_val,sum_files
idmax=5
nfiles=2
allocate (vct_val(idmax))
allocate(val(nfiles))
allocate (sum_files(idmax))
nf=0 !nf is the number of the file
do nf = 1,nfiles
open(10,file=fname_daily)
do id=1, idmax
read(10,'(i6,1x,f16.2)') write_id,val(nf)
vct_val(id)=val(nf)
print *, 'read file ', write_id, vct_val(id), val(nf)
end do !for all rows(id)
enddo !for all nfiles
do id=1, idmax
print *, 'file1 sum', write_id, val(1)
print *, 'file2 sum', write_id, val(2)
sum_files(id)=val(1)+val(2)
end do !for all rows(id)
write(20,'(i6,A,f20.2)') (write_id, tab, sum_files(write_id), write_id=1, idmax)
In the following you can find an example of file 1 and file 2 for which I want to sum their columns "value" for each id.
file1
"id" "value"
1 51000.00
2 1612000.00
3 1520000.00
4 1520000.00
5 1134000.00
file2
"id" "value"
1 35740000.00
2 24460000.00
3 21800000.00
4 54280000.00
5 42530000.00
The first print(line 14) gives the following correct values:
read file 1 51000.0000 51000.0000
read file 2 1612000.00 1612000.00
read file 3 1520000.00 1520000.00
read file 4 1520000.00 1520000.00
read file 5 1134000.00 1134000.00
read file 1 35740000.0 35740000.0
read file 2 24460000.0 24460000.0
read file 3 21800000.0 21800000.0
read file 4 54280000.0 54280000.0
read file 5 42530000.0 42530000.0
However, the second print (line 18) gives only the values for id 5:
file1 sum 5 1134000.00
file2 sum 5 42530000.0
file1 sum 5 1134000.00
file2 sum 5 42530000.0
file1 sum 5 1134000.00
file2 sum 5 42530000.0
file1 sum 5 1134000.00
file2 sum 5 42530000.0
file1 sum 5 1134000.00
file2 sum 5 42530000.0
file1 sum 5 1134000.00
file2 sum 5 42530000.0

Related

read columns from several file and print them in individual columns

I have several text files which each one contains several columns contains numbers e.g:
5 10 6
6 20 1
7 30 4
8 40 3
9 23 1
4 13 6
I want to collect the second column of all files in separate columns. I used this code, it works but print all second columns in a single column.
{awk '{print $3}' > outfile}
How can I print each column in an individual one?
$ awk '{a[FNR]=(FNR in a)?a[FNR] OFS $2:$2}
END {for(i=1;i<=NR;i++) print a[i]}' file1 file2 ... > outfile
assumes all files have the same number of lines, otherwise alignment will be off.

Find the ratio among columns

I have some input files of the following format:
File1.txt File2.txt File3.txt
1 2 1 6 1 20
2 3 2 9 2 21
3 7 3 14 3 28
Now I need to output a new single file using AWK with three columns, the first column remains the same, and it is the same among the three files (just an ordinal number).
However for 2nd and the 3rd column of this newly created file, I need to values of the 2nd column of the second file divided by the values of the 2nd column of the 1st file, also the values of the second column of the third file divided by the value of the 2nd column of the first file. In other words, the 2nd columns for the 2nd and 3rd file divided by the 2nd column of the first file.
e.g.:
Result.txt
1 3 10
2 3 7
3 2 4
Use a multidimensional matrix to store the values:
awk 'FNR==NR {a[$1]=$2; next}
{b[$1,ARGIND]=$2/a[$1]}
END {for (i in a)
print i,b[i,2],b[i,3]
}' f1 f2 f3
Test
$ awk 'FNR==NR {a[$1]=$2; next} {b[$1,ARGIND]=$2/a[$1]} END {for (i in a) print i,b[i,2],b[i,3]}' f1 f2 f3
1 3 10
2 3 7
3 2 4

AWK: Comparing two different columns in two files

I have these two files
File1:
9 8 6 8 5 2
2 1 7 0 6 1
3 2 3 4 4 6
File2: (which has over 4 million lines)
MN 1 0
JK 2 0
AL 3 90
CA 4 83
MK 5 54
HI 6 490
I want to compare field 6 of file1, and compare field 2 of file 2. If they match, then put field 3 of file2 at the end of file1
I've looked at other solutions but I can't get it to work correctly.
Desired output:
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
My attempt:
awk 'NR==FNR{a[$2]=$2;next}a[$6]{print $0,a[$6]}' file2 file1
program just hangs after that.
To print all lines in file1 with match if available:
$ awk 'FNR==NR{a[$2]=$3;next;} {print $0,a[$6];}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
To print only the lines that have a match:
$ awk 'NR==FNR{a[$2]=$3;next} $6 in a {print $0,a[$6]}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
Note that I replaced a[$2]=$2 with a[$2]=$3 and changed the test a[$6] (which is false if the value is zero) to $6 in a.
Your own attempt basically has two bugs as seen in #John1024's answer:
You use field 2 as both key and value in a, where you should be storing field 3 as the value (since you want to keep it for later), i.e., it should be a[$2] = $3.
The test a[$6] is false when the value in a is zero, even if it exists. The correct test is $6 in a.
Hence:
awk 'NR==FNR { a[$2]=$3; next } $6 in a {print $0, a[$6] }' file2 file1
However, there might be better approaches, but it is not clear from your specifications. For instance, you say that file2 has over 4 million lines, but it is unknown if there are also that many unique values for field 2. If yes, then a will also have that many entries in memory. And, you don't specify how long file1 is, or if its order must be preserved for output, or if every line (even without matches in file2) should be output.
If it is the case that file1 has many fewer lines than file2 has unique values for field 2, and only matching lines need to be output, and order does not need to be preserved, then you might wish to read file1 first…

Concatenate files based off unique titles in their first column

I have many files that are of two column format with a label in the first column and a number in the second column. The number is positive (never zero):
AGS 3
KET 45
WEGWET 12
FEW 56
Within each file, the labels are not repeated.
I would like to concatenate these many files into one file with many+1 columns, such that the first column includes the unique set of all labels across all files, and the last five columns include the number for each label of each file. If the label did not exist in a certain file (and hence there is no number for it), I would like it to default to zero. For instance, if the second file contains this:
AGS 5
KET 14
KJV 2
FEW 3
then the final output would look like:
AGS 3 5
KET 45 14
WEGWET 12 0
KJV 0 2
FEW 56 3
I am new to Linux, and have been playing around with sed and awk, but realize this probably requires multiple steps...
*Edit note: I had to change it from just 2 files to many files. Even though my example only shows 2 files, I would like to do this in case of >2 files as well. Thank you...
Here is one way using awk:
awk '
NR==FNR {a[$1]=$0;next}
{
print (($1 in a)?a[$1] FS $2: $1 FS "0" FS $2)
delete a[$1]
}
END{
for (x in a) print a[x],"0"
}' file1 file2 | column -t
AGS 3 5
KET 45 14
KJV 0 2
FEW 56 3
WEGWET 12 0
You read file1 in to an array indexed at column 1 and assign entire line as it's value
For the file2, check if column 1 is present in our array. If it is print the value from file1 along with value from file2. If it is not present print 0 as value for file1.
Delete the array element as we go along to get only what was unique in file1.
In the END block print what was unique in file1 and print 0 for file2.
Pipe the output to column -t for pretty format.
Assuming that your data are in files named file1 and file2:
$ awk 'FNR==NR {a[$1]=$2; b[$1]=0; next} {a[$1]+=0; b[$1]=$2} END{for (x in b) {printf "%-15s%3s%3s\n",x,a[x],b[x]}}' file1 file2
KJV 0 2
WEGWET 12 0
KET 45 14
AGS 3 5
FEW 56 3
To understand the above, we have to understand an awk trick.
In awk, NR is the number of records (lines) that have been processed and FNR is the number of records that we have processed in the current file. Consequently, the condition FNR==NR is true only when we are processing in the first file. In this case, the associative array a gets all the values from the first file and associative array b gets placeholder, i.e. zero, values. When we process the second file, its values go in array b and we make sure that array a at least has a placeholder value of zero. When we are done with the second file, the data is printed.
More than two files using GNU Awk
I created a file3:
$ cat file3
AGS 3
KET 45
WEGWET 12
FEW 56
AGS 17
ABC 100
The awk program extended to work with any number of files is:
$ awk 'FNR==1 {n+=1} {a[$1][n]=$2} END{for (x in a) {printf "%-15s",x; for (i=1;i<=n;i++) {printf "%5s",a[x][i]};print ""}}' file1 file2 file3
KJV 2
ABC 100
WEGWET 12 12
KET 45 14 45
AGS 3 5 17
FEW 56 3 56
This code works creates a file counter. We know that we are in a new file every time that FNR is 1 and a counter, n, is incremented. For every line we encounter, we put the data in a 2-D array. The first dimension of a is the label and the second is the number of the file that we encountered it in. In the end, we just loop over all the labels and all the files, from 1 to n and print the data.
More than 2 files without GNU Awk
Without requiring GNU's awk, we can solve the problem using simulated two-dimensional arrays:
$ awk 'FNR==1 {n+=1} {b[$1]=1; a[$1,":",n]=$2} END{for (x in b) {printf "%-15s",x; for (i=1;i<=n;i++) {q=a[x,":",i]+0; printf "%5s",q};print ""}}' file1 file2 file3
KJV 0 2 0
ABC 0 0 100
WEGWET 12 0 12
KET 45 14 45
AGS 3 5 17
FEW 56 3 56

Count the repetitions of an element from a file with awk

I have a one column file composed by only integer as
1
1
4
3
3
2
I want to count how many time a number appear in the file. The output file should be:
1 2
2 1
3 2
4 1
Thanks
try this line:
awk '{a[$0]++}END{for(x in a)print x,a[x]}' file
awk '{tot[$0]++} END{for (n in tot) {print n,tot[n]}} ' numbers