I need to read a file and store column 1 and 4, look in a second file using column one and store column 4 of the second file and then do a subtraction with between column 04 of file 01 and column 04 of file 2 . Can you help me? Column 04 is in seconds.
The two files contain the following headers.
ID, origin, destination, time
I need to get the first ID in file 1, and look in file 2.
For example, take ID 37 from file 1 and look at file 2. When I find it, I need the ID 37 time in the first file to be subtracted from the ID 37 time in file 2
I need the sum of subtraction times.
Wondering if awk is right solution
File 01
37 33 44 602.04
39 32 13 602.20
File 02
37 44 44 602.184852493
39 13 13 602.263704529
Output
0,2
One possibility to consider is splitting the task up into two parts - joining the two files based on that common field, and then doing the math. It avoids having to store part of every line from one file in memory all at once, which is nice if they're big.
The following assumes that a) the files are sorted based on the first column, b) that tabs are used to separate the columns:
$ join -j1 -o '1.4 2.4' file1.txt file2.txt | awk '{total+=$2-$1} END {print total}'
0.208557
The join command merges the two files on common lines and prints out just the numbers you want to subtract, which are piped to awk to do the actual math.
Edit: Or all in awk:
$ awk 'NR==FNR { f1[$1]=$4; next }
$1 in f1 { total += $4 - f1[$1] }
END { print total }' file1.txt file2.txt
0.208557
this stores the ids and times from the first file in an associative array, and then for each line in file 2, if that line's id exists in the array, add the difference of times to the total. Finally, print the total after reading all of the file.
f1.col4 - f2.col4:
awk 'NR==FNR{a[$1]=$4;next}{$4=a[$1]?a[$1]-$4:$4}7' f1 f2
The output looks like:
37 44 44 -0.144852
39 13 13 -0.0637045
41 44 44 -0.0642587
44 13 13 -0.0196296
45 44 44 -0.0145357
47 13 13 -0.014259
If you want the f2.col4 - f1.col4, use $4-a[$1] in the above code, you get:
37 44 44 0.144852
39 13 13 0.0637045
41 44 44 0.0642587
44 13 13 0.0196296
45 44 44 0.0145357
47 13 13 0.0142594
Related
This question already has an answer here:
plot first and last columns (variable number) gnuplot
(1 answer)
Closed 2 years ago.
I wish to plot the second, third, and fourth from the rightmost column using Gnuplot. In awk, we can use ($NF-1). But in Gnuplot, not sure how can I designate a column from the rightmost column with 'using'.
Is this possible to use awk in Gnuplot to plot 3rd from the right column vs 4th from the right column? Or is this something that we must need to use shell script?
I have a lot of long text files to plot, so I cannot create new text files to rewrite the file using awk and then use Gnuplot. That would be too time-consuming. I wish to use Gnuplot to make plots from 2nd, 3rd, and 4th from the right.
No need for awk. If you do stats you could limit it to one row with every ::0::0. It should be pretty fast. Try the following complete example:
Code:
### plotting columns from right
reset session
$Data <<EOD
11 21 31 41 51 61 71
12 22 32 42 52 62 72
13 23 33 43 53 63 73
14 24 34 44 54 64 74
15 25 35 45 55 65 75
16 26 36 46 56 66 76
17 27 37 47 57 67 77
EOD
stats $Data u 0 every ::0::0 nooutput
ColMax = STATS_columns
ColFromRight(col) = column(ColMax-col+1)
plot $Data u (ColFromRight(3)):(ColFromRight(4)) w lp pt 7
### end of code
Result:
you can use STATS_columns for the number of columns and use it in your plot
e.g.
nf = int(STATS_columns)
plot data.dat using 1:nf-4
I have a data set of:
32
33
34
35
34
32
29
28
27
25
29
32
34
35
36
28
27
28
28
I would like to be able to find out how many numbers in a row are above 32. For example an output like:
5
4
where 5 is the first instance the values are above 32, and 4 is the second instance the values are over 32. I have been trying to do this in awk but so far all I am getting is the collective number i.e. 9 for all value combined above 32.
Any help would be much appreciated.
awk to the rescue! I think your output is not consistent with the input, or I misunderstood the problem. This is computing the chain length of values >31
$ awk '$1>31{c++; next} c{print c; c=0} END{if(c) print c}' file
6
4
END block is required for the case if the last chain contains the last element.
I have a file named input.txt which contains students data in StudentName # ClassName #SchoolName # Subject1Marks # Subject2Marks format.
Shriii#First#ADCET#95#90
Chaitraliii#Second#ADCET#80#75
Shubhangi#First#ADCET#75#70
Tushar#Second#RIT#80#79
Prathamesh#First#RIT#88#63
The output should contains record of students whose average is more than 90. I have to show the average in new column.
The expected output is Shriii|First|ADCET|95|90|92.5 I tried various ways but was not able to generate expected output. I tried Link1 Link2
awk to the rescue!
$ awk -F# -v OFS='|' '{$(NF+1)=($(NF-1)+$NF)/2}1' file
Shriii|First|ADCET|95|90|92.5
Chaitraliii|Second|ADCET|80|75|77.5
Shubhangi|First|ADCET|75|70|72.5
Tushar|Second|RIT|80|79|79.5
Prathamesh|First|RIT|88|63|75.5
Sukrut|Second|KIT|91|90|90.5
or pipe to column to get pretty output
... | column -ts'|'
Shriii First ADCET 95 90 92.5
Chaitraliii Second ADCET 80 75 77.5
Shubhangi First ADCET 75 70 72.5
Tushar Second RIT 80 79 79.5
Prathamesh First RIT 88 63 75.5
Sukrut Second KIT 91 90 90.5
I have a file separated by \t.
header text with many lines
V F A B
10 30 26 42
14 33 25 45
16 32 23 43
18 37 22 48
I want to change the 3rd column by the 4th and vice versa. I'm using
awk '
BEGIN {
RS = "\n";
OFS="\t";
record=0;
};
record {
a = $4;
$4 = $3;
$3 = a;
};
$1=="V" {
record=1
};
{
print $0
};
'
}
Instead of just changing the position of the columns, column 3 also has the line break of the original 4th column:
header text with many lines
V F A B
10 30 42
26
14 33 45
25
16 32 43
23
18 37 48
22
How can I prevent this in order to get?
header text with many lines
V F A B
10 30 42 26
14 33 45 25
16 32 43 23
18 37 48 22
Could you please try following, using usual method of storing 1 field's value to a variable and then exchanging the value of 4th field to 3rd field, at last putting 4th field value as variable value(could say swapping values using a variable).
awk 'FNR==1{print;next} {val=$3;$3=$4;$4=val} 1' OFS="\t" Input_file
Or, this messy sed:
sed -E 's/([[:digit:]]+)([[:blank:]]+)([[:digit:]]+)([[:space:]]*)$/\3\2\1\4/' file
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# 3rd column tab 4th column optional whitespce
I have file1 as a result of a first operation, it has the following structure
201 12 0.298231 8.8942
206 13 -0.079795 0.6367
101 34 0.86348 0.7456
301 15 0.215355 4.6378
303 16 0.244734 5.9895
and file2 as a result of a different operation and has the same type of structure.
File 2 sample
204 60 -0.246038 6.0535
304 83 -0.246209 6.0619
101 34 -0.456629 6.0826
211 36 -0.247003 6.1011
305 83 -0.247134 6.1075
206 46 -0.247485 6.1249
210 39 -0.248066 6.1537
107 41 -0.248201 6.1603
102 20 -0.248542 6.1773
I would like to select fields 1 and 2 that have a field 3 value higher than a threshold in file1 (0.8) , then for these selected values of field 1 and 2, select the values that have a field 3 value higher than another threshold in file 2 (abs(x)=0.4).
Note that although files 1 and 2 have the same structure fields 1 and 2 values are not the same (not the same number of lines etc..)
Can you do this with awk?
desired output
101 34
If you combine awk with unix commands you can do the following
sort file1.txt > sorted1.txt
sort file2.txt > sorted2.txt
Sorting will allow you to use JOIN on the first line (which I assume is unique). Now field 3 of file1 is $3 and file2 is $6. Using awk you can write the following.:
join sorted1.txt sorted2.txt | awk 'function abs(value){return (value<0?-value:value);}{print $1"\t"$2} $3 >=0.8 && abs($6) >=0.4'
In essence, in the awk you first write a function to deal with absolute values, then you simply ask it to print line 1 and 2 selecting for the criteria you detailed at $3 and $6 (formely field 3 of file1 and file2 respectively)
Hope this helps...