I have a data set of:
32
33
34
35
34
32
29
28
27
25
29
32
34
35
36
28
27
28
28
I would like to be able to find out how many numbers in a row are above 32. For example an output like:
5
4
where 5 is the first instance the values are above 32, and 4 is the second instance the values are over 32. I have been trying to do this in awk but so far all I am getting is the collective number i.e. 9 for all value combined above 32.
Any help would be much appreciated.
awk to the rescue! I think your output is not consistent with the input, or I misunderstood the problem. This is computing the chain length of values >31
$ awk '$1>31{c++; next} c{print c; c=0} END{if(c) print c}' file
6
4
END block is required for the case if the last chain contains the last element.
Related
This question already has an answer here:
plot first and last columns (variable number) gnuplot
(1 answer)
Closed 2 years ago.
I wish to plot the second, third, and fourth from the rightmost column using Gnuplot. In awk, we can use ($NF-1). But in Gnuplot, not sure how can I designate a column from the rightmost column with 'using'.
Is this possible to use awk in Gnuplot to plot 3rd from the right column vs 4th from the right column? Or is this something that we must need to use shell script?
I have a lot of long text files to plot, so I cannot create new text files to rewrite the file using awk and then use Gnuplot. That would be too time-consuming. I wish to use Gnuplot to make plots from 2nd, 3rd, and 4th from the right.
No need for awk. If you do stats you could limit it to one row with every ::0::0. It should be pretty fast. Try the following complete example:
Code:
### plotting columns from right
reset session
$Data <<EOD
11 21 31 41 51 61 71
12 22 32 42 52 62 72
13 23 33 43 53 63 73
14 24 34 44 54 64 74
15 25 35 45 55 65 75
16 26 36 46 56 66 76
17 27 37 47 57 67 77
EOD
stats $Data u 0 every ::0::0 nooutput
ColMax = STATS_columns
ColFromRight(col) = column(ColMax-col+1)
plot $Data u (ColFromRight(3)):(ColFromRight(4)) w lp pt 7
### end of code
Result:
you can use STATS_columns for the number of columns and use it in your plot
e.g.
nf = int(STATS_columns)
plot data.dat using 1:nf-4
I need to read a file and store column 1 and 4, look in a second file using column one and store column 4 of the second file and then do a subtraction with between column 04 of file 01 and column 04 of file 2 . Can you help me? Column 04 is in seconds.
The two files contain the following headers.
ID, origin, destination, time
I need to get the first ID in file 1, and look in file 2.
For example, take ID 37 from file 1 and look at file 2. When I find it, I need the ID 37 time in the first file to be subtracted from the ID 37 time in file 2
I need the sum of subtraction times.
Wondering if awk is right solution
File 01
37 33 44 602.04
39 32 13 602.20
File 02
37 44 44 602.184852493
39 13 13 602.263704529
Output
0,2
One possibility to consider is splitting the task up into two parts - joining the two files based on that common field, and then doing the math. It avoids having to store part of every line from one file in memory all at once, which is nice if they're big.
The following assumes that a) the files are sorted based on the first column, b) that tabs are used to separate the columns:
$ join -j1 -o '1.4 2.4' file1.txt file2.txt | awk '{total+=$2-$1} END {print total}'
0.208557
The join command merges the two files on common lines and prints out just the numbers you want to subtract, which are piped to awk to do the actual math.
Edit: Or all in awk:
$ awk 'NR==FNR { f1[$1]=$4; next }
$1 in f1 { total += $4 - f1[$1] }
END { print total }' file1.txt file2.txt
0.208557
this stores the ids and times from the first file in an associative array, and then for each line in file 2, if that line's id exists in the array, add the difference of times to the total. Finally, print the total after reading all of the file.
f1.col4 - f2.col4:
awk 'NR==FNR{a[$1]=$4;next}{$4=a[$1]?a[$1]-$4:$4}7' f1 f2
The output looks like:
37 44 44 -0.144852
39 13 13 -0.0637045
41 44 44 -0.0642587
44 13 13 -0.0196296
45 44 44 -0.0145357
47 13 13 -0.014259
If you want the f2.col4 - f1.col4, use $4-a[$1] in the above code, you get:
37 44 44 0.144852
39 13 13 0.0637045
41 44 44 0.0642587
44 13 13 0.0196296
45 44 44 0.0145357
47 13 13 0.0142594
I have a file separated by \t.
header text with many lines
V F A B
10 30 26 42
14 33 25 45
16 32 23 43
18 37 22 48
I want to change the 3rd column by the 4th and vice versa. I'm using
awk '
BEGIN {
RS = "\n";
OFS="\t";
record=0;
};
record {
a = $4;
$4 = $3;
$3 = a;
};
$1=="V" {
record=1
};
{
print $0
};
'
}
Instead of just changing the position of the columns, column 3 also has the line break of the original 4th column:
header text with many lines
V F A B
10 30 42
26
14 33 45
25
16 32 43
23
18 37 48
22
How can I prevent this in order to get?
header text with many lines
V F A B
10 30 42 26
14 33 45 25
16 32 43 23
18 37 48 22
Could you please try following, using usual method of storing 1 field's value to a variable and then exchanging the value of 4th field to 3rd field, at last putting 4th field value as variable value(could say swapping values using a variable).
awk 'FNR==1{print;next} {val=$3;$3=$4;$4=val} 1' OFS="\t" Input_file
Or, this messy sed:
sed -E 's/([[:digit:]]+)([[:blank:]]+)([[:digit:]]+)([[:space:]]*)$/\3\2\1\4/' file
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# 3rd column tab 4th column optional whitespce
I have the following file:
61 12451
61 13451
61 14451
61 15415
12 48469
12 78456
12 47845
32 45778
32 48745
32 47845
32 52448
32 87451
The output I want is the following, for example, 61 s are replaced by 1 as they are the first occurrence and they are repeated 4 times, then the second column goes from 2 to 5, as these are pairwise comparisons, 1 to 1 is ignored, but the second column should start from 2, so on for the rest.
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
3 6
3 7
3 8
Any suggestion on how to achieve this with AWK? Thanks!
It could be written in one awk command like this
awk '{a[NR]=$1;b[NR]=$2;c[NR]=$1;d[NR]=$2} END {for(i=1; i<=NR; i++){if(i==1){c[i]=1;d[i]=2}else if(a[i]==a[i-1]){c[i]=c[i-1];d[i]=1+d[i-1]}else{c[i]=1+c[i-1];d[i]=c[i]+1}print c[i],d[i]}}' pairwise.txt > output.txt
Here a and b are the arrays that read the first and second column of the file. The new values are stored in arrays c and d as first & second column and are printed to the output file.
not sure if this one-liner helps:
awk '$1!=p{++i;j=i+1}{print i,j++;p=$1}' file
at least it gives the desired output.
I have three files!
coord.xvg
veloc.xvg
force.xvg
each of these files have lines with multiple numbers lets say 10000
I would like to construct a script that
opens the three files
reads columns
and make arithmetic operations between them for every line.
For example
if every file has 4 words
coord.xvg >> Time x y z
veloc.xvg >> Time vx vy vz
force.xvg >> Time fx fy fz
and c,v,f stands for coord.xvg, veloc.xvg,force.xvg
if I write the operation 2*v*v+c*f*c the output should be
column1 Column2 Column3 Column4
Time 2*vx*vx+cx*fx*cx 2*vy*vy+cy*fy*cy 2*vz*vz+cz*fz*cz
I have found in the internet the following
awk '{
{ getline < "coord.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) c=$i}
{ getline < "veloc.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) v=$i}
{ getline < "force.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) f=$i}
}
END {print c+v+f}' coord.xvg
which stands for my files which I want to begin reading after 90307 lines.
but it didn't help me much as it returns only the last values of every variable
Any thought??
Something to get you started if I understood you correctly
$ cat *.xvg
Time 1 2 3
Time 4 5 6
Time 7 8 9
Time 10 11 12
Time 13 14 15
Time 16 17 18
Time 19 20 21
Time 22 23 24
Time 25 26 27
The following awk-script
{ if (FNR>=1) {
{ getline < "coord.xvg" ; c1=$2;c2=$3;c3=$4}
{ getline < "veloc.xvg" ; v1=$2;v2=$3;v3=$4}
{ getline < "force.xvg" ; f1=$2;f2=$3;f3=$4}
print c1,c2,c3,v1,v2,v3,f1,f2,f3
print $1, c1+v1+f1, c2+v2+f2, c3+v3+f3
}}
reads a line from each of the files and puts the data in variables
as can be seen here
$ awk -f s.awk coord.xvg
1 2 3 19 20 21 10 11 12
Time 30 33 36
4 5 6 22 23 24 13 14 15
Time 39 42 45
7 8 9 25 26 27 16 17 18
Time 48 51 54
The if (FNR>=1) part controls which lines are displayed. Counting starts at 1, change this to match your need. The actual calculations I leave to you :-)