I have a file separated by \t.
header text with many lines
V F A B
10 30 26 42
14 33 25 45
16 32 23 43
18 37 22 48
I want to change the 3rd column by the 4th and vice versa. I'm using
awk '
BEGIN {
RS = "\n";
OFS="\t";
record=0;
};
record {
a = $4;
$4 = $3;
$3 = a;
};
$1=="V" {
record=1
};
{
print $0
};
'
}
Instead of just changing the position of the columns, column 3 also has the line break of the original 4th column:
header text with many lines
V F A B
10 30 42
26
14 33 45
25
16 32 43
23
18 37 48
22
How can I prevent this in order to get?
header text with many lines
V F A B
10 30 42 26
14 33 45 25
16 32 43 23
18 37 48 22
Could you please try following, using usual method of storing 1 field's value to a variable and then exchanging the value of 4th field to 3rd field, at last putting 4th field value as variable value(could say swapping values using a variable).
awk 'FNR==1{print;next} {val=$3;$3=$4;$4=val} 1' OFS="\t" Input_file
Or, this messy sed:
sed -E 's/([[:digit:]]+)([[:blank:]]+)([[:digit:]]+)([[:space:]]*)$/\3\2\1\4/' file
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# 3rd column tab 4th column optional whitespce
Related
I have a text file with a spacial format.
After the top "N" rows, the file will have a 7 column row ans then there will be "X" rows (X is the value from column number 6 in this 7 column row). Then there will be another row with 7 column and it will have further "Y" sub-rows (Y is the value from column number 6 in this 7 column row). and it occurance of rows will go upto some fixed numbers, say 40.
En example is here
(I am skipping top few rows).
2.857142857143E-01 2.857142857143E-01-2.857142857143E-01 1 1533 9 1.0
1 -3.52823873905418
2 -3.52823873905417
3 -1.77620635653680
4 -1.77620635653680
5 -1.77620570068355
6 -1.77620570068354
7 -1.77620570066112
8 -1.77620570066112
9 -1.60388273192418
1.428571428571E-01 1.428571428571E-01-1.428571428571E-01 2 1506 14 8.0
1 -3.52823678441811
2 -3.52823678441810
3 -1.77620282216865
4 -1.77620282216865
5 -1.77619365786042
6 -1.77619365786042
7 -1.77619324280126
8 -1.77619324280125
9 -1.60387130881086
10 -1.60387130881086
11 -1.60387074066972
12 -1.60387074066972
13 -1.51340357895078
14 -1.51340357895078
1.000000000000E+00 4.285714285714E-01-1.428571428571E-01 20 1524 51 24.0
1 -3.52823580096110
2 -3.52823580096109
3 -1.77624472106293
4 -1.77624472106293
5 -1.77623455229910
6 -1.77623455229909
7 -1.77620473017160
8 -1.77620473017159
9 -1.60387169115834
10 -1.60387169115834
11 -1.60386634866654
12 -1.60386634866654
13 -1.51340851656332
14 -1.51340851656332
15 -1.51340086887553
16 -1.51340086887553
17 -1.51321967923767
18 -1.51321967923766
19 -1.40212716813452
20 -1.40212716813451
21 -1.40187887062753
22 -1.40187887062753
23 -0.749391485667459
24 -0.749391485667455
25 -0.740712218931955
26 -0.740712218931954
27 -0.714030906779278
28 -0.714030906779278
29 -0.689087278411268
30 -0.689087278411265
31 -0.687054399753234
32 -0.687054399753233
33 -0.677686868127079
34 -0.677686868127075
35 -0.405343895324740
36 -0.405343895324739
37 -0.404786479693490
38 -0.404786479693488
39 -0.269454266134757
40 -0.269454266134755
41 -0.267490250650300
42 -0.267490250650296
43 -0.262198373307171
44 -0.262198373307170
45 -0.260912148881762
46 -0.260912148881761
47 -9.015623907768122E-002
48 -9.015623907767983E-002
49 0.150591609452852
50 0.150591609452856
51 0.201194203960446
I want to grep a particular number from my text file and to do so, I use
awk -v c=2 -v t=$GREP 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' case.dat
Here $GREP is 0.2011942 which prints the last row (it will change according to different file)
51 0.201194203960446
I want to print the header row also with this number, i.e., my script should print,
51 0.201194203960446
1.000000000000E+00 4.285714285714E-01-1.428571428571E-01 20 1524 51 24.0.
How can I print this header row of the grepped numbers?
I have idea, but I could not implement it in script format.
Simply, grep the number using my script and print the first row before this number that have 7 columns.
This may be what you're looking for
awk -v t="$GREP" '
BEGIN { sub("\\.", "\\.", t) }
NF > 2 { header=$0; next }
NF == 2 && $2 ~ t { printf("%s %s\n%s\n", $1, $2, header) }
' file
You can replace the NF > 2 with NF == 7 if you want the strictly seven-column header to be printed (that header contains 6 columns in your sample data, not 7).
Update after the comment "Can you please modify my script so that it should grep upto 13 decimal number":
awk -v t="$GREP" '
BEGIN { if (match(t, "\\.")) {
t = substr(t, 1, RSTART + 13)
sub("\\.", "\\.", t)
}
}
NF > 2 { header=$0; next }
NF == 2 && $2 ~ t { printf("%s %s\n%s\n", $1, $2, header) }
' file
I have a data set of:
32
33
34
35
34
32
29
28
27
25
29
32
34
35
36
28
27
28
28
I would like to be able to find out how many numbers in a row are above 32. For example an output like:
5
4
where 5 is the first instance the values are above 32, and 4 is the second instance the values are over 32. I have been trying to do this in awk but so far all I am getting is the collective number i.e. 9 for all value combined above 32.
Any help would be much appreciated.
awk to the rescue! I think your output is not consistent with the input, or I misunderstood the problem. This is computing the chain length of values >31
$ awk '$1>31{c++; next} c{print c; c=0} END{if(c) print c}' file
6
4
END block is required for the case if the last chain contains the last element.
I need to read a file and store column 1 and 4, look in a second file using column one and store column 4 of the second file and then do a subtraction with between column 04 of file 01 and column 04 of file 2 . Can you help me? Column 04 is in seconds.
The two files contain the following headers.
ID, origin, destination, time
I need to get the first ID in file 1, and look in file 2.
For example, take ID 37 from file 1 and look at file 2. When I find it, I need the ID 37 time in the first file to be subtracted from the ID 37 time in file 2
I need the sum of subtraction times.
Wondering if awk is right solution
File 01
37 33 44 602.04
39 32 13 602.20
File 02
37 44 44 602.184852493
39 13 13 602.263704529
Output
0,2
One possibility to consider is splitting the task up into two parts - joining the two files based on that common field, and then doing the math. It avoids having to store part of every line from one file in memory all at once, which is nice if they're big.
The following assumes that a) the files are sorted based on the first column, b) that tabs are used to separate the columns:
$ join -j1 -o '1.4 2.4' file1.txt file2.txt | awk '{total+=$2-$1} END {print total}'
0.208557
The join command merges the two files on common lines and prints out just the numbers you want to subtract, which are piped to awk to do the actual math.
Edit: Or all in awk:
$ awk 'NR==FNR { f1[$1]=$4; next }
$1 in f1 { total += $4 - f1[$1] }
END { print total }' file1.txt file2.txt
0.208557
this stores the ids and times from the first file in an associative array, and then for each line in file 2, if that line's id exists in the array, add the difference of times to the total. Finally, print the total after reading all of the file.
f1.col4 - f2.col4:
awk 'NR==FNR{a[$1]=$4;next}{$4=a[$1]?a[$1]-$4:$4}7' f1 f2
The output looks like:
37 44 44 -0.144852
39 13 13 -0.0637045
41 44 44 -0.0642587
44 13 13 -0.0196296
45 44 44 -0.0145357
47 13 13 -0.014259
If you want the f2.col4 - f1.col4, use $4-a[$1] in the above code, you get:
37 44 44 0.144852
39 13 13 0.0637045
41 44 44 0.0642587
44 13 13 0.0196296
45 44 44 0.0145357
47 13 13 0.0142594
I have a file first.txt that looks like this :
45
56
74
62
I want to append this file to second.tsv that looks like this(there are 17 columns) :
2 a ...
3 b ...
5 c ...
6 d ...
The desired output is :
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
How can I append to the second column?
I've tried
awk -F, '{getline f1 <"first.txt" ;print $1,f1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' second.tsv
but did not work. This added the columns of first.txt to the last column of second.tsv, and it was not tab separated.
Thank you.
Your code works if you remove the -F, bit. This tells awk that the file is comma-separated, which it is not.
Another option would be to go for a piped version with paste, e.g.:
paste first.tsv second.tsv | awk '{ t=$2; $2=$1; $1=t } 1' OFS='\t'
Output:
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
$ awk 'NR==FNR{a[FNR]=$0;next} {$1=$1 OFS a[FNR]} 1' file1 file2
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
If your files are tab-separated add BEGIN{FS=OFS="\t"} at the front.
I have three files!
coord.xvg
veloc.xvg
force.xvg
each of these files have lines with multiple numbers lets say 10000
I would like to construct a script that
opens the three files
reads columns
and make arithmetic operations between them for every line.
For example
if every file has 4 words
coord.xvg >> Time x y z
veloc.xvg >> Time vx vy vz
force.xvg >> Time fx fy fz
and c,v,f stands for coord.xvg, veloc.xvg,force.xvg
if I write the operation 2*v*v+c*f*c the output should be
column1 Column2 Column3 Column4
Time 2*vx*vx+cx*fx*cx 2*vy*vy+cy*fy*cy 2*vz*vz+cz*fz*cz
I have found in the internet the following
awk '{
{ getline < "coord.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) c=$i}
{ getline < "veloc.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) v=$i}
{ getline < "force.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) f=$i}
}
END {print c+v+f}' coord.xvg
which stands for my files which I want to begin reading after 90307 lines.
but it didn't help me much as it returns only the last values of every variable
Any thought??
Something to get you started if I understood you correctly
$ cat *.xvg
Time 1 2 3
Time 4 5 6
Time 7 8 9
Time 10 11 12
Time 13 14 15
Time 16 17 18
Time 19 20 21
Time 22 23 24
Time 25 26 27
The following awk-script
{ if (FNR>=1) {
{ getline < "coord.xvg" ; c1=$2;c2=$3;c3=$4}
{ getline < "veloc.xvg" ; v1=$2;v2=$3;v3=$4}
{ getline < "force.xvg" ; f1=$2;f2=$3;f3=$4}
print c1,c2,c3,v1,v2,v3,f1,f2,f3
print $1, c1+v1+f1, c2+v2+f2, c3+v3+f3
}}
reads a line from each of the files and puts the data in variables
as can be seen here
$ awk -f s.awk coord.xvg
1 2 3 19 20 21 10 11 12
Time 30 33 36
4 5 6 22 23 24 13 14 15
Time 39 42 45
7 8 9 25 26 27 16 17 18
Time 48 51 54
The if (FNR>=1) part controls which lines are displayed. Counting starts at 1, change this to match your need. The actual calculations I leave to you :-)