how to calculate the difference of each line with awk? - awk

I want to calculate the difference w/ awk output.
Can anyone help me with this ?
cat x.txt
a 100
b 102
c 110
awk output.
a 100
b 102 2
c 110 8

Try:
awk 'NR>1{$0=$0" "$2-v}{v=$2;print $0}' x.txt
Output:
a 100
b 102 2
c 110 8

Related

Add a new column between the first and second columns with values that are numbers from 1 to 5, repeated as needed to reach total line number

I have a txt file with 2 columns, and would want to add a new column between the two that has values ranging from 1 to 5, and repeated as many times as needed to have the same rows as the other columns. I'm trying to use AWK but I'm open to other suggestions
Example Input
A 100
A 200
A 300
A 400
A 500
B 1000
B 2000
B 3000
B 4000
B 5000
Example output
A 1 100
A 2 200
A 3 300
A 4 400
A 5 500
B 1 1000
B 2 2000
B 3 3000
B 4 4000
B 5 5000
Right now I'm trying
awk 'BEGIN{FS=OFS="\t"}{for (i = 1; i <= 5; ++i) $2 =++i OFS $2}1' $my_data
But clearly is not working.
A simpler awk:
awk '{print $1, ++cnt[$1], $2}' file
A 1 100
A 2 200
A 3 300
A 4 400
A 5 500
B 1 1000
B 2 2000
B 3 3000
B 4 4000
B 5 5000
With modulo-operator %:
awk '{print $1, (NR-1)%5+1, $2}' file
Output:
A 1 100
A 2 200
A 3 300
A 4 400
A 5 500
B 1 1000
B 2 2000
B 3 3000
B 4 4000
B 5 5000
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
awk '{print $1,a[$1]+=1,$2}' file

Replace nth and (n+1)th values in one file with same values from another file

i have two files:
f1.txt:
header 1
header 2
100
100
100
100
100
100
100
100
100
100
100
100
100
f2.txt:
header 1
header 2
10
1234
5678
10
10
2345
6789
10
10
3456
7890
10
10
desired output
f3.txt:
header 1
header 2
100
1234
5678
100
100
2345
6789
100
100
3456
7890
100
100
the values in f2.txt that occur in lines 4 & 5, then 8 & 9, then 12 & 13 (i.e., they're spaced every 6th row), i want to put them inside f1.txt to replace the corresponding rows in f1.txt. how can i do this?
so far, i have only been able to print these values out of f2.txt as such:
exec<f2.txt
var=$(awk 'NR % 6 == 4')
echo "$var"
this produces
1234
2345
3456
then when i change 4 to 5, it gives me the 2nd set of values. so am trying to learn how to extract the 2 sets of values, and then put them in f1.txt? any help will be greatly appreciated. thanks!
Try:
paste f1.txt f2.txt | awk -F'\t' '
NR < 3 || (NR-2)%4 == 1 || (NR-2)%4 == 0 {print $1; next}
{print $2}
'
Your desired output does not indicate groups of 6 lines, but instead groups of 4 lines. Perhaps the 2 header lines are throwing you off.
I'm assuming your input files do not contain tabs.
More concise awk from Ed Morton:
awk -F'\t' '{print (NR-2)%4 < 2 ? $1 : $2}'

Retrieving lines comparing columns from two files

I'm trying to compare 2 tables and retrieve the matches based on two columns:
file 1
0.736 5 100 T
0.723 1 15 T
0.792 6 100 T
0.634 3 100 T
0.754 7 100 T
0.708 2 100 T
0.722 9 100 T
0.542 1 6 T
File 2
0.736 5
0.634 3
0.542 1
output
0.736 5 100 T
0.634 3 100 T
0.542 1 6 T
When I try this code it tells me that awk is not found, which doesnt make sense because I use awk regularly.. Could you help me out spotting the error here please?
awk 'FNR==NR{a[$1,$2]=$0;next}{if(b=a[$1,$2]){print b}}' file1 file2> output
you could use grep
grep -f file2 file1
or awk
awk 'NR==FNR{A[$1];next}$1 in A' file2 file1
Hope this helps :)

Sorting a column based on the value of another

I have a file with the following structure:
Input
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 T:0.966667 A:0.0333333 T
What I would like to do is to change the position of the fifth and sixth column in a way that one column gets the order identical as of the seventh column. You can see in the example. In the seventh column, we have T, C, T and after the change, the sixth column from T, C, A has changed into T, C, T in the output, that is in the third line, the position of the fifth and sixth columns have switched when compared to the seventh column.
Output
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 A:0.0333333 T:0.966667 T
I hope I could explain clearly, I have not been able to find a solution, could you please give me a hint as how to do this?
Thank you in advance.
Using output as tab delimiters and all columns justified.
awk -F'[ :]*' '{if($7 == $9 ) print $1,$2,$3,$4,$5,$6,$7,$8,$9; else print $1,$2,$3,$4,$7,$8,$5,$6,$9}' input.txt|column -t
Output:
1 30923 2 300 G 0.503333 T 0.496667 T
1 51476 2 300 T 0.986667 C 0.0133333 C
1 51479 2 300 A 0.0333333 T 0.966667 T
If I understand correctly, maybe this will work for you?
: file a.awk
substr($6,1,1) == $7 { print }
substr($6,1,1) != $7 { print $1, $2, $3, $4, $6, $5, $7 }
: file a.txt
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 T:0.966667 A:0.0333333 T
bash-3.2$ awk -f a.awk a.txt
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 A:0.0333333 T:0.966667 T

Join multiple files in gawk

I have a large number of files (around 500). Each file contain two columns. The first column is same for every file. I want to join all the files into a single file using gawk.
For example,
File 1
a 123
b 221
c 904
File 2
a 298
b 230
c 102
and so on. I want a final file like as below:
Final file
a 123 298
b 221 230
c 904 102
I have found scripts that can join two files, but I need to join multiple files.
For given sample files:
$ head f*
==> f1 <==
a 123
b 221
c 904
==> f2 <==
a 298
b 230
c 102
==> f3 <==
a 500
b 600
c 700
Method 1:
$ awk '{a[FNR]=((a[FNR])?a[FNR]FS$2:$0)}END{for(i=1;i<=FNR;i++) print a[i]}' f*
a 123 298 500
b 221 230 600
c 904 102 700
Method 2: (Will probably be faster as your are not loading 500 files in memory)
Using paste and awk together. (Assuming first column is same and present in all files). Doing paste f* will give you the following result:
$ paste f*
a 123 a 298 a 500
b 221 b 230 b 600
c 904 c 102 c 700
Pipe that to awk to remove extra columns.
$ paste f* | awk '{printf "%s ",$1;for(i=2;i<=NF;i+=2) printf "%s%s",$i,(i==NF?RS:FS)}'
a 123 298 500
b 221 230 600
c 904 102 700
You can re-direct the output to another file.
I have encountered this problem very frequently.
I strongly encourage you to check into the getline function in gawk.
getline var < filename
is the command syntax and can be used to solve your problem.
I would suggest utilizing another language that solves this problem much more easily. Typically I invest about 5 lines of code to solve this standard problem.
j=1;
j=getline x < "filename";
if(j==0) {
break;
}
... (Commands involving x such as split and print).
You could try something like :
$ ls
f1.txt f2.txt f3.txt
$ awk '($0 !~ /#/){a[$1]=a[$1]" "$2} END {for(i in a){print i""a[i]}}' *.txt
a 123 298 299
b 221 230 231
c 904 102 103
awk 'FNR==NR{arr[$1]=$2; next;}{printf "%s%s%s%s%s",$1,OFS,arr[$1],OFS,$2; print"";}' file1 file2
based on this