awk - writing values from file - awk

How to write in awk a script that will write 10 values from the input file plase? Thank you.
I tried this:
BEGIN
$2 == 0 && $3 == 2 { print $7}
$2 == 0 && $3 == 4 { print $7}
$2 == 0 && $3 == 5 { print $7}
$2 == 2 && $3 == 2 { print $7}
$2 == 2 && $3 == 4 { print $7}
$2 == 2 && $3 == 5 { print $7}
$2 == 3 && $3 == 2 { print $7}
$2 == 3 && $3 == 4 { print $7}
$2 == 3 && $3 == 5 { print $7}
$1 == "achil" { print $3}
Should I write everything in one row?
When is it necessary to write BEGIN in code and when not?
Input file is:
achil 1 197524.72437205614 197524.72437205614 0.43066284286002637
o 0 1 0 1 1 5.732821000
o 0 2 0 1 1 54002.804084586
o 0 3 0 1 1 0.088300000
o 0 4 0 1 1 150.924210421
o 0 5 0 1 1 108.520740945
o 0 6 0 1 1 0.380000000
o 0 7 0 1 1 0.004220000
o 0 8 0 1 1 0.000000000
o 0 9 0 1 1 0.000000000
o 0 10 0 1 1 0.000000000
o 0 11 0 1 1 0.000000000
o 2 1 0 1 1 73413.000000000
o 2 2 0 1 1 36176.166543543
o 2 3 0 1 1 0.560000000
o 2 4 0 1 1 229.480202654
o 2 5 0 1 1 7.032947038
o 2 6 0 1 1 0.480000000
o 2 7 0 1 1 0.000000000
o 2 8 0 1 1 0.000000000
o 2 9 0 1 1 0.000000000
o 2 10 0 1 1 0.000000000
o 2 11 0 1 1 0.000000000
o 3 1 0 1 1 365.256360000
o 3 2 0 1 1 51550.294729034
o 3 3 0 1 1 0.016710220
o 3 4 0 1 1 299.430719769
o 3 5 0 1 1 0.001070537
o 3 6 0 1 1 0.000036626
o 3 7 0 1 1 0.000009111
o 3 8 0 1 1 0.000000000
o 3 9 0 1 1 0.000000000
o 3 10 0 1 1 0.000000000
o 3 11 0 1 1 0.000000000
I would like an output consisting of these 10 numbers:
197524.72437205614
54002.804084586
150.924210421
108.520740945
36176.166543543
229.480202654
7.032947038
51550.294729034
299.430719769
0.001070537

The script you have is just about right, but you don't need the BEGIN clause when using it as a script. Because any actions you include in the BEGIN clause gets executed before any of the input lines are processed. For example consider a case when you had to print a title for your output, you can just print it as
BEGIN { print "my-title-string-in-double quotes" }
Writing it one line or in multiple lines is purely a matter of style and what you have looks neat and much readable. So all you need to do now is define a awk script with content as
#!/usr/bin/awk -f
$2 == 0 && $3 == 2 { print $7}
$2 == 0 && $3 == 4 { print $7}
$2 == 0 && $3 == 5 { print $7}
$2 == 2 && $3 == 2 { print $7}
$2 == 2 && $3 == 4 { print $7}
$2 == 2 && $3 == 5 { print $7}
$2 == 3 && $3 == 2 { print $7}
$2 == 3 && $3 == 4 { print $7}
$2 == 3 && $3 == 5 { print $7}
$1 == "achil" { print $3}
Add execute permissions to it,
chmod +x script.awk
and run it as
awk -f script.awk input-file
But that said, your conditions could very well simply be written using pattern matching operators as
$2 ~ /^(0|2|3)$/ && $3 ~ /^(2|4|5)$/ { print $7; next } $1 == "achil" { print $3 }

Related

Using awk to count number of row group

I have a data set: (file.txt)
X Y
1 a
2 b
3 c
10 d
11 e
12 f
15 g
20 h
25 i
30 j
35 k
40 l
41 m
42 n
43 o
46 p
I want to add two columns which are Up10 and Down10,
Up10: From (X) to (X-10) count of row.
Down10 : From (X) to (X+10)
count of row
For example:
X Y Up10 Down10
35 k 3 5
For Up10; 35-10 X=35 X=30 X=25 Total = 3 row
For Down10; 35+10 X=35 X=40 X=41 X=42 X=42 Total = 5 row
Desired Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
3 c 3 4
10 d 4 5
11 e 5 4
12 f 5 3
15 g 4 3
20 h 5 3
25 i 3 3
30 j 3 3
35 k 3 5
40 l 3 5
41 m 3 4
42 n 4 3
43 o 5 2
46 p 5 1
This is the Pierre François' solution: Thanks again #Pierre François
awk '
BEGIN{OFS="\t"; print "X\tY\tUp10\tDown10"}
(NR == FNR) && (FNR > 1){a[$1] = $1 + 0}
(NR > FNR) && (FNR > 1){
up = 0; upl = $1 - 10
down = 0; downl = $1 + 10
for (i in a) { i += 0 # tricky: convert i to integer
if ((i >= upl) && (i <= $1)) {up++}
if ((i >= $1) && (i <= downl)) {down++}
}
print $1, $2, up, down;
}
' file.txt file.txt > file-2.txt
But when i use this command for 13GB data, it takes too long.
I have used this way for 13GB data again:
awk 'BEGIN{ FS=OFS="\t" }
NR==FNR{a[NR]=$1;next} {x=y=FNR;while(--x in a&&$1-10<a[x]){} while(++y in a&&$1+10>a[y]){} print $0,FNR-x,y-FNR}
' file.txt file.txt > file-2.txt
When file-2.txt reaches 1.1GB it is frozen. I am waiting several hours, but i can not see finish of command and final output file.
Note: I am working on Gogole cloud. Machine type
e2-highmem-8 (8 vCPUs, 64 GB memory)
A single pass awk that keeps the sliding window of 10 last records and uses that to count the ups and downs. For symmetricy's sake there should be deletes in the END but I guess a few extra array elements in memory isn't gonna make a difference:
$ awk '
BEGIN {
FS=OFS="\t"
}
NR==1 {
print $1,$2,"Up10","Down10"
}
NR>1 {
a[NR]=$1
b[NR]=$2
for(i=NR-9;i<=NR;i++) {
if(a[i]>=a[NR]-10&&i>=2)
up[NR]++
if(a[i]<=a[NR-9]+10&&i>=2)
down[NR-9]++
}
}
NR>10 {
print a[NR-9],b[NR-9],up[NR-9],down[NR-9]
delete a[NR-9]
delete b[NR-9]
delete up[NR-9]
delete down[NR-9]
}
END {
for(nr=NR+1;nr<=NR+9;nr++) {
for(i=nr-9;i<=nr;i++)
if(a[i]<=a[nr-9]+10&&i>=2&&i<=NR)
down[nr-9]++
print a[nr-9],b[nr-9],up[nr-9],down[nr-9]
}
}' file
Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
...
35 k 3 5
...
43 o 5 2
46 p 5 1
Another single pass approach with a sliding window
awk '
NR == 1 { next } # skip the header
NR == 2 { min = max = cur = 1; X[cur] = $1; Y[cur] = $2; next }
{ X[++max] = $1; Y[max] = $2
if (X[cur] >= $1 - 10) next
for (; X[cur] + 10 < X[max]; ++cur) {
for (; X[min] < X[cur] - 10; ++min) {
delete X[min]
delete Y[min]
}
print X[cur], Y[cur], cur - min + 1, max - cur
}
}
END {
for (; cur <= max; ++cur) {
for (; X[min] < X[cur] - 10; ++min);
for (i = max; i > cur && X[cur] + 10 < X[i]; --i);
print X[cur], Y[cur], cur - min + 1, i - cur + 1
}
}
' file
The script assumes the X column is ordered numerically.

Using awk, how to average numbers in column between two strings in a text file

A text file containing multiple tabular delimited columns between strings with an example below.
Code 1 (3)
5 10 7 1 1
6 10 9 1 1
7 10 10 1 1
Code 2 (2)
9 11 3 1 3
10 8 5 2 1
Code 3 (1)
12 10 2 1 1
Code 4 (2)
14 8 1 1 3
15 8 7 5 1
I would like to average the numbers in the third column for each code block. The example below is what the output should look like.
8.67
4
2
4
Attempt 1
awk '$3~/^[[:digit:]]/ {i++; sum+=$3; print $3} $3!~/[[:digit:]]/ {print sum/i; sum=0;i=0}' in.txt
Returned fatal: division by zero attempted.
Attempt 2
awk -v OFS='\t' '/^Code/ { if (NR > 1) {i++; sum+=$3;} {print sum/i;}}' in.txt
Returned another division by zero error.
Attempt 3
awk -v OFS='\t' '/^Code/ { if (NR > 1) { print s/i; s=0; i=0; } else { s += $3; i += 1; }}' in.txt
Returned 1 value: 0.
Attempt 4
awk -v OFS='\t' '/^Code/ {
if (NR > 1)
i++
print sum += $3/i
}
END {
i++
print sum += $3/i
}'
Returned:
0
0
0
0.3
I am not sure where that last number is coming from, but this has been the closest solution so far. I am getting a number for each block, but not the average.
Could you please try following.
awk '
/^Code/{
if(value!=0 && value){
print sum/value
}
sum=value=""
next
}
{
sum+=$NF;
value++
}
END{
if(value!=0 && value){
print sum/value
}
}
' Input_file

replacing associative array indexes with their value using awk or sed

I would like to replace column values of ref using key value pairs from id
cat id:
[1] a 8-23
[2] g 8-21
[3] d 8-13
cat ref:
a 1 2
b 3 4
c 5 3
d 1 2
e 3 1
f 1 2
g 2 3
desired output
8-23 1 2
b 3 4
c 5 3
8-13 1 2
e 3 1
f 1 2
8-21 2 3
I assume it would be best done using awk.
cat replace.awk
BEGIN { OFS="t" }
NR==FNR {
a[$2]=$3; next
}
$1 in !{!a[#]} {
print $0
}
Not sure what I need to change?
$1 in !{!a[#]} is not awk syntax. You just need $1 in a:
BEGIN { OFS='\t' }
NR==FNR {
a[$2] = $3
next
}
{
$1 = ($1 in a) ? a[$1] : $1
print
}
to force OFS to update, this version always assigns to $1
print uses $0 if unspecified

Remove data using AWK

I have a file of values that I wish to plot using gnuplot. The problem is that there are some values that I wish to remove.
Here is an example of my data:
1 52
2 3
3 0
4 4
5 1
6 1
7 1
8 0
9 0
I want to remove any row in which the right column is 0, so the data above would end up looking like this:
1 52
2 3
4 4
5 1
6 1
7 1
Let's just check field 2:
awk '$2' file
If the 2nd field has a True value, that is, not 0 or empty, the condition is True. In such case, awk performs its default action: print $0, meaning print the current line.
Updated, shorter:
awk '$2 == 0 { next; } { print; }'
awk '{ if ($2 == 0) { next; } else { print; } }'

Awk Conditional Test Statement

I would really appreciate some help. I spent almost the whole morning on it.
I have a data of structure field 1 to 16 as follows
4572 1307084940 RDCSWE 2006 1 5 0.28125 0.5 0.125 0.09375 0 0 0 0 0 0
4573 1307101627 RDCSWE 2006 1 5 0.6875 0.125 0.1875 0 0 0 0 0 0 0
4574 1307101642 RDCSWE 2006 1 5 0.5625 0.25 0.03125 0.15625 0 0 0 0 0 0
4575 1307101662 RDCSWE 2006 1 5 0.53125 0.25 0.1875 0.03125 0 0 0 0 0 0
4576 1307127329 RDCSWE 2006 1 5 0.4375 0.34375 0.09375 0.125 0 0 0 0 0 0
From field 7 to 10 I need a test on the elements (ranging fro 0-1) and the field number.
i.e. for every record, check the fields 7-10 for maximum value,
if found and its in field 7 print $0, $6-4
if found and its in field 8 print $0, $6-3
if found and its in field 9 print $0, $6-2
if found and its in field 10 print $0, $6-1
I'll be so grateful for the help. Thank you in advance
Edit (by belisarius)
Just transcripting a comment from #Tumi2002 (author)
Sorry, my 6th field (i.e. $6) has values 1-5.
I am trying to reclassify records where field 6=5 back into 1-4 classes in the same fieid).
So that instead of 5 classes I have 4.
Awk '$6==5
{for i=7; i<11; i++)
if ($i==max) && NF==7) print $0,$6-4;
if ($i==max) && NF==8) print $0,$6-3;
if ($i==max) && NF==9) print $0,$6-2;
if ($i==max) && NF==10) print $0,$6-1
I am struggling with the syntax in awk
{
max=0; maxindex=0;
for (i=7; i<=10; i++)
{
if ($i>max){
maxindex=i;
max=$i;
# print i;
}
}
if (maxindex > 0){
print $6-11+maxindex;
}
}
Running at ideone
Output for your example data:
2
1
1
1
1
Edit
Modified answering your comment:
($6 == 5){
max=0; maxindex=0;
for (i=7; i<=10; i++)
{
if ($i>max){
maxindex=i;
max=$i;
# print i;
}
}
if (maxindex > 0){
print $0,"-->",$6-11+maxindex;
}
}
Output:
4572 1307084940 RDCSWE 2006 1 5 0.28125 0.5 0.125 0.09375 0 0 0 0 0 0 --> 2
4573 1307101627 RDCSWE 2006 1 5 0.6875 0.125 0.1875 0 0 0 0 0 0 0 --> 1
4574 1307101642 RDCSWE 2006 1 5 0.5625 0.25 0.03125 0.15625 0 0 0 0 0 0 --> 1
4575 1307101662 RDCSWE 2006 1 5 0.53125 0.25 0.1875 0.03125 0 0 0 0 0 0 --> 1
4576 1307127329 RDCSWE 2006 1 5 0.4375 0.34375 0.09375 0.125 0 0 0 0 0 0 --> 1
Running at ideone here
First of all, thanks for belisarius for pointing to ideone.
My (updated) solution is working correctly now:
# max value in an array, copied verbatim from the gawk manual (credit)
function maxelt(vec, i, ret)
{
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
# Load all fields of each record into nums.
{
delete nums
for(i = 7; i <= 10; i++)
{ nums[NR, i] = $i }
### DEBUG print NR, maxelt(nums)
if ( $7 == maxelt(nums) ) { print $0, ($6-4) }
if ( $8 == maxelt(nums) ) { print $0, ($6-3) }
if ( $9 == maxelt(nums) ) { print $0, ($6-2) }
if ( $10 == maxelt(nums) ) { print $0, ($6-1) }
}
HTH