How to loop awk command over row values - awk

I would like to use awk to search for a particular word in the first column of a table and print the value in the 6th column. I understand how to do this searching one word at time using something along the lines of:
awk '$1 == "<insert-word>" { print $6 }' file.txt
But I was wondering if it is possible to loop this over a list of words in a row?
For example If I had a table like file1.txt below:
cat file1.txt
dna1 dna4 dna5
dna3 dna6 dna2
dna7 dna8 dna9
Could I loop over each value in row 1 and search for this word in column 1 of file2.txt below, each time printing the value of column 6? Then do this for row 2, 3 and so on...
cat file2
dna1 0 229 7 0 4 0 0
dna2 0 296 39 2 1 3 100
dna3 0 255 15 0 6 0 0
dna4 0 209 3 0 0 0 0
dna5 0 253 14 2 3 7 100
dna6 0 897 629 7 8 1 100
dna7 0 214 4 0 9 0 0
dna8 0 255 15 0 2 0 0
dna9 0 606 338 8 3 1 100
So an example looping the awk over row 1 of file 1 would return the numbers 4, 0 and 3.
The looping the command over row 2 would return the numbers 6, 8 and 1
And finally looping over row 3 would return the number 9, 2, 3
An example output might be
4 0 3
6 8 1
9 2 3
What I would really like to to is sum the total value of the numbers returned for each row. I just wasn't sure if this would be possible...
An example output of this would be
7
15
14
But I am not worried if this step isn't possible using awk as I could just do it separately
Hope this makes sense
Cheers
Ollie

yes, you can give awk multiple input files. For your example:
awk 'NR==FNR{a[$1]=a[$2]=1;next}a[$1]{print $6}' file1 file2
I didn't test the above one-liner, but it should go. At least you get the idea.
If you don't know how many columns in your file1, as you said, you want to do a loop:
awk 'NR==FNR{for(x=1;x<=NF;x++)a[$x]=1;next}a[$1]{print $6}' file1 file2
update
edit for the new requirement:
awk 'NR==FNR{a[$1]=$6;next}{for(i=1;i<=NF;i++)s+=a[$i];print s;s=0}' f2 f1
The output of above one-liner: (take f1 and f2 as your input example file1 file2):
7
15
14

Related

Collapsing a column value into lines, copying values of a second column

I have a file with two columns (tab-separated):
In the first column I have the number of lines that I want to collapse, and in the second column is the number that I want to be pasted in each row (in a new file), based on the first column values.
File1:
col1 col2
365 1
6 1
142 1
99 0
223 0
11 1
So basically in the new file I want 365 lines with the number 1, followed by 6 lines of 1, 142 lines of 1, 99 lines of 0, 223 lines of 0 and 11 lines of 1...and so forth...
In total the new file should have 846 lines (which is the sum of the first column on the File1.
Ideally an awk command should do the trick I guess. Any inputs on this would be really appreciated...
Thanks
I would use GNU AWK following way. Contrived example to avoid superlong output, let file.txt be
col1 col2
5 1
3 0
5 1
then
awk 'NR>1{for(i=0;i<$1;i+=1)print $2}' file.txt
output
1
1
1
1
1
0
0
0
1
1
1
1
1
Explanation: I used for statement to print content of 2nd column ($2) times specified in 1st column ($1) for every line beyond 1st line (NR>1).
(tested in gawk 4.2.1)

Select current and previous line if certain value is found

To figure out my problem, I subtract column 3 and create a new column 5 with new values, then I print the previous and current line if the value found is equal to 25 in column 5.
Input file
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
I tried
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' file
output
1 1 35 1 35
2 5 50 1 15
2 6 75 1 25
4 7 85 1 10
5 8 100 1 15
6 9 125 1 25
4 1 200 1 75
Desired Output
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Thanks in advance
you're almost there, in addition to previous $3, keep the previous $0 and only print when condition is satisfied.
$ awk '{$5=$3-p3} $5==25{print p0; print} {p0=$0;p3=$3}' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
this can be further golfed to
$ awk '25==($5=$3-p3){print p0; print} {p0=$0;p3=$3}' file
check the newly computed field $5 whether equal to 25. If so print the previous line and current line. Save the previous line and previous $3 for the computations in the next line.
You are close to the answer, just pipe it another awk and print it
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
with Inputs:
$ cat oxxo.txt
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
$ awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
$
Could you please try following.
awk '$3-prev==25{print line ORS $0,$3} {$(NF+1)=$3-prev;prev=$3;line=$0}' Input_file | column -t
Here's one:
$ awk '{$5=$3-q;t=p;p=$0;q=$3;$0=t ORS $0}$10==25' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Explained:
$ awk '{
$5=$3-q # subtract
t=p # previous to temp
p=$0 # store previous for next round
q=$3 # store subtract value for next round
$0=t ORS $0 # prepare record for output
}
$10==25 # output if equals
' file
No checking for duplicates so you might get same record printed twice. Easiest way to fix is to pipe the output to uniq.

Finding NR of row with specific conditions (using next line)

Guys I have a file like this
NR column
1 1
2 1
3 0
4 0
5 0
6 1
7 1
8 1
9 1
10 0
11 0
12 0
13 1
14 1
What I need is to find the NR what will tell me where there are 1.
so my ideal output should tell me from NR=1 - 2 (there are 1s, then), NR=6 - 9, NR=13 - 14
or
1
2
6
9
13
14
Since, I think is easier not consider in the output the first row and the last. I expect that the output is
2
6
9
13
I've been trying a way to use getline but unsuccessfully.
I am sure there is an easy way to do this, help?
Thanks
Assuming your output above was incorrect (and it should really be the line number where the 0/1 or 1/0 transition happens - so the lines would be: "1, 3, 6, 10, 13"), then an awk oneliner is:
awk 'prev!=$0{print NR};{prev=$0}' file
which says:
for every line that doesn't match the prev line, print the line number, and
for every line, save the prev line
$ awk 'NR>1 && $0!=prev{print NR} {prev=$0}' file
3
6
10
13
or for your updated requirements:
$ awk '$1!=prev{print NR-prev} {prev=$1} END{if (prev) print NR}' file
1
2
6
9
13
14
awk to the rescue!
$ awk '!p&&$2==1{p=$1}
p&&!$2{print p"-"($1-1);p=0}
END{if(p) print p"-"$1}' file
1-2
6-9
13-14
{
if (NR > 1 && last != $0) {
print NR;
}
last = $0;
}
Another way
awk '$2!=x{x=$2;print NR-!($2)}END{if(x)print NR}' file
1
2
6
9
13
14

Replace characters with awk

I have the following file:
61 12451
61 13451
61 14451
61 15415
12 48469
12 78456
12 47845
32 45778
32 48745
32 47845
32 52448
32 87451
The output I want is the following, for example, 61 s are replaced by 1 as they are the first occurrence and they are repeated 4 times, then the second column goes from 2 to 5, as these are pairwise comparisons, 1 to 1 is ignored, but the second column should start from 2, so on for the rest.
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
3 6
3 7
3 8
Any suggestion on how to achieve this with AWK? Thanks!
It could be written in one awk command like this
awk '{a[NR]=$1;b[NR]=$2;c[NR]=$1;d[NR]=$2} END {for(i=1; i<=NR; i++){if(i==1){c[i]=1;d[i]=2}else if(a[i]==a[i-1]){c[i]=c[i-1];d[i]=1+d[i-1]}else{c[i]=1+c[i-1];d[i]=c[i]+1}print c[i],d[i]}}' pairwise.txt > output.txt
Here a and b are the arrays that read the first and second column of the file. The new values are stored in arrays c and d as first & second column and are printed to the output file.
not sure if this one-liner helps:
awk '$1!=p{++i;j=i+1}{print i,j++;p=$1}' file
at least it gives the desired output.

AWK: Comparing two different columns in two files

I have these two files
File1:
9 8 6 8 5 2
2 1 7 0 6 1
3 2 3 4 4 6
File2: (which has over 4 million lines)
MN 1 0
JK 2 0
AL 3 90
CA 4 83
MK 5 54
HI 6 490
I want to compare field 6 of file1, and compare field 2 of file 2. If they match, then put field 3 of file2 at the end of file1
I've looked at other solutions but I can't get it to work correctly.
Desired output:
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
My attempt:
awk 'NR==FNR{a[$2]=$2;next}a[$6]{print $0,a[$6]}' file2 file1
program just hangs after that.
To print all lines in file1 with match if available:
$ awk 'FNR==NR{a[$2]=$3;next;} {print $0,a[$6];}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
To print only the lines that have a match:
$ awk 'NR==FNR{a[$2]=$3;next} $6 in a {print $0,a[$6]}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
Note that I replaced a[$2]=$2 with a[$2]=$3 and changed the test a[$6] (which is false if the value is zero) to $6 in a.
Your own attempt basically has two bugs as seen in #John1024's answer:
You use field 2 as both key and value in a, where you should be storing field 3 as the value (since you want to keep it for later), i.e., it should be a[$2] = $3.
The test a[$6] is false when the value in a is zero, even if it exists. The correct test is $6 in a.
Hence:
awk 'NR==FNR { a[$2]=$3; next } $6 in a {print $0, a[$6] }' file2 file1
However, there might be better approaches, but it is not clear from your specifications. For instance, you say that file2 has over 4 million lines, but it is unknown if there are also that many unique values for field 2. If yes, then a will also have that many entries in memory. And, you don't specify how long file1 is, or if its order must be preserved for output, or if every line (even without matches in file2) should be output.
If it is the case that file1 has many fewer lines than file2 has unique values for field 2, and only matching lines need to be output, and order does not need to be preserved, then you might wish to read file1 first…