Is there a way to match a string in a file and print/append the contents of another file (single line) on the next line after string match - awk

I am trying to match a string in a file and print/append the contents of another file after the string match
I have file "addthis" containing single line pleaseaddme:
pleaseaddme
and I have a second file "tothis" containing two lines [one] and [two] like so:
[one]
[two]
I have tried combinations resulting in the following which is close to my desired result:
awk '/\[one\]/ { printf $0; getline < "addthis" }1' tothis
Result:
[one]please add me
[two]
I would like "please add me" to be added to a new line after [one]
like so:
[one]
please add me
[two]
Also this is only printing to the screen and not into the file "tothis"
Have tried numerous variations this as close as I get

You could try with:
>> awk '/\[one\]/ {getline s < "addthis"; $0=$0 "\n" s;} /[^\s]+/{print}' tothis
[one]
pleaseaddme
[two]
If you want to replace in your file you must add the -i inplace:
>> awk -i inplace '/\[one\]/ {getline s < "addthis"; $0=$0 "\n" s;} /[^\s]+/{print}' tothis
>> cat tothis
[one]
pleaseaddme
[two]
If you couldn't use -i inplace, you should achieve the same with:
>> awk '/\[one\]/ {getline s < "addthis"; $0=$0 "\n" s;} /[^\s]+/{print}' tothis > fckmack && mv fckmack tothis

awk '/\[one\]/ {getline var< "addthis"; $0 = $0 ORS var }1' tothis
#=> [one]
#=> pleaseaddme
#=> [two]
First thing you should escape [ and ].
And then read file to a variable, and "append" it to current line.
Another way to do it could be:
awk '/\[one\]/ {print; getline var< "addthis"; print var; next }1' tothis
As of your own attempt, actually change printf to print will work:
awk '/\[one\]/ { print $0; getline < "addthis" }1' tothis
#=> [one]
#=> pleaseaddme
#=> [two]
Because printf won't print ORS but print will.
However on this account, $0 is superfluous as print without parameter will print $0 by default.
As for change in place, you can still redirect to a temp file and replace file later.

Given my awk version on mac this works
awk '/[one]/ {getline s < "addthis"; $0=$0 "\n" s;} /[^\s]+/{print}' tothis > fckmack && mv fckmack tothis

Related

Merge lines based on first column without delimiter

I need to merge all the lines that have the same value on the first column.
The input file is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)
34600000031|(2|2|0|2|2|20190114180000|20191027185959)
34600000031|(3|3|0|3|3|20190114180000|20191027185959)
34600000031|(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)
34600000015|(2|2|100|2|9|20190114180000|20191027185959)
34600000015|(3|3|100|3|10|20190114180000|20191027185959)
34600000015|(4|4|100|4|11|20190114180000|20191027185959)
I was able to partially achieve it using the following:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub(p,x); s=s $0} END{print s}' INPUT
The output is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)|(2|2|0|2|2|20190114180000|20191027185959)|(3|3|0|3|3|20190114180000|20191027185959)|(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)|(2|2|100|2|9|20190114180000|20191027185959)|(3|3|100|3|10|20190114180000|20191027185959)|(4|4|100|4|11|20190114180000|20191027185959)
What I need (and i cannot find how) is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)(2|2|0|2|2|20190114180000|20191027185959)(3|3|0|3|3|20190114180000|20191027185959)(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)(2|2|100|2|9|20190114180000|20191027185959)(3|3|100|3|10|20190114180000|20191027185959)(4|4|100|4|11|20190114180000|20191027185959)
I could do a sed after the initial awk but I don't believe that this is the proper way to do it.
You need to substitute the separator in the values too. Your fixes awk would look like this:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub(p "\\|",x); s=s $0} END{print s}'
but it's also good to match beginning of the string:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub("^" p "\\|",x); s=s $0} END{print s}'
I would do it somewhat simpler, which uses more memory (as it stores everything in an array) but doesn't need the file to be sorted:
awk -F'|' '{ k=$1; sub("^" $1 "\\|", ""); a[k] = a[k] $0 } END{ for (i in a) print i "|" a[i] }'
For each line, remember the first field, substitute the first field with | for nothing, then add it to an array indexed by the first field. On the end, print each element in the array with the key, separator and value.
$ awk -F'|' '
{
curr = $1
sub(/^[^|]+\|/,"")
printf "%s%s", (curr==prev ? "" : ors curr FS), $0
ors = ORS
prev = curr
}
END { print "" }
' file
34600000031|(1|1|0|1|1|20190114180000|20191027185959)(2|2|0|2|2|20190114180000|20191027185959)(3|3|0|3|3|20190114180000|20191027185959)(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)(2|2|100|2|9|20190114180000|20191027185959)(3|3|100|3|10|20190114180000|20191027185959)(4|4|100|4|11|20190114180000|20191027185959)

While Read and AWK to Change Field

I have two files - FileA and FileB. FileA has 10 fields with 100 lines. If Field1 and Field2 match, Field3 should be changed. FileB has 3 fields. I am reading in FileB with a while loop to match the two fields and to get the value that should be use for field 3.
while IFS=$'\t' read hostname interface metric; do
awk -v var1=${hostname} -v var2=${interface} -v var3=${metric} '{if ($1 ~ var1 && $2 ~ var2) $3=var3; print $0}' OFS="\t" FileA.txt
done < FileB.txt
At each line iteration, this prints FileB.txt with the single line that changed. I only want it to print the line that was changed.
Please Help!
It's a smell to be calling awk once for each line of file B. You should be able to accomplish this task with a single pass through each file.
Try something like this:
awk -F'\t' -v OFS='\t' '
# first, read in data from file B
NR == FNR { values[$1 FS $2] = $3; next }
# then, output modified lines from matching lines in file A
($1 FS $2) in values { $3 = values[$1 FS $2]; print }
' fileB fileA
I'm assuming that you actually want to match with string equality instead of ~ pattern matching.
I only want it to print the line that was changed.
Simply put your print $0 statement to if clause body:
'{if ($1 ~ var1 && $2 ~ var2) { $3=var3; print $0 }}'
or even shorter:
'$1~var1 && $2~var2{ $3=var3; print $0 }'

Find "complete cases" with awk

Using awk, how can I output the lines of a file that have all fields non-null without manually specifying each column?
foo.dat
A||B|X
A|A|1|
|1|2|W
A|A|A|B
Should return:
A|A|A|B
In this case we can do:
awk -F"|" -v OFS="|" '$1 != "" && $2 != "" && $3 != "" && $4 != "" { print }' foo.dat
But is there a way to do this without specifying each column?
You can loop over all fields and skip the record if any of the fields are empty:
$ awk -F'|' '{ for (i=1; i<=NF; ++i) { if (!$i) next } }1' foo.dat
A|A|A|B
if (!$i) is "if field i is not non-empty", and 1 is short for "print the line", but it is only hit if next was not executed for any of the fields of the current line.
Another in awk:
$ awk -F\| 'gsub(/[^|]+(\||$)/,"&")==NF' file
A|A|A|B
print record if there are NF times | terminating (non-empty, |-excluding) strings.
awk '!/\|\|/&&!/\|$/&&!/^\|/' file
A|A|A|B

tcsh error: while loop

This is a basic program but since I'm a newbie, I'm not able to figure out the solution.
I have a file named rama.xvg in the following format:
-75.635 105.879 ASN-2
-153.704 64.7089 ARG-3
-148.238 -47.6076 GLN-4
-63.2568 -8.05441 LEU-5
-97.8149 -7.34302 GLU-6
-119.276 8.99017 ARG-7
-144.198 -103.917 SER-8
-65.4354 -10.3962 GLY-9
-60.6926 12.424 ARG-10
-159.797 -0.551989 PHE-11
65.9924 -48.8993 GLY-12
179.677 -7.93138 GLY-13
..........
...........
-70.5046 38.0408 GLY-146
-155.876 153.746 TRP-147
-132.355 151.023 GLY-148
-66.2679 167.798 ASN-2
-151.342 -33.0647 ARG-3
-146.483 41.3483 GLN-4
..........
..........
-108.566 0.0212432 SER-139
47.6854 33.6991 MET-140
47.9466 40.1073 ASP-141
46.4783 48.5301 SER-142
-139.17 172.486 LYS-143
58.9514 32.0602 SER-144
60.744 18.3059 SER-145
-94.0533 165.745 GLY-146
-161.809 177.435 TRP-147
129.172 -101.736 GLY-148
I need to extract all the lines containing "ASN-2" in one file all_1.dat and so on for all the 147 residues.
If I run the following command in the terminal, it gives the desired output for ASN-2:
awk '{if( NR%147 == 1 ) printf $0 "\n"}' rama.xvg > all_1.dat
To avoid doing it repeatedly for all the residues, I have written the following code.
#!/bin/tcsh
set i = 1
while ( $i < 148)
echo $i
awk '{if( NR%147 == i ) printf $0 "\n"}' rama.xvg > all_"$i".dat
# i++
end
But this code prints the lines containing GLY-148 in all the output files.
Please let me know what is the error in this code. I think it is related to nesting.
In your awk-line the variable i is an awk-variable not shell variable! If you want use shell-variable $i you can do:
awk -v i="$i" '{if( NR%147 == i ) printf $0 "\n"}' rama.xvg > all_"$i".dat
But I think would better put your while-loop into awk:
awk '{for (i=1; i<=147; i++) { if (NR%147==i) {printf $0 "\n" > ("all_" i ".dat") } } }' rama.xvg

Append lines to a previous line

I am trying to append all lines that begin with > to the previous line that did not begin with >
cat tmp
ATAAACGGAAAAACACTACTTTAGCTTACGGGATCCGGT
>Aa_816
>Aa_817
>Aa_818
CCAAACGGAAAAACACTACTTGAGCTTACGGGATCCGGT
>Aa_940
>Aa_941
CTAAAAGGAAAAACACTACTTTAGCTTTTGGGATCCGGT
What I want is this:
ATAAACGGAAAAACACTACTTTAGCTTACGGGATCCGGT >Aa_816 >Aa_817 >Aa_818
CCAAACGGAAAAACACTACTTGAGCTTACGGGATCCGGT >Aa_940 >Aa_941
CTAAAAGGAAAAACACTACTTTAGCTTTTGGGATCCGGT
This almost gets me there:
cat tmp |awk '!/>/ {sub(/\\$/,""); getline t; print $0 t; next}; 1'
With awk:
awk '!/^>/{printf "%s%s", (NR==1)?"":RS,$0;next}{printf "%s", FS $0}END{print ""}' file
Using awk
awk '!/>/{printf (NR==1)?$0:RS $0;next}{printf FS $0}' file
If you don't care the output has new line generated on the first line, here is the shorter one.
awk '{printf (/>/?FS $0:RS $0)}' file
I think all you need is a little sed:
sed ':a; N; $!ba; s/\n>/ >/g' file
Results:
ATAAACGGAAAAACACTACTTTAGCTTACGGGATCCGGT >Aa_816 >Aa_817 >Aa_818
CCAAACGGAAAAACACTACTTGAGCTTACGGGATCCGGT >Aa_940 >Aa_941
CTAAAAGGAAAAACACTACTTTAGCTTTTGGGATCCGGT
awk '/^[^>]/ { if (length(old) > 0) print old; old = $0 }
/^>/ { old = old " " $0 }
END { if (length(old) > 0) print old }'