Sed / Awk : replace the first occurrence of a pattern with the content of another file - awk

With sed, I'm trying to replace the first occurrence of a comment in a script, like :
#ENTRYPOINT_CONTENT
by the content of a second file ($file_content) into another third file (/base.sh).
So, according to many docs, the string should be quite simple, something like :
sed "s|\#ENTRYPOINT_CONTENT|$file_content|" /base.sh
But I always end up with errors like :
sed: -e expression #1, char 23: unterminated s' command
or similar messages, also tried different delimiters, even with Awk instead, but without success, it seems to be fine after escaping the # in the search pattern, but I still can't get the file content as a variable in Sed.
Any ideas, either with Sed or Awk ?
Edit : --------------------
#Sundeep #James Brown:
Don't want to mix the subjects nor to be long :) the response to you clarification request is in bold at the end of this edit, but just to elaborate the context, my case is a Docker entrypoint script in bash (for a base Docker image) called /root/test/base :
#!/usr/bin/env bash
#ENTRYPOINT_CONTENT
if [[ -e "/root/test/custom" ]]; then
printf "\n\n#ENTRYPOINT_CONTENT\n" >> /root/test/custom
# Code from whjm's reply below (actually works but appends shebangs from custom files)
sed -e '0,/^#ENTRYPOINT_CONTENT/!b; /^#ENTRYPOINT_CONTENT/{ r /root/test/custom' -e 'd; }' /root/test/base.sh >> /root/test/base2.sh
mv /root/test/base2.sh /root/test/base.sh
rm -f /root/test/custom
fi
I just want to let users drop another bash script of their own on a specific path (say /root/test/custom), for example :
#!/usr/bin/env bash
echo 'My 2nd bash code'
The first script above (base) should insert the content of the custom file at #ENTRYPOINT_CONTENT position (in the base script itself, without removing this search string), like this :
#!/usr/bin/env bash
echo 'My 2nd bash code'
#ENTRYPOINT_CONTENT
if [[ -e "/root/test/custom" ]]; then
printf "\n\n#ENTRYPOINT_CONTENT\n" >> /root/test/custom
# Code from whjm's reply below
sed -e '0,/^#ENTRYPOINT_CONTENT/!b; /^#ENTRYPOINT_CONTENT/{ r /root/test/custom' -e 'd; }' /root/test/base.sh >> /root/test/base2.sh
mv /root/test/base2.sh /root/test/base.sh
rm -f /root/test/custom
fi
If another user later drops another custom script at the same path, we should have the code of this third custom script appended like this :
#!/usr/bin/env bash
echo 'My 2nd bash code'
echo 'My 3rd bash code'
#ENTRYPOINT_CONTENT
if [[ -e "/root/test/custom" ]]; then
# ... and so on
Regarding the shebangs from custom files, it's not a real issue if they are appended to the base file, the sed code from #whjm works as expected but appends them, while (surprisingly) both awk codes from #James Brown already (and gracefully :) ignore all additional shebangs from custom files (probably because they also start with # as the #ENTRYPOINT_CONTENT search string) but currently partly preprend the code like :
#!/usr/bin/env bash
echo 'My 3rd bash code'
echo 'My 2nd bash code'
#ENTRYPOINT_CONTENT
while I'm trying to get it appended like :
#!/usr/bin/env bash
echo 'My 2nd bash code'
echo 'My 3rd bash code'
#ENTRYPOINT_CONTENT
So, in short, #Sundeep, if you could just give me an updated version of your awk code for this, it would be perfect ! :D (couldn't find a way to invert this...) Thanks a lot.
Code from your previous post :
NR==FNR { b=b (FNR==1?"":ORS) $0; next }
{ r=r (FNR==1?"":ORS) $0 } #ENTRYPOINT_CONTENT
END { sub(/\#[^\n]+/,r,b); print b}

If you're using GNU sed:
[STEP 101] # cat file1
11
xx // replace me
44
xx // don't replace me
55
[STEP 102] # cat file2
22
33
[STEP 103] # sed -e '0,/^xx/!b; /^xx/{ r file2' -e 'd }' file1
11
22
33
44
xx // don't replace me
55
[STEP 104] #

Related

Extracting data after a tag and CR with Busybox sed

I have a script that extracts a file from a bash script combined with a binary file. It does so using the following GNU sed syntax
sed -n '/__DATA__/{n;:1;n;p;b1}' /tmp/combined.file > /tmp/binary.file
The files are assembled by cat'ing an ISO file to the end of a bash script. Which is then sent over the network to an embedded device and extracted on the device, piping the ISO file to a temporary dir and executing the bash script to install it.
However, on executing this I get a
sed: unterminated {
Am I missing something here? Is this task possible with BusyBox sed?
It tried the "Second attempt" below with OSX/BSD awk and it failed, just printing up til the first NUL character. So you can't do this job portably with awk or sed.
Here's what should work everywhere given that the POSIX standard says
the input file to tail can be any type
so the input to tail doesn't have to be a POSIX text file (no NULs) and we're exiting from awk before the first NUL is encountered in the input so they should both be happy:
$ tail -n +"$(awk '/^__DATA__$/{print NR+2; exit}' binary.bin)" binary.bin | cat -ev
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3��M-^Nռ^#|��f1�f1�fSfQ^FWM-^N�M-^N�R�^#|�^#^F�^#^A��K^F^#^#R�A��U1�0���^Sr^VM-^A�U�u^PM-^C�^At^Kf�^F�^F�B�^U�^B1�ZQ�^H�^S[^O��#PM-^C�?Q��SRP�^#|�^D^#f��^G�D^#^OM-^BM-^#^#f#M-^#�^B��fM-^A>#|��xpu ��{�D|^#^#�M-^C^#isolinux.bin missing or corrupt.^M$
f`f1�f^C^F�{f^S^V�{fRfP^FSj^Aj^PM-^I�f�6�{��^FM-^H�M-^H�M-^R�6�{M-^H�^H�A�^A^BM-^J^V�{�^SM-^Md^Pfa��^^^#Operating system load error.^M$
^��^NM-^J>b^D�^G�^P<$
u��^X���^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#�K�6^#^#M-^#^#^A^#^#?�M-^K^#^#^#^#^#`^\^#^#�������<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#U�EFI PART^#^#^A^#\^#^#^#]3�.^#^#^#^#^A^#^#^#^#^#^#^#�_^\^#^#^#^#^##^#^#^#^#^#^#^#�_^\^#^#^#^#^#Uc�r^Oqc#M-^Rc^F�$LZ�^L^#^#^#^#^#^#^#�^#^#^#M-^#^#^#^#�t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
Second attempt:
Now that I have a better idea what you're trying to do (process a file consisting of POSIX text lines up to a point and then can contain NUL characters afterwards), try this:
$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
foo^#bar^#etc
$ cat tst.awk
/^__DATA__$/ { n=NR + 1 }
n && (NR == n) { RS="\0"; ORS="" }
n && (NR > n) { print (c++ ? RS : "") $0 }
$ awk -f tst.awk file | cat -ev
foo^#bar^#etc
The above doesn't try to store any input lines containing NUL in memory, instead it reads \n-terminated text lines until it reaches the line after the one containing __DATA__ and then switches to reading NUL-terminated records into memory and printing NULs between them on output.
It's still undefined behavior per POSIX (see my comments below) but in theory it should work since it just relies on being able to set one variable (RS) to NUL rather than trying to store input strings that contain NULs. Also, setting RS to NUL has been a (flawed) workaround for awk scripts for years to be able to read a whole file into memory at once so being able to set RS to NUL should work in any modern awk.
Using the new sample you provided with the missing blank line after the __DATA__ line added:
$ cat -ev file
#!/bin/bash$
$
echo "I: Awesome Things happened here"$
exit 0$
$
__DATA__$
$
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3M-mM-zM-^NM-UM-<^#|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^#|M-?^#^FM-9^#^AM-sM-%M-jK^F^#^#RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F#PM-^CM-a?QM-wM-aSRPM-;^#|M-9^D^#fM-!M-0^GM-hD^#^OM-^BM-^#^#f#M-^#M-G^BM-bM-rfM-^A>#|M-{M-#xpu M-zM-<M-l{M-jD|^#^#M-hM-^C^#isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-#M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^#Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#M-/KM-66^#^#M-^#^#^A^#^#?M-`M-^K^#^#^#^#^#`^\^#^#M-~M-^?M-^?M-oM-~M-^?M-^?<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#UM-*EFI PART^#^#^A^#\^#^#^#]3M-%.^#^#^#^#^A^#^#^#^#^#^#^#M-^?_^\^#^#^#^#^##^#^#^#^#^#^#^#M-J_^\^#^#^#^#^#UcM-)r^Oqc#M-^Rc^FM-2$LZM-p^L^#^#^#^#^#^#^#M-P^#^#^#M-^#^#^#^#M-{t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
.
$ awk -f tst.awk file | cat -ev
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3M-mM-zM-^NM-UM-<^#|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^#|M-?^#^FM-9^#^AM-sM-%M-jK^F^#^#RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F#PM-^CM-a?QM-wM-aSRPM-;^#|M-9^D^#fM-!M-0^GM-hD^#^OM-^BM-^#^#f#M-^#M-G^BM-bM-rfM-^A>#|M-{M-#xpu M-zM-<M-l{M-jD|^#^#M-hM-^C^#isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-#M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^#Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#M-/KM-66^#^#M-^#^#^A^#^#?M-`M-^K^#^#^#^#^#`^\^#^#M-~M-^?M-^?M-oM-~M-^?M-^?<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#UM-*EFI PART^#^#^A^#\^#^#^#]3M-%.^#^#^#^#^A^#^#^#^#^#^#^#M-^?_^\^#^#^#^#^##^#^#^#^#^#^#^#M-J_^\^#^#^#^#^#UcM-)r^Oqc#M-^Rc^FM-2$LZM-p^L^#^#^#^#^#^#^#M-P^#^#^#M-^#^#^#^#M-{t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
Original answer:
Assuming this question is related to your previous question, this will work using any awk in any shell on every UNIX box:
$ awk '/^__DATA__$/{n=NR+1} n && NR>n' file
3<ED>M-^PM-^PM-^PM-^PM-^
When it finds __DATA__ it sets a variable n to the line number to start printing after and then when n is set prints every line for which the line number is greater than n.
The above was run against this input file from your previous question:
$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
3<ED>M-^PM-^PM-^PM-^PM-^$

Change a string using sed or awk

I have some files which have wrong time and date, but the filename contains the correct time and date and I try to write a script to fix this with the touch command.
Example of filename:
071212_090537.jpg
I would like this to be converted to the following format:
1712120905.37
Note, the year is listed as 07 in the filename, even if it is 17 so I would like the first 0 to be changed to 1.
How can I do this using awk or sed?
I'm quite new to awk and sed, an programming in general. Have tried to search for a solution and instruction, but haven't manage to figure out how to solve this.
Can anyone help me?
Thanks. :)
Take your example:
awk -F'[_.]' '{$0=$1$2;sub(/^./,"1");sub(/..$/,".&")}1'<<<"071212_090537.jpg"
will output:
1712120905.37
If you want the file to be removed, you can let awk generate the mv origin new command, and pipe the output to |sh, like: (comments inline)
listYourFiles| # list your files as input to awk
awk -F'[_.]' '{o=$0;$0=$1$2;sub(/^./,"1");sub(/..$/,".&");
printf "mv %s %s\n",o,$0 }1' #this will print "mv ori new"
|sh # this will execute the mv command
It's completely unnecessary to call awk or sed for this, you can do it in your shell. e.g. with bash:
$ f='071212_090537.jpg'
$ [[ $f =~ ^.(.*)_(.*)(..)\.[^.]+$ ]]
$ echo "1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
1712120905.37
This is probably what you're trying to do:
for old in *.jpg; do
[[ $old =~ ^.(.*)_(.*)(..)\.[^.]+$ ]] || { printf 'Warning, unexpected old file name format "%s"\n' "$old" >&2; continue; }
new="1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
[[ -f "$new" ]] && { printf 'Warning, new file name "%s" generated from "%s" already exists, skipping.\n' "$new" "$old" >&2; continue; }
mv -- "$old" "$new"
done
You need that test for new already existing since an old of 071212_090537.jpg or 171212_090537.jpg (or various other values) would create the same new of 1712120905.37
I think sed really is the easiest solution:
You could do this:
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
For more info:
You probably need to read an introductory tutorial on regular expressions.
Note that the -E option to sed allows use of extended regular expressions, allowing a more readable and convenient expression here.
Use of <<< is a Bashism known as a "here-string". If you are using a shell that doesn't support that, A <<< $b can be rewritten as echo $b | A.
Testing:
▶ touch 071212_090538.jpg 071212_090539.jpg
▶ ls -1 *.jpg
071212_090538.jpg
071212_090539.jpg
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
▶ ls -1
0712120905.38.jpg
0712120905.39.jpg

Rename file using fasta header

I have multiple fasta files downloaded from NCBI and want to rename them with some part of the header:
Example of the header: >KY705281.1 Streptococcus phage P7955, complete genome
Example of filename: KY705281.fasta
The idea is to get rid of 'KY705281.1' and 'complete genome' so that only Streptococcus phage P7955 remain
For example, one input file will be:
>KY705281.1 Streptococcus phage P7955, complete genome
AGAAAGAAAAGACGGCTCATTTGTGGGTTGTCTTTTTTTGATTAAGTAATGAAGGAGGTGGATGTATTGG GCTAAATCAACGACAAAAACGATTTGCAGACGAATATTTGATATCTGGTGTCGCTTACAATGCAGCTATC AAAGCTGGGTATTCTGAGAAATACGCTAGAGCAAGAAGTCATACCTTGTTGGAAAATGTCGGCAT
It wlil be renamed to KY705281.fasta with content:
>Streptococcus phage P7955
AGAAAGAAAAGACGGCTCATTTGTGGGTTGTCTTTTTTTGATTAAGTAATGAAGGAGGTGGATGTATTGG GCTAAATCAACGACAAAAACGATTTGCAGACGAATATTTGATATCTGGTGTCGCTTACAATGCAGCTATC AAAGCTGGGTATTCTGAGAAATACGCTAGAGCAAGAAGTCATACCTTGTTGGAAAATGTCGGCAT
I'm a newbie with Linux but somehow with some Google search, I know that this could be done easily with some awk/sed/grep commands.
Any advice would be grateful
One way could be:
awk -F, 'FNR==1{match($1, "^>([^.]+)[^ ]+ (.*)", oFv); $1= ">" oFv[2]; sub(/ *complete genome */, "", $2);}{printf $0>oFv[1] ".fasta"}' somefiles*
This will keep old files and write corresponding new file(s).
Also this assume that the input files only have one line like you gave.
If you want to rename old files as well as change their contents,
Given your system and bash, also I think it's GNU awk & GNU sed,
please backup your files and try this:
#!/usr/bin/bash
for file in somefiles*; do
nn="$(awk -F[\>.] '{printf $2 ".fasta";exit}' "file")"
sed -ri '1{s/^[^ ]* />/;s/, complete genome//;}' "file"
if [ ! -f "$nn"];
then
mv "file" "nn"
else
echo "'$nn' exists, skip '$file', its content already changed." | tee _err_.log
fi
done
Or as oneliner:
for file in somefiles*; do nn="$(awk -F[\>.] '{printf $2 ".fasta";exit}' "$file")"; sed -ri '1{s/^[^ ]* />/;s/, complete genome//;}' "$file"; if [ ! -f "$nn" ]; then mv "$file" "$nn"; else echo "'$nn' exists, skip '$file', its content already changed." | tee _err_.log; fi; done

Prepend a # to the first line not already having a #

I have a file with options for a command I run. Whenever I run the command I want it to run with the options defined in the first line which is not commented out. I do this using this bash script:
while read run opt c; do
[[ $run == \#* ]] && continue
./submit.py $opt $run -c "$c"
break
done < to_submit.txt
The file to_submit.txt has entries like this:
#167993 options/optionfile.py long description
167995 options/other_optionfile.py other long description
...
After having run the submit script with the options in the last not commented out line, I want to comment out that line after the command ran successfully.
I can find the line number of the options I used adding this to the while loop:
line=$(grep -n "$run" to_submit.txt | grep "$opt" | grep "$c" | cut -f 1 -d ":")
But I'm not sure how to actually prepend a # to that line now. I could probably use head and tail to save the other lines and process that line separately and combine it all back into the file. But this sounds like it's to complicated, there must be an easier sed or awk solution to this.
$ awk '!f && sub(/^[^#]/,"#&"){f=1} 1' file
#167993 options/optionfile.py long description
#167995 options/other_optionfile.py other long description
...
To overwrite the contents of the original file:
awk '!f && sub(/^[^#]/,"#&"){f=1} 1' file > tmp && mv tmp file
just like with any other UNIX command.
Using GNU sed is probably simplest here:
sed '0,/^[^#]/ s//#&/' file
Add option -i if you want to update file in place.
'0,/^[^#]/ matches all lines up to and including the first one that doesn't start with #
s//#&/ then prepends # to that line.
Note that s//.../ (i.e., an empty regex) reuses the last matching regex in the range, which is /^[^#]/ in this case.
Note that the command doesn't work with BSD/OSX sed, unfortunately, because starting a range with 0 so as to allow the range endpoint to match the very first line also is not supported there. It is possible to make the command work with BSD/OSX sed, but it's more cumbersome.
If the input/output file is not very large, you can do it all in Bash:
optsfile=to_submit.txt
has_run_cmd=0
outputlines=()
while IFS= read -r inputline || [[ -n $inputline ]] ; do
read run opt c <<<"$inputline"
if (( has_run_cmd )) || [[ $run == \#* ]] ; then
outputlines+=( "$inputline" )
elif ./submit.py "$opt" "$run" -c "$c" ; then
has_run_cmd=1
outputlines+=( "#$inputline" )
else
exit $?
fi
done < "$optsfile"
(( has_run_cmd )) && printf '%s\n' "${outputlines[#]}" > "$optsfile"
The lines of the file are put in the outputlines array, with a hash prepended to the line that was used in the ./submit.py command. If the command runs successfully, the file is overwritten with the lines in outputlines.
After some searching around I found that
awk -v run="$run" -v opt="$opt" '{if($1 == run && $2 == opt) {print "#" $0} else print}' to_submit.txt > temp
mv -b -f temp to_submit.txt
seems to solve this (without needing to find the line number first, just comparing $ run and $opt). This assumes that the combination of run and opt is enough to identify a line and the comment is not needed (which happens to be true in my case). Not sure how the comment which is spanning multiple fields in awk would also be taken into account.

awk use a command line variable

awk -F, -f awkfile.awk -v mysearch="search term"
I am trying to use the above command from terminal and use search as the search term in the awk program. My awk program runs perfectly fine while actually assigning the search term inside of the program but I am wondering how to get the variable search to be used?
example of the line it's used at if($j ~ /mysearch/){, this does not work at setting the search term, but actually searching for the string mysearch.
Just remove the slashes:
$j ~ mysearch
This is not ideal, but I suggest to write a bash script, which takes in the search term, replace that search term in the awk script, then run the script. For example:
$ cat dosearch.sh
sed "s/XXX/$1/" awktemplate.awk > awkfile.awk
awk -f awkfile.awk data.txt
$ cat awktemplate.awk
{
j = 1
if ($j ~ /XXX/) {
# Do something, such as
print "Found:", $0
}
}
$ cat data.txt
foo here
bar there
xyz everywhere
$ ./dosearch.sh foo
Found: foo here
$ ./dosearch.sh bar
Found: bar there
In the above example, the awk template contains "XXX" as a search term, the bash script replaces that search term with the first parameter, then invoke awk on the modified script.
$ cat input
tinky-winky
dipsy
laa-laa
noo-noo
po
$ teletubby='po'
$ awk -v "regexp=$teletubby" '$0 ~ regexp' input
po
Note that anything could go into the shell-variable,
even a full-blown regexp, e.g ^d.*y. Just make sure to use single-quotes
to prevent the shell from doing any expansion.