Prepend a # to the first line not already having a # - awk

I have a file with options for a command I run. Whenever I run the command I want it to run with the options defined in the first line which is not commented out. I do this using this bash script:
while read run opt c; do
[[ $run == \#* ]] && continue
./submit.py $opt $run -c "$c"
break
done < to_submit.txt
The file to_submit.txt has entries like this:
#167993 options/optionfile.py long description
167995 options/other_optionfile.py other long description
...
After having run the submit script with the options in the last not commented out line, I want to comment out that line after the command ran successfully.
I can find the line number of the options I used adding this to the while loop:
line=$(grep -n "$run" to_submit.txt | grep "$opt" | grep "$c" | cut -f 1 -d ":")
But I'm not sure how to actually prepend a # to that line now. I could probably use head and tail to save the other lines and process that line separately and combine it all back into the file. But this sounds like it's to complicated, there must be an easier sed or awk solution to this.

$ awk '!f && sub(/^[^#]/,"#&"){f=1} 1' file
#167993 options/optionfile.py long description
#167995 options/other_optionfile.py other long description
...
To overwrite the contents of the original file:
awk '!f && sub(/^[^#]/,"#&"){f=1} 1' file > tmp && mv tmp file
just like with any other UNIX command.

Using GNU sed is probably simplest here:
sed '0,/^[^#]/ s//#&/' file
Add option -i if you want to update file in place.
'0,/^[^#]/ matches all lines up to and including the first one that doesn't start with #
s//#&/ then prepends # to that line.
Note that s//.../ (i.e., an empty regex) reuses the last matching regex in the range, which is /^[^#]/ in this case.
Note that the command doesn't work with BSD/OSX sed, unfortunately, because starting a range with 0 so as to allow the range endpoint to match the very first line also is not supported there. It is possible to make the command work with BSD/OSX sed, but it's more cumbersome.

If the input/output file is not very large, you can do it all in Bash:
optsfile=to_submit.txt
has_run_cmd=0
outputlines=()
while IFS= read -r inputline || [[ -n $inputline ]] ; do
read run opt c <<<"$inputline"
if (( has_run_cmd )) || [[ $run == \#* ]] ; then
outputlines+=( "$inputline" )
elif ./submit.py "$opt" "$run" -c "$c" ; then
has_run_cmd=1
outputlines+=( "#$inputline" )
else
exit $?
fi
done < "$optsfile"
(( has_run_cmd )) && printf '%s\n' "${outputlines[#]}" > "$optsfile"
The lines of the file are put in the outputlines array, with a hash prepended to the line that was used in the ./submit.py command. If the command runs successfully, the file is overwritten with the lines in outputlines.

After some searching around I found that
awk -v run="$run" -v opt="$opt" '{if($1 == run && $2 == opt) {print "#" $0} else print}' to_submit.txt > temp
mv -b -f temp to_submit.txt
seems to solve this (without needing to find the line number first, just comparing $ run and $opt). This assumes that the combination of run and opt is enough to identify a line and the comment is not needed (which happens to be true in my case). Not sure how the comment which is spanning multiple fields in awk would also be taken into account.

Related

Extracting data after a tag and CR with Busybox sed

I have a script that extracts a file from a bash script combined with a binary file. It does so using the following GNU sed syntax
sed -n '/__DATA__/{n;:1;n;p;b1}' /tmp/combined.file > /tmp/binary.file
The files are assembled by cat'ing an ISO file to the end of a bash script. Which is then sent over the network to an embedded device and extracted on the device, piping the ISO file to a temporary dir and executing the bash script to install it.
However, on executing this I get a
sed: unterminated {
Am I missing something here? Is this task possible with BusyBox sed?
It tried the "Second attempt" below with OSX/BSD awk and it failed, just printing up til the first NUL character. So you can't do this job portably with awk or sed.
Here's what should work everywhere given that the POSIX standard says
the input file to tail can be any type
so the input to tail doesn't have to be a POSIX text file (no NULs) and we're exiting from awk before the first NUL is encountered in the input so they should both be happy:
$ tail -n +"$(awk '/^__DATA__$/{print NR+2; exit}' binary.bin)" binary.bin | cat -ev
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3��M-^Nռ^#|��f1�f1�fSfQ^FWM-^N�M-^N�R�^#|�^#^F�^#^A��K^F^#^#R�A��U1�0���^Sr^VM-^A�U�u^PM-^C�^At^Kf�^F�^F�B�^U�^B1�ZQ�^H�^S[^O��#PM-^C�?Q��SRP�^#|�^D^#f��^G�D^#^OM-^BM-^#^#f#M-^#�^B��fM-^A>#|��xpu ��{�D|^#^#�M-^C^#isolinux.bin missing or corrupt.^M$
f`f1�f^C^F�{f^S^V�{fRfP^FSj^Aj^PM-^I�f�6�{��^FM-^H�M-^H�M-^R�6�{M-^H�^H�A�^A^BM-^J^V�{�^SM-^Md^Pfa��^^^#Operating system load error.^M$
^��^NM-^J>b^D�^G�^P<$
u��^X���^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#�K�6^#^#M-^#^#^A^#^#?�M-^K^#^#^#^#^#`^\^#^#�������<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#U�EFI PART^#^#^A^#\^#^#^#]3�.^#^#^#^#^A^#^#^#^#^#^#^#�_^\^#^#^#^#^##^#^#^#^#^#^#^#�_^\^#^#^#^#^#Uc�r^Oqc#M-^Rc^F�$LZ�^L^#^#^#^#^#^#^#�^#^#^#M-^#^#^#^#�t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
Second attempt:
Now that I have a better idea what you're trying to do (process a file consisting of POSIX text lines up to a point and then can contain NUL characters afterwards), try this:
$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
foo^#bar^#etc
$ cat tst.awk
/^__DATA__$/ { n=NR + 1 }
n && (NR == n) { RS="\0"; ORS="" }
n && (NR > n) { print (c++ ? RS : "") $0 }
$ awk -f tst.awk file | cat -ev
foo^#bar^#etc
The above doesn't try to store any input lines containing NUL in memory, instead it reads \n-terminated text lines until it reaches the line after the one containing __DATA__ and then switches to reading NUL-terminated records into memory and printing NULs between them on output.
It's still undefined behavior per POSIX (see my comments below) but in theory it should work since it just relies on being able to set one variable (RS) to NUL rather than trying to store input strings that contain NULs. Also, setting RS to NUL has been a (flawed) workaround for awk scripts for years to be able to read a whole file into memory at once so being able to set RS to NUL should work in any modern awk.
Using the new sample you provided with the missing blank line after the __DATA__ line added:
$ cat -ev file
#!/bin/bash$
$
echo "I: Awesome Things happened here"$
exit 0$
$
__DATA__$
$
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3M-mM-zM-^NM-UM-<^#|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^#|M-?^#^FM-9^#^AM-sM-%M-jK^F^#^#RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F#PM-^CM-a?QM-wM-aSRPM-;^#|M-9^D^#fM-!M-0^GM-hD^#^OM-^BM-^#^#f#M-^#M-G^BM-bM-rfM-^A>#|M-{M-#xpu M-zM-<M-l{M-jD|^#^#M-hM-^C^#isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-#M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^#Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#M-/KM-66^#^#M-^#^#^A^#^#?M-`M-^K^#^#^#^#^#`^\^#^#M-~M-^?M-^?M-oM-~M-^?M-^?<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#UM-*EFI PART^#^#^A^#\^#^#^#]3M-%.^#^#^#^#^A^#^#^#^#^#^#^#M-^?_^\^#^#^#^#^##^#^#^#^#^#^#^#M-J_^\^#^#^#^#^#UcM-)r^Oqc#M-^Rc^FM-2$LZM-p^L^#^#^#^#^#^#^#M-P^#^#^#M-^#^#^#^#M-{t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
.
$ awk -f tst.awk file | cat -ev
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3M-mM-zM-^NM-UM-<^#|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^#|M-?^#^FM-9^#^AM-sM-%M-jK^F^#^#RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F#PM-^CM-a?QM-wM-aSRPM-;^#|M-9^D^#fM-!M-0^GM-hD^#^OM-^BM-^#^#f#M-^#M-G^BM-bM-rfM-^A>#|M-{M-#xpu M-zM-<M-l{M-jD|^#^#M-hM-^C^#isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-#M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^#Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#M-/KM-66^#^#M-^#^#^A^#^#?M-`M-^K^#^#^#^#^#`^\^#^#M-~M-^?M-^?M-oM-~M-^?M-^?<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#UM-*EFI PART^#^#^A^#\^#^#^#]3M-%.^#^#^#^#^A^#^#^#^#^#^#^#M-^?_^\^#^#^#^#^##^#^#^#^#^#^#^#M-J_^\^#^#^#^#^#UcM-)r^Oqc#M-^Rc^FM-2$LZM-p^L^#^#^#^#^#^#^#M-P^#^#^#M-^#^#^#^#M-{t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
Original answer:
Assuming this question is related to your previous question, this will work using any awk in any shell on every UNIX box:
$ awk '/^__DATA__$/{n=NR+1} n && NR>n' file
3<ED>M-^PM-^PM-^PM-^PM-^
When it finds __DATA__ it sets a variable n to the line number to start printing after and then when n is set prints every line for which the line number is greater than n.
The above was run against this input file from your previous question:
$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
3<ED>M-^PM-^PM-^PM-^PM-^$

How to parse a column from one file in mutiple other columns and concatenate the output?

I have one file like this:
head allGenes.txt
ENSG00000128274
ENSG00000094914
ENSG00000081760
ENSG00000158122
ENSG00000103591
...
and I have a multiple files named like this *.v7.egenes.txt in the current directory. For example one file looks like this:
head Stomach.v7.egenes.txt
ENSG00000238009 RP11-34P13.7 1 89295 129223 - 2073 1.03557 343.245
ENSG00000237683 AL627309.1 1 134901 139379 - 2123 1.02105 359.907
ENSG00000235146 RP5-857K21.2 1 523009 530148 + 4098 1.03503 592.973
ENSG00000231709 RP5-857K21.1 1 521369 523833 - 4101 1.07053 559.642
ENSG00000223659 RP5-857K21.5 1 562757 564390 - 4236 1.05527 595.015
ENSG00000237973 hsa-mir-6723 1 566454 567996 + 4247 1.05299 592.876
I would like to get lines from all *.v7.egenes.txt files that match any entry in allGenes.txt
I tried using:
grep -w -f allGenes.txt *.v7.egenes.txt > output.txt
but this takes forever to complete. Is there is any way to do this in awk or?
Without knowing the size of the files, but assuming the host has enough memory to hold allGenes.txt in memory, one awk solution comes to mind:
awk 'NR==FNR { gene[$1] ; next } ( $1 in gene )' allGenes.txt *.v7.egenes.txt > output.txt
Where:
NR==FNR - this test only matches the first file to be processed (allGenes.txt)
gene[$1] - store each gene as an index in an associative array
next stop processing and go to next line in the file
$1 in gene - applies to all lines in all other files; if the first field is found to be an index in our associative array then we print the current line
I wouldn't expect this to run any/much faster than the grep solution the OP is currently using (especially with shelter's suggestion to use -F instead of -w), but it should be relatively quick to test and see ....
GNU Parallel has a whole section dedicated to grepping n lines for m regular expressions:
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Grepping-n-lines-for-m-regular-expressions
You could try with a while read loop :
#!/bin/bash
while read -r line; do
grep -rnw Stomach.v7.egenes.txt -e "$line" >> output.txt
done < allGenes.txt
So here tell the while loop to read all the lines from the allGenes.txt, and for each line, check whether there are matching lines in the egenes file. Would that do the trick?
EDIT :
New version :
#!/bin/bash
for name in $(cat allGenes.txt); do
grep -rnw *v7.egenes.txt* -e $name >> output.txt
done

Change a string using sed or awk

I have some files which have wrong time and date, but the filename contains the correct time and date and I try to write a script to fix this with the touch command.
Example of filename:
071212_090537.jpg
I would like this to be converted to the following format:
1712120905.37
Note, the year is listed as 07 in the filename, even if it is 17 so I would like the first 0 to be changed to 1.
How can I do this using awk or sed?
I'm quite new to awk and sed, an programming in general. Have tried to search for a solution and instruction, but haven't manage to figure out how to solve this.
Can anyone help me?
Thanks. :)
Take your example:
awk -F'[_.]' '{$0=$1$2;sub(/^./,"1");sub(/..$/,".&")}1'<<<"071212_090537.jpg"
will output:
1712120905.37
If you want the file to be removed, you can let awk generate the mv origin new command, and pipe the output to |sh, like: (comments inline)
listYourFiles| # list your files as input to awk
awk -F'[_.]' '{o=$0;$0=$1$2;sub(/^./,"1");sub(/..$/,".&");
printf "mv %s %s\n",o,$0 }1' #this will print "mv ori new"
|sh # this will execute the mv command
It's completely unnecessary to call awk or sed for this, you can do it in your shell. e.g. with bash:
$ f='071212_090537.jpg'
$ [[ $f =~ ^.(.*)_(.*)(..)\.[^.]+$ ]]
$ echo "1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
1712120905.37
This is probably what you're trying to do:
for old in *.jpg; do
[[ $old =~ ^.(.*)_(.*)(..)\.[^.]+$ ]] || { printf 'Warning, unexpected old file name format "%s"\n' "$old" >&2; continue; }
new="1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
[[ -f "$new" ]] && { printf 'Warning, new file name "%s" generated from "%s" already exists, skipping.\n' "$new" "$old" >&2; continue; }
mv -- "$old" "$new"
done
You need that test for new already existing since an old of 071212_090537.jpg or 171212_090537.jpg (or various other values) would create the same new of 1712120905.37
I think sed really is the easiest solution:
You could do this:
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
For more info:
You probably need to read an introductory tutorial on regular expressions.
Note that the -E option to sed allows use of extended regular expressions, allowing a more readable and convenient expression here.
Use of <<< is a Bashism known as a "here-string". If you are using a shell that doesn't support that, A <<< $b can be rewritten as echo $b | A.
Testing:
▶ touch 071212_090538.jpg 071212_090539.jpg
▶ ls -1 *.jpg
071212_090538.jpg
071212_090539.jpg
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
▶ ls -1
0712120905.38.jpg
0712120905.39.jpg

Sed / Awk : replace the first occurrence of a pattern with the content of another file

With sed, I'm trying to replace the first occurrence of a comment in a script, like :
#ENTRYPOINT_CONTENT
by the content of a second file ($file_content) into another third file (/base.sh).
So, according to many docs, the string should be quite simple, something like :
sed "s|\#ENTRYPOINT_CONTENT|$file_content|" /base.sh
But I always end up with errors like :
sed: -e expression #1, char 23: unterminated s' command
or similar messages, also tried different delimiters, even with Awk instead, but without success, it seems to be fine after escaping the # in the search pattern, but I still can't get the file content as a variable in Sed.
Any ideas, either with Sed or Awk ?
Edit : --------------------
#Sundeep #James Brown:
Don't want to mix the subjects nor to be long :) the response to you clarification request is in bold at the end of this edit, but just to elaborate the context, my case is a Docker entrypoint script in bash (for a base Docker image) called /root/test/base :
#!/usr/bin/env bash
#ENTRYPOINT_CONTENT
if [[ -e "/root/test/custom" ]]; then
printf "\n\n#ENTRYPOINT_CONTENT\n" >> /root/test/custom
# Code from whjm's reply below (actually works but appends shebangs from custom files)
sed -e '0,/^#ENTRYPOINT_CONTENT/!b; /^#ENTRYPOINT_CONTENT/{ r /root/test/custom' -e 'd; }' /root/test/base.sh >> /root/test/base2.sh
mv /root/test/base2.sh /root/test/base.sh
rm -f /root/test/custom
fi
I just want to let users drop another bash script of their own on a specific path (say /root/test/custom), for example :
#!/usr/bin/env bash
echo 'My 2nd bash code'
The first script above (base) should insert the content of the custom file at #ENTRYPOINT_CONTENT position (in the base script itself, without removing this search string), like this :
#!/usr/bin/env bash
echo 'My 2nd bash code'
#ENTRYPOINT_CONTENT
if [[ -e "/root/test/custom" ]]; then
printf "\n\n#ENTRYPOINT_CONTENT\n" >> /root/test/custom
# Code from whjm's reply below
sed -e '0,/^#ENTRYPOINT_CONTENT/!b; /^#ENTRYPOINT_CONTENT/{ r /root/test/custom' -e 'd; }' /root/test/base.sh >> /root/test/base2.sh
mv /root/test/base2.sh /root/test/base.sh
rm -f /root/test/custom
fi
If another user later drops another custom script at the same path, we should have the code of this third custom script appended like this :
#!/usr/bin/env bash
echo 'My 2nd bash code'
echo 'My 3rd bash code'
#ENTRYPOINT_CONTENT
if [[ -e "/root/test/custom" ]]; then
# ... and so on
Regarding the shebangs from custom files, it's not a real issue if they are appended to the base file, the sed code from #whjm works as expected but appends them, while (surprisingly) both awk codes from #James Brown already (and gracefully :) ignore all additional shebangs from custom files (probably because they also start with # as the #ENTRYPOINT_CONTENT search string) but currently partly preprend the code like :
#!/usr/bin/env bash
echo 'My 3rd bash code'
echo 'My 2nd bash code'
#ENTRYPOINT_CONTENT
while I'm trying to get it appended like :
#!/usr/bin/env bash
echo 'My 2nd bash code'
echo 'My 3rd bash code'
#ENTRYPOINT_CONTENT
So, in short, #Sundeep, if you could just give me an updated version of your awk code for this, it would be perfect ! :D (couldn't find a way to invert this...) Thanks a lot.
Code from your previous post :
NR==FNR { b=b (FNR==1?"":ORS) $0; next }
{ r=r (FNR==1?"":ORS) $0 } #ENTRYPOINT_CONTENT
END { sub(/\#[^\n]+/,r,b); print b}
If you're using GNU sed:
[STEP 101] # cat file1
11
xx // replace me
44
xx // don't replace me
55
[STEP 102] # cat file2
22
33
[STEP 103] # sed -e '0,/^xx/!b; /^xx/{ r file2' -e 'd }' file1
11
22
33
44
xx // don't replace me
55
[STEP 104] #

Trying to modify awk code

awk 'BEGIN{OFS=","} FNR == 1
{if (NR > 1) {print fn,fnr,nl}
fn=FILENAME; fnr = 1; nl = 0}
{fnr = FNR}
/ERROR/ && FILENAME ~ /\.gz$/ {nl++}
{
cmd="gunzip -cd " FILENAME
cmd; close(cmd)
}
END {print fn,fnr,nl}
' /tmp/appscraps/* > /tmp/test.txt
the above scans all files in a given directory. prints the file name, number of lines in each file and number of lines found containing 'ERROR'.
im now trying to make it so that the script executes a command if any of the file it reads in isn't a regular file. i.e., if the file is a gzip file, then run a particular command.
above is my attempt to include the gunzip command in there and to do it on my own. unfortunately, it isn't working. also, i cannot "gunzip" all the files in the directory beforehand. this is because not all files in the directory will be "gzip" type. some will be regular files.
so i need the script to treat any .gz file it finds a different way so it can read it, count and print the number of lines that's in it, and the number of lines it found matching the pattern supplied (just as it would if the file had been a regular file).
any help?
This part of your script makes no sense:
{if (NR > 1) {print fn,fnr,nl}
fn=FILENAME; fnr = 1; nl = 0}
{fnr = FNR}
/ERROR/ && FILENAME ~ /\.gz$/ {nl++}
Let me restructure it a bit and comment it so it's clearer what it does:
{ # for every line of every input file, do the following:
# If this is the 2nd or subsequent line, print the values of these variables:
if (NR > 1) {
print fn,fnr,nl
}
fn = FILENAME # set fn to FILENAME. Since this will occur for the first line of
# every file, this is that value fn will have when printed above,
# so why not just get rid of fn and print FILENAME?
fnr = 1 # set fnr to 1. This is immediately over-written below by
# setting it to FNR so this is pointless.
nl = 0
}
{ # for every line of every input file, also do the following
# (note the unnecessary "}" then "{" above):
fnr = FNR # set fnr to FNR. Since this will occur for the first line of
# every file, this is that value fnr will have when printed above,
# so why not just get rid of fnr and print FNR-1?
}
/ERROR/ && FILENAME ~ /\.gz$/ {
nl++ # increment the value of nl. Since nl is always set to zero above,
# this will only ever set it to 1, so why not just set it to 1?
# I suspect the real intent is to NOT set it to zero above.
}
You also have the code above testing for a file name that ends in ".gz" but then you're running gunzip on every file in the very next block.
Beyond that, just call gunzip from shell as everyone else also suggested. awk is a tool for parsing text, it's not an environment from which to call other tools - that's what a shell is for.
For example, assuming your comment (prints the file name, number of lines in each file and number of lines found containing 'ERROR) accurately describes what you want your awk script to do and assuming it makes sense to test for the word "ERROR" directly in a ".gz" file using awk:
for file in /tmp/appscraps/*.gz
do
awk -v OFS=',' '/ERROR/{nl++} END{print FILENAME, NR+0, nl+0}' "$file"
gunzip -cd "$file"
done > /tmp/test.txt
Much clearer and simpler, isn't it?
If it doesn't make sense to test for the word ERROR directly in a ".gz" file, then you can do this instead:
for file in /tmp/appscraps/*.gz
do
zcat "$file" | awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'
gunzip -cd "$file"
done > /tmp/test.txt
To handle gz and non-gz files as you've now described in your comment below:
for file in /tmp/appscraps/*
do
case $file in
*.gz ) cmd="zcat" ;;
* ) cmd="cat" ;;
esac
"$cmd" "$file" |
awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'
done > /tmp/test.txt
I left out the gunzip since you don't need it as far as I can tell from your stated requirements. If I'm wrong, explain what you need it for.
I think it could be simpler than that.
With shell expansion you already have the file name (hence you can print it).
So you can do a loop over all the files, and for each do the following:
print the file name
zgrep -c ERROR $file (this outputs the number of lines containing 'ERROR')
zcat $file|wc -l (this will output the line numbers)
zgrep and zcat work on both plain text files and gzipped ones.
Assuming you don't have any spaces in the paths/filenames:
for f in /tmp/appscraps/*
do
n_lines=$(zcat "$f"|wc -l)
n_errors=$(zgrep -c ERROR "$f")
echo "$f $n_lines $n_errors"
done
This is untested but it should work.
You can use execute the following command for each file :
gunzip -t FILENAME; echo $?
It will pass print the exit code 0(for gzip files) or 1(corrupt/other file). Now you can compare the output using IF to execute the required processing.