awk: escape $ in pattern matching

awk: escape $ in pattern matching - awk

My Scenario:
I am writing a bash script in which, I am trying to match pattern using following command.
awk '/'$messageID'/' /file/path/fileName
where $messageID is a bash variable.
I am facing following problem:
As long as $messageID contains string without any special characters it works fine but if variable contains any special character like $ command do not gives proper output.
Expected result:
Even if variable $messageID contains special character, output should be proper.
Any help would be appreciated.

Why don't you pass the variable to awk?
awk -v m="$match" '$0 ~ m' file
This way, you do not have to worry about any character.
Test
$ match="te$t"
$ cat a
hello this is
a te$t line
with other te$t info
$ awk -v m="$match" '$0 ~ m' a
a te$t line
with other te$t info

Related

How do I write a sed or a awk command that finds a pattern and deletes it in a text file

I have the following lines in a text file:
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:FOO.${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:BAR.${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:BAZ.${TAG_NAME}
I want to write a sed or awk command that finds all occurrences of FOO., BAR. and BAZ. in the above lines in a file and deletes these occurrences so that the result looks like this in the end:
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}

Sed one:
sed 's/\b\(FOO\|BAR\|BAZ\)\.//' input_file

When you want to replace to first occurance of :something. in every line, test
sed 's/:[^.]*[.]/:/' inputfile
When this works and you want it replaced in the file without making a backup, you can use the option -i:
sed -i 's/:[^.]*[.]/:/' inputfile
When ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME} might have a :, the pattern is matched at the wrong place. When you are sure that the ${TAG_NAME} is without :, use
sed -r 's/:[^.]*[.]([^:]*)$/:\1/' inputfile
Edit: After the comment of #potong, I replaced '.' with ':' in the replacement strings. I kept the wrong character.

Use one of these Perl one-liners:
# Remove { FOO, BAR or BAZ } and the following '.' :
perl -pe 's/(FOO|BAR|BAZ)[.]//' in_file > out_file
# Remove *anything* between the last ':' (exclusive) and '.' (inclusive) :
perl -pe 's/\A(.+:)[^.]+[.]/$1/' in_file > out_file
The Perl one-liners use these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
The regex uses:
\A : beginning of the line,
(.+:) : capture into variable $1 everything between the first character and the first semicolon,
[^.]+[.] : 1 or more occurrence of any character other than '.', followed by '.'. It is surrounded by brackets to match a literal dot: [.]
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups

In awk the variable FS containing a regex can help.
$ cat file
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:FOO.${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:BAR.${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:BAZ.${TAG_NAME}
In the input, we can see that the 3 words, each one with 3 characters (FOO, BAR, BAZ) and followed by dot. We can make a regex for the match inside the FS separator...
FS='[A-Z]{3}\\.'
In the awk manual we can read that "the value of FS may be a string containing any regular expression. In this case, each match in the record for the regular expression separates fields."
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Field-Splitting.html
So we have
awk -v FS='[A-Z]{3}\\.' '{print $1 $2}' file
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
..without comma: $1 $2
The same result gives the regex with the 3 specific words:
$ awk -v FS='(FOO|BAR|BAZ)\\.' '{print $1 $2}' file
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
docker pull ${DOCKER_REGISTRY}/{REPOSITORY}/{IMAGE_NAME}:${TAG_NAME}
--

awk/grep/sed: find multiple patterns at one line in my files

I read a lot here about awk and variables, but could not find what I want.
I have some files ($FILES) in a directory ($DIR) and I want to search in those files for all lines containing: both the 2 strings (SEARCH1 and SEARCH2). Using sh (/bin/bash): I do NOT want to use the read command, so I prefer awk/grep/sed. The wanted output is the line(s) containing the 2 strings and the corresp. file name(s) of the file(s).
When I use this code, everything is ok:
FILES="news_*.txt"
DIR="/news"
awk '/Corona US/&&/Infected/{print a[FILENAME]?$0:FILENAME RS $0;a[FILENAME]++}' ${DIR}/${FILES}
Now I want to replace the 2 patterns ('Corona US' and "Infected') with variables in the awk command and I tried:
SEARCH1="Corona US"
SEARCH2="Infected"
awk -v str1="$SEARCH1" -v str2="$SEARCH2" '/str1/&&/str2/{print a[FILENAME]?$0:FILENAME RS $0;a[FILENAME]++}' ${DIR}/${FILES}
However that did not give me the right output: it came up empty (didn't find anything).

Since you have not shown sample of output so couldn't test it, based on OP's code trying to fix it.
awk -v str1="$SEARCH1" -v str2="$SEARCH2" 'index($0,str1) && index($0,str2){print (seen[FILENAME]++ ? "" : FILENAME ORS) $0;a[FILENAME]++}' ${DIR}/${FILES}
OR
awk -v str1="$SEARCH1" -v str2="$SEARCH2" '$0 ~ str1 && $0 ~ str2{print (seen[FILENAME]++ ? "" : FILENAME ORS) $0;a[FILENAME]++}' ${DIR}/${FILES}
OP's code issue: We can't search variables inside /var/ in should be used like index or $0 ~ str style.

It isn't 100% clear exactly what you are looking for, but it sounds like grep -H with an alternate pattern would allow you to output the filename and the line that matches $SEARCH1 or $SEARCH2 anywhere in the line. For example, you could do:
grep -H "$SEARCH1.*$SEARCH2\|$SEARCH2.*$SEARCH1" "$DIR/"$FILES
(note $FILES must NOT be quoted in order for * expansion to take place.)
If you just want a list of filenames that contain a match on any line, you can change -H to -l.

awk -Search pattern through Variable

We have wrote shell script for multiple file name search pattern.
file format:
<number>_<20180809>.txt
starting with single number and ending with 8 digits number
Command:
awk -v string="12_1234" -v serch="^[0-9]+_+[0-9][0-9][0-9][0-9]$" "BEGIN{ if (string ~/serch$/) print string }"
If sting matches then return value.

You can just change your command in the following way and it will work:
awk -v string='12_1234' -v search='^[0-9]+_+[0-9][0-9][0-9][0-9]$' 'BEGIN{ if (string ~ search) print string }'
12_1234
You do not need to use /.../ syntax for regex if you use the ~ operator and also you had one extra $. You were really close!!!
Then you must adapt the search regex into ^[0-9]_[0-9]{8}$ to match exactly your_<20180809>` pattern.
Also if you are just extracting this information from the file you can use grep,
$ awk -v string='1_12345678' -v search='^[0-9]_[0-9]{8}$' 'BEGIN{ if (string ~ search) print string }'
1_12345678
$ (search='^[0-9]_[0-9]{8}$'; echo '1_12345678')| grep -oE "$search"
1_12345678

Concatenating lines using awk

I have fasta file that contains two gene sequences and what I want to do is remove the fasta header (line starting with ">"), concatenate the rest of the lines and output that sequence
Here is my fasta sequence (genome.fa):
>Potrs164783
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
>Potrs164784
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
Desired output
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
I am using awk to do this but I am getting this error
awk 'BEGIN{filename="file1"}{if($1 ~ />/){filename=$1; sub(/>/,"",filename); print filename;} print $0 >filename.fa;}' ../genome.fa
awk: syntax error at source line 1
context is
BEGIN{filename="file1"}{if($1 ~ />/){filename=$1; sub(/>/,"",filename); print filename;} print $0 >>> >filename. <<< fa;}
awk: illegal statement at source line 1
I am basically a python person and I was given this script by someone. What am I doing wrong here?
I realized that i was not clear and so i am pasting the whole code that i got from someone. The input file and desired output remains the same
mkdir split_genome;
cd split_genome;
awk 'BEGIN{filename="file1"}{if($1 ~ />/){filename=$1; sub(/>/,"",filename); print filename;} print $0 >filename.fa;}' ../genome.fa;
ls -1 `pwd`/* > ../scaffold_list.txt;
cd ..;

If all you want to do is produce the desired output shown in your question, other solutions will work.
However, the script you have is trying to print each sequence to a file that is named using its header, and the extension .fa.
The syntax error you're getting is because filename.fa is neither a variable or a fixed string. While no Awk will allow you to print to filename.fa because it is neither in quotes or a variable (varaible names can't have a . in them), BSD Awk does not allow manipulating strings when they currently act as a file name where GNU Awk does.
So the solution:
print $0 > filename".fa"
would produce the same error in BSD Awk, but would work in GNU Awk.
To fix this, you can append the extension ".fa" to filename at assignment.
This will do the job:
$ awk '{if($0 ~ /^>/) filename=substr($0, 2)".fa"; else print $0 > filename}' file
$ cat Potrs164783.fa
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
$ cat Potrs164784.fa
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
You'll notice I left out the BEGIN{filename="file1"} declaration statement as it is unnecessary. Also, I replaced the need for sub(...) by using the string function substr as it is more clear and requires fewer actions.

The awk code that you show attempts to do something different than produce the output that you want. Fortunately, there are much simpler ways to obtain your desired output. For example:
$ grep -v '>' ../genome.fa
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
Alternatively, if you had intended to have all non-header lines concatenated into one line:
$ sed -n '/^>/!H; $!d; x; s/\n//gp' ../genome.fa
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGATTGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAACTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAATTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCCGGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC

Try this to print lines not started by > and in one line:
awk '!/^>/{printf $0}' genome.fa > filename.fa
With carriage return:
awk '!/^>/' genome.fa > filename.fa
To create single files named by the headers:
awk 'split($0,a,"^>")>1{file=a[2];next}{print >file}' genome.fa

awk use a command line variable

awk -F, -f awkfile.awk -v mysearch="search term"
I am trying to use the above command from terminal and use search as the search term in the awk program. My awk program runs perfectly fine while actually assigning the search term inside of the program but I am wondering how to get the variable search to be used?
example of the line it's used at if($j ~ /mysearch/){, this does not work at setting the search term, but actually searching for the string mysearch.

Just remove the slashes:
$j ~ mysearch

This is not ideal, but I suggest to write a bash script, which takes in the search term, replace that search term in the awk script, then run the script. For example:
$ cat dosearch.sh
sed "s/XXX/$1/" awktemplate.awk > awkfile.awk
awk -f awkfile.awk data.txt
$ cat awktemplate.awk
{
j = 1
if ($j ~ /XXX/) {
# Do something, such as
print "Found:", $0
}
}
$ cat data.txt
foo here
bar there
xyz everywhere
$ ./dosearch.sh foo
Found: foo here
$ ./dosearch.sh bar
Found: bar there
In the above example, the awk template contains "XXX" as a search term, the bash script replaces that search term with the first parameter, then invoke awk on the modified script.

$ cat input
tinky-winky
dipsy
laa-laa
noo-noo
po
$ teletubby='po'
$ awk -v "regexp=$teletubby" '$0 ~ regexp' input
po
Note that anything could go into the shell-variable,
even a full-blown regexp, e.g ^d.*y. Just make sure to use single-quotes
to prevent the shell from doing any expansion.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas