Print word after string - awk

How to print word after centrain string?
I have a file
Hello word.
I want to get an output:
#word
Something like
awk '{ print '#' }'
Thank you

You may use
awk '{sub(/[[:punct:]]+$/, "", $2);print "#"$2}' file > newfile
See the online demo.
The sub(/[[:punct:]]+$/, "", $2) operation removes 1 or more punctuation chars ([[:punct:]]+) from the end ($) of Field 2 and print "#"$2 prepends it with # and prints it.
To make sure to get the word after Hello you may use
cat file | grep -Po 'Hello\s+\K\w+' | awk '{print "#"$0}'
See the online demo
Or, with the help of sed:
sed 's/.*Hello \([[:alnum:]]*\).*/#\1/' file > newfile
See another demo.
Here, .*Hello \([[:alnum:]]*\).* matches any 0+ chars, then Hello, a space, then captures 0 or more alphanumeric chars into Group 1 (\1) and then matches the rest of the line. The #\1 pattern in the RHS leaves just what was captured with a # in front.
Another solution with awk with . and / as field delimiters using -F'[/.]':
awk -F'[/.]' '{for(i=1;i<=NF;i++){if($i=="54517334"){print "#"$(i+1)}}}' file
See an online demo
Here, for(i=1;i<=NF;i++) enumerates all the fields, and if the field is equal to 54517334, the next field with a # prepended at the start is printed.

EDIT: In case you want to search for a string and print string next to it then try following. I am looking for hello string here you could change it as per your need.
awk '{for(i=1;i<=NF;i++){if(tolower($i)=="hello"){print "#"$(i+1)}}}' Input_file
OR(create an awk variable and perform checks, it will help in changing string in place of hello in variable itself)
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){print "#"$(i+1)}}}' Input_file
In case you want to print next keyword by finding a string and remove punctuations then try following.
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){sub(/[[:punct:]]+/,"",$(i+1));print "#"$(i+1)}}}' Input_file
Could you please try following.
awk -F'[ .]' '{print "#"$2}' Input_file
OR
awk -F'[ .]' '{print $(NF-1)}' Input_file

Related

How to extract string from a file in bash

I have a file called DB_create.sql which has this line
CREATE DATABASE testrepo;
I want to extract only testrepo from this. So I've tried
cat DB_create.sql | awk '{print $3}'
This gives me testrepo;
I need only testrepo. How do I get this ?
With your shown samples, please try following.
awk -F'[ ;]' '{print $(NF-1)}' DB_create.sql
OR
awk -F'[ ;]' '{print $3}' DB_create.sql
OR without setting any field separators try:
awk '{sub(/;$/,"");print $3}' DB_create.sql
Simple explanation would be: making field separator as space OR semi colon and then printing 2nd last field($NF-1) which is required by OP here. Also you need not to use cat command with awk because awk can read Input_file by itself.
Using gnu awk, you can set record separator as ; + line break:
awk -v RS=';\r?\n' '{print $3}' file.sql
testrepo
Or using any POSIX awk, just do a call to sub to strip trailing ;:
awk '{sub(/;$/, "", $3); print $3}' file.sql
testrepo
You can use
awk -F'[;[:space:]]+' '{print $3}' DB_create.sql
where the field separator is set to a [;[:space:]]+ regex that matches one or more occurrences of ; or/and whitespace chars. Then, Field 3 will contain the string you need without the semi-colon.
More pattern details:
[ - start of a bracket expression
; - a ; char
[:space:] - any whitespace char
] - end of the bracket expression
+ - a POSIX ERE one or more occurrences quantifier.
See the online demo.
Use your own code but adding the function sub():
cat DB_create.sql | awk '{sub(/;$/, "",$3);print $3}'
Although it's better not using cat. Here you can see why: Comparison of cat pipe awk operation to awk command on a file
So better this way:
awk '{sub(/;$/, "",$3);print $3}' file

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

Issue with field separator in AWK script

Having a very large file where two lines shown below and having two fields name and revision having colon delimiter. I need to print only the second column.
sam:7.[0:6]
Ram:8.[6:6]_rev[2:4] h_ack[2:6]
vincent:58
I tried this code:
#!/bin/bash
awk -F: '{print $2}'
7.[0
8.[6
58
Output should be:
7.[0:6]
8.[6:6]_rev[2:4] h_ack[2:6]
58
What went wrong in my code.
The problem in your awk expression is that you are splitting on all :.
Instead, you want to split only on the first : from the start.
$ awk -F'^[^:]+:' '{print $2}' file
The regex pattern matches the start of the string ^, any character other than a :, and finally a :.
If you specify field separator as :, it's normal behavior of awk to output this, ex:
7.[0, because you need the other columns after $2.
cut here, better suits the requirement:
cut -d: -f2- file
Could you please try following.
awk '
match($0,/:.*/){
print substr($0,RSTART+1,RLENGTH-1)
}
' Input_file

Change output field seperator on first row

I have a file like this
name|age
Bob|30
Tom|50
Cindy|10
I want the first row to have a different seperator, "^".
awk 'NR==1 { gsub("|","^")1}1' f
But I keep getting
^n^a^m^e^|^a^g^e^
Bob|30
Tom|50
Cindy|10
Desired output is
name^age
Bob|30
Tom|50
Cindy|10
Your code with gsub("|","^") doesn't have special meta character | (used for alternation in regex) escaped hence it will match every position in input.
You may use this awk without involving any regex:
awk 'BEGIN{FS=OFS="|"} FNR==1{OFS="^"; $1=$1; OFS=FS} 1' f
name^age
Bob|30
Tom|50
Cindy|10
Details:
FS="|": Sets FS as |
OFS="^": Sets OFS as ^
$1=$1: Forces awk to reformat each of the fields using OFS
You can also use sed:
sed '1 s/|/^/' ip.txt
1 address for the command, which is first line here
| is not special, because by default sed uses BRE, see this Q&A for BRE vs ERE differences
use s/|/^/g if you can have multiple matches
Like this :
awk -F'|' 'NR==1{print $1,$2;next}1' OFS='^' file
or a mix between anubhava response and mine:
awk -F'|' 'NR==1{$1=$1}1' OFS='^' file
Could you please try following.
awk 'FNR==1{sub(/\|/,"^")} 1' Input_file
Use gsub in place of sub in case of multiple occurrences needs to be changed.
awk 'FNR==1{gsub(/\|/,"^")} 1' Input_file

How to print the length size of the following line

I would like to modify a file by including the size of following line using awk.
My file is like this:
>AAAS:1220136:1220159:-:0::NW_015494524.1:1220136-1220159(-)
ATGTCGATGCTCGATC
>AAAS::1215902:1215986:-:1::NW_015494524.1:1215902-1215986(-)
ATGCGATGCTAGCTAGCTCGAT
>AAAS:1215614:1215701:-:1::NW_015494524.1:1215614-1215701(-)
ATGCCGCGACGCAGCACCCGACGCGCAG
I am using awk to modify it to have the following format:
>Assembly_AAAS_1_16
ATGTCGATGCTCGATC
>Assembly_AAAS_2_22
ATGCGATGCTAGCTAGCTCGAT
>Assembly_AAAS_3_28
ATGCCGCGACGCAGCACCCGACGCGCAG
I have used awk to modify the first part.
awk -F":" -v i=1 '/>/{print ">Assembly_" $1 "_" val i "_";i++;next} {print length($0)} 1' infile | sed -e "s/_>/_/g" > outfile
I can use print length($0) but how to print it in the same line?
Thanks
EDIT2: Since OP has changed the sample data again so adding this code now.
awk -v val="Assembly_AAAS_" '/>/{++i;val=">"val i "_";next} {sub(/ +$/,"");print val length($0) ORS $0}' Input_file
OR
awk -v val="Assembly_AAAS_" '/>/{++i;val=">"val i "_";next} {print val length($1) ORS $0;}' Input_file
Above will remove spaces from last of the lines of Input_file, in case you don't need it then remove sub(/ +$/,""); part from above code please.
EDIT: As per OP changed solution now.
awk -v i=1 -v val=">Assembly_GeneName1_" -v val1="_sizeline" '/>/{value="\047" val i val1;i++;next} {print value length($0) ORS $0}' Input_file
OR
awk -v i=1 -v val=">Assembly_GeneName1_" -v val1="_sizeline" '
/>/{ value="\047" val i val1;
i++;
next}
{
print value length($0) ORS $0
}
' Input_file
Following awk may help you on same.
awk -v i="" -v j=2 '/>/{print "\047>Assembly_GeneName1_"++i"_sizeline"j;j+=2;next} 1' Input_file
Solution 2nd:
awk -v i=1 -v j=2 -v val=">Assembly_GeneName1_" -v val1="_sizeline" '/>/{print "\047" val i val1 j;j+=2;i++;next} 1' Input_file
What you are dealing with is a beautiful example of records which are not lines. awk is a record parser and by default, a record is defined to be a line. With awk you can define a record to be a block of text using the record separator RS.
RS : The first character of the string value of RS shall be the input record separator; a <newline> by default. If RS contains more
than one character, the results are unspecified. If RS is null, then
records are separated by sequences consisting of a <newline> plus one
or more blank lines, leading or trailing blank lines shall not result
in empty records at the beginning or end of the input, and a <newline>
shall always be a field separator, no matter what the value of FS is.
So the goal is to define the record to be
AAAS:1220136:1220159:-:0::NW_015494524.1:1220136-1220159(-)
ATGTCGATGCTCGATC
And this can be done by defining the RS="\n<". Furthremore we will use \n as a field separator FS. This way you can get the requested length as length($2) and the count by using the record count NR.
A simple awk script is then:
awk 'BEGIN{RS="\n<"; FS=OFS="\n"}
{$1=">Assembly_AAAS_"NR"_"length($2)}
{print $1,$2}' <file>
This will do exactly what you want.
note: we use print $1,$2 and not print $0 as the last record might have 3 fields (if the last char of the file is a newline). This would imply that you would have an extra empty line at the end of your file.
If you want to pick the AAAS string out of $1 you can use substr($1,1,match($1,":")-1) to pick it up. This results in this:
awk 'BEGIN{RS="\n<"; FS=OFS="\n"}
{$1=">Assembly_"substr($1,1,match($1,":")-1)"_"NR"_"length($2)}
{print $1,$2}' <file>
Finally, be aware that the above solution only works if there are no spaces in $2, if you want to change that, you can do this :
awk 'BEGIN{RS="\n<"; FS=OFS="\n"}
{ gsub(/[[:blank:]]/,"",$2);
$1=">Assembly_"substr($1,1,match($1,":")-1)"_"NR"_"length($2)
}
{ print $1,$2 }' <file>