Issue with field separator in AWK script - awk

Having a very large file where two lines shown below and having two fields name and revision having colon delimiter. I need to print only the second column.
sam:7.[0:6]
Ram:8.[6:6]_rev[2:4] h_ack[2:6]
vincent:58
I tried this code:
#!/bin/bash
awk -F: '{print $2}'
7.[0
8.[6
58
Output should be:
7.[0:6]
8.[6:6]_rev[2:4] h_ack[2:6]
58
What went wrong in my code.

The problem in your awk expression is that you are splitting on all :.
Instead, you want to split only on the first : from the start.
$ awk -F'^[^:]+:' '{print $2}' file
The regex pattern matches the start of the string ^, any character other than a :, and finally a :.

If you specify field separator as :, it's normal behavior of awk to output this, ex:
7.[0, because you need the other columns after $2.
cut here, better suits the requirement:
cut -d: -f2- file

Could you please try following.
awk '
match($0,/:.*/){
print substr($0,RSTART+1,RLENGTH-1)
}
' Input_file

Related

right pad regex with spaces using sed or awk

I have a file with two fields separated with :, both fields are varying length, second field can have all sort of characters(user input). I want the first field to be right padded with spaces to fixed length of 15 characters, for first field I have a working regex #.[A-Z0-9]{4,12}.
sample:
#ABC123:"wild things here"
#7X3Z:"":":#":";:*:-user input:""
#99999X999:"also, imagine: unicode, yay!"
desired output:
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
There is plenty of examples how to zero pad a number, but surprisingly not a lot about general padding a regex or a field, any help using (preferably) sed or awk?
Here is another awk solution that would work with any version of awk:
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
With perl:
$ perl -pe 's/^[^:]+/sprintf("%-15s",$&)/e' ip.txt
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
The e flag allows you to use Perl code in replacement section. $& will have the matched portion which gets formatted by sprintf.
With awk:
# should work with any awk
awk 'match($0, /^[^:]+/){printf "%-15s%s\n", substr($0,1,RLENGTH), substr($0,RLENGTH+1)}'
# can be simplified with GNU awk
awk 'match($0, /^[^:]+/, m){printf "%-15s%s\n", m[0], substr($0,RLENGTH+1)}'
# or
awk 'match($0, /^([^:]+)(.+)/, m){printf "%-15s%s\n", m[1], m[2]}'
substr($0,1,RLENGTH) or m[0] will give contents of first field. I have used 1 instead of the usual RSTART here since we are matching start of line
substr($0,RLENGTH+1) will give rest of the line contents (i.e. from the first :)
See awk manual: String-Manipulation for details about match function.
Adding one more way of adding spaces to 1st columns here, though anubhava's answer with sprintf is better answer, adding is as an option here. Here I have created a variable named spaces, where one could define number of spaces which we need to add to it.
awk -v spaces="15" 'BEGIN{FS=OFS=":"} {sub(/:/,sprintf("%"spaces-length($1)"s",":"))} 1' Input_file
Explanation: Adding detailed explanation for above.
awk -v spaces="15" ' ##Starting awk program from here, setting spaces to 15 here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS=":" ##Setting FS and OFS as colon here.
}
{
sub(/:/,sprintf("%"spaces-length($1)"s",":")) ##Substituting colon first occurrence with spaces(left padding of spaces) along with colon here.
}
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name here.
i believe anbhava's solution of
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
can be even further simplified as :
awk -F: 'BEGIN{FS=OFS} $1=sprintf("%-15s",$1)'
the { } and final 1 are optional

Change output field seperator on first row

I have a file like this
name|age
Bob|30
Tom|50
Cindy|10
I want the first row to have a different seperator, "^".
awk 'NR==1 { gsub("|","^")1}1' f
But I keep getting
^n^a^m^e^|^a^g^e^
Bob|30
Tom|50
Cindy|10
Desired output is
name^age
Bob|30
Tom|50
Cindy|10
Your code with gsub("|","^") doesn't have special meta character | (used for alternation in regex) escaped hence it will match every position in input.
You may use this awk without involving any regex:
awk 'BEGIN{FS=OFS="|"} FNR==1{OFS="^"; $1=$1; OFS=FS} 1' f
name^age
Bob|30
Tom|50
Cindy|10
Details:
FS="|": Sets FS as |
OFS="^": Sets OFS as ^
$1=$1: Forces awk to reformat each of the fields using OFS
You can also use sed:
sed '1 s/|/^/' ip.txt
1 address for the command, which is first line here
| is not special, because by default sed uses BRE, see this Q&A for BRE vs ERE differences
use s/|/^/g if you can have multiple matches
Like this :
awk -F'|' 'NR==1{print $1,$2;next}1' OFS='^' file
or a mix between anubhava response and mine:
awk -F'|' 'NR==1{$1=$1}1' OFS='^' file
Could you please try following.
awk 'FNR==1{sub(/\|/,"^")} 1' Input_file
Use gsub in place of sub in case of multiple occurrences needs to be changed.
awk 'FNR==1{gsub(/\|/,"^")} 1' Input_file

Print word after string

How to print word after centrain string?
I have a file
Hello word.
I want to get an output:
#word
Something like
awk '{ print '#' }'
Thank you
You may use
awk '{sub(/[[:punct:]]+$/, "", $2);print "#"$2}' file > newfile
See the online demo.
The sub(/[[:punct:]]+$/, "", $2) operation removes 1 or more punctuation chars ([[:punct:]]+) from the end ($) of Field 2 and print "#"$2 prepends it with # and prints it.
To make sure to get the word after Hello you may use
cat file | grep -Po 'Hello\s+\K\w+' | awk '{print "#"$0}'
See the online demo
Or, with the help of sed:
sed 's/.*Hello \([[:alnum:]]*\).*/#\1/' file > newfile
See another demo.
Here, .*Hello \([[:alnum:]]*\).* matches any 0+ chars, then Hello, a space, then captures 0 or more alphanumeric chars into Group 1 (\1) and then matches the rest of the line. The #\1 pattern in the RHS leaves just what was captured with a # in front.
Another solution with awk with . and / as field delimiters using -F'[/.]':
awk -F'[/.]' '{for(i=1;i<=NF;i++){if($i=="54517334"){print "#"$(i+1)}}}' file
See an online demo
Here, for(i=1;i<=NF;i++) enumerates all the fields, and if the field is equal to 54517334, the next field with a # prepended at the start is printed.
EDIT: In case you want to search for a string and print string next to it then try following. I am looking for hello string here you could change it as per your need.
awk '{for(i=1;i<=NF;i++){if(tolower($i)=="hello"){print "#"$(i+1)}}}' Input_file
OR(create an awk variable and perform checks, it will help in changing string in place of hello in variable itself)
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){print "#"$(i+1)}}}' Input_file
In case you want to print next keyword by finding a string and remove punctuations then try following.
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){sub(/[[:punct:]]+/,"",$(i+1));print "#"$(i+1)}}}' Input_file
Could you please try following.
awk -F'[ .]' '{print "#"$2}' Input_file
OR
awk -F'[ .]' '{print $(NF-1)}' Input_file

Grep part of string after symbol and shuffle columns

I would like to take the number after the - sign and put is as column 2 in my matrix. I know how to grep the string but not how to print it after the text string.
in:
1-967764 GGCTGGTCCGATGGTAGTGGGTTATCAGAACT
3-425354 GCATTGGTGGTTCAGTGGTAGAATTCTCGCC
4-376323 GGCTGGTCCGATGGTAGTGGGTTATCAGAAC
5-221398 GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT
6-180339 TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT
out:
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
awk -F'[[:space:]-]+' '{print $3,$2}' file
Seems like a simple substitution should do the job:
sed -E 's/[0-9]+-([0-9]+)[[:space:]]*(.*)/\2 \1/' file
Capture the parts you're interested in and use them in the replacement.
Alternatively, using awk:
awk 'sub(/^[0-9]+-/, "") { print $2, $1 }' file
Remove the leading digits and - from the start of the line. When this is successful, sub returns true, so the action is performed, printing the second field, followed by the first.
Using regex ( +|-) as field separator:
$ awk -F"( +|-)" '{print $3,$2}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
here is another awk
$ awk 'split($1,a,"-") {print $2,a[2]}' file
awk '{sub(/.-/,"");print $2,$1}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

AWK get specificic pattern

I have lines like this:
Volume.Free_IBM_LUN59_28D: 2072083693568
I would like to get only IBM_LUN59_28D from this line using awk.
Thanks
You can use sub to do substitutions on each input line, as per the following transcript:
pax> echo 'Volume.Free_IBM_LUN59_28D: 2072083693568' | awk '
...> {
...> sub (".*Free_", "");
...> sub (":.*", "");
...> print
...> }'
IBM_LUN59_28D
That command crosses multiple lines for readability but, if you're operating on a file and not too concerned about readability, you can just use the compressed version:
awk '{sub(".*Free_","");sub(":.*","");print}' inputFile
If you're amenable to non-awk solutions, you could also use sed:
sed -e 's/.*Free_//' -e 's/:.*//' inputFile
Note that both those solutions rely on your (somewhat sparse) test data. If your definition of "like" includes preceding textual segments other than Free_ or subsequent characters other than :, some more work may be needed.
For example, if you wanted the string between the first _ and the first :, you could use:
awk '{sub("[^_]*_","");sub(":.*","");print}'
With sed:
sed 's/[^_]*_\(.*\):.*/\1/'
Search for sequence of non _ characters followed by _ (this will match Volume.Free_), then another sequence of characters (this will match IBM_LUN59_28D, we group this for future use), followed by : and any char sequence. Substitute with the saved pattern (\1). That's it.
Sample:
$ echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | sed 's/[^_]*_\(.*\):.*/\1/'
IBM_LUN59_28D
Here is one awk
awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
Eks:
echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
IBM_LUN59_28D
It divides the line by Free_.
If line then have more than one field NF>1 then:
Split second field bye : and print first part a[1]
With awk:
echo "$val" | awk -F: '{print $1}' | awk -F. '{print $2}' | awk '{print substr($0,6)}'
where the given string is in $val.