right pad regex with spaces using sed or awk - awk

I have a file with two fields separated with :, both fields are varying length, second field can have all sort of characters(user input). I want the first field to be right padded with spaces to fixed length of 15 characters, for first field I have a working regex #.[A-Z0-9]{4,12}.
sample:
#ABC123:"wild things here"
#7X3Z:"":":#":";:*:-user input:""
#99999X999:"also, imagine: unicode, yay!"
desired output:
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
There is plenty of examples how to zero pad a number, but surprisingly not a lot about general padding a regex or a field, any help using (preferably) sed or awk?

Here is another awk solution that would work with any version of awk:
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"

With perl:
$ perl -pe 's/^[^:]+/sprintf("%-15s",$&)/e' ip.txt
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
The e flag allows you to use Perl code in replacement section. $& will have the matched portion which gets formatted by sprintf.
With awk:
# should work with any awk
awk 'match($0, /^[^:]+/){printf "%-15s%s\n", substr($0,1,RLENGTH), substr($0,RLENGTH+1)}'
# can be simplified with GNU awk
awk 'match($0, /^[^:]+/, m){printf "%-15s%s\n", m[0], substr($0,RLENGTH+1)}'
# or
awk 'match($0, /^([^:]+)(.+)/, m){printf "%-15s%s\n", m[1], m[2]}'
substr($0,1,RLENGTH) or m[0] will give contents of first field. I have used 1 instead of the usual RSTART here since we are matching start of line
substr($0,RLENGTH+1) will give rest of the line contents (i.e. from the first :)
See awk manual: String-Manipulation for details about match function.

Adding one more way of adding spaces to 1st columns here, though anubhava's answer with sprintf is better answer, adding is as an option here. Here I have created a variable named spaces, where one could define number of spaces which we need to add to it.
awk -v spaces="15" 'BEGIN{FS=OFS=":"} {sub(/:/,sprintf("%"spaces-length($1)"s",":"))} 1' Input_file
Explanation: Adding detailed explanation for above.
awk -v spaces="15" ' ##Starting awk program from here, setting spaces to 15 here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS=":" ##Setting FS and OFS as colon here.
}
{
sub(/:/,sprintf("%"spaces-length($1)"s",":")) ##Substituting colon first occurrence with spaces(left padding of spaces) along with colon here.
}
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name here.

i believe anbhava's solution of
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
can be even further simplified as :
awk -F: 'BEGIN{FS=OFS} $1=sprintf("%-15s",$1)'
the { } and final 1 are optional

Related

How to find and match an exact string in a column using AWK?

I'm having trouble on matching an exact string that I want to find in a file using awk.
I have the file called "sup_groups.txt" that contains:
(the structure is: "group_name:pw:group_id:user1<,user2>...")
adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel
now, I want to extract the records that have in the user list the user "adm1" and print the first column (the group name), but you can see that there is a user called "adm12", so when i do this:
awk -F: '$4 ~ "adm1" {print $1}' sup_groups.txt
the output is:
adm
admins
conda
adm1Group
the command of course also prints those records that contain the string "adm12", but I don't want these lines because I'm interested only on the user "adm1".
So, How can I change this command so that it just prints the lines 1 and 6 (excluding 2 and 5)?
thank you so much and sorry for my bad English
EDIT: thank you for the answers, u gave me inspiration for the solution, i think this might work as well as your solutions but more simplified:
awk -F: '$4 ~ "adm,|adm1$|:adm1," {print $1}' sup_groups.txt
basically I'm using ORs covering all the cases and excluding the "adm12"
let me know if you think this is correct
1st solution: Using split function of awk. With your shown samples, please try following awk code.
awk -F':' '
{
num=split($4,arr,",")
for(i=1;i<=num;i++){
if(arr[i]=="adm1"){
print
}
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -F':' ' ##Starting awk program from here setting field separator as : here.
{
num=split($4,arr,",") ##Using split to split 4th field into array arr with delimiter of ,
for(i=1;i<=num;i++){ ##Running for loop till value of num(total elements of array arr).
if(arr[i]=="adm1"){ ##Checking condition if arr[i] value is equal to adm1 then do following.
print ##printing current line here.
}
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: Using regex and conditions in awk.
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/' Input_file
OR if 4th field doesn't have comma at all then try following:
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/ || $4=="adm1"' Input_file
Explanation: Making field separator as : and checking condition if 4th field is either equal to ^adm1,(starting adm1,) OR its equal to ,adm1, OR its equal to ,adm1$(ending with ,adm1) then print that line.
This should do the trick:
$ awk -F: '"," $4 "," ~ ",adm1," { print $1 }' file
The idea behind this is the encapsulate both the group field between commas such that each group entry is encapsulated by commas. So instead of searching for adm1 you search for ,adm1,
So if your list looks like:
adm2,adm12,manuel
and, by adding commas, you convert it too:
,adm2,adm12,manuel,
you can always search for ,adm1, and find the perfect match .
once u setup FS per task requirements, then main body becomes barely just :
NF = !_ < NF
or even more straight forward :
{m,n,g}awk —- --NF
=
{m,g}awk 'NF=!_<NF' OFS= FS=':[^:]*:[^:]*:[^:]*[^[:alpha:]]?adm[0-9]+.*$'
adm
admins
conda
adm1Group

Issue with field separator in AWK script

Having a very large file where two lines shown below and having two fields name and revision having colon delimiter. I need to print only the second column.
sam:7.[0:6]
Ram:8.[6:6]_rev[2:4] h_ack[2:6]
vincent:58
I tried this code:
#!/bin/bash
awk -F: '{print $2}'
7.[0
8.[6
58
Output should be:
7.[0:6]
8.[6:6]_rev[2:4] h_ack[2:6]
58
What went wrong in my code.
The problem in your awk expression is that you are splitting on all :.
Instead, you want to split only on the first : from the start.
$ awk -F'^[^:]+:' '{print $2}' file
The regex pattern matches the start of the string ^, any character other than a :, and finally a :.
If you specify field separator as :, it's normal behavior of awk to output this, ex:
7.[0, because you need the other columns after $2.
cut here, better suits the requirement:
cut -d: -f2- file
Could you please try following.
awk '
match($0,/:.*/){
print substr($0,RSTART+1,RLENGTH-1)
}
' Input_file

Change output field seperator on first row

I have a file like this
name|age
Bob|30
Tom|50
Cindy|10
I want the first row to have a different seperator, "^".
awk 'NR==1 { gsub("|","^")1}1' f
But I keep getting
^n^a^m^e^|^a^g^e^
Bob|30
Tom|50
Cindy|10
Desired output is
name^age
Bob|30
Tom|50
Cindy|10
Your code with gsub("|","^") doesn't have special meta character | (used for alternation in regex) escaped hence it will match every position in input.
You may use this awk without involving any regex:
awk 'BEGIN{FS=OFS="|"} FNR==1{OFS="^"; $1=$1; OFS=FS} 1' f
name^age
Bob|30
Tom|50
Cindy|10
Details:
FS="|": Sets FS as |
OFS="^": Sets OFS as ^
$1=$1: Forces awk to reformat each of the fields using OFS
You can also use sed:
sed '1 s/|/^/' ip.txt
1 address for the command, which is first line here
| is not special, because by default sed uses BRE, see this Q&A for BRE vs ERE differences
use s/|/^/g if you can have multiple matches
Like this :
awk -F'|' 'NR==1{print $1,$2;next}1' OFS='^' file
or a mix between anubhava response and mine:
awk -F'|' 'NR==1{$1=$1}1' OFS='^' file
Could you please try following.
awk 'FNR==1{sub(/\|/,"^")} 1' Input_file
Use gsub in place of sub in case of multiple occurrences needs to be changed.
awk 'FNR==1{gsub(/\|/,"^")} 1' Input_file

Match regexp at the end of the string with AWK

I am trying to match two different Regexp to long strings with awk, removing the part of the string that matches in a 35 characters window.
The problem is that the same bunch of code works when I am looking for the first (which matches at the beginnng) whereas fails to match with the second one (end of string).
Input:
Regexp1(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)Regexp2
Desired output
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)
So far I used this code that extracts correctly Regexp1, but, unfortunately, is not able to extract also Regexp2 since indexed of RSTART and RLENGTH for Regexp2 are incorrect.
Code for extracting Regexp1 (correct output):
awk -v F="Regexp1" '{if (match(substr($1,1,35),F)) print substr($1,RSTART,RLENGTH)}' file
Code for extracting Regexp2 (wrong output)
awk -v F="Regexp2" '{if (match(substr($1,length($1)-35,35),F)) print substr($1,RSTART,RLENGTH)}' file
Despite the indexes for Regexp1 are correct, for Regexp2 indexes are wrond (RSTART=13). I cannot figure out how to extract the second Regexp.
Considering that your actual Input_file is same as shown samples, if this is the case could you please try following then(good to have new version of awk since old versions may not support number of times logic for regex).
awk '
match($0,/\([0-9]+\){5}.*\([0-9]\){4}/){
print substr($0,RSTART,RLENGTH)
}' Input_file
In case your number of parenthesis values are not fixed then you could do like as follows:
awk '
match($0,/\([0-9]+\){1,}.*\([0-9]\){1,}/){
print substr($0,RSTART,RLENGTH)
}' Input_file
If this isn't all you need:
$ sed 's/Regexp1\(.*\)Regexp2/\1/' file
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)
or using GNU awk for gensub():
$ awk '{print gensub(/Regexp1(.*)Regexp2/,"\\1",1)}' file
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)
then edit your question to be far clearer with your requirements and example.

Print word after string

How to print word after centrain string?
I have a file
Hello word.
I want to get an output:
#word
Something like
awk '{ print '#' }'
Thank you
You may use
awk '{sub(/[[:punct:]]+$/, "", $2);print "#"$2}' file > newfile
See the online demo.
The sub(/[[:punct:]]+$/, "", $2) operation removes 1 or more punctuation chars ([[:punct:]]+) from the end ($) of Field 2 and print "#"$2 prepends it with # and prints it.
To make sure to get the word after Hello you may use
cat file | grep -Po 'Hello\s+\K\w+' | awk '{print "#"$0}'
See the online demo
Or, with the help of sed:
sed 's/.*Hello \([[:alnum:]]*\).*/#\1/' file > newfile
See another demo.
Here, .*Hello \([[:alnum:]]*\).* matches any 0+ chars, then Hello, a space, then captures 0 or more alphanumeric chars into Group 1 (\1) and then matches the rest of the line. The #\1 pattern in the RHS leaves just what was captured with a # in front.
Another solution with awk with . and / as field delimiters using -F'[/.]':
awk -F'[/.]' '{for(i=1;i<=NF;i++){if($i=="54517334"){print "#"$(i+1)}}}' file
See an online demo
Here, for(i=1;i<=NF;i++) enumerates all the fields, and if the field is equal to 54517334, the next field with a # prepended at the start is printed.
EDIT: In case you want to search for a string and print string next to it then try following. I am looking for hello string here you could change it as per your need.
awk '{for(i=1;i<=NF;i++){if(tolower($i)=="hello"){print "#"$(i+1)}}}' Input_file
OR(create an awk variable and perform checks, it will help in changing string in place of hello in variable itself)
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){print "#"$(i+1)}}}' Input_file
In case you want to print next keyword by finding a string and remove punctuations then try following.
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){sub(/[[:punct:]]+/,"",$(i+1));print "#"$(i+1)}}}' Input_file
Could you please try following.
awk -F'[ .]' '{print "#"$2}' Input_file
OR
awk -F'[ .]' '{print $(NF-1)}' Input_file