AIX/KSH Extract string from a comma seperated line - awk

I want to extract the part "virtual_eth_adapters" from the following comma seperated line:
lpar_io_pool_ids=none,max_virtual_slots=300,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1","virtual_scsi_adapters=166/client/1/ibm/166/0,266/client/2/ibm/266/0",virtual_eth_adapters=116/0/263,proc_mode=shared,min_proc_units=0.5,desired_proc_units=2.0,max_proc_units=8.0
Im using AIX with ksh.
I found a workaround with awk and the -F flag to seperate the string with a delimiter and then printing the item ID. But if the input string changes the id may differ...

1st solution: Could you please try following in case you want to print string virtual_eth_adapters too in output.
awk '
match($0,/virtual_eth_adapters[^,]*/){
print substr($0,RSTART,RLENGTH)
}
' Input_file
Output will be as follows.
virtual_eth_adapters=116/0/263
2nd solution: In case you want to print only value for String virtual_eth_adapters then try following.
awk '
match($0,/virtual_eth_adapters[^,]*/){
print substr($0,RSTART+21,RLENGTH-21)
}
' Input_file
Output will be as follows.
116/0/263
Explanation: Adding explanation for code.
awk ' ##Starting awk program here.
match($0,/virtual_eth_adapters[^,]*/){ ##Using match function of awk here, to match from string virtual_eth_adapters till first occurrence of comma(,)
print substr($0,RSTART,RLENGTH) ##Printing sub-string whose starting value is RSTART and till value of RLENGTH, where RSTART and RLENGTH variables will set once a regex found by above line.
}
' Input_file ##Mentioning Input_file name here.

I do use these approach to get data out in middle of lines.
awk -F'virtual_eth_adapters=' 'NF>1{split($2,a,",");print a[1]}' file
116/0/263
Its short and easy to learn. (no counting or regex needed)
-F'virtual_eth_adapters=' split the line by virtual_eth_adapters=
NF>1 if there are more than one field (line contains virtual_eth_adapters=)
split($2,a,",") split last part of line in to array a separated by ,
print a[1] print first part of array a

And one more solution (assuming the position of the string)
awk -F\, '{print $7}'
If you need only the value try this:
awk -F\, '{print $7}'|awk -F\= '{print $2}'
Also is possible to get the value on this way:
awk -F\, '{split($7,a,"=");print a[2]}'

Related

How to find and match an exact string in a column using AWK?

I'm having trouble on matching an exact string that I want to find in a file using awk.
I have the file called "sup_groups.txt" that contains:
(the structure is: "group_name:pw:group_id:user1<,user2>...")
adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel
now, I want to extract the records that have in the user list the user "adm1" and print the first column (the group name), but you can see that there is a user called "adm12", so when i do this:
awk -F: '$4 ~ "adm1" {print $1}' sup_groups.txt
the output is:
adm
admins
conda
adm1Group
the command of course also prints those records that contain the string "adm12", but I don't want these lines because I'm interested only on the user "adm1".
So, How can I change this command so that it just prints the lines 1 and 6 (excluding 2 and 5)?
thank you so much and sorry for my bad English
EDIT: thank you for the answers, u gave me inspiration for the solution, i think this might work as well as your solutions but more simplified:
awk -F: '$4 ~ "adm,|adm1$|:adm1," {print $1}' sup_groups.txt
basically I'm using ORs covering all the cases and excluding the "adm12"
let me know if you think this is correct
1st solution: Using split function of awk. With your shown samples, please try following awk code.
awk -F':' '
{
num=split($4,arr,",")
for(i=1;i<=num;i++){
if(arr[i]=="adm1"){
print
}
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -F':' ' ##Starting awk program from here setting field separator as : here.
{
num=split($4,arr,",") ##Using split to split 4th field into array arr with delimiter of ,
for(i=1;i<=num;i++){ ##Running for loop till value of num(total elements of array arr).
if(arr[i]=="adm1"){ ##Checking condition if arr[i] value is equal to adm1 then do following.
print ##printing current line here.
}
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: Using regex and conditions in awk.
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/' Input_file
OR if 4th field doesn't have comma at all then try following:
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/ || $4=="adm1"' Input_file
Explanation: Making field separator as : and checking condition if 4th field is either equal to ^adm1,(starting adm1,) OR its equal to ,adm1, OR its equal to ,adm1$(ending with ,adm1) then print that line.
This should do the trick:
$ awk -F: '"," $4 "," ~ ",adm1," { print $1 }' file
The idea behind this is the encapsulate both the group field between commas such that each group entry is encapsulated by commas. So instead of searching for adm1 you search for ,adm1,
So if your list looks like:
adm2,adm12,manuel
and, by adding commas, you convert it too:
,adm2,adm12,manuel,
you can always search for ,adm1, and find the perfect match .
once u setup FS per task requirements, then main body becomes barely just :
NF = !_ < NF
or even more straight forward :
{m,n,g}awk —- --NF
=
{m,g}awk 'NF=!_<NF' OFS= FS=':[^:]*:[^:]*:[^:]*[^[:alpha:]]?adm[0-9]+.*$'
adm
admins
conda
adm1Group

How to filter a field with awk follow a pattern

I have a file with that format:
Topic:test_replication PartitionCount:1 ReplicationFactor:3 Configs:retention.ms=604800000,delete.retention.ms=86400000,cleanup.policy=delete,max.message.bytes=1000012,min.insync.replicas=2,retention.bytes=-1
Topic:teste2e_funcional PartitionCount:12 ReplicationFactor:3 Configs:min.cleanable.dirty.ratio=0.00001,delete.retention.ms=86400000,cleanup.policy=delete,min.insync.replicas=2,segment.ms=604800000,retention.bytes=-1
Topic:ticket_dl.replica_cloudera PartitionCount:3 ReplicationFactor:3 Configs:message.downconversion.enable=true,file.delete.delay.ms=60000,segment.ms=604800000,min.compaction.lag.ms=0,retention.bytes=-1,segment.index.bytes=10485760,cleanup.policy=delete,message.timestamp.difference.max.ms=9223372036854775807,segment.jitter.ms=0,preallocate=false,message.timestamp.type=CreateTime,message.format.version=2.2-IV1,segment.bytes=1073741824,max.message.bytes=1000000,unclean.leader.election.enable=false,retention.ms=604800000,flush.ms=9223372036854775807,delete.retention.ms=31536000000,min.insync.replicas=2,flush.messages=9223372036854775807,compression.type=producer,index.interval.bytes=4096,min.cleanable.dirty.ratio=0.5
And I want to have only the value of Topic (e.g. test_replication) and the value of min.insync.replicas (e.g. 2)
I know that it is possible to do with regular expression, but I don't know how to do it. For me the problems is that min.insync.replicas is not in the same possition so if I use the awk option -F with for example , I will got diferent values of min.insync.replicas.
Could you please try following.
awk '
match($0,/Topic:[^ ]*/){
topic=substr($0,RSTART+6,RLENGTH-6)
match($0,/min\.insync\.replicas[^,]*/)
print topic,substr($0,RSTART+20,RLENGTH-20)
topic=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/Topic:[^ ]*/){ ##Using match function to match regex Topic: till space comes here.
topic=substr($0,RSTART+6,RLENGTH-6) ##Creating topic varwhich has sub-string of current line starting from RSTART till RLENGTH.
match($0,/min\.insync\.replicas[^,]*/) ##Using match again to match regex frommin to till comma here.
print topic,substr($0,RSTART+20,RLENGTH-20) ##Printing topic and sub-string from RSTART to till RLENGTH adding and substracting respectively here.
topic="" ##Nullify variable topic here.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: Adding a sed solution here.
sed 's/Topic:\([^ ]*\).*min\.insync\.replicas=\([^,]*\).*/\1 \2/' Input_file
Sorry for the questions before. Was very simple:
awk '
match($0,/Topic:[^ ]*/){
topic=substr($0,RSTART+6,RLENGTH-6)
match($0,/min\.insync\.replicas[^,]*/)
mininsync=substr($0,RSTART+20,RLENGTH-20)
match($0,/retention\.ms[^,]*/)
retention=substr($0,RSTART+13,RLENGTH-13)
print topic",",mininsync,","retention
topic=""
}

Understand the code of Split file to fasta

I understand the matching pattern but how the sequence is read from the matching pattern as the code is matching only pattern ">chr" then how sequence goes to the output file?
awk '/^>chr/ {OUT=substr($0,2) ".fa"}; {print >> OUT; close(OUT)}' Input_File
Could you please go through following explanation once.
awk ' ##Starting awk program here.
/^>chr/{ ##Checking condition if any line starts from string chr then do following.
OUT=substr($0,2) ".fa" ##Creating variable OUT whose value is substring of first 2 letters and concatenating .fa string to it.
} ##Closing block for condition ^>chr here.
{
print >> OUT ##Printing current line to variable OUT value which is formed above and is writing output into out file.
close(OUT) ##If we keep writing lot of files we will get "Too many files opened error(s)" so closing these files in backend to avoid that error.
}
' Input_File ##Mentioning Input_file here which we are processing through awk.

Print word after string

How to print word after centrain string?
I have a file
Hello word.
I want to get an output:
#word
Something like
awk '{ print '#' }'
Thank you
You may use
awk '{sub(/[[:punct:]]+$/, "", $2);print "#"$2}' file > newfile
See the online demo.
The sub(/[[:punct:]]+$/, "", $2) operation removes 1 or more punctuation chars ([[:punct:]]+) from the end ($) of Field 2 and print "#"$2 prepends it with # and prints it.
To make sure to get the word after Hello you may use
cat file | grep -Po 'Hello\s+\K\w+' | awk '{print "#"$0}'
See the online demo
Or, with the help of sed:
sed 's/.*Hello \([[:alnum:]]*\).*/#\1/' file > newfile
See another demo.
Here, .*Hello \([[:alnum:]]*\).* matches any 0+ chars, then Hello, a space, then captures 0 or more alphanumeric chars into Group 1 (\1) and then matches the rest of the line. The #\1 pattern in the RHS leaves just what was captured with a # in front.
Another solution with awk with . and / as field delimiters using -F'[/.]':
awk -F'[/.]' '{for(i=1;i<=NF;i++){if($i=="54517334"){print "#"$(i+1)}}}' file
See an online demo
Here, for(i=1;i<=NF;i++) enumerates all the fields, and if the field is equal to 54517334, the next field with a # prepended at the start is printed.
EDIT: In case you want to search for a string and print string next to it then try following. I am looking for hello string here you could change it as per your need.
awk '{for(i=1;i<=NF;i++){if(tolower($i)=="hello"){print "#"$(i+1)}}}' Input_file
OR(create an awk variable and perform checks, it will help in changing string in place of hello in variable itself)
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){print "#"$(i+1)}}}' Input_file
In case you want to print next keyword by finding a string and remove punctuations then try following.
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){sub(/[[:punct:]]+/,"",$(i+1));print "#"$(i+1)}}}' Input_file
Could you please try following.
awk -F'[ .]' '{print "#"$2}' Input_file
OR
awk -F'[ .]' '{print $(NF-1)}' Input_file

Find string then print what comes next until another string

Here's my input.file (thousands of lines):
FN545816.1 EMBL CDS 9450 9857 . + 0 ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11
I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"
I tried this:
awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file
but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.
I tried many previous solutions but none gave what I want.
Thank you.
Could you please try following.
awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file
Explanation: Adding explanation for above code too now.
awk ' ##Starting awk program here.
match($0,/product=[^;]*/){ ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;
print substr($0,RSTART+8,RLENGTH-8) ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.
}' Input_file ##Mentioning Input_file name here.