How I proceed with awk after if statement - awk

my input:
Jun 26 06:54:33 host dovecot: imap-login: Login: user=<xxx>, method=PLAIN, rip=111.111.111.111, lip=111.111.111.111, mpid=00000, TLS, session=<LVVIgfWodFBZD+3W>
Like to get the IP of the rip entry with one command
awk '{ if ($6 == "imap-login:" && match($10,/rip/) ) { print $10 } }'
give me "rip=78.47.14.44,"
How it works to get only the IP?

Could you please try following, written and tested with shown samples in GNU awk.
awk 'match($0,/rip[^,]*/){print substr($0,RSTART+4,RLENGTH-4)}' Input_file
OR as per kvantour sir's suggestion:
awk 'match($0,/[:,] rip[^,]*/){val=substr($0,RSTART+4,RLENGTH-4);sub(/.*rip/,"");print val;val=""}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/rip[^,]*/){ ##Using match to match regex of rip till comma comes in current line,if regex match is found then it sets variables RSTART and RLENGTH for match.
print substr($0,RSTART+4,RLENGTH-4) ##Printing sub string from RSTART+4 value to RLENGTH+4 values to get exact IP which is coming with strig rip in line.
}
' Input_file ##Mentioning Input_file name here.

$ awk -F'[ =,]+' '$6=="imap-login:"{print $13}' file
111.111.111.111

Related

How to find and match an exact string in a column using AWK?

I'm having trouble on matching an exact string that I want to find in a file using awk.
I have the file called "sup_groups.txt" that contains:
(the structure is: "group_name:pw:group_id:user1<,user2>...")
adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel
now, I want to extract the records that have in the user list the user "adm1" and print the first column (the group name), but you can see that there is a user called "adm12", so when i do this:
awk -F: '$4 ~ "adm1" {print $1}' sup_groups.txt
the output is:
adm
admins
conda
adm1Group
the command of course also prints those records that contain the string "adm12", but I don't want these lines because I'm interested only on the user "adm1".
So, How can I change this command so that it just prints the lines 1 and 6 (excluding 2 and 5)?
thank you so much and sorry for my bad English
EDIT: thank you for the answers, u gave me inspiration for the solution, i think this might work as well as your solutions but more simplified:
awk -F: '$4 ~ "adm,|adm1$|:adm1," {print $1}' sup_groups.txt
basically I'm using ORs covering all the cases and excluding the "adm12"
let me know if you think this is correct
1st solution: Using split function of awk. With your shown samples, please try following awk code.
awk -F':' '
{
num=split($4,arr,",")
for(i=1;i<=num;i++){
if(arr[i]=="adm1"){
print
}
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -F':' ' ##Starting awk program from here setting field separator as : here.
{
num=split($4,arr,",") ##Using split to split 4th field into array arr with delimiter of ,
for(i=1;i<=num;i++){ ##Running for loop till value of num(total elements of array arr).
if(arr[i]=="adm1"){ ##Checking condition if arr[i] value is equal to adm1 then do following.
print ##printing current line here.
}
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: Using regex and conditions in awk.
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/' Input_file
OR if 4th field doesn't have comma at all then try following:
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/ || $4=="adm1"' Input_file
Explanation: Making field separator as : and checking condition if 4th field is either equal to ^adm1,(starting adm1,) OR its equal to ,adm1, OR its equal to ,adm1$(ending with ,adm1) then print that line.
This should do the trick:
$ awk -F: '"," $4 "," ~ ",adm1," { print $1 }' file
The idea behind this is the encapsulate both the group field between commas such that each group entry is encapsulated by commas. So instead of searching for adm1 you search for ,adm1,
So if your list looks like:
adm2,adm12,manuel
and, by adding commas, you convert it too:
,adm2,adm12,manuel,
you can always search for ,adm1, and find the perfect match .
once u setup FS per task requirements, then main body becomes barely just :
NF = !_ < NF
or even more straight forward :
{m,n,g}awk —- --NF
=
{m,g}awk 'NF=!_<NF' OFS= FS=':[^:]*:[^:]*:[^:]*[^[:alpha:]]?adm[0-9]+.*$'
adm
admins
conda
adm1Group

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

How to filter a field with awk follow a pattern

I have a file with that format:
Topic:test_replication PartitionCount:1 ReplicationFactor:3 Configs:retention.ms=604800000,delete.retention.ms=86400000,cleanup.policy=delete,max.message.bytes=1000012,min.insync.replicas=2,retention.bytes=-1
Topic:teste2e_funcional PartitionCount:12 ReplicationFactor:3 Configs:min.cleanable.dirty.ratio=0.00001,delete.retention.ms=86400000,cleanup.policy=delete,min.insync.replicas=2,segment.ms=604800000,retention.bytes=-1
Topic:ticket_dl.replica_cloudera PartitionCount:3 ReplicationFactor:3 Configs:message.downconversion.enable=true,file.delete.delay.ms=60000,segment.ms=604800000,min.compaction.lag.ms=0,retention.bytes=-1,segment.index.bytes=10485760,cleanup.policy=delete,message.timestamp.difference.max.ms=9223372036854775807,segment.jitter.ms=0,preallocate=false,message.timestamp.type=CreateTime,message.format.version=2.2-IV1,segment.bytes=1073741824,max.message.bytes=1000000,unclean.leader.election.enable=false,retention.ms=604800000,flush.ms=9223372036854775807,delete.retention.ms=31536000000,min.insync.replicas=2,flush.messages=9223372036854775807,compression.type=producer,index.interval.bytes=4096,min.cleanable.dirty.ratio=0.5
And I want to have only the value of Topic (e.g. test_replication) and the value of min.insync.replicas (e.g. 2)
I know that it is possible to do with regular expression, but I don't know how to do it. For me the problems is that min.insync.replicas is not in the same possition so if I use the awk option -F with for example , I will got diferent values of min.insync.replicas.
Could you please try following.
awk '
match($0,/Topic:[^ ]*/){
topic=substr($0,RSTART+6,RLENGTH-6)
match($0,/min\.insync\.replicas[^,]*/)
print topic,substr($0,RSTART+20,RLENGTH-20)
topic=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/Topic:[^ ]*/){ ##Using match function to match regex Topic: till space comes here.
topic=substr($0,RSTART+6,RLENGTH-6) ##Creating topic varwhich has sub-string of current line starting from RSTART till RLENGTH.
match($0,/min\.insync\.replicas[^,]*/) ##Using match again to match regex frommin to till comma here.
print topic,substr($0,RSTART+20,RLENGTH-20) ##Printing topic and sub-string from RSTART to till RLENGTH adding and substracting respectively here.
topic="" ##Nullify variable topic here.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: Adding a sed solution here.
sed 's/Topic:\([^ ]*\).*min\.insync\.replicas=\([^,]*\).*/\1 \2/' Input_file
Sorry for the questions before. Was very simple:
awk '
match($0,/Topic:[^ ]*/){
topic=substr($0,RSTART+6,RLENGTH-6)
match($0,/min\.insync\.replicas[^,]*/)
mininsync=substr($0,RSTART+20,RLENGTH-20)
match($0,/retention\.ms[^,]*/)
retention=substr($0,RSTART+13,RLENGTH-13)
print topic",",mininsync,","retention
topic=""
}

Grepping all strings on the same line from multiple files

Trying to find a way to grep all names on one line for 100 files. grepping all names available in each file must appear on the same line.
FILE1
"company":"COMPANY1","companyDisplayName":"CM1","company":"COMPANY2","companyDisplayName":"CM2","company":"COMPANY3","companyDisplayName":"CM3",
FILE2
"company":"COMPANY99","companyDisplayName":"CM99"
The output i actually want is, ( include file name as prefix.)
FILE1:COMPANY1,COMPANY2,COMPANY3
FILE2:COMPANY99
i tried grep -oP '(?<="company":")[^"]*' * but i get results like this :
FILE1:COMPANY1
FILE1:COMPANY2
FILE1:COMPANY3
FILE2:COMPANY99
Could you please try following.
awk -F'[,:]' '
BEGIN{
OFS=","
}
{
for(i=1;i<=NF;i++){
if($i=="\"company\""){
val=(val?val OFS:"")$(i+1)
}
}
gsub(/\"/,"",val)
print FILENAME":"val
val=""
}
' Input_file1 Input_file2
Explanation: Adding explanation for above code.
awk -F'[,:]' ' ##Starting awk program here and setting field separator as colon OR comma here for all lines of Input_file(s).
BEGIN{ ##Starting BEGIN section of awk here.
OFS="," ##Setting OFS as comma here.
} ##Closing BEGIN BLOCK here.
{ ##Starting main BLOCK here.
for(i=1;i<=NF;i++){ ##Starting a for loop which starts from i=1 to till value of NF.
if($i=="\"company\""){ ##Checking condition if field value is equal to "company" then do following.
val=(val?val OFS:"")$(i+1) ##Creating a variable named val and concatenating its own value to it each time cursor comes here.
} ##Closing BLOCK for if condition here.
} ##Closing BLOCK for, for loop here.
gsub(/\"/,"",val) ##Using gsub to gklobally substitute all " in variable val here.
print FILENAME":"val ##Printing filename colon and variable val here.
val="" ##Nullifying variable val here.
} ##Closing main BLOCK here.
' Input_file1 Input_file2 ##Mentioning Input_file names here.
Output will be as follows.
Input_file1:COMPANY1,COMPANY2,COMPANY3
Input_file2:COMPANY99
EDIT: Adding solution in case OP needs to use grep and want to get final output from its output(though I will recommend to use awk solution itself since we are NOT using multiple commands or sub-shells).
grep -oP '(?<="company":")[^"]*' * | awk 'BEGIN{FS=":";OFS=","} prev!=$1 && val{print prev":"val;val=""} {val=(val?val OFS:"")$2;prev=$1} END{if(val){print prev":"val}}'
There are two tools that can take the output of your grep command and reformat it the way you want. First tool is GNU datamash. Second is tsv-summarize from eBay's tsv-utils package (disclaimer: I'm the author). Both tools solve this in similar ways:
$ # The grep output
$ echo $'FILE1:COMPANY1\nFILE1:COMPANY2\nFILE1:COMPANY3\nFILE2:COMPANY99' > grep-output.txt
$ cat grep-output.txt
FILE1:COMPANY1
FILE1:COMPANY2
FILE1:COMPANY3
FILE2:COMPANY99
$ # Using GNU datamash
$ cat grep-output.txt | datamash -field-separator : --group 1 unique 2
FILE1:COMPANY1,COMPANY2,COMPANY3
FILE2:COMPANY99
$ # Using tsv-summarize
$ cat grep-output.txt | tsv-summarize --delimiter : --group-by 1 --unique-values 2 --values-delimiter ,
FILE1:COMPANY1,COMPANY2,COMPANY3
FILE2:COMPANY99

Use awk to limit output of transparent huge pages

How would I use awk to filter the current setting of the transparent huge pages?
Example output of transparent huge pages:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
I would like to output only the current setting:
never
Following awk may also help you on same.
awk -F"[][]" 'NF{print $2}' Input_file
Explanation: following is not exact code it is only for explanation purposes only.
-F"[][]" ##Setting field separator as ] and [ for each line in Input_file.
'NF{ ##Checking condition here if line is NOT NULL where NF is number of fields awk variable which will be set only when a line is NOT NULL.
print $2} ##If above condition is TRUE then print the 2nd field of current line of Input_file.
' Input_file ##Mentioning Input_file name here.
How about something like this?
awk -F'[\\]\\[]' '/always madvise/ && NF > 2 { print $2 }' /sys/kernel/mm/transparent_hugepage/enabled