How to combine these awk commands?

How to combine these awk commands? - awk

Can someone please explain to me how I can combine these piped awks to a single awk?
awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
print substr($0, RSTART, RLENGTH)}' file.txt |
awk 'gsub(",","")' | awk '{$1=$1};1'
I try this but it doesn't work:
awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
gsub(",","");$1=$1;print substr($0, RSTART, RLENGTH)}' file.txt
I understand that it shouldn't work because the characters are removed but the pointers don't change. How can I fix it now?

You need to wrap things the other way around. Collect the string you want to extract, then do the manipulations on the extracted value, just like your original script with multiple Awk scripts in a pipeline did.
awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
g=substr($0, RSTART, RLENGTH);
gsub(",","",g);
# $1=$1 is nice but we cannot use that here; here is a workaround
gsub(/^ *| *$/, "", g);
print g}' file.txt
The shortcut $1=$1 for trimming whitespace around a value works in an isolated Awk script if you are confident that there is only one field, but here, we don't necessarily have a single field (or do we?) so I use a more general solution to explicitly trim whitespace around the extracted string which also avoids relying on a well-known but still obscure side effect.
If shalvar is actually a variable you want to receive from the shell like $foo , try
awk -v field="$foo" 'match($0, "(^|,)[^,]*" field "[^,]*(,|$)") {
...
to interpolate the variable into a string which is then applied as a regular expression.

Related

right pad regex with spaces using sed or awk

I have a file with two fields separated with :, both fields are varying length, second field can have all sort of characters(user input). I want the first field to be right padded with spaces to fixed length of 15 characters, for first field I have a working regex #.[A-Z0-9]{4,12}.
sample:
#ABC123:"wild things here"
#7X3Z:"":":#":";:*:-user input:""
#99999X999:"also, imagine: unicode, yay!"
desired output:
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
There is plenty of examples how to zero pad a number, but surprisingly not a lot about general padding a regex or a field, any help using (preferably) sed or awk?

Here is another awk solution that would work with any version of awk:
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"

With perl:
$ perl -pe 's/^[^:]+/sprintf("%-15s",$&)/e' ip.txt
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
The e flag allows you to use Perl code in replacement section. $& will have the matched portion which gets formatted by sprintf.
With awk:
# should work with any awk
awk 'match($0, /^[^:]+/){printf "%-15s%s\n", substr($0,1,RLENGTH), substr($0,RLENGTH+1)}'
# can be simplified with GNU awk
awk 'match($0, /^[^:]+/, m){printf "%-15s%s\n", m[0], substr($0,RLENGTH+1)}'
# or
awk 'match($0, /^([^:]+)(.+)/, m){printf "%-15s%s\n", m[1], m[2]}'
substr($0,1,RLENGTH) or m[0] will give contents of first field. I have used 1 instead of the usual RSTART here since we are matching start of line
substr($0,RLENGTH+1) will give rest of the line contents (i.e. from the first :)
See awk manual: String-Manipulation for details about match function.

Adding one more way of adding spaces to 1st columns here, though anubhava's answer with sprintf is better answer, adding is as an option here. Here I have created a variable named spaces, where one could define number of spaces which we need to add to it.
awk -v spaces="15" 'BEGIN{FS=OFS=":"} {sub(/:/,sprintf("%"spaces-length($1)"s",":"))} 1' Input_file
Explanation: Adding detailed explanation for above.
awk -v spaces="15" ' ##Starting awk program from here, setting spaces to 15 here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS=":" ##Setting FS and OFS as colon here.
}
{
sub(/:/,sprintf("%"spaces-length($1)"s",":")) ##Substituting colon first occurrence with spaces(left padding of spaces) along with colon here.
}
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name here.

i believe anbhava's solution of
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
can be even further simplified as :
awk -F: 'BEGIN{FS=OFS} $1=sprintf("%-15s",$1)'
the { } and final 1 are optional

How to AWK print only specific item?

I have a log file that looks like this:
RPT_LINKS=1,T1999
RPT_NUMALINKS=1
RPT_ALINKS=1,1999TK,2135,2009,31462,29467,2560
RPT_TXKEYED=1
RPT_ETXKEYED=0
I have used grep to isolate the line I am interested in with the RPT_ALINKS. In that line I want to know how to use AWK to print only the link that ends with a TK.
I am really close running this:
grep -w 'RPT_ALINKS' stats2.log | awk -F 'TK' '{print FS }'
But I am sure those who are smarter than me already know I am getting only the TK back, how do I get the entire field so that I would get a return of 1999TK?

If there is only a single RT in that line and RT is always at the end:
awk '/RPT_ALINKS/{match($0,/[^=,]*TK/); print substr($0,RSTART,RLENGTH)}'
You can also use a double grep
grep -w 'RPT_ALINKS' stats2.log | grep -wo '[^=,]*TK'
The following sed solution also works nicely:
sed '/RPT_ALINKS/s/\(^.*[,=]\)\([^=,]*TK\)\(,.*\)\?/\2/'

It doesn't get any more elegant
awk -F '=' '$1=="RPT_ALINKS" {n=split($2,array,",")
for(i=1; i<=n; i++)
if (array[i] ~ /TK$/)
{print array[i]}}
' stats2.log
n=split($2,array,","): split 1,1999TK,2135,2009,31462,29467,2560 with , to array array. n contains number of array elements, here 7.

Here is a simple solution
awk -F ',|=' '/^RPT_ALINKS/ { for (i=1; i<=NF; i++) if ($i ~ /TK$/) print $i }' stats2.log
It looks only on the record which begins with RPT_ALINKS. And there it check every field. If field ends with TK, then it prints it.

Dang, I was just about to post the double-grep alternative, but got scooped. And all the good awk solutions are taken as well.
Sigh. So here we go in bash, for fun.
$ mapfile a < stats2.log
$ for i in "${a[#]}"; do [[ $i =~ ^RPT_ALINKS=(.+,)*([^,]+TK) ]] && echo "${BASH_REMATCH[2]}"; done
1999TK
This has the disadvantage of running way slower than awk and not using fields. Oh, and it won't handle multiple *TK items on a single line. And like sed, this is processing lines as patterns rather than fields, which saps elegance. And by using mapfile, we limit the size of input you can handle because your whole log is loaded into memory. Of course you don't really need to do that, but if you were going to use a pipe, you'd use a different tool anyway. :-)
Happy Thursday.

With a sed that has -E for EREs, e.g. GNU or OSX/BSD sed:
$ sed -En 's/^RPT_ALINKS=(.*,)?([^,]*TK)(,.*|$)/\2/p' file
1999TK
With GNU awk for the 3rd arg to match():
$ awk 'match($0",",/^RPT_ALINKS=(.*,)?([^,]*TK),.*/,a){print a[2]}' file
1999TK

Instead of looping through it, you can use an other alternative.
This will be fast, loop takes time.
awk -F"TK" '/RPT_ALINKS/ {b=split($1,a,",");print a[b]FS}' stats2.log
1999TK
Here you split the line by setting field separator to TK and search for line that contains RPT_ALINKS
That gives $1=RPT_ALINKS=1,1999 and $2=,2135,2009,31462,29467,2560
$1 will always after last comma have our value.
So split it up using split function by comma. b would then contain number of fields.
Since we know that number would be in last section we do use a[b] and add FS that contains TK

Join lines into one line using awk

I have a file with the following records
ABC
BCD
CDE
EFG
I would like to convert this into
'ABC','BCD','CDE','EFG'
I attempted to attack this problem using Awk in the following way:
awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}'
but I obtain not what I expected:
ABC,BCD,CDE,EFG
Are there any suggestions on how we can achieve this?

Could you please try following.
awk -v s1="'" 'BEGIN{OFS=","} {val=val?val OFS s1 $0 s1:s1 $0 s1} END{print val}' Input_file
Output will be as follows.
'ABC','BCD','CDE','EFG'

With GNU awk for multi-char RS:
$ awk -v RS='\n$' -F'\n' -v OFS="','" -v q="'" '{$1=$1; print q $0 q}' file
'ABC','BCD','CDE','EFG'

There are many ways of achieving this:
with pipes:
sed "s/.*/'&'/" <file> | paste -sd,
awk '{print '"'"'$0'"'"'}' <file> | paste -sd,
remark: we do not make use of tr here as this would lead to an extra , at the end.
reading the full file into memory:
sed ':a;N;$!ba;s/\n/'"','"'/g;s/.*/'"'&'"'/g' <file> #POSIX
sed -z 's/^\|\n$/'"'"'/g;s/\n/'"','"'/g;' <file> #GNU
and the solution of #EdMorton
without reading the full file into memory:
awk '{printf (NR>1?",":"")"\047"$0"\047"}' <file>
and some random other attempts:
awk '(NR-1){s=s","}{s=s"\047"$0"\047"}END{print s}' <file>
awk 'BEGIN{printf s="\047";ORS=s","s}(NR>1){print t}{t=$0}END{ORS=s;print t} <file>
So what is going on with the OP's attempts?
Writing down the OP's awk line, we have
/START/{if (x)print x;x="";next}
{x=(!x)?$0:x","$0;}
END{print x;}
What does this do? Let us analyze step by step:
/START/{if (x)print x;x="";next}:: This reads If the current record/line contains the string START, then do
if (x) print x:: if x is not an empty string, print the value of x
x="" set x to be an empty string
next:: skip to the next record/line
In this code block, the OP probably assumed that /START/ means do this at the beginning of all things. In awk, this is however written as BEGIN and since in the beginning, all variables are empty strings or zero, the if statement is not executed by default. This block could be replaced by:
BEGIN{x=""}
But again, this is not needed and thus one can remove it:
{x=(!x)?$0:x","$0;}:: concatenate the string with the correct delimiter. This is good, especially due to the usage of the ternary operator. Sadly the delimiter is set to , and not ',' which in awk is best written as \047,\047. So the line could read:
{x=(!x)?$0:x"\047,\047"$0;}
This line, can be written shorter if you realize that x could be an empty string. For an empty string, x=$0 is equivalent to x=x $0 and all you want to do is add a separator which all or not could be an empty string. So you can write this as
{x= x ((!x)?"":"\047,\047") $0}
or inverting the logic to get rid of some more characters:
{x=x(x?"\047,\047":"")$0}
one could even write
{x=x(x?"\047,\047":x)$0}
but this is not optimal as it needs to read what is the memory of x again. However, this form can be used to finally optimize it to (per #EdMorton's comment)
{x=(x?x"\047,\047":"")$0}
This is better as it removes an extra concatenation operator.
END{print x}:: Here the OP prints the result. This, however, will miss the final single-quotes at the beginning and end of the string, so they could be added
END{print "\047" x "\047"}
So the corrected version of the OP's code would read:
awk '{x=(x?x"\047,\047":"")$0}END{print "\047" x "\047"}'

awk may be better
awk '{printf fmt,$1}' fmt="'%s'\n" file | paste -sd, -
'ABC','BCD','CDE','EFG'

Combining awk search with standard awk and awk delimiter

I`m working on a set of data for which I need specific fields as output:
The data looks like this:
/home/oracle/db.log.gz:2013-1-19T00:00:25 <user.info> 1 2013-1-19T00:00:53.911 host_name RT_FLOW [junos#26.1.1.1.2.4 source-address="10.1.2.0" source-port="616" destination-address="100.1.1.2" destination-port="23" service-name="junos-telnet" nat-source-address="20x.2x.1.2" nat-source-port="3546" nat-destination-address="9x.12x.3.0"]
From above I need three things:
(I) - 2013-1-19T00:00:53.911 which is $4
(II)- source-address="10.1.2.0" which is $8 of which I need only 10.1.2.0
(III) - destination-address="100.1.1.2" which $10 of which I need only 100.1.1.2
I cannot use simple awk like this -> awk '{ print $4 \t $8 \t $10 }' since there are some fields after "device_name" in the log file which are not always present in all log lines so I have to make use of delimiters such as
awk -F 'source-address=' '{print $2}' | awk '{print $1} -> this gives source-addressIP which is (II) requirement
I`m not sure how do I combine using a awk search for I and II and III.
Can someone help?

I believe sed is better for this job
sed -r 's/([^ ]+[ ]+){3}([^ ]+).*[ ]+source-address="([^"]+)".*[ ]+destination-address="([^"]+)".*/\2\t\3\t\4/' file
Output:
2013-1-19T00:00:53.911 10.1.2.0 100.1.1.2

What do you exactly want?
solve the problem using any (reasonably standard) tool
solve this challenge using one instance of awk
solve the problem using just awk, no matter how many instances it costs
For the first case, you could parse the line using scripting language of your choice (mine would be Perl), or do it the hard way using sed and a single big substitution. Or something between the two – use three regexes to get the parts you want.
For the second case, you could adapt any of the former solutions, preferably the sed one. Awk and sed solutions have already been posted.
For the third case, you could just run the obvious awk solutions you mentioned in your question and send the results to a single pipe like { awk …; awk …; awk …; } < file | consumer.

Try doing this :
awk '{print gensub(/.*\s+([0-9]{4}-[0-9]+-[0-9]+T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]+).*source-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*destination-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/, "(I) \\1\n(II) \\2\n(III) \\3", "g"); }' file
Another solution using perl :
perl -lne 'print "(", "I" x ++$c, ") $_" for m/.*?\s+(\d{4}-\d+-\d+T\d{2}:\d{2}:\d{2}.\d+).*source-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*destination-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*/' file
Outputs :
(I) 2013-1-19T00:00:53.911
(II) 10.1.2.0
(III) 100.1.1.2

Unable to match regex in string using awk

I am trying to fetch the lines in which the second part of the line contains a pattern from the first part of the line.
$ cat file.txt
String1 is a big string|big
$ awk -F'|' ' { if ($2 ~ /$1/) { print $0 } } ' file.txt
But it is not working.
I am not able to find out what is the mistake here.
Can someone please help?

Two things: No slashes, and your numbers are backwards.
awk -F\| '$1~$2' file.txt

I guess what you meant is part of the string in the first part should be a part of the 2nd part.if this is what you want! then,
awk -F'|' '{n=split($1,a,' ');for(i=1,i<=n;i++){if($2~/a[i]/)print $0}}' your_file

There are surprisingly many things wrong with your command line:
1) You aren't using the awk condition/action syntax but instead needlessly embedding a condition within an action,
2) You aren't using the default awk action but instead needlessly hand-coding a print $0.
3) You have your RE operands reversed.
4) You are using RE comparison but it looks like you really want to match strings.
You can fix the first 3 of the above by modifying your command to:
awk -F'|' '$1~$2' file.txt
but I think what you really want is "4" which would mean you need to do this instead:
awk -F'|' 'index($1,$2)' file.txt

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to combine these awk commands? - awk

Related

right pad regex with spaces using sed or awk

How to AWK print only specific item?

Join lines into one line using awk

Combining awk search with standard awk and awk delimiter

Unable to match regex in string using awk

Categories

Resources