Combining awk search with standard awk and awk delimiter - awk

I`m working on a set of data for which I need specific fields as output:
The data looks like this:
/home/oracle/db.log.gz:2013-1-19T00:00:25 <user.info> 1 2013-1-19T00:00:53.911 host_name RT_FLOW [junos#26.1.1.1.2.4 source-address="10.1.2.0" source-port="616" destination-address="100.1.1.2" destination-port="23" service-name="junos-telnet" nat-source-address="20x.2x.1.2" nat-source-port="3546" nat-destination-address="9x.12x.3.0"]
From above I need three things:
(I) - 2013-1-19T00:00:53.911 which is $4
(II)- source-address="10.1.2.0" which is $8 of which I need only 10.1.2.0
(III) - destination-address="100.1.1.2" which $10 of which I need only 100.1.1.2
I cannot use simple awk like this -> awk '{ print $4 \t $8 \t $10 }' since there are some fields after "device_name" in the log file which are not always present in all log lines so I have to make use of delimiters such as
awk -F 'source-address=' '{print $2}' | awk '{print $1} -> this gives source-addressIP which is (II) requirement
I`m not sure how do I combine using a awk search for I and II and III.
Can someone help?

I believe sed is better for this job
sed -r 's/([^ ]+[ ]+){3}([^ ]+).*[ ]+source-address="([^"]+)".*[ ]+destination-address="([^"]+)".*/\2\t\3\t\4/' file
Output:
2013-1-19T00:00:53.911 10.1.2.0 100.1.1.2

What do you exactly want?
solve the problem using any (reasonably standard) tool
solve this challenge using one instance of awk
solve the problem using just awk, no matter how many instances it costs
For the first case, you could parse the line using scripting language of your choice (mine would be Perl), or do it the hard way using sed and a single big substitution. Or something between the two – use three regexes to get the parts you want.
For the second case, you could adapt any of the former solutions, preferably the sed one. Awk and sed solutions have already been posted.
For the third case, you could just run the obvious awk solutions you mentioned in your question and send the results to a single pipe like { awk …; awk …; awk …; } < file | consumer.

Try doing this :
awk '{print gensub(/.*\s+([0-9]{4}-[0-9]+-[0-9]+T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]+).*source-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*destination-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/, "(I) \\1\n(II) \\2\n(III) \\3", "g"); }' file
Another solution using perl :
perl -lne 'print "(", "I" x ++$c, ") $_" for m/.*?\s+(\d{4}-\d+-\d+T\d{2}:\d{2}:\d{2}.\d+).*source-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*destination-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*/' file
Outputs :
(I) 2013-1-19T00:00:53.911
(II) 10.1.2.0
(III) 100.1.1.2

Related

Extracting and rearranging columns

I read from stdin lines which contain fields. The field delimiter is a semicolon. There are no specific quoting characters in the input (i.e. the fields can't contain themselves semicolons or newline characters). The number of the input fields is unknown, but it is at least 4.
The output is supposed to be a similar file, consisting of the fields from 2 to the end, but field 2 and 3 reversed in order.
I'm using zsh.
I came up with a solution, but find it clumsy. In particular, I could not think of anything specific to zsh which would help me here, so basically I reverted to awk. This is my approach:
awk -F ';' '{printf("%s", $3 ";" $2); for(i=4;i<=NF;i++) printf(";%s", $i); print "" }' <input_file >output_file
The first printf takes care about the two reversed fields, and then I use an explicit loop to write out the remaining fields. Is there a possibility in awk (or gawk) to print a range of fields in a single command? Or did I miss some incredibly clever feature in zsh, which could make my life simpler?
UPDATE: Example input data
a;bb;c;D;e;fff
gg;h;ii;jj;kk;l;m;n
Should produce the output
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
Using any awk in any shell on every Unix box:
$ awk 'BEGIN{FS=OFS=";"} {t=$3; $3=$2; $2=t; sub(/[^;]*;/,"")} 1' file
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
With GNU awk you could try following code. Using match function ogf GNU awk, where using regex ^[^;]*;([^;]*;)([^;]*;)(.*)$ to catch the values as per requirement, this is creating 3 capturing groups; whose values are getting stored into array named arr(GNU awk's functionality) and then later in program printing values as per requirement.
Here is the Online demo for used regex.
awk 'match($0,/^[^;]*;([^;]*;)([^;]*;)(.*)$/,arr){
print arr[2] arr[1] arr[3]
}
' Input_file
If perl is accepted, it provides a join() function to join elements on a delimiter. In awk though you'd have to explicitly define one (which isn't complex, just more lines of code)
perl -F';' -nlae '$t = #F[2]; #F[2] = #F[1]; $F[1] = $t; print join(";", #F[1..$#F])' file
With sed, perl, hck and rcut (my own script):
$ sed -E 's/^[^;]+;([^;]+);([^;]+)/\2;\1/' ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
# can also use: perl -F';' -lape '$_ = join ";", #F[2,1,3..$#F]' ip.txt
$ perl -F';' -lane 'print join ";", #F[2,1,3..$#F]' ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
# -d and -D specifies input/output separators
$ hck -d';' -D';' -f3,2,4- ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
# syntax similar to cut, but output field order can be different
$ rcut -d';' -f3,2,4- ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
Note that the sed version will preserve input lines with less than 3 fields.
$ cat ip.txt
1;2;3
apple;fig
abc
$ sed -E 's/^[^;]+;([^;]+);([^;]+)/\2;\1/' ip.txt
3;2
apple;fig
abc
$ perl -F';' -lane 'print join ";", #F[2,1,3..$#F]' ip.txt
3;2
;fig
;
Another awk variant:
awk 'BEGIN{FS=OFS=";"} {$1=$3; $3=""; sub(/;;/, ";")} 1' file
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
With gnu awk and gensub switching the position of 2 capture groups:
awk '{print gensub(/^[^;]*;([^;]*);([^;]*)/, "\\2;\\1", 1)}' file
The pattern matches
^ Start of string
[^;]*; Negated character class, match optional chars other than ; and then match ;
([^;]*);([^;]*) 2 capture groups, both capturing chars other than ; and match ; in between
Output
c;bb;D;e;fff
ii;h;jj;kk;l;m;n
awk '{print $3, $0}' {,O}FS=\; < file | cut -d\; -f1,3,5-
This uses awk to prepend the third column, then pipes to cut to extract the desired columns.
Here is one way to do it using only zsh:
rearrange() {
local -a lines=(${(#f)$(</dev/stdin)})
for line in $lines; do
local -a flds=(${(s.;.)line})
print $flds[3]';'$flds[2]';'${(j.;.)flds[4,-1]}
done
}
The same idea in a single line. This may not be an improvement over your awk script:
for l in ${(#f)$(<&0)}; print ${${(A)i::=${(s.;.)l}}[3]}\;$i[2]\;${(j.;.)i:3}
Some of the pieces:
$(</dev/stdin) - read from stdin using pseudo-device.
$(<&0) - another way to read from stdin.
(f) - parameter expansion flag to split by newlines.
(#) - treat split as an array.
(s.;.) - split by semicolon.
$flds[3] - expands to the third array element.
$flds[4,-1] - fourth, fifth, etc. array elements.
$i:3 - ksh-style array slice for fourth, fifth ... elements.
Mixing styles like this can be confusing, even if it is slightly shorter.
(j.;.) - join array by semicolon.
i::= - assign the result of the expansion to the variable i.
This lets us use the semicolon-split fields later.
(A)i::= - the (A) flag ensures i is an array.

What does this Awk expression mean

I am working with bash script that has this command in it.
awk -F ‘‘ ‘/abc/{print $3}’|xargs
What is the meaning of this command?? Assume input is provided to awk.
The quick answer is it'll do different things depending on the version of awk you're running and how many fields of output the awk script produces.
I assume you meant to write:
awk -F '' '/abc/{print $3}'|xargs
not the syntactically invalid (due to "smart quotes"):
awk -F ‘’’/abc/{print $3}’|xargs
-F '' is undefined behavior per POSIX so what it will do depends on the version of awk you're running. In some awks it'll split the current line into 1 character per field. in others it'll be ignored and the line will be split into fields at every sequence of white space. In other awks still it could do anything else.
/abc/ looks for a string matching the regexp abc on the current line and if found invokes the subsequent action, in this case {print $3}.
However it's split into fields, print $3 will print the 3rd such field.
xargs as used will just print chunks of the multi-line input it's getting all on 1 line so you could get 1 line of all-fields output if you don't have many fields being output or several lines of multi-field output if you do.
I suspect the intent of that code was to do what this code actually will do in any awk alone:
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
e.g.:
$ printf 'foo\nxabc\nyzabc\nbar\n' |
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
b a

How to AWK print only specific item?

I have a log file that looks like this:
RPT_LINKS=1,T1999
RPT_NUMALINKS=1
RPT_ALINKS=1,1999TK,2135,2009,31462,29467,2560
RPT_TXKEYED=1
RPT_ETXKEYED=0
I have used grep to isolate the line I am interested in with the RPT_ALINKS. In that line I want to know how to use AWK to print only the link that ends with a TK.
I am really close running this:
grep -w 'RPT_ALINKS' stats2.log | awk -F 'TK' '{print FS }'
But I am sure those who are smarter than me already know I am getting only the TK back, how do I get the entire field so that I would get a return of 1999TK?
If there is only a single RT in that line and RT is always at the end:
awk '/RPT_ALINKS/{match($0,/[^=,]*TK/); print substr($0,RSTART,RLENGTH)}'
You can also use a double grep
grep -w 'RPT_ALINKS' stats2.log | grep -wo '[^=,]*TK'
The following sed solution also works nicely:
sed '/RPT_ALINKS/s/\(^.*[,=]\)\([^=,]*TK\)\(,.*\)\?/\2/'
It doesn't get any more elegant
awk -F '=' '$1=="RPT_ALINKS" {n=split($2,array,",")
for(i=1; i<=n; i++)
if (array[i] ~ /TK$/)
{print array[i]}}
' stats2.log
n=split($2,array,","): split 1,1999TK,2135,2009,31462,29467,2560 with , to array array. n contains number of array elements, here 7.
Here is a simple solution
awk -F ',|=' '/^RPT_ALINKS/ { for (i=1; i<=NF; i++) if ($i ~ /TK$/) print $i }' stats2.log
It looks only on the record which begins with RPT_ALINKS. And there it check every field. If field ends with TK, then it prints it.
Dang, I was just about to post the double-grep alternative, but got scooped. And all the good awk solutions are taken as well.
Sigh. So here we go in bash, for fun.
$ mapfile a < stats2.log
$ for i in "${a[#]}"; do [[ $i =~ ^RPT_ALINKS=(.+,)*([^,]+TK) ]] && echo "${BASH_REMATCH[2]}"; done
1999TK
This has the disadvantage of running way slower than awk and not using fields. Oh, and it won't handle multiple *TK items on a single line. And like sed, this is processing lines as patterns rather than fields, which saps elegance. And by using mapfile, we limit the size of input you can handle because your whole log is loaded into memory. Of course you don't really need to do that, but if you were going to use a pipe, you'd use a different tool anyway. :-)
Happy Thursday.
With a sed that has -E for EREs, e.g. GNU or OSX/BSD sed:
$ sed -En 's/^RPT_ALINKS=(.*,)?([^,]*TK)(,.*|$)/\2/p' file
1999TK
With GNU awk for the 3rd arg to match():
$ awk 'match($0",",/^RPT_ALINKS=(.*,)?([^,]*TK),.*/,a){print a[2]}' file
1999TK
Instead of looping through it, you can use an other alternative.
This will be fast, loop takes time.
awk -F"TK" '/RPT_ALINKS/ {b=split($1,a,",");print a[b]FS}' stats2.log
1999TK
Here you split the line by setting field separator to TK and search for line that contains RPT_ALINKS
That gives $1=RPT_ALINKS=1,1999 and $2=,2135,2009,31462,29467,2560
$1 will always after last comma have our value.
So split it up using split function by comma. b would then contain number of fields.
Since we know that number would be in last section we do use a[b] and add FS that contains TK

Combine grep -f and awk

I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!

Unable to match regex in string using awk

I am trying to fetch the lines in which the second part of the line contains a pattern from the first part of the line.
$ cat file.txt
String1 is a big string|big
$ awk -F'|' ' { if ($2 ~ /$1/) { print $0 } } ' file.txt
But it is not working.
I am not able to find out what is the mistake here.
Can someone please help?
Two things: No slashes, and your numbers are backwards.
awk -F\| '$1~$2' file.txt
I guess what you meant is part of the string in the first part should be a part of the 2nd part.if this is what you want! then,
awk -F'|' '{n=split($1,a,' ');for(i=1,i<=n;i++){if($2~/a[i]/)print $0}}' your_file
There are surprisingly many things wrong with your command line:
1) You aren't using the awk condition/action syntax but instead needlessly embedding a condition within an action,
2) You aren't using the default awk action but instead needlessly hand-coding a print $0.
3) You have your RE operands reversed.
4) You are using RE comparison but it looks like you really want to match strings.
You can fix the first 3 of the above by modifying your command to:
awk -F'|' '$1~$2' file.txt
but I think what you really want is "4" which would mean you need to do this instead:
awk -F'|' 'index($1,$2)' file.txt