How to select fields in a non-uniform file using different delimiters? - awk

I'm having trouble writing a one-liner that will select out the numbers between the parentheses, wrap it in double quotes, insert a comma, then select all the text after "USER_RULE: " up to the next double quote.
Here is a small sample of my file:
#213(1547485175) pass in quick on igb0 inet proto udp from <MGMT_HOSTS:1> to <UNRAID_IP:1> port = http keep state label "USER_RULE: Local Mgmt Services"
#174(1548683908) block return in quick on ALL_LAN inet proto tcp from <LOCAL_NETWORKS:7> to <LOCAL_BROADCAST:8> label "USER_RULE: Local Broadcast Noise"
#157(1547555119) block return in log quick on ALL_LAN inet from ! <NO_PFBLOCKER:1> to <pfB_BAD_IP_v4:55258> label "USER_RULE: pfb_Bad_IP (outbound)"
#137(1547478025) pass in quick on igb0 inet proto tcp from 192.168.1.0/24 to (self:13) port = ssh flags S/SA keep state label "USER_RULE: Anti-Lockout"
#386(1548774638) pass in quick on igb0.10 route-to (ovpnc1 10.20.48.141) inet proto udp from <MOBILE_DEVICES:5> to ! <PRIVATE_NETWORKS:3> port = https keep state label "USER_RULE: Policy Route" tag NO_WAN_EGRESS
Here's my expected output:
"1547485175",Local Mgmt Services
"1548683908",Local Broadcast Noise
"1547555119",pfb_Bad_IP (outbound)
"1547478025",Anti-Lockout
"1548774638",Policy Route
I've tried various combinations of awk, sed, and grep and I can get sort of the output I want. I just can't nail it. I'll spare you my ugly failed attempts.

$ sed 's/[^(]*(\([^)]*\).*"USER_RULE: *\([^"]*\).*/"\1",\2/' file
"1547485175",Local Mgmt Services
"1548683908",Local Broadcast Noise
"1547555119",pfb_Bad_IP (outbound)
"1547478025",Anti-Lockout
"1548774638",Policy Route

Could you please try following(It is always recommended to add your effort in your post so kindly do so as we all are here to learn).
awk '
BEGIN{
s1="\""
OFS=","
}
match($0,/\([^\)]*/){
val=substr($0,RSTART+1,RLENGTH-1)
}
match($0,/USER_RULE[^"]*/){
print s1 val s1,substr($0,RSTART+11,RLENGTH-11)
}' Input_file
Output will be as follows.
"1547485175",Local Mgmt Services
"1548683908",Local Broadcast Noise
"1547555119",pfb_Bad_IP (outbound)
"1547478025",Anti-Lockout
"1548774638",Policy Route

# File a.awk:
BEGIN { q = "\"" }
{ idx = index($0, "USER_RULE:")
rule = substr($0, idx + 11)
idx = index(rule, q) - 1
print q substr($0, 6, 10) q "," substr(rule, 1, idx)
}
Run:
$ awk -f a.awk file
"1547485175",Local Mgmt Services
"1548683908",Local Broadcast Noise
"1547555119",pfb_Bad_IP (outbound)
"1547478025",Anti-Lockout
"1548774638",Policy Route

Related

Non blocking read from GNU awk coprocess?

I would like to implement incremental execution of scripts using gawk in order to interleave script source and script output in a document.
The idea would be to read script lines into awk to print them and also pipe them into an appropriate interpreter. Then, on a queue from the input file, read any output from the coprocess and print it to standard output. But it seems that I must know how much output has been generated before looping over the coprocess output.
Is there any way to do a non-blocking read from the coprocess?
function script_checkpoint() {
while(("python3" |& getline output) > 0)
print output
}
/^# checkpoint/ { script_checkpoint(); next }
{ print; print $0 |& "python3" }
END { script_checkpoint() }
EDIT: I have tried to implement this without using a coprocess by buffering the input lines until a checkpoint and just letting the interpreter print to standard out itself but the interpreter always buffers its output until the stream closes. I don't want to close it until the program ends to preserve its internal state.
EDIT: made it more clear that my first intended use case is running python scripts. Here is a sample input/output pair.
print('first line')
# checkpoint
print('second line')
should result in
print('first line')
first line
print('second line')
second line
The general issue:
while ((interpreter |& getline output) > 0) runs until it sees an EOF but ...
interpreter does not end/terminate/exit, thus no EOF is sent so ...
awk hangs while waiting for interpreter to send more data so ...
we end up with a deadlock situation (awk waiting for input from interpreter; interpreter waiting for input from awk)
Assumptions:
need to maintain a single invocation of interpreter throughout the run (per a comment from OP); net result: awk cannot depend on interpreter sending an EOF
interpreter can be modified (to generate additional output)
the awk script has no way of knowing how many lines of output will be generated by interpreter
One idea is to setup a handshake between awk and interpreter. Within the while ((interpreter |& getline output) > 0) loop we'll test for our handshake and when we see it break out of the loop and return back to the main awk script.
For demo purposes I'll use a simple bash script that does some handshake processing otherwise just prints to stdout whatever it reads from stdin:
$ cat interpreter
#!/usr/bin/bash
while read -r line
do
if [[ "${line}" = 'checkpoint' ]] # received 'checkpoint' handshake?
then
echo "CHECKPOINT" # send "CHECKPOINT" handshake/acknowledgement
continue
else
echo "interpreter: $line"
fi
done
Demo awk code with handshake logic:
awk '
function script_checkpoint() {
while (( cmd |& getline output) > 0) {
if ( output == "CHECKPOINT" ) # received "CHECKPOINT" handshake/acknowledgement?
break
print output
}
}
BEGIN { cmd= "./interpreter" }
/^# checkpoint/ { print "checkpoint" |& cmd # send "checkpoint" handshake
script_checkpoint()
next
}
{ print "awk: " $0
print $0 |& cmd
}
END { print "awk: last checkpoint" # in case last line of input is not "# checkpoint" we will ...
print "checkpoint" |& cmd # send one last "checkpoint" handshake
script_checkpoint()
print "awk: done"
}
' test.dat
Sample input file:
$ cat test.dat
line1
line2
# checkpoint
line3
line4
# checkpoint
line5
Output:
awk: line1
awk: line2
interpreter: line1
interpreter: line2
awk: line3
awk: line4
interpreter: line3
interpreter: line4
awk: line5
awk: last checkpoint
interpreter: line5
awk: done
NOTES:
awk will still hang in the event interpreter crashes and/or fails to send back the CHECKPOINT handshake
if the strings checkpoint and/or CHECKPOINT can show up in the 'normal' data streams then update the code to use strings that are not expected in the data streams
It sounds like you're trying to do something like this:
BEGIN { cmd="/my/python/script/path" }
function script_checkpoint( output) {
close(cmd,"to")
while ( (cmd |& getline output) > 0 ) {
print output
}
close(cmd)
}
/^# checkpoint/ {
script_checkpoint()
next
}
{
print
print |& cmd
}
END { script_checkpoint() }

sed script - How do I extract IP & MAC addresses from config sections?

How can I use sed to extract the IP address and MAC address of every lease {...} section containing a MAC address?
Example input:
lease 192.168.0.188 {
starts 0 2015/10/18 10:02:20;
ends 0 2015/10/18 10:32:20;
cltt 0 2015/10/18 10:02:20;
binding state active;
next binding state free;
hardware ethernet 2c:44:fd:25:f7:fc;
uid "\001,D\375%\367\374";
client-hostname "708-PC";
}
lease 192.168.0.71 {
starts 0 2015/10/18 10:02:16;
ends 0 2015/10/18 10:02:16;
tstp 0 2015/10/18 10:02:16;
cltt 0 2015/10/18 10:02:16;
binding state abandoned;
next binding state free;
}
Example output:
192.168.0.188
2c:44:fd:25:f7:fc
I tried using:
s/lease // ;s/hardware ethernet // ;s/^ // ;/^[^0-9]/d ;s/[^0-9a-z\:\.][\{]// ;s/\;// ;/^$/d
This doesn't work correctly, however.
Using awk:
awk 'BEGIN {RS="lease "} /hardware ethernet / { match($0,/[[:xdigit:]:]{17}/); print $1; print substr($0, RSTART, RLENGTH) }' your_file
158.108.30.188
2c:44:fd:25:f7:fc
Using sed:
sed -r -n '/lease/{h};/hardware/{H;g;s/lease (.+) \{.*ethernet (.+);/\1\n\2/;p}' your_file
158.108.30.188
2c:44:fd:25:f7:fc
If your awk supports gensub(),
$ awk -vRS='}' '/lease/ && /hardware ethernet/{r=gensub(/.*lease ([^ ]*).*hardware ethernet ([^;]*).*/,"\\1\n\\2", "g");print r}' file
158.108.30.188
2c:44:fd:25:f7:fc

awk command to run a C++ code and input several times

Say, I have a C code which I compile like:
$ gcc code.c -o f.out
$ ./f.out inputfile outputfile
Then the code asks for input
$ enter mass:
Now if I need to run this code for example 200 times and the input files have name : 0c.txt, 1c.txt, ....., 199c.txt etc and I want to use same value of mass every time (e.g. mass=6) then how do I write an "awk" command for that? Thanks for your help.
You don't specify your outputfile name. I'll assume 0c.out, 1c.out, ...
I'm also assuming that the f.out program reads the mass from stdin instead of anything more complicated.
#!/usr/bin/gawk -f
BEGIN {
mass = 6
for (i=0; i<200; i++) {
cmd = sprintf("./f.out %dc.txt %dc.out", i, i)
print mass |& cmd
close(cmd, "to")
while ((cmd |& getline out) > 0) {
do something with each line of output from ./f.out
}
close(cmd)
}
}
ref http://www.gnu.org/software/gawk/manual/html_node/Two_002dway-I_002fO.html
In bash, you'd write:
for i in $(seq 0 199); do
echo 6 | ./f.out ${i}c.txt ${i}c.out
done

How can I check if a GNU awk coprocess is open, or force it to open without writing to it?

I have a gawk program that uses a coprocess. However, sometimes I don't have any data to write to the coprocess, and my original script hangs while waiting for the output of the coprocess.
The code below reads from STDIN, writes each line to a "cat" program, running as a coprocess. Then it reads the coprocess output back in and writes it to STDOUT. If we change the if condition to be 1==0, nothing gets written to the coprocess, and the program hangs at the while loop.
From the manual, it seems that the coprocess and the two-way communication channels are only started the first time there is an IO operation with the |& operator. Perhaps we can start things without actually writing anything (e.g. writing an empty string)? Or is there a way to check if the coprocess ever started?
#!/usr/bin/awk -f
BEGIN {
cmd = "cat"
## print "" |& cmd
}
{
if (1 == 1) {
print |& cmd
}
}
END {
close (cmd, "to")
while ((cmd |& getline line)>0) {
print line
}
close(cmd)
}
Great question, +1 for that!
Just test the return code of the close(cmd, "to") - it will be zero if the pipe was open, -1 (or some other value) otherwise. e.g.:
if (close(cmd, "to") == 0) {
while ((cmd |& getline line)>0) {
print line
}
close(cmd)
}

sending "ping" output to a variable

I am trying to ping some ip addresses in my router. I use this code:
for {set n 0} {$n < 10} {incr n} {puts [exec "ping 199.99.$n.1]}
but this will show the output.
the issue is that I don't want to see the output. I would like to send that output into another variable and the search the content of variable with "regexp" and get the result, and do the rest of the story.
but I don't know how I can do that.
Use the set command. The puts command prints it's argument.
set pingOutput [exec ping "199.99.$n.1"]
Or append if you want all IP's results in one variable.
set allPingOutput ""
for {set n 0} {$n < 10} {incr n} {
append allPingOutput [exec ping "199.99.$n.1"]
}
Try calling the ping with the -c flag:
ping -c 1 10.0.1.1
Not sure how to do it in tcl but in php for example:
It is very important to use ping -c1 <IP address> , otherwise the script will never end as the ping process never ends :)
My code uses an array of results of every IP
for {set i 2 } {$i < 10} {incr i} {
catch {if {[regexp {bytes from} [exec ping -c1 192.168.12.$i]]} {
set flag "reachable"
} else { set flag "not reachable"}
set result(192.168.12.$i) $flag
}
}
parray result
OUTPUT :
result(192.168.12.2) = reachable
result(192.168.12.3) = reachable
result(192.168.12.5) = reachable
result(192.168.12.6) = reachable
result(192.168.12.7) = reachable
result(192.168.12.9) = reachable
Instead of storing and manipulating , I used regexp .