How to write this in idiomatic awk? - awk

The following program prints out the name of the file, the number of rows, and the number of rows that begin with // in the case that more than one fifth of the rows begin that way.
awk '$1 == "//" { a+=1 } END { if (a * 5 >= NR) {print FILENAME " " NR " " a}}' MyClass.java
This works, but the nested {{}} make me question if I'm doing it right, knowing that the typical structure of an awk program is:
awk 'condition { actions }'
So I suspect that something like
awk '$1 == "//" { a+=1 } END && (a * 5 >= NR) {print FILENAME " " NR " " a}' MyClass.java
would be more appropriate, but every such attempt gives syntax errors. Is there a right way to do this, or is my approach as good as it gets.

There are other ways to express it, but you wrote it idiomatically the first time. Although the authors tend to omit braces whenever they can, you can still find examples of code like that throughout The AWK Programming Language. They should know.
It seems like Aho, Weinberger, and Kernighan have several centuries of development experience in languages whose syntax derives from C. And when they write something like this
if (a * 5 >= NR)
print FILENAME " " NR " " a
it communicates perfectly that the block following the if statement is supposed to contain one and only one statement.
I have considerably fewer centuries of experience. Whenever I read something like that, it communicates perfectly that a) somebody forgot to type {}, and b) somebody else is about to introduce a bug by adding a statement to that block without adding the braces.
Over the years, I've trained myself to type this whenever I type an if.
if () {}
Then I go back and fill it in, breaking lines if I need to. In my normal editor, "if" expands automatically to "if () {}". I'm pretty sure I haven't omitted braces even once since the mid-1980s.

Related

Parsing and creating new arguments with getline AWK code

I am writing a pretty long AWK program (NOT terminal script) to parse through a network trace file. I have a situation where the next line in the trace file is ALWAYS a certain type of 'receive' (3 possible types) - however, I only want AWK to handle/print on one type. In short, I want to tell AWK if the next line contains a certain receive type, do not include it. It is my understanding that getline is the best way to go about this.
I have tried a couple different variations of getline and getline VAR via the manual, I still cannot seem to search through and reference fields in the next line like I want. Updated from edit:
if ((event=="r") && (hopSource == hopDest)) {
getline x
if ((x $31 =="arp") || (x $35 =="AODV")) {
#printf("Badline %s %s \n", $31, $35)
}
else {
macLinkRec++;
#printf("MAC Link Recieved from HEAD - %d to MEMBER %d \n", messageSource, messageDest)
}
}
I am using the print(badline) as just a marker to see what is going on. I fully understand how to restructure the code once I get the search and reference correct. I am also able to print the correct 'next' lines. However, I would expect to be able to search through the next line and create new arguments based on what is contained in the next line. How do I search a 'next line' based on an argument in AWK? How do I reference fields in that line to create new arguments?
Final note, the 'next line' number of fields (NF) varies, but I feel that the $35 field reference should handle any problems there.

awk: concat string with number in dict value

I have next awk oneliner :
{dict[$2"#"$6]=($(NF-2)/($(NF-2)+$NF))*100 } END {for (a in dict) { printf "%s %d :" , a, int(dict[a]) }}
What i need, is to add to value of each dictionary key combination of
($(NF-2)/($(NF-2)+$NF))*100 " out of" $(NF-2)+$NF
So i want awk to calculate all math , then compose string and put it as dictionary value. I already tried with some combination of spaces and brackets but still no luck.
Vars are filled from input stream :
$2 - host , not unique in input stream
$3 - partition , not unique in input stream
$NF - space avail
$NF-2 - space used
$(NF-2)+$NF - gives you overall capacity of partiton
Output is
80% host1#/local/1
Output expected:
80% host1#/local/1 out of 112G
----------------------Solution-----------------------------------
With good catch below , i resolved this. Issue was that i did int() in printf part, that truncated output. Though, further i faced other problems with my wrap-around shell part, therefore my final code was different than i expected it to be asking question.
'{key=($2 "#" $6 " out of " int((($(NF-2)+$NF)/1000)/1000) "GB" ) ; dict[key]=($(NF-2)/($(NF-2)+$NF))*100 } END {for (a in dict) { printf "%s , %d :" , a, int(dict[a]) }}'
I`ve moved "out of " and capacity part to dictionary key , because dict value cannot be string in my case, futher i will compare it with INT.
The concatenation is working fine. It's not the problem.
The problem is that you are calculating the int() of the dictionary value when you print. Since the value is a string, the result is truncated. If you need to use int() do it at the time you perform the calculations rather than at print time.
By the way, if you had provided some sample data it would have been a lot easier to test your code and provide an answer. This is especially important since it's sometimes the case, as it is here, that the problem is in a place that is not where it was anticipated.

GAWK Script using special characters

I am having an issue using special characters. I am parsing a text file separated by tabs. I want to have the program add a "*" to the first word in the line if a certain parameter is true.
if ($Var < $3) $1 = \*$1
Now every time I run it I get the error that it is not the end of the line.
2 things, but without more context to test with we really can't help you much.
$Var will only have meaning if you have set it above like Var=3. Then I don't think gawk will evaluate your $3 to the value of $3. The other side of that expression < $3 WILL expand to the value of the 3rd field. If you're getting $Var from the shell environment, you need to let the gawk script 'see' that value, i.e.
awk '{ ..... if ('"$Var"' < $3) $1= "*" $1 .....}
If you want the string literal '*' pre-pended, you're better off doing $1 = "*" $1
Without sample inputs, sample expected output, actual output and error messages, we'll be playing 20 questions here. If these comments don't solve your problem, please edit your question above to include these items.
P.S. Welcome to StackOverflow and let me remind you of three things we usually do here: 1) As you receive help, try to give it too, answering questions in your area of expertise 2) Read the FAQs, http://tinyurl.com/2vycnvr , 3) When you see good Q&A, vote them up by using the gray triangles, http://i.imgur.com/kygEP.png , as the credibility of the system is based on the reputation that users gain by sharing their knowledge. Also remember to accept the answer that better solves your problem, if any, by pressing the checkmark sign , http://i.imgur.com/uqJeW.png

Reorganizing named fields with AWK

I have to deal with various input files with a number of fields, arbitrarily arranged, but all consistently named and labelled with a header line. These files need to be reformatted such that all the desired fields are in a particular order, with irrelevant fields stripped and missing fields accounted for. I was hoping to use AWK to handle this, since it has done me so well when dealing with field-related dilemmata in the past.
After a bit of mucking around, I ended up with something much like the following (writing from memory, untested):
# imagine a perfectly-functional BEGIN {} block here
NR==1 {
fldname[1] = "first_name"
fldname[2] = "last_name"
fldname[3] = "middle_name"
maxflds = 3
# this is just a sample -- my real script went through forty-odd fields
for (i=1;i<=NF;i++) for (j=1;j<=maxflds;j++) if ($i == fldname[j]) fldpos[j]=i
}
NR!=1 {
for (j=1;j<=maxflds;j++) {
if (fldpos[j]) printf "%s",$fldpos[j]
printf "%s","/t"
}
print ""
}
Now this solution works fine. I run it, I get my output exactly how I want it. No complaints there. However, for anything longer than three fields or so (such as the forty-odd fields I had to work with), it's a lot of painfully redundant code which always has and always will bother me. And the thought of having to insert a field somewhere else into that mess makes me shudder.
I die a little inside each time I look at it.
I'm sure there must be a more elegant solution out there. Or, if not, perhaps there is a tool better suited for this sort of task. AWK is awesome in it's own domain, but I fear I may be stretching it's limits some with this.
Any insight?
The only suggestion that I can think of is to move the initial array setup into the BEGIN block and read the ordered field names from a separate template file in a loop. Then your awk program consists only of loops with no embedded data. Your external template file would be a simple newline-separated list.
BEGIN {while ((getline < "fieldfile") > 0) fldname[++maxflds] = $0}
You would still read the header line in the same way you are now, of course. However, it occurs to me that you could use an associative array and reduce the nested for loops to a single for loop. Something like (untested):
BEGIN {while ((getline < "fieldfile") > 0) fldname[$0] = ++maxflds}
NR==1 {
for (i=1;i<=NF;i++) fldpos[i] = fldname[$i]
}

Awk scripting help - Logic Issue

I'm currently writing a simple .sh script to parse an Exim log file for strings matching " o' ". Currently, when viewing output.txt, all that is there is a 0 printed on every line(606 lines). I'm guessing my logic is wrong, as awk does not throw any errors.
Here is my code(updated for concatenation and counter issues). Edit: I've adopted some new code from dmckee's answer that I'm now working with over the old code in favor of simplicity.
awk '/o'\''/ {
line = "> ";
for(i = 20; i <= 33; i++) {
line = line " " $i;
}
print line;
}' /var/log/exim/main.log > output.txt
Any ideas?
EDIT: For clarity's sake, I'm grepping for "o'" in email addresses, because ' is an illegal character in email addresses(and in our databases, appears only with o'-prefixed names).
EDIT 2: As per commentary request, here is a sanitized sample of some desired output:
[xxx.xxx.xxx.xxx] kathleen.o'toole#domain.com <kathleen.o'toole#domain.com> routing defer (-51): retry time not reached
[xxx.xxx.xxx.xxx] julie.o'brien#domain.com <julie.o'brien#domain.com> routing defer (-51): retry time not reached
[xxx.xxx.xxx.xxx] james.o'dell#domain.com <james.o'dell#domain.com> routing defer (-51): retry time not reached
[xxx.xxx.xxx.xxx] daniel_o'leary#domain.com <aniel_o'leary#domain.com> routing defer (-51): retry time not reached
The reason I'm starting at 20 in my loop is because everything before the 20th field is just standard log information that isn't needed for my purposes here. All I need is everything from the IP and beyond for this solution(the messages for each 550 error are different for each mail server in use out there. I'm compiling a list of common ones)
+ means numerical addition in awk. If you want to concatenate, just place the constants and/or expressions separated with spaces.
So, this
line += " " + $i
should become
line = line " " $i
EDIT: Iff exim log files (I am more into Postfix :) are separated by a single space, isn't the following more simple:
grep -F o\' /var/log/exim/main.log | cut -d\ -f20-33 >output.txt
?
There is no real need for the grep here. Let awk select the matching lines for you (and fixing your concatenation bug as per ΤΖΩΤΖΙΟΥ):
awk '/o'\''/ {
line = "> ";
for(i = 20; i <= 33; i++) {
line = line " " $i;
}
print line;
}' /var/log/exim/main.log > output.txt
Of course, you end up needing some weird escaping if you do it at the promp like above. It is cleaner in a script...
Edit: On the first pass I missed the += problem...
Also assuming that the line you gave above is partial, as it has only 13ish fields (by default fields are white space delimited).
"'" is not illegal in local parts. From RFC2821, section 4.1.2:
Local-part = Dot-string / Quoted-string
Dot-string = Atom *("." Atom)
Atom = 1*atext
2821 further references RFC2822 for non-locally-defined elements, so:
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
In other words, "'" is a perfectly legal unquoted characted to have in an email localpart. Now, it may not be legal at your site, but that's not what you said.
Sorry for not staying directly on topic, but I wanted to correct your assertion.
Off task, and simpler still: python.
import fileinput
for line in fileinput.input():
if "'" in line:
fields = line.split(' ')
print "> ", ' '.join( fields[20:34] )