Parsing and creating new arguments with getline AWK code - awk

I am writing a pretty long AWK program (NOT terminal script) to parse through a network trace file. I have a situation where the next line in the trace file is ALWAYS a certain type of 'receive' (3 possible types) - however, I only want AWK to handle/print on one type. In short, I want to tell AWK if the next line contains a certain receive type, do not include it. It is my understanding that getline is the best way to go about this.
I have tried a couple different variations of getline and getline VAR via the manual, I still cannot seem to search through and reference fields in the next line like I want. Updated from edit:
if ((event=="r") && (hopSource == hopDest)) {
getline x
if ((x $31 =="arp") || (x $35 =="AODV")) {
#printf("Badline %s %s \n", $31, $35)
}
else {
macLinkRec++;
#printf("MAC Link Recieved from HEAD - %d to MEMBER %d \n", messageSource, messageDest)
}
}
I am using the print(badline) as just a marker to see what is going on. I fully understand how to restructure the code once I get the search and reference correct. I am also able to print the correct 'next' lines. However, I would expect to be able to search through the next line and create new arguments based on what is contained in the next line. How do I search a 'next line' based on an argument in AWK? How do I reference fields in that line to create new arguments?
Final note, the 'next line' number of fields (NF) varies, but I feel that the $35 field reference should handle any problems there.

Related

Simplest way to find text by regex and replace by lookup table

A legacy web application needs to be internationalized. Error messages are currently written inside source code in this way:
addErrorMessage("some text here");
These signs can be easily found and extracted using regex. They should be replaced with something like this:
addErrorMessage(ResourceBundle.getBundle("/Bundle", lcale).getString("key for text here"));
The correspondence between key for text here and some text here will be in a .property file.
According to some linux guru it can be achieved using awk, but I don't know anything about it. I can write a small application to do that task but it could be overkill. Are there ide plugin or existing applications for this goal ?
awk -v TextOrg='some text here' -v key='key for text here' ' "addErrorMessage(\"" TextOrg "\")" {
gsub( "addErrorMessage(\"" TextOrg "\")" \
, "addErrorMessage(ResourceBundle.getBundle(\"/Bundle\", lcale).getString(\"" key "\"))")
}
7
' YourFile
this is one way for a specific combination. Be carefful with:
assignation of value (-v ... that are constraint by shell interpretation in this case)
gsub is using regex to find, so your text need to be treated with this constraint (ex: "this f***ing text" -> "this f\*\*\*ing text" )
You certainly want to do if for several peer.
her with a file conatinaing peers
assuming that Trad.txt is a file that containt a series of 2 lines 1st, original text, second the key (to avoid some chara as separator that need complexe escape sequence interpretation if used)
ex: Trad.txt
some text
key text
other text
other key
sample code (simple, no exhaustive security, ...) Not tested, but for the concept with awk
awk '
# for first file only
FNR == NR {
# keep in memory first line as text to change
if ( NR % 2 ) TextOrg = $0
else {
# load in array the key corresponding (index is the text to change)
Key[ TextOrg] = $0
Len[ TextOrg] = length( addErrorMessage(\"" $0 "\")"
}
# don't go further in script for this line
next
}
# this point and further is reach only by second file
# if addError is found
/addErrorMessage(".*")/{
# try with each peer if there is a change (a more complex loop is more perfomant checking just necessary replacement but this one do the job)
for( TextOrg in Key) {
# to avoid regex interpretation
# Assuming for this sample code that there is 1 replace (loop is needed normaly)
# try a find of text (return the place where)
Here = index( addErrorMessage(\"" TextOrg "\")", $0)
if( Here > 0) {
# got a match, replace the substring be recreating a full one
$0 = substr( $0, 1, Here) \
"addErrorMessage(ResourceBundle.getBundle(\"/Bundle\", lcale).getString(\"" Key[ TextOrg] "\"))") \
substr( $0, Here + Len[ TextOrg])
}
}
}
# print the line in his current state (modified or not)
7
' Trad.txt YourFile
Finally, this is a workaround solution because lot of special case could occurs like "ref: function addErrorMessage(\" ...\") bla bla" will be an issue, or space inside () not treated here, or cutted line insdie (), ...

awk: concat string with number in dict value

I have next awk oneliner :
{dict[$2"#"$6]=($(NF-2)/($(NF-2)+$NF))*100 } END {for (a in dict) { printf "%s %d :" , a, int(dict[a]) }}
What i need, is to add to value of each dictionary key combination of
($(NF-2)/($(NF-2)+$NF))*100 " out of" $(NF-2)+$NF
So i want awk to calculate all math , then compose string and put it as dictionary value. I already tried with some combination of spaces and brackets but still no luck.
Vars are filled from input stream :
$2 - host , not unique in input stream
$3 - partition , not unique in input stream
$NF - space avail
$NF-2 - space used
$(NF-2)+$NF - gives you overall capacity of partiton
Output is
80% host1#/local/1
Output expected:
80% host1#/local/1 out of 112G
----------------------Solution-----------------------------------
With good catch below , i resolved this. Issue was that i did int() in printf part, that truncated output. Though, further i faced other problems with my wrap-around shell part, therefore my final code was different than i expected it to be asking question.
'{key=($2 "#" $6 " out of " int((($(NF-2)+$NF)/1000)/1000) "GB" ) ; dict[key]=($(NF-2)/($(NF-2)+$NF))*100 } END {for (a in dict) { printf "%s , %d :" , a, int(dict[a]) }}'
I`ve moved "out of " and capacity part to dictionary key , because dict value cannot be string in my case, futher i will compare it with INT.
The concatenation is working fine. It's not the problem.
The problem is that you are calculating the int() of the dictionary value when you print. Since the value is a string, the result is truncated. If you need to use int() do it at the time you perform the calculations rather than at print time.
By the way, if you had provided some sample data it would have been a lot easier to test your code and provide an answer. This is especially important since it's sometimes the case, as it is here, that the problem is in a place that is not where it was anticipated.

GAWK Script using special characters

I am having an issue using special characters. I am parsing a text file separated by tabs. I want to have the program add a "*" to the first word in the line if a certain parameter is true.
if ($Var < $3) $1 = \*$1
Now every time I run it I get the error that it is not the end of the line.
2 things, but without more context to test with we really can't help you much.
$Var will only have meaning if you have set it above like Var=3. Then I don't think gawk will evaluate your $3 to the value of $3. The other side of that expression < $3 WILL expand to the value of the 3rd field. If you're getting $Var from the shell environment, you need to let the gawk script 'see' that value, i.e.
awk '{ ..... if ('"$Var"' < $3) $1= "*" $1 .....}
If you want the string literal '*' pre-pended, you're better off doing $1 = "*" $1
Without sample inputs, sample expected output, actual output and error messages, we'll be playing 20 questions here. If these comments don't solve your problem, please edit your question above to include these items.
P.S. Welcome to StackOverflow and let me remind you of three things we usually do here: 1) As you receive help, try to give it too, answering questions in your area of expertise 2) Read the FAQs, http://tinyurl.com/2vycnvr , 3) When you see good Q&A, vote them up by using the gray triangles, http://i.imgur.com/kygEP.png , as the credibility of the system is based on the reputation that users gain by sharing their knowledge. Also remember to accept the answer that better solves your problem, if any, by pressing the checkmark sign , http://i.imgur.com/uqJeW.png

How to write this in idiomatic awk?

The following program prints out the name of the file, the number of rows, and the number of rows that begin with // in the case that more than one fifth of the rows begin that way.
awk '$1 == "//" { a+=1 } END { if (a * 5 >= NR) {print FILENAME " " NR " " a}}' MyClass.java
This works, but the nested {{}} make me question if I'm doing it right, knowing that the typical structure of an awk program is:
awk 'condition { actions }'
So I suspect that something like
awk '$1 == "//" { a+=1 } END && (a * 5 >= NR) {print FILENAME " " NR " " a}' MyClass.java
would be more appropriate, but every such attempt gives syntax errors. Is there a right way to do this, or is my approach as good as it gets.
There are other ways to express it, but you wrote it idiomatically the first time. Although the authors tend to omit braces whenever they can, you can still find examples of code like that throughout The AWK Programming Language. They should know.
It seems like Aho, Weinberger, and Kernighan have several centuries of development experience in languages whose syntax derives from C. And when they write something like this
if (a * 5 >= NR)
print FILENAME " " NR " " a
it communicates perfectly that the block following the if statement is supposed to contain one and only one statement.
I have considerably fewer centuries of experience. Whenever I read something like that, it communicates perfectly that a) somebody forgot to type {}, and b) somebody else is about to introduce a bug by adding a statement to that block without adding the braces.
Over the years, I've trained myself to type this whenever I type an if.
if () {}
Then I go back and fill it in, breaking lines if I need to. In my normal editor, "if" expands automatically to "if () {}". I'm pretty sure I haven't omitted braces even once since the mid-1980s.

Reorganizing named fields with AWK

I have to deal with various input files with a number of fields, arbitrarily arranged, but all consistently named and labelled with a header line. These files need to be reformatted such that all the desired fields are in a particular order, with irrelevant fields stripped and missing fields accounted for. I was hoping to use AWK to handle this, since it has done me so well when dealing with field-related dilemmata in the past.
After a bit of mucking around, I ended up with something much like the following (writing from memory, untested):
# imagine a perfectly-functional BEGIN {} block here
NR==1 {
fldname[1] = "first_name"
fldname[2] = "last_name"
fldname[3] = "middle_name"
maxflds = 3
# this is just a sample -- my real script went through forty-odd fields
for (i=1;i<=NF;i++) for (j=1;j<=maxflds;j++) if ($i == fldname[j]) fldpos[j]=i
}
NR!=1 {
for (j=1;j<=maxflds;j++) {
if (fldpos[j]) printf "%s",$fldpos[j]
printf "%s","/t"
}
print ""
}
Now this solution works fine. I run it, I get my output exactly how I want it. No complaints there. However, for anything longer than three fields or so (such as the forty-odd fields I had to work with), it's a lot of painfully redundant code which always has and always will bother me. And the thought of having to insert a field somewhere else into that mess makes me shudder.
I die a little inside each time I look at it.
I'm sure there must be a more elegant solution out there. Or, if not, perhaps there is a tool better suited for this sort of task. AWK is awesome in it's own domain, but I fear I may be stretching it's limits some with this.
Any insight?
The only suggestion that I can think of is to move the initial array setup into the BEGIN block and read the ordered field names from a separate template file in a loop. Then your awk program consists only of loops with no embedded data. Your external template file would be a simple newline-separated list.
BEGIN {while ((getline < "fieldfile") > 0) fldname[++maxflds] = $0}
You would still read the header line in the same way you are now, of course. However, it occurs to me that you could use an associative array and reduce the nested for loops to a single for loop. Something like (untested):
BEGIN {while ((getline < "fieldfile") > 0) fldname[$0] = ++maxflds}
NR==1 {
for (i=1;i<=NF;i++) fldpos[i] = fldname[$i]
}