how to create an empty array - lint

UPDATE
The original description below has many errors; gawk lint does not complain about uninitialized arrays used as RHS of in. For example, the following example gives no errors or warnings. I am not deleting the question because the answer I am about to accept gives good suggestion of using split with an empty string to create an empty array.
BEGIN{
LINT = "fatal";
// print x; // LINT gives error if this is uncommented
thread = 0;
if (thread in threads_start) {
print "if";
} else {
print "not if";
}
}
Original Question
A lot of my awk scripts have a construct as follows:
if (thread in threads_start) { // LINT warning here
printf("%s started at %d\n", threads[thread_start]));
} else {
printf("%s started at unknown\n");
}
With gawk --lint which results in
warning: reference to uninitialized variable `thread_start'
So I initialize in the BEGIN block as follows. But this looks kludge-y. Is there a more elegant way to create a zero-element array?
BEGIN { LINT = 1; thread_start[0] = 0; delete thread_start[0]; }

I think you might have made a few typo's in your code.
if (thread in threads_start) { // LINT warning here (you think)
Here you look for the index thread in array threads_start.
printf("%s started at %d\n", threads[thread_start])); // Actual LINT warning
But here you print the index thread_start in array threads! Also notice the different s's thread/threads and threads_start/thread_start. Gawk is actually warning you correctly about the usage of thread_start (without s) on the second line.
There also is an error in your printf format.
When you change these the lint warning disappears:
if (thread in threads_start) {
printf("%s started at %d\n", thread, threads_start[thread]));
} else {
printf("%s started at unknown\n");
}
But perhaps I've misunderstood what your code is supposed to do. In that case, could you post a minimal self-contained code sample that produces the spurious lint warning?

Summary
The idiomatic method of creating an empty array in Awk is to use split().
Details
To simplify your example above to focus on your question rather than your typos, the fatal error can be triggered with:
BEGIN{
LINT = "fatal";
if (thread in threads_start) {
print "if";
} else {
print "not if";
}
}
which produces the following error:
gawk: cmd. line:3: fatal: reference to uninitialized variable `thread'
Giving thread a value before using it to search in threads_start passes linting:
BEGIN{
LINT = "fatal";
thread = 0;
if (thread in threads_start) {
print "if";
} else {
print "not if";
}
}
produces:
not if
To create a linting error with an uninitialised array, we need to attempt to access an non-existent entry:
BEGIN{
LINT = "fatal";
thread = 0;
if (threads_start[thread]) {
print "if";
} else {
print "not if";
}
}
produces:
gawk: cmd. line:4: fatal: reference to uninitialized element `threads_start["0"]'
So, you don't really need to create an empty array in Awk, but if you want to do so, and answer your question, use split():
BEGIN{
LINT = "fatal";
thread = 0;
split("", threads_start);
if (thread in threads_start) {
print "if";
} else {
print "not if";
}
}
produces:
not if

Related

Else syntax error when nesting array formula

I am recieving a syntax error on "else" for this shell:
{for (i=8;i<=NF;i+=3)
{if ($0~"=>") # if-else statement designed to flag file / directory transfers
print "=> flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2);
{split ($(i+2), array, "/");
for (x in array)
{j++;
a[j] =j;
printf (array[x] ",");}
printf ("%s\n", "");}
else
print "no => flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2)
}
}
Can't figure out why. If I delete the array block (starting with split()), all is well. But I need to scan the contents of $(i+2), so cutting it does me no good.
Also, if anyone has guidance on a good list of how to interpret error messages, that would be great.
Thanks for your advice.
EDIT: here is the above script laid out with sensible formatting:
{
for (i=8;i<=NF;i+=3) {
if ($0~"=>") # if-else statement designed to flag file / directory transfers
print "=> flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2);
{
split ($(i+2), array, "/");
for (x in array) {
j++;
a[j] =j;
printf (array[x] ",");
}
printf ("%s\n", "");
}
else
print "no => flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2)
}
}
First thing first, since you didn't post any samples of input and expected output so didn't test it at all. Could you please try following, I hope you are running this in .awk script style. Also these are mostly syntax/cosmetic changes NOT on logic part, since no background was given on problem.
BEGIN{
OFS=","
}
{
for (i=8;i<=NF;i+=3){
if ($0~/=>/){
print "=> flag,"$1,$2,$3,$4,$5,$6,$7,$(i),$(i+1),$(i+2)
split ($(i+2), array, "/");
for(x in array){
j++;
a[j] =j;
printf (array[x] ",")
}
printf ("%s\n", "")
}
else{
print "no => flag",$1,$2,$3,$4,$5,$6,$7,$(i),$(i+1),$(i+2)
}
}
}
Problems fixed in OP's attempt:
{ starting curly braces(which indicates that if condition of for loop with multiple statements is started) could be in last of the line where they are present, NOT in next line, for better visibility purposes, I fixed in for loop and if condition first.
Since you are using regexp matching with a pattern so I fixed from $0~"=>" TO $0~/=>/.
Added BEGIN section in your attempt where I have set OFS(output field separator) value to , so that you need NOT to print like "," to print comma between variables, just , between variables will do the trick.
Fixed indentation, so that we are NOT confused where to close loop/condition and where to NOT.

In AWK, skip the rest of the current action?

Thanks for looking.
I have an AWK script with something like this;
/^test/{
if ($2 == "2") {
# What goes here?
}
# Do some more stuff with lines that match test, but $2 != "2".
}
NR>1 {
print $0
}
I'd like to skip the rest of the action, but process the rest of the patterns/actions on the same line.
I've tried return but this isn't a function.
I've tried next but that skips the rest of the patterns/actions for the current line.
For now I've wrapped the rest of the ^test action in the if statement's else, but I was wondering if there was a better approach.
Not sure this matters but I am using gawk on OSX, installed via brew (for better compatibility with my target OS).
Update (w/solution):
Edits: Expanded code sample based on #karakfa's answer.
BEGIN{
keepLastLine = 1;
}
/^test/ && !keepLastLine{
printLine = 1;
print $0;
next;
}
/^test/ && keepLastLine{
printLine = 0;
next;
}
/^foo/{
# This is where I have the rest of my logic (approx 100 lines),
# including updates to printLine and keepLastLine
}
NR>1 {
if (printLine) {
print $0
}
}
This will work for me, I even like it better that what I was thinking of.
However I do wonder what if my keepLastLine condition was only accessible in a for loop?
I gather from what #karakfa has said, there isn't a control structure for exiting only an action, and continuing with other patterns, so that would have to be implemented with a flag of some sort (not unlike #RavinderSingh13's answer).
If I got it correct could you please try following. I am creating a variable named flag here which will be chedked if condition inside test block for checking if 2nd field is 2 is TRUE then it will be SET. When it is SET so rest of statements in test BLOCK will NOT be executed. Also resetting flag's value before read starts for a line too.
awk '
{
found=""
}
/^test/{
if ($2 == "2") {
# What goes here?
found=1
}
if(!found){
# Do some more stuff with lines that match test, but $2 != "2".
}
}
NR>1 {
print $0
}' Input_file
Testing of code here:
Let's say following is the Input_file:
cat Input_file
file
test 2 file
test
abcd
After running code following we will get following output, where if any line is having test keyword and NOT having $2==2 then also it will execute statements outside of test condition.
awk '
{
found=""
}
/^test/{
if ($2 == "2") {
print "# What goes here?"
found=1
}
if(!found){
print "Do some more stuff with lines that match test, but $2 != 2"
}
}
NR>1 {
print $0
}' Input_file
# What goes here?
test 2 file
Do some more stuff with lines that match test, but $2 != 2
test
abcd
the magic keyword you're looking for is else
/^test/{ if($2==2) { } # do something
else { } # do something else
}
NR>1 # {print $0} is implied.
for some reason if you don't want to use else just move up condition one up (flatten the hierarchy)
/^test/ && $2==2 { } # do something
/^test/ && $2!=2 { } # do something else
# other action{statement}s

Merging rows in a file | Performance Improvement

I have a file in which I have to merge 2 rows on the basis of:
- Common sessionID
- Immediate next matching pattern (GX with QG)
file1:
session=001,field01,name=GX1_TRANSACTION,field03,field04
session=001,field91,name=QG
session=001,field01,name=GX2_TRANSACTION,field03,field04
session=001,field92,name=QG
session=004,field01,name=GX1_TRANSACTION,field03,field04
session=002,field01,name=GX1_TRANSACTION,field03,field04
session=002,field01,name=GX2_TRANSACTION,field03,field04
session=002,field92,name=QG
session=003,field91,name=QG
session=003,field01,name=GX2_TRANSACTION,field03,field04
session=003,field92,name=QG
session=004,field91,name=QG
session=004,field01,name=GX2_TRANSACTION,field03,field04
session=004,field92,name=QG
I have created an awk (I am new and learnt awk only from This portal only) which created my desired output.
Output1
session=001,field01,name=GX1_TRANSACTION,field03,field04,session=001,field91,name=QG
session=001,field01,name=GX2_TRANSACTION,field03,field04,session=001,field92,name=QG
session=002,field01,name=GX1_TRANSACTION,field03,field04,NOMATCH-QG
session=002,field01,name=GX2_TRANSACTION,field03,field04,session=002,field92,name=QG
session=003,field01,name=GX2_TRANSACTION,field03,field04,session=003,field92,name=QG
session=004,field01,name=GX1_TRANSACTION,field03,field04,session=004,field91,name=QG
session=004,field01,name=GX2_TRANSACTION,field03,field04,session=004,field92,name=QG
Output2: Pending
session=003,field91,name=QG
Awk:
{
if($0~/name=GX1_TRANSACTION/ || $0~/GX2_TRANSACTION/) {
if($1 in ccr)
print ccr[$1]",NOMATCH-QG";
ccr[$1]=$0;
}
if($0~/name=QG/) {
if($1 in ccr) {
print ccr[$1]","$0;
delete ccr[$1];
}
else {
print $0",NOUSER" >> Pending
}
}
}
END {
for (i in ccr)
print ccr[i]",NOMATCH-QG"
}
Command:
awk -F"," -v Pending=t -f a.awk file1
But Issue is my "file1" is really big, So I want to improve the performance of this script. Is their any way by which I can improve its performance?
There are a couple of changes that may lead to small improvements in speed, and if not may give you some ideas for future awk scripts.
Don't "manually" test every line if you don't have to - raise the name= tests to the main awk loop. Currently your script checks $0 up to three times per line for a name= match.
Since you're using , as the FS, test the corresponding field ($3) instead of $0. It only saves a few leading chars of pattern matching in your example data.
Here's a refactored a.awk:
$3~/name=GX[12]_TRANSACTION/ {
if($1 in ccr)
print ccr[$1]",NOMATCH-QG";
ccr[$1]=$0;
}
$3~/name=QG/ {
if($1 in ccr) {
print ccr[$1]","$0;
delete ccr[$1];
}
else {
print $0",NOUSER" >> Pending
}
}
END { for (i in ccr) print ccr[i]",NOMATCH-QG" }
I've also condensed the GX pattern match to one regex. I get the same output as your example.
In any program, IO (e.g. print statements) is usually the most real-time intensive operation. In awk there's an operation that's even slower, though, and that's string concatenation. Because awk doesn't require you to pre-allocate memory for strings, the memory gets allocated dynamically so then when you increase the length of a string, it must get dynamically re-allocated. So, you can speed up your program by removing the string concatenations, e.g. for all those hard-coded ","s you're printing instead of just setting/using the OFS.
I haven't really thought about the logic of your overall approach but there's a couple of other tweaks you could try:
BEGIN{ FS=OFS="," }
NF {
if ($3 ~ /name=GX[12]_TRANSACTION/) {
if($1 in ccr) {
print ccr[$1], "NOMATCH-QG"
}
ccr[$1]=$0
}
else {
if($1 in ccr) {
print ccr[$1], $0
delete ccr[$1]
}
else {
print $0, "NOUSER" >> Pending
}
}
}
END {
for (i in ccr)
print ccr[i], "NOMATCH-QG"
}
Note that by setting FS in the script you no longer need to use -F"," on the command line.
Are you sure you want >> instead of > on the print to "Pending"? Those 2 constructs don't mean the same in awk as they do in shell.

awk: catch `exit' in the END block

I'm using awk for formatting an input file in an output file. I have several patterns to fill variables (like "some pattern" in the example). These variables are printed in the required format in the END block. The output has to be done there because the order of appearance in the input file is not guaranteed, but the order in the output file must be always the same.
BEGIN {
FS = "=|,"
}
/some pattern/ {
if ($1 == 8) {
var = $1
} else {
# Incorrect field value
exit 1
}
}
END {
# Output the variables
print var
}
So my problem is the exit statement in the pattern. If there is some error and this command is invoked, there should be no output at all or at the most an error message. But as the gawk manual (here) says, if the exit command is invoked in a pattern block the END block will be executed at least. Is there any way to catch the exit like:
if (!exit_invoked) {
print var
}
or some other way to avoid printing the output in the END block?
Stefan
edit: Used the solution from shellter.
you'll have to handle it explicitly, by setting exit_invoked before exit line, i.e.
BEGIN {
FS = "=|,"
}
/some pattern/ {
if ($1 == 8) {
var = $1
} else {
# Incorrect field value
exit_invoked=1
exit 1
}
}
END {
if (! exit_invoked ) {
# Output the variables
print var
}
}
I hope this helps.
END {
# If here from a main block exit error, it is unlikely to be at EOF
if (getline) exit
# If the input can still be read, exit with the previously set status rather than run the rest of the END block.
......
Being a fan of short syntax and trying to avoid futile {}s or adding them later to pre-existing programs, instead of:
...
else {
exit_invoked=1
exit 1
}
...
END {
if (! exit_invoked ) {
print var
}
}
I use:
else
exit (e=1) # the point
...
END {
if(!e)
print v
}

Are there any AWK syntax checkers?

Are there any AWK syntax checkers? I'm interested in both minimal checkers that only flag syntax errors and more extensive checkers along the lines of lint.
It should be a static checker only, not dependent on running the script.
If you prefix your Awk script with BEGIN { exit(0) } END { exit(0) }, you're guaranteed that none of your of code will run. Exiting during BEGIN and END prevents other begin and exit blocks from running. If Awk returns 0, your script was fine; otherwise there was a syntax error.
If you put the code snippet in a separate argument, you'll get good line numbers in the error messages. This invocation...
gawk --source 'BEGIN { exit(0) } END { exit(0) }' --file syntax-test.awk
Gives error messages like this:
gawk: syntax-test.awk:3: x = f(
gawk: syntax-test.awk:3: ^ unexpected newline or end of string
GNU Awk's --lint can spot things like global variables and undefined functions:
gawk: syntax-test.awk:5: warning: function `g': parameter `x' shadows global variable
gawk: warning: function `f' called but never defined
And GNU Awk's --posix option can spot some compatibility problems:
gawk: syntax-test.awk:2: error: `delete array' is a gawk extension
Update: BEGIN and END
Although the END { exit(0) } block seems redundant, compare the subtle differences between these three invocations:
$ echo | awk '
BEGIN { print("at begin") }
/.*/ { print("found match") }
END { print("at end") }'
at begin
found match
at end
$ echo | awk '
BEGIN { exit(0) }
BEGIN { print("at begin") }
/.*/ { print("found match") }
END { print("at end") }'
at end
$ echo | awk '
BEGIN { exit(0) } END { exit(0) }
BEGIN { print("at begin") }
/.*/ { print("found match") }
END { print("at end") }'
In Awk, exiting during BEGIN will cancel all other begin blocks, and will prevent matching against any input. Exiting during END is the only way to prevent all other event blocks from running; that's why the third invocation above shows that no print statements were executed. The GNU Awk User's Guide has a section on the exit statement.
GNU awk appears to have a --lint option.
For a minimal syntax checker, which stops at the first error, try awk -f prog < /dev/null.