correct way to write to the same file from multiple processes awk

correct way to write to the same file from multiple processes awk - awk

The title says it all.
I have 4 awk processes logging to the same file, and output seems fine, not mangled, but I'm not sure that just redirecting print output like this: print "xxx" >> file in every process is the right way to do it.
There are many similar questions around the site, but this one is particularly about awk and a pragmatic, code-correct way to approach the problem.
EDIT
Sorry folks, of course I wasn't "just redirecting" like I wrote, I was appending.

No it is not safe.
the awk print "foo" > "file" will open the file and overwrite the file content, till the end of script.
That is, if your 4 awk processes started writing to the same file on different time, they overwrite the result of each other.
To reproduce it, you could start two (or more) awk like this:
awk '{while(++i<9){system("sleep 2");print "p1">"file"}}' <<<"" &
awk '{while(++i<9){system("sleep 2");print "p2">"file"}}' <<<"" &
and same time you monitoring the content of file, you will see finally there are not exactly 8 "p1" and 8 "p2".
using >> could avoid the losing of entries. but the entry sequence from 4 processes could be messed up.
EDIT
Ok, the > was a typo.
I don't know why you really need 4 processes to write into same file. as I said, with >>, the entries won't get lost (if you awk scripts works correctly). however personally I won't do in this way. If I have to have 4 processes, i would write to different files. well I don't know your requirement, just speaking in general.
outputting to different files make the testing, debugging easier.. imagine when one of your processes had problem, you want to solve it. etc...

I think using the operating system print command is save. As in fact this will append the file write buffer with the string you provide as log. So the system will menage the actual writing process of the data to disc, also if another process will want to use the same file the system will see that the resource is already claimed and will wait for 1st thread to finish its processing, than will allow the 2nd process to write to the buffer.

Related

Compare 2 files in awk and append Pass / Fail

Thank you all for the feedback. Apologies since I am newer to coding and new to SO. Below is the code I have currently been running.
awk 'FNR==NR{a[$4,$5]=$0}{if(b=a[$4,$5]); print b, "PASS";next}else{if(b!=a[$4,$5]){print a, b, "FAIL";next}}'
This appends a PASS next to each line if it is the same, but does print FAIL if there are any inconsistencies in the line.
Trying to get myself more familiar with awk. Using FNR==NR I've been able to compare 2 files (line by line) and then print PASS at the end of the file. However, I cannot actually get it to properly fail the scenario and print FAIL if they do not match. Could anybody help a noobie out?

Here is some awk script to get you started.
$ awk 'NR==FNR{a[NR]=$0;next}
{f=$0!=a[FNR]; delete a[FNR]}
f{c=FNR;exit}
END{c=c?c:(FNR+1);print f||(c in a)?"FAIL on line "c:"PASS"}'
additional complexity is due to files might have different lengths. Note also that there are existing tools (diff, comm, ...) to do this in a much compact way.

Buffering output with AWK

I have an input file which consists of three parts:
inputFirst
inputMiddle
inputLast
Currently I have an AWK script which with this input creates an output file which consists of two parts:
outputFirst
outputLast
where outputFirst and outputLast is generated (on the fly) from inputFirst and inputLast respectively. However, to calculate the outputMiddle part (which is only one line) I need to scan the entire input, so I store it in a variable. The problem is that the value of this variable should go in between outputFirst and outputLast in the output file.
Is there a way to solve this using a single portable AWK script that takes no arguments? Is there a portable way to create temporary files in an AWK script or should I store the output from outputFirst and outputLast in two variables? I suspect that using variables will be quite inefficient for large files.

All versions of AWK (since at least 1985) can do basic I/O redirection to files or pipelines, just like the shell can, as well as run external commands without I/O redirection.
So, there are any number of ways to approach your problem and solve it without having to read the entire input file into memory. The most optimal solution will depend on exactly what you're trying to do, and what constraints you must honour.
A simple approach to the more precise example problem you describe in your comment above would perhaps go something like this: first in the BEGIN clause form two unique filenames with rand() (and define your variables), then read and sum the first 50 numbers from standard input while also writing them to a temporary file, then continuing to read and sum the next 50 numbers and write them to a second file, then finally in an END clause you would use a loop to read the first temporary file with getline and write it to standard output, print the total sum, then read the second temporary file the same way and write it to standard output, and finally call system("rm " file1 " " file2) to remove the temporary files.

If the output file is not too large (whatever that is), saving outputLast in a variable is quite reasonable. The first part, outputFirst, can (as described) be generated on the fly. I tried this approach and it worked fine.

Print the "first" output while processing the file, then write the remainder to a temporary file until you have written the middle.
Here is a self-contained shell script which processes its input files and writes to standard output.
#!/bin/sh
t=$(mktemp -t middle.XXXXXXXXX) || exit 127
trap 'rm -f "$t"' EXIT
trap 'exit 126' HUP INT TERM
awk -v temp="$t" "NR<500000 { print n+1 }
{ s+=$1 }
NR>=500000 { print n+1 >>temp
END { print s }' "$#"
cat "$t"
For illustration purposes, I used really big line numbers. I'm afraid your question is still too vague to really obtain a less general answer, but perhaps this can help you find the right direction.

How do I tell Octave where to find functions without picking up other files?

I've written an octave script, hello.m, which calls subfunc.m, and which takes a single input file, a command line argument, data.txt, which it loads with load(argv(){1}).
If I put all three files in the same directory, and call it like
./hello.m data.txt
then all is well.
But if I've got another data.txt in another directory, and I want to run my script on it, and I call
../helloscript/hello.m data.txt
this fails because hello.m can't find subfunc.m.
If I call
octave --path "../helloscript" ../helloscript/hello.m data.txt
then that seems to work fine.
The problem is that if I don't have a data.txt in the directory, then the script will pick up any data.txt that is lying around in ../helloscript.
This seems a bit fragile. Is there any way to tell octave, preferably in the script itself, to get subfunctions from the same directory as the script, but to get everything else relative to the current directory.
The best robust solution I can think of at the moment is to inline the subfunction in the script, which is a bit nasty.
Is there a good way to do this, or is it just a thorny problem that will cause occasional hard to find problems and can't be avoided?
Is this in fact just a general problem with scripting languages that I've just never noticed before? How does e.g. python deal with it?
It seems like there should be some sort of library-load-path that can be set without altering the data-load-path.

Adding all your subfunctions to your program file is not nasty at all. Why would you think so? It is perfectly normal to have function definitions in your script. The only language I know that does not do this is Matlab but that's just braindead.
The other alternative you have is to check that the input file argument, data.txt exists. Like so:
fpath = argv (){1};
[info, err, msg] = stat (fpath);
if (err)
error ("could not stat `%s' : %s", fpath, msg);
endif
## continue your script knowing the file exists
But really, I would recommend you to use both. Add your subfunctions in your main program, the only reason to have it on separate file is if you plan on sharing with other programs, and always check input arguments.

How to redirect output of a running process to a file in Linux Shell

I am trying a bit of experiments with airmon-ng script in Linux. Meanwhile i want to redirect output of a process "airodump-ng mon0" to a file. I can see the instantaneous output on the screen. The feature of this process is that it won't stop execution(actually it is a script to scan for AP and clients, it will keep on scanning) unless we use ctrl+c.
Whenever i try
airodump-ng mon0 > file.txt
i won't get the output in the file.
My primary assumption is that the shell will write it to the file only after completing the execution. But in the above case i stopped the execution(as the execution won't complete).
So to generalize i can't pipe the output of a running process to a file. How can i do that?
Or is there any alternative way to stop the execution of the process(for example after 5 seconds) and redirect the current output to a file?

A process may send output to standard output or standard error to get it to the terminal. Generally, the former is for information and the latter for errors, but in some cases, a process may mix them up.
I'm supposing that in your case, the standard error is being used. To get both of these to the output file, you can use:
airmon-ng mon0 > file.txt 2>&1
This says to send standard output to file.txt and to reroute 2 (which is the file id for standard error) into 1 (the file id for standard output) so that it also goes to the file.

Inconsistent Behavior In A Batch File's For Statement

I've done very little with batch files but I'm trying to track down a strange bug I've been encountering on a legacy system.
I have a number of .exe files in particular folder. This script is supposed to duplicate them with a different file name.
Code From Batch File
for %%i in (*.exe) do copy \\networkpath\folder\%%i \\networkpath\folder\%%i.backup.exe
(Note: The source and destination folders are THE SAME)
Example Of Desired Behavior:
File1.exe --> Becomes --> File1.exe.backup.exe
File2.exe --> Becomes --> File2.exe.backup.exe
Now first, let me say that this is not the approach I would take. I know there are other (potentially more straight forward) ways to do this. I also know that you might wonder WHY on earth we care about creating a FileX.exe.backup.exe. But this script has been running for years and I'm told the problem only started recently. I'm trying to pinpoint the problem, not rewrite the code (even if it would be trivial).
Example Buggy Output:
File1.exe.backup.exe
File1.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
etc...
File2.exe.backup.exe
File2.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
Not knowing anything about batch files, I looked at this and figured that the condition of the for statement was being re-evaluated after each iteration - creating a (near) infinite loop of copying (I can see that, eventually, the copy will fail when the names get too long).
This would explain the behaviour I'm seeing. And when cleaned the directory in question so that it had only the original File1.exe file and ran the script it produced the bug code. The problem is that I CANNOT replicate the behaviour anywhere else!?!
When I create a folder locally with a few .exe files and run the script - I get the expected output. And yes, if I run it again, I get one instance of 'File1.exe.backup.exe.backup.exe' (and each time I run it again, it increases in length by one). But I cannot get it to enter the near-infinite loop case.
It's been driving me crazy.
The bug is occurring on a networked location - so I've tried to recreate it on one - but again, no success. Because it's a shared network location, I wondered if it could have something to do with other people accessing or modifying files in the folder and even introduced delays and wrote a tiny program to perform actions in the same folder - but without any success.
The documentation I can find on the 'for' statement doesn't really help, but all of the tests I've run seem to suggest that the in (*.exe) section is only evaluated once at the beginning of execution.
Does anyone have any suggestions for what might be going on here?

I agree with Andriy M's comment - it looks to be related to Windows 7 Batch Script 'For' Command Error/Bug
The following change should fix the problem:
for /f "eol=: delims=" %%i in ('dir /b *.exe') do copy \\networkpath\folder\%%i \\networkpath\folder\%%i.backup.exe
Any file that starts with a semicolon (highly unlikely, but it can happen) would be skipped with the default EOL of semicolon. To be safe you should set EOL to some character that could never start a file name (or any path). That is why I chose the colon - it cannot appear in a folder or file name, and can only appear after a drive letter. So it should always be safe.

Copy supports wildcard characters also in target path. You can use
copy \\networkpath\folder\*.exe \\networkpath\folder\*.backup.exe

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas