I have encountered a problem in one of my TCL scripts. I need to run it in an infinite loop with a terminating condition and in every loop I need to write some output. This is the basic code that im using:
proc wr {i} {
puts -nonewline "$i"
}
proc do {roof} {
set end 0
while {$end < $roof} {
after 1000
wr $end
incr end
}
}
do 10
The expected behaviour is that every second there will be a new output until $end == $roof. But instead after running this script, the console window is busy for 10 seconds and after that time, the entire output prints out at once.
Thank you for your advice :)
The problem is that you don't flush stdout.
If you modify your script so it flushes stdout:
proc wr {i} {
puts -nonewline "$i"
flush stdout
}
proc do {roof} {
set end 0
while {$end < $roof} {
after 1000
wr $end
incr end
}
}
do 10
It will work. You can also change the buffering of the stdout channel to none, the default is line:
fconfigure stdout -buffering none
If you write more than one line, the default buffering will flush stdout when it encounters a newline, but you never write a newline.
Related
I would like to implement incremental execution of scripts using gawk in order to interleave script source and script output in a document.
The idea would be to read script lines into awk to print them and also pipe them into an appropriate interpreter. Then, on a queue from the input file, read any output from the coprocess and print it to standard output. But it seems that I must know how much output has been generated before looping over the coprocess output.
Is there any way to do a non-blocking read from the coprocess?
function script_checkpoint() {
while(("python3" |& getline output) > 0)
print output
}
/^# checkpoint/ { script_checkpoint(); next }
{ print; print $0 |& "python3" }
END { script_checkpoint() }
EDIT: I have tried to implement this without using a coprocess by buffering the input lines until a checkpoint and just letting the interpreter print to standard out itself but the interpreter always buffers its output until the stream closes. I don't want to close it until the program ends to preserve its internal state.
EDIT: made it more clear that my first intended use case is running python scripts. Here is a sample input/output pair.
print('first line')
# checkpoint
print('second line')
should result in
print('first line')
first line
print('second line')
second line
The general issue:
while ((interpreter |& getline output) > 0) runs until it sees an EOF but ...
interpreter does not end/terminate/exit, thus no EOF is sent so ...
awk hangs while waiting for interpreter to send more data so ...
we end up with a deadlock situation (awk waiting for input from interpreter; interpreter waiting for input from awk)
Assumptions:
need to maintain a single invocation of interpreter throughout the run (per a comment from OP); net result: awk cannot depend on interpreter sending an EOF
interpreter can be modified (to generate additional output)
the awk script has no way of knowing how many lines of output will be generated by interpreter
One idea is to setup a handshake between awk and interpreter. Within the while ((interpreter |& getline output) > 0) loop we'll test for our handshake and when we see it break out of the loop and return back to the main awk script.
For demo purposes I'll use a simple bash script that does some handshake processing otherwise just prints to stdout whatever it reads from stdin:
$ cat interpreter
#!/usr/bin/bash
while read -r line
do
if [[ "${line}" = 'checkpoint' ]] # received 'checkpoint' handshake?
then
echo "CHECKPOINT" # send "CHECKPOINT" handshake/acknowledgement
continue
else
echo "interpreter: $line"
fi
done
Demo awk code with handshake logic:
awk '
function script_checkpoint() {
while (( cmd |& getline output) > 0) {
if ( output == "CHECKPOINT" ) # received "CHECKPOINT" handshake/acknowledgement?
break
print output
}
}
BEGIN { cmd= "./interpreter" }
/^# checkpoint/ { print "checkpoint" |& cmd # send "checkpoint" handshake
script_checkpoint()
next
}
{ print "awk: " $0
print $0 |& cmd
}
END { print "awk: last checkpoint" # in case last line of input is not "# checkpoint" we will ...
print "checkpoint" |& cmd # send one last "checkpoint" handshake
script_checkpoint()
print "awk: done"
}
' test.dat
Sample input file:
$ cat test.dat
line1
line2
# checkpoint
line3
line4
# checkpoint
line5
Output:
awk: line1
awk: line2
interpreter: line1
interpreter: line2
awk: line3
awk: line4
interpreter: line3
interpreter: line4
awk: line5
awk: last checkpoint
interpreter: line5
awk: done
NOTES:
awk will still hang in the event interpreter crashes and/or fails to send back the CHECKPOINT handshake
if the strings checkpoint and/or CHECKPOINT can show up in the 'normal' data streams then update the code to use strings that are not expected in the data streams
It sounds like you're trying to do something like this:
BEGIN { cmd="/my/python/script/path" }
function script_checkpoint( output) {
close(cmd,"to")
while ( (cmd |& getline output) > 0 ) {
print output
}
close(cmd)
}
/^# checkpoint/ {
script_checkpoint()
next
}
{
print
print |& cmd
}
END { script_checkpoint() }
can anyone tell me how I can update the following procedure to handle big files please (size <= 10 G):
proc read_text_file { file } {
set fp [open ${file} r]
set return_data ""
while { [gets $fp each_line] != -1 } {
lappend return_data ${each_line}
}
close $fp
return ${return_data}
}
my objective is to read a huge file line by line in a better runtime
Thanks
When you have a very large file, you categorically want to avoid bringing it all into memory at once. (Also, Tcl 8.* has a memory chunk allocation limit that makes bringing in 50GB of data intensely exciting. That's a long-standing API bug that's fixed in 9.0 — in alpha — but you'll have to put up with it for now.)
If you can, do a pass over the file to identify where the interesting sub-chunks of it are. For the sake of argument, let's assume that those are the lines that match a pattern; here's an example that finds where procedures are in a Tcl script (under some simple assumptions).
proc buildIndices {filename} {
set f [open $filename]
set indices {}
try {
while {![eof $f]} {
set idx [tell $f]
set line [gets $f]
if {[regexp {^proc (\w+)} $line -> name]} {
dict set indices $name $idx
}
}
return $indices
} finally {
close $f
}
}
Now you have the indices, you can then pull in a procedure from the file like this:
proc obtainProcedure {filename procName indices} {
set f [open $filename]
try {
seek $f [dict get $indices $procName]
set procedureDefinition ""
while {[gets $f line] >= 0} {
append procedureDefinition $line "\n"
if {[info complete $procedureDefinition]} {
# We're done; evaluate the script in the caller's context
tailcall eval $procedureDefinition
}
}
} finally {
close $f
}
}
You'd use that like this:
# Once (possibly even save this to its own file)
set indices [buildIndices somefile.tcl]
# Then, to use
obtainProcedure somefile.tcl foobar $indices
If you're doing this a lot, convert your code to use a database; they're a lot more efficient in the long run. The index building is equivalent to building the database and the other procedure is equivalent to doing a DB query.
I currently have a awk method to parse through whether or not an expression output contains more than one line. If it does, it aggregates and prints the sum. For example:
someexpression=$'JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)'
might be the one-liner where it DOESN'T yield any information. Then,
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
printf "%d\n", a[i]
}
}'
this will yield NULL or an empty return. Instead, I would like to have it return a numeric value of $0$ if empty. How can I modify the above to do this?
Nothing in UNIX "returns" anything (despite the unfortunately named keyword for setting the exit status of a function), everything (tools, functions, scripts) outputs X and exits with status Y.
Consider these 2 identical functions named foo(), one in C and one in shell:
C (x=foo() means set x to the return code of foo()):
foo() {
printf "7\n"; // this is outputting 7 from the full program
return 3; // this is returning 3 from this function
}
x=foo(); <- 7 is output on screen and x has value '3'
shell (x=foo means set x to the output of foo()):
foo() {
printf "7\n"; # this is outputting 7 from just this function
return 3; # this is setting this functions exit status to 3
}
x=foo <- nothing is output on screen, x has value '7', and '$?' has value '3'
Note that what the return statement does is vastly different in each. Within an awk script, printing and return codes from functions behave the same as they do in C but in terms of a call to the awk tool, externally it behaves the same as every other UNIX tool and shell script and produces output and sets an exit status.
So when discussing anything in UNIX avoid using the term "return" as it's imprecise and ambiguous and so different people will think you mean "output" while others think you mean "exit status".
In this case I assume you mean "output" BUT you should instead consider setting a non-zero exit status when there's no match like grep does, e.g.:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
print a[i]
}
exit (NR < 2)
}'
and then your code that uses the above can test for the success/fail exit status rather than testing for a specific output value, just like if you were doing the equivalent with grep.
You can of course tweak the above to:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
if ( NR > 1 ) {
for (i in a) {
print a[i]
}
}
else {
print "$0$"
exit 1
}
}'
if necessary and then you have both a specific output value and a success/fail exit status.
You may keep a flag inside for loop to detect whether loop has executed or not:
echo "$someexpression" |
awk 'NR>1 {
a[$4]++
}
END
{
for (i in a) {
p = 1
printf "%d\n", a[i]
}
if (!p)
print "$0$"
}'
$0$
I am currently using awk scripting to censor the console output and I print one dot for each censored line.
I want to update this code to make it avoid printing more than one dot per minute (or something similar). Obviously that if I do not get any progress (streamed new lines), no update is supposed to happen.
Current version of the code is at https://gist.github.com/ssbarnea/f7b72491af524fa364d9fc328cd43f2a
Note: I know that I could print a newline with "mod 10" or similar in order to limit the output but that approach is not good because the lines are not received with a consistent speed, sometimes I get lots of them, sometimes i get only one or two. Due to this I need to use a timer based approach which would do something like "print newline if the last one was printed more than x seconds ago"
With GNU awk for time functions you can print dots no more frequently than once per minute by simply comparing the time in seconds since the epoch when the current input line is being processed with the time when the previous dot was printed:
awk '
function prtDot() {
currTime = systime()
if ( (currTime - prevTime) > 60 ) {
printf "." | "cat>&2"
prevTime = currTime
}
}
{ print $0; prtDot() }
END { print "" | "cat>&2" }
'
e.g. printing a . every 10 seconds within a stream of numbers:
$ cat tst.awk
function prtDot() {
currTime = systime()
if ( (currTime - prevTime) > 10 ) {
printf "." | "cat>&2"
prevTime = currTime
}
}
{ printf "%s",$0%10 | "cat>&2"; prtDot() }
END { print "" | "cat>&2" }
$ i=0; while (( i < 50 )); do echo $((++i)); sleep 1; done | awk -f tst.awk
1.2345678901.23456789012.3456789012.34567890123.4567890
$ i=0; while (( i < 50 )); do echo $((++i)); sleep 3; done | awk -f tst.awk
1.2345.6789.0123.4567.8901.2345.6789.0123.4567.8901.2345.6789.0
the slight difference between the actual digits printed and expected is due to how long other parts of the while loop add to the overall interval between echos and other small imprecisions affecting when the shell loop is printing numbers and consequently when systime() is getting called in awk.
I have a gawk program that uses a coprocess. However, sometimes I don't have any data to write to the coprocess, and my original script hangs while waiting for the output of the coprocess.
The code below reads from STDIN, writes each line to a "cat" program, running as a coprocess. Then it reads the coprocess output back in and writes it to STDOUT. If we change the if condition to be 1==0, nothing gets written to the coprocess, and the program hangs at the while loop.
From the manual, it seems that the coprocess and the two-way communication channels are only started the first time there is an IO operation with the |& operator. Perhaps we can start things without actually writing anything (e.g. writing an empty string)? Or is there a way to check if the coprocess ever started?
#!/usr/bin/awk -f
BEGIN {
cmd = "cat"
## print "" |& cmd
}
{
if (1 == 1) {
print |& cmd
}
}
END {
close (cmd, "to")
while ((cmd |& getline line)>0) {
print line
}
close(cmd)
}
Great question, +1 for that!
Just test the return code of the close(cmd, "to") - it will be zero if the pipe was open, -1 (or some other value) otherwise. e.g.:
if (close(cmd, "to") == 0) {
while ((cmd |& getline line)>0) {
print line
}
close(cmd)
}