Function doesn't have the same behavior if called from a function - awk

I have written a small function that converts a hex number to decimal using bc. When called directly, it works, however when called multiple times from a single statement in another function, it fails.
Here's a demo script to reproduce the issue:
awk 'function hd(h) {
cmd=sprintf("echo \"ibase=16; obase=A; %s\"|bc", h);
cmd|getline d;
printf("hd(%s)=%s\n", h, d);
return d;
}
function test() {
printf("A=%d, FF=%d\n", hd("A"), hd("FF"));
}
BEGIN {
printf("A=%d, FF=%d\n", hd("A"), hd("FF"));
test();
}'
Here is the output of this:
hd(A)=10
hd(FF)=255
A=10, FF=255
hd(A)=255
hd(FF)=255
A=255, FF=255
As you can see, when executed directly in BEGIN, it works; however when executed through the test() function it fails.
I am using GNU Awk 3.1.5. I also tried GNU Awk 4.1.1 on another machine, it fails in a similar fashion.

The problem is, you didn't close the pipe after cmd|getline d.
adding
close(cmd)
after the getline should fix your problem. getline should be used with caution.
P.S. printf in awk is a statement, not a function.

Related

Create timestamp with fractional seconds

awk can generate a timestamp with strftime function, e.g.
$ awk 'BEGIN {print strftime("%Y/%m/%d %H:%M:%S")}'
2019/03/26 08:50:42
But I need a timestamp with fractional seconds, ideally down to nanoseconds. gnu date can do this with the %N element:
$ date "+%Y/%m/%d %H:%M:%S.%N"
2019/03/26 08:52:32.753019800
But it is relatively inefficient to invoke date from within awk compared to calling strftime, and I need high performance as I'm processing many large files with awk and need to generate many timestamps while processing the files. Is there a way that awk can efficiently generate a timestamp that includes fractional seconds (ideally nanoseconds, but milliseconds would be acceptable)?
Adding an example of what I am trying to perform:
awk -v logFile="$logFile" -v outputFile="$outputFile" '
BEGIN {
print "[" strftime("%Y%m%d %H%M%S") "] Starting to process " FILENAME "." >> logFile
}
{
data[$1] += $2
}
END {
print "[" strftime("%Y%m%d %H%M%S") "] Processed " NR " records." >> logFile
for (id in data) {
print id ": " data[id] >> outputFile
}
}
' oneOfManyLargeFiles
If you are really in need of subsecond timing, then any call to an external command such as date or reading an external system file such as /proc/uptime or /proc/rct defeats the purpose of the subsecond accuracy. Both cases require to many resources to retrieve the requested information (i.e. the time)
Since the OP already makes use of GNU awk, you could make use of a dynamic extension. Dynamic extensions are a way of adding new functionality to awk by implementing new functions written in C or C++ and dynamically loading them with gawk. How to write these functions is extensively written down in the GNU awk manual.
Luckily, GNU awk 4.2.1 comes with a set of default dynamic libraries which can be loaded at will. One of these libraries is a time library with two simple functions:
the_time = gettimeofday()
Return the time in seconds that has elapsed since 1970-01-01 UTC as a floating-point value. If the time is unavailable on this platform, return -1 and set ERRNO. The returned time should have sub-second precision, but the actual precision may vary based on the platform. If the standard C gettimeofday() system call is available on this platform, then it simply returns the value. Otherwise, if on MS-Windows, it tries to use GetSystemTimeAsFileTime().
result = sleep(seconds)
Attempt to sleep for seconds seconds. If seconds is negative, or the attempt to sleep fails, return -1 and set ERRNO. Otherwise, return zero after sleeping for the indicated amount of time. Note that seconds may be a floating-point (nonintegral) value. Implementation details: depending on platform availability, this function tries to use nanosleep() or select() to implement the delay.
source: GNU awk manual
It is now possible to call this function in a rather straightforward way:
awk '#load "time"; BEGIN{printf "%.6f", gettimeofday()}'
1553637193.575861
In order to demonstrate that this method is faster then the more classic implementations, I timed all 3 implementations using gettimeofday():
awk '#load "time"
function get_uptime( a) {
if((getline line < "/proc/uptime") > 0)
split(line,a," ")
close("/proc/uptime")
return a[1]
}
function curtime( cmd, line, time) {
cmd = "date \047+%Y/%m/%d %H:%M:%S.%N\047"
if ( (cmd | getline line) > 0 ) {
time = line
}
else {
print "Error: " cmd " failed" | "cat>&2"
}
close(cmd)
return time
}
BEGIN{
t1=getimeofday(); curtime(); t2=gettimeofday();
print "curtime()",t2-t1
t1=getimeofday(); get_uptime(); t2=gettimeofday();
print "get_uptime()",t2-t1
t1=getimeofday(); gettimeofday(); t2=gettimeofday();
print "gettimeofday()",t2-t1
}'
which outputs:
curtime() 0.00519109
get_uptime() 7.98702e-05
gettimeofday() 9.53674e-07
While it is evident that curtime() is the slowest as it loads an external binary, it is rather startling to see that awk is blazingly fast in processing an extra external /proc/ file.
If you are on Linux, you could use /proc/uptime:
$ cat /proc/uptime
123970.49 354146.84
to get some centiseconds (the first value is the uptime) and compute the time difference between the beginning and whenever something happens:
$ while true ; do echo ping ; sleep 0.989 ; done | # yes | awk got confusing
awk '
function get_uptime( a, line) {
if((getline line < "/proc/uptime") > 0)
split(line,a," ")
close("/proc/uptime")
return a[1]
}
BEGIN {
basetime=get_uptime()
}
{
if(!wut) # define here the cause
print get_uptime()-basetime # calculate time difference
}'
Output:
0
0.99
1.98
2.97
3.97

Awk array, replace with full length matches of keys

I want to replace strings in a target file (target.txt) by strings in a lookup table (lookup.tab), which looks as follows.
Seq_1 Name_one
Seq_2 Name_two
Seq_3 Name_three
...
Seq_10 Name_ten
Seq_11 Name_eleven
Seq_12 Name_twelve
The target.txt file is a large file with a tree structure (Nexus format). It is not arranged in columns.
Therefore I use the following command:
awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' "lookup.tab" "target.txt"
Unfortunately, this command does not take the full length of the elements from the first column, so that Seq_1, Seq_10, Seq_11, Seq_12 end up as Name_one, Name_one0, Name_one1, Name_one2 etc...
How can the awk command be made more specific to correctly substitute the strings?
Try this please, see if it meets your need:
awk 'FNR==NR { le=length($1); a[le][$1]=$2; if (maxL<le) maxL=le; next } { for(le=maxL;le>0;le--) if(length(a[le])) for (i in a[le]) gsub(i, a[le][i]) }1' "lookup.tab" "target.txt"
It's based on your own trying, but instead of randomly replace using the hashes in the array, replace using those longer keys first.
By this way, and based on your examples, I think it's enough to avoid wrongly substitudes.

TCL, How to name a variable that includes another variable

In TCL, I'm writing a procedure that returns the depth of a clock.
But since I have several clocks I want to name the var: depth_$clk
proc find_depth {} {
foreach clk $clocks {
…
set depth_$clk $max_depth
echo $depth_$clk
}
}
But I get:
Error: can't read "depth_": no such variable
Use error_info for more info. (CMD-013)
Your problem is this line:
echo $depth_$clk
The issue is that the syntax for $ only parses a limited set of characters afterwards for being part of the variable name; the $ is not part of that. Instead, you can use the set command with one argument; $ is effectively syntactic sugar for that, but the command lets you use complex substitutions.
echo [set depth_$clk]
HOWEVER!
The real correct thing to do here is to switch to using an associative array. It's a bit larger change to your code, but lets you do more as you've got proper access to substitutions in array element names:
proc find_depth {} {
foreach clk $clocks {
…
set depth($clk) $max_depth
echo $depth($clk)
}
}
echo ${depth}_$cell
This can help too.
Thanks !

How to handle 3 files with awk?

Ok, so after spending 2 days, I am not able solve it and I am almost out of time now. It might be a very silly question, so please bear with me. My awk script does something like this:
BEGIN{ n=50; i=n; }
FNR==NR {
# Read file-1, which has just 1 column
ids[$1]=int(i++/n);
next
}
{
# Read file-2 which has 4 columns
# Do something
next
}
END {...}
It works fine. But now I want to extend it to read 3 files. Let's say, instead of hard-coding the value of "n", I need to read a properties file and set value of "n" from that. I found this question and have tried something like this:
BEGIN{ n=0; i=0; }
FNR==NR {
# Block A
# Try to read file-0
next
}
{
# Block B
# Read file-1, which has just 1 column
next
}
{
# Block C
# Read file-2 which has 4 columns
# Do something
next
}
END {...}
But it is not working. Block A is executed for file-0, I am able to read the property from properties files. But Block B is executed for both files file-1 and file-2. And Block C is never executed.
Can someone please help me solve this? I have never used awk before and the syntax is very confusing. Also, if someone can explain how awk reads input from different files, that will be very helpful.
Please let me know if I need to add more details to the question.
If you have gawk, just test ARGIND:
awk '
ARGIND == 1 { do file 1 stuff; next }
ARGIND == 2 { do file 2 stuff; next }
' file1 file2
If you don't have gawk, get it.
In other awks though you can just test for the file name:
awk '
FILENAME == ARGV[1] { do file 1 stuff; next }
FILENAME == ARGV[2] { do file 2 stuff; next }
' file1 file2
That only fails if you want to parse the same file twice, if that's the case you need to add a count of the number of times that file's been opened.
Update: The solution below works, as long as all input files are nonempty, but see #Ed Morton's answer for a simpler and more robust way of adding file-specific handling.
However, this answer still provides a hopefully helpful explanation of some awk basics and why the OP's approach didn't work.
Try the following (note that I've made the indices 1-based, as that's how awk does it):
awk '
# Increment the current-file index, if a new file is being processed.
FNR == 1 { ++fIndex }
# Process current line if from 1st file.
fIndex == 1 {
print "file 1: " FILENAME
next
}
# Process current line if from 2nd file.
fIndex == 2 {
print "file 2: " FILENAME
next
}
# Process current line (from all remaining files).
{
print "file " fIndex ": " FILENAME
}
' file-1 file-2 file-3
Pattern FNR==1 is true whenever a new input file is starting to get processed (FNR contains the input file-relative line number).
Every time a new file starts processing, fIndexis incremented and thus reflects the 1-based index of the current input file. Tip of the hat to #twalberg's helpful answer.
Note that an uninitialized awk variable used in a numeric context defaults to 0, so there's no need to initialize fIndex (unless you want a different start value).
Patterns such as fIndex == 1 can then be used to execute blocks for lines from a specific input file only (assuming the block ends in next).
The last block is then executed for all input files that don't have file-specific blocks (above).
As for why your approach didn't work:
Your 2nd and 3rd blocks are potentially executed unconditionally, for lines from all input files, because they are not preceded by a pattern (condition).
So your 2nd block is entered for lines from all subsequent input files, and its next statement then prevents the 3rd block from ever getting reached.
Potential misconceptions:
Perhaps you think that each block functions as a loop processing a single input file. This is NOT how awk works. Instead, the entire awk program is processed in a loop, with each iteration processing a single input line, starting with all lines from file 1, then from file 2, ...
An awk program can have any number of blocks (typically preceded by patterns), and whether they're executed for the current input line is solely governed by whether the pattern evaluates to true; if there is no pattern, the block is executed unconditionally (across input files). However, as you've already discovered, next inside a block can be used to skip subsequent blocks (pattern-block pairs).
Perhaps you need to consider adding some additional structure like this:
BEGIN { file_number=1 }
FNR==1 { ++file_number }
file_number==3 && /something_else/ { ...}

Vim: Writing function skeleton

When writing new functions with Vim I always seem to have to do a bit of "manual" work.
x = Cursor position
If I start typing a function and insert a couple of curly braces and the end
function Apples() {x}
Then hit Enter, it obviously looks like this
function Apples() {
x}
Which results in me having to ESC,O in order to shift the closing curlybrace down.
While this may seem like a trivial matter, doing this over the last 5 months is getting bothersome, and I know there are plenty like me out there who knows there should be an elegant solution to this. I am open to plugin-suggestions.
You can use a simple mapping like this one (there are dozens of variants floating around SO and the web):
inoremap {} {<CR>}<C-o>O
You can also search www.vim.org for a dedicated plugin.
But I strongly recommend you to try a snippet-expansion plugin like Snipmate or UltiSnips. Both plugins follow the same model but they have slightly different snippet syntaxes and features. It works like that:
you type a trigger:
fun
you hit <Tab> and you get the following with function_name in select mode:
function [function_name]() {
}
you type your desired name:
function Apples|() {
}
you hit <Tab> to place the cursor between the parentheses:
function Apples(|) {
}
you hit <Tab> again to place the cursor on the line below with the correct indentation:
function Apples() {
|
}
With lh-bracket (in C or C++ mode), when you hit enter whilst in between two brackets, the newlines you are expecting are inserted.
The idea is to test for: getline(".")[col(".")-2:col(".")-1]=="{}" and execute/insert "\<cr>\<esc>O" when the condition is true, or "\<cr>" otherwise.
Within lh-bracket, I have the following into a C-ftplugin:
call lh#brackets#enrich_imap('<cr>',
\ {'condition': 'getline(".")[col(".")-2:col(".")-1]=="{}"',
\ 'action': 'Cpp_Add2NewLinesBetweenBrackets()'},
\ 1,
\ '\<cr\>'
\ )
function! Cpp_Add2NewLinesBetweenBrackets()
return "\<cr>\<esc>O"
endfunction
I guess (code not tested) it would (*) translate into:
" put it into ftplugin/{yourfiltetype, javascript?}.vim
inoremap <buffer> <silent> <expr> <cr> s:SmartCR()
function s:SmartCR()
return getline(".")[col(".")-2:col(".")-1]=="{}"
\ ? "\<cr>\<esc>O"
\ : "\<cr>"
endfunction
(*) Actually, lh#brackets#enrich_imap does a few other things (the mapping is compatible with LaTeXSuite's IMAP.vim presence ; the mapping can be toggled on/off along with all the other mappings from lh-brackets)