gawk: Local variables typed differently in SYMTAB - awk

Function parameters seem to be typed differently in the SYMTAB array than if you check them directly.
i.e. typeof(p) is not always equal to typeof(SYMTAB["p"])
I'm trying to write a generic function to initialize a one-dimensional array, with a variable number of elements. This issue is specific to gawk as it uses the SYMTAB array that doesn't exist in standard awk.
For example:
array_new(array1, 1, 2, "c", 4, "e")
array_new(array2, 99, 98, 97)
function array_new(arr, p1, p2, p3, p4, p5, p6) {
# initialization code
}
gawk doesn't allow for optional parameters, so I have set a reasonable upper limit - 6 for the example above but in practice I've been using 20 - and then within the function I only want to create elements for which a value has been given.
A first "brute force" approach that I've tried, and which works, is as follows :
function array_new(arr, p1, p2, p3, p4, p5, p6) {
delete arr
if (typeof(p1) != "untyped") arr[1] = p1
if (typeof(p2) != "untyped") arr[2] = p2
if (typeof(p3) != "untyped") arr[3] = p3
if (typeof(p4) != "untyped") arr[4] = p4
if (typeof(p5) != "untyped") arr[5] = p5
if (typeof(p6) != "untyped") arr[6] = p6
return
}
But the more parameters I want to allow, the more unwieldy this becomes, so I wanted to loop through the parameters and exit as soon as an uninitialized one is encountered.
So I tried the following, but it doesn't work :
function array_new_2(arr, p1, p2, p3, p4, p5, p6, n) {
for (n=1; n<=6; n++) {
if (typeof(SYMTAB["p"n]) != "untyped") arr[n] = SYMTAB["p"n]
else break
}
return
}
Trying to understand why I ran some tests, and it turns out that typeof(p...) is "string", "number" or "untyped" depending on what is passed to the function, or not, but typeof(SYMTAB["p"n]) is always "unassigned".
This is my complete code, with some debugging "print" statements added to see what is happening:
END {
array_new_1(aaa, 3, 2, 1)
array_walk(aaa, "aaa")
print ""
array_new_2(bbb, "a", "b", "c")
array_walk(bbb, "bbb")
print ""
}
function array_walk(arr, name, i) {
for (i in arr) {
if (isarray(arr[i]))
array_walk(arr[i], (name "[" i "]"))
else
printf("%s[%s] = '%s'\n", name, i, arr[i])
}
}
function array_new_1(arr, p1, p2, p3, p4, p5, p6) {
print "p1 : " typeof(p1), p1
print "p6 : " typeof(p6), p6
print "SYMTAB[\"p1\"] : " typeof(SYMTAB["p1"]), SYMTAB["p1"]
print "SYMTAB[\"p6\"] : " typeof(SYMTAB["p6"]), SYMTAB["p6"] # this creates an "unassigned" value
if (typeof(p1) != "untyped") arr[1] = p1
if (typeof(p2) != "untyped") arr[2] = p2
if (typeof(p3) != "untyped") arr[3] = p3
if (typeof(p4) != "untyped") arr[4] = p4
if (typeof(p5) != "untyped") arr[5] = p5
if (typeof(p6) != "untyped") arr[6] = p6
return
}
function array_new_2(arr, p1, p2, p3, p4, p5, p6, n) {
print "p1 : " typeof(p1), p1
print "p6 : " typeof(p6), p6
print "SYMTAB[\"p1\"] : " typeof(SYMTAB["p1"]), SYMTAB["p1"]
print "SYMTAB[\"p6\"] : " typeof(SYMTAB["p6"]), SYMTAB["p1"]
for (n=1; n<=6; n++) {
if (typeof(SYMTAB["p"n]) == "unassigned") arr[n] = SYMTAB["p"n]
else break
}
return
}
This produces the following output :
p1 : number 3
p6 : untyped
SYMTAB["p1"] : unassigned
SYMTAB["p6"] : unassigned
aaa[1] = '3'
aaa[2] = '2'
aaa[3] = '1'
aaa[6] = ''
p1 : string a
p6 : untyped
SYMTAB["p1"] : unassigned
SYMTAB["p6"] : unassigned
bbb[1] = ''
bbb[2] = ''
bbb[3] = ''
bbb[4] = ''
bbb[5] = ''
bbb[6] = ''
So, to sum up, can anyone explain why the SYMTAB versions of the parameters are typed differently from the variables themselves, and also, is there a better way to write my array_new function ?

In appears (to me) that the SYMTAB[]/typing question is a diversion from the main issue of figuring out how to (efficiently) provide a variable number of input parameters to an awk function. [NOTE: This is similar to the issue of wanting to feed a variable number of values on the command line via several -v var=val clauses.]
One common workaround is to concatenate the (variable) list of values into a single delimited string, then have the code split() this delimited string into a variable length array of values.
Applying this to OP's case:
array_new(array1, "1,2,c,4,e") # concatenate the list of values into a comma-delimited string
function array_new{arr, plist) {
n=split(plist,p,",") ... # split the comma-delimited string into an array named p[], with "n" elements in the array
for (i=1;i<=n;i++) { # loop through the indices of the p[] array (1 to n)
# do something with p[i]
}
}
If the sole purpose of OP's function is to populate an array then we can use the input parameter (arr) as the target of the split() call, eg:
function array_new(arr, plist) {
split(plist,arr,",")
}
Running some tests:
array_new(array1, "1,2,c,4,e")
print "########### array1"
for (i=1;i<=length(array1);i++)
print "array1[" i "] = " array1[i]
array_new(array2, "99,98,97")
print "########### array2"
for (i=1;i<=length(array2);i++)
print "array2[" i "] = " array2[i]
This generates:
########### array1
array1[1] = 1
array1[2] = 2
array1[3] = c
array1[4] = 4
array1[5] = e
########### array2
array2[1] = 99
array2[2] = 98
array2[3] = 97
NOTE: for this sample code I've used a comma as the delimiter; if the data could include commas then OP will (obviously) need to switch to a different delimiter (that doesn't show up in the data)
Then again, if OP's only need for the function is to populate the array, there's really no need for a user-defined function to serve as a wrapper for the awk supplied split() function.
In other words, all of this:
array_new(array1, "1,2,c,4,e")
array_new(array2, "99,98,97")
function array_new(arr, plist) {
split(plist,arr,",")
}
Can be replaced by this:
split("1,2,c,4,e", array1, ",")
split("99,98,97", array2, ",")

Looking solely at the SYMTAB[] issue ...
It doesn't appear that SYMTAB[] is aware of 'local' parameters as defined in the function's parameter list unless they are also referenced in the body of the function, but even then it appears that SYMTAB[<param>] doesn't contain the value of the parameter.
Consider the following:
awk '
function array_new_2(arr, p1, p2, p3, p4, p5, p6, n) {
print "p1",p1,SYMTAB["p1"],typeof(SYMTAB["p1"])
print "p6",p6,SYMTAB["p6"],typeof(SYMTAB["p1"])
print "XX",XX,SYMTAB["XX"],typeof(SYMTAB["XX"])
print "############# pX in SYMTAB ?"
for (i=1;i<=6;i++)
print "p"i ("p"i in SYMTAB ? "" : " not") " in SYMTAB[]; SYMTAB[p"i"]=" SYMTAB["p"i]
print "############# for loop:"
for (i in SYMTAB)
if (isarray(SYMTAB[i]) )
print i,"[ array ]"
else
print i,SYMTAB[i]
print "#############"
arr[1]=p1
}
BEGIN {
OFS=","
array_new_2(arrX,1,2,3) # provide values for p1/p2/p3
}
'
This generates:
p1,1,,unassigned # SYMTAB["p1"] is empty/undefined
p6,,,unassigned # no value supplied for p6, SYMTAB["p6"] is empty/undefined
XX,,,unassigned # undefined variable "XX"
############# pX in SYMTAB ?
p1 in SYMTAB[]; SYMTAB[p1]= # in SYMTAB[] because referenced in body of function, but value not in SYMTAB[] !!
p2 not in SYMTAB[]; SYMTAB[p2]= # value provided but not in SYMTAB[] !!
p3 not in SYMTAB[]; SYMTAB[p3]= # value provided but not in SYMTAB[] !!
p4 not in SYMTAB[]; SYMTAB[p4]=
p5 not in SYMTAB[]; SYMTAB[p5]=
p6 in SYMTAB[]; SYMTAB[p6]= # no value provided but in SYMTAB[] because referenced in body of function
############# for loop:
ARGV,[ array ]
i,i
ROUNDMODE,N
ORS,
OFS,,
LINT,0
FNR,0
ERRNO,
NR,0
IGNORECASE,0
p1, # referenced in body of function; SYMTAB[] is empty/undefined
TEXTDOMAIN,messages
NF,0
ARGIND,0
arrX,[ array ] # without "arr[1]=p1" the variable "arrX" is treated as a scalar and SYMTAB["arrX"] is empty/undefined
XX,
ARGC,1
PROCINFO,[ array ]
FIELDWIDTHS,
CONVFMT,%.6g
SUBSEP,
PREC,53
ENVIRON,[ array ]
RS,
FPAT,[^[:space:]]+
p6, # referenced in body of function; SYMTAB[] is empty/undefined
RT,
RLENGTH,0
OFMT,%.6g
FS,
RSTART,0
FILENAME,
BINMODE,0
#############
NOTES:
p1 (=1) and p6 (empty/undefined) show up in for loop output because they are referenced in the body of function
p2 (=2) and p3 (=3) have values supplied by the function call but they are never referenced in the body of the function soooo, they don't show up in the for loop output
p4(empty/undefined) and p5 (empty/undefined) are never referenced in the body of the function soooo, they also don't show up in the for loop output
From this small test it appears:
SYMTAB[] cannot be used to determine if a function's input parameters were supplied values
SYMTAB[] cannot be used to access the actual values passed in the function call

Related

Perl: Combine duplicated keys in Hash of Array

I having issues with this and wondering if someone could provide some help. I'm parsing a .txt file and want to combine duplicated keys and it's values. Essentially, for each identifier I want to store it's height value. Each "sample" has 2 entries (A & B). I have the file stored like this:
while(...){
#data= split ("\t", $line);
$curr_identifier= $data[0];
$markername= $data[1];
$position1= $data[2];
$height= $data[4];
if ($line >0){
$result[0] = $markername;
$result[1] = $position1;
$result[2] = $height;
$result[3] = $curr_identifier;
$data{$curr_identifier}= [#result];
}
}
This seems to work fine, but my issue is that when I send this data to below function. It prints the $curr_identifier twice. I only want to populate unique identifiers and check for the presence of it's $height variable.
if (!defined $data{$curr_identifier}[2]){
$output1= "no height for both markers- failed";
} else {
if ($data{$curr_identifier}[2] eq " ") {
$output1 = $markername;
}
}
print $curr_identifier, $output1 . "\t" . $output1 . "\n";
Basically, if sample height is present for both markers (A&B), then output is both markers.
'1', 'A', 'B'
If height is not present, then output is empty for reported marker.
'2', 'A', ' '
'3', ' ', 'B'
My current output is printing out like this:
1, A
1, B
2, A
2, ' '
3, ' '
3, B'
_DATA_
Name Marker Position1 Height Time
1 A A 6246 0.9706
1 B B 3237 0.9706
2 A 0
2 B B 5495 0.9775
3 A A 11254 0.9694
3 B 0
Your desired output can essentially be boiled down to these few lines of perl code:
while (<DATA>) {
($name,$mark,$pos,$heig,$time) = split /\t/;
print "'$name','$mark','$pos'\n";
}
__DATA__
... your tab-separated data here ...

Can one define a one dimensional image inline?

I would like to describe a very simple image (really a vector) of length 2, like (1,2) for the purpose of some linear algebra.
The following creates a two dimensional image with a y axis of length 1:
image a := [2,1]: {
{1, 2}
}
MatrixPrint(a)
This outputs
{
{1, 2}
}
How would I in a similar fashion output this instead?
{123,45}
Additionally, if I had image of arbitrary shape (a, b), how can I slice it to extract a one dimensional image at a value n, either along the x or y axes? (Extracting a line profile along one of the image axes)
In your example you do define a 2D image, so you get a 2D output. If the image really would be 1D, your output would be 1D, i.e.
image a := [2]: {123, 45}
MatrixPrint(a)
So your second question actually is the answer to your first: You need to do a 1D slice of the data, which you can do with the command slice1() as follows:
image a := [2,1]: {
{123, 45}
}
MatrixPrint( a.slice1(0,0,0,0,2,1) )
Note some peculiarities of the command:
The command always assume the input is 3D, so the first 3 parameters are the start-index triplet x/y/z even if it is just 2D or 1D data.
the 2nd triplet specifies the sampling of the slice. First the dimensions index (0=x) then the number of sampling steps (2) and then the stepsize (1)
Similar slice commands exist for 2D slices, 3D slices and nD Slices from nD data.
The matrixPrint command only outputs to the results window. There is no way to reroute this to some string. However, you can easily make yourself a method that would do that (albeit not very fast for big data):
string VectorPrint( image img, string FormatStr, number maxNum )
{
if ( !img.ImageIsValid() ) return "{invalid}"
if ( 1 != img.ImageGetNumDimensions() ) return "{not 1D}"
string out = "{ "
number nx = img.ImageGetDimensionSize(0)
if (( nx <= maxNum ) || ( maxNum <= 2) )
{
for( number i=0; i<min(nx,maxNum); i++)
out += Format( sum(img[0,i]), FormatStr ) + ", "
out = out.left( out.len() - 2 )
}
else
{
for( number i=0; i<maxNum-1; i++)
out += Format( sum(img[0,i]), FormatStr ) + ", "
out = out.left( out.len() - 2 ) + ", ... , "
out += Format( sum(img[0,nx-1]), FormatStr )
}
out += " }"
return out
}
image a := [10,4]: {
{1,2,3,4,5,6,7,8,9,10},
{123, 45, 12.3, -12, 55, 1.2, 9999, 89.100, 1e-10, 0},
{0,0,0,0,0,0,0,0,0,0},
{1,2,3,4,5,6,7,8,9,10}
}
// Slice 2D image to 1D image at n'th line
number n = 1
image line := a.slice1(0,n,0,0,a.ImageGetDimensionSize(0),1)
// Printout with given number format and a maximum number of entries
string fStr = "%3.1f"
number maxN = 3
Result( "\n "+VectorPrint( line, fStr, maxN ) )

Inverse of `split` function: `join` a string using a delimeter

IN Red and Rebol(3), you can use the split function to split a string into a list of items:
>> items: split {1, 2, 3, 4} {,}
== ["1" " 2" " 3" " 4"]
What is the corresponding inverse function to join a list of items into a string? It should work similar to the following:
>> join items {, }
== "1, 2, 3, 4"
There's no inbuild function yet, you have to implement it yourself:
>> join: function [series delimiter][length: either char? delimiter [1][length? delimiter] out: collect/into [foreach value series [keep rejoin [value delimiter]]] copy {} remove/part skip tail out negate length length out]
== func [series delimiter /local length out value][length: either char? delimiter [1] [length? delimiter] out: collect/into [foreach value series [keep rejoin [value delimiter]]] copy "" remove/part skip tail out negate length length out]
>> join [1 2 3] #","
== "1,2,3"
>> join [1 2 3] {, }
== "1, 2, 3"
per request, here is the function split into more lines:
join: function [
series
delimiter
][
length: either char? delimiter [1][length? delimiter]
out: collect/into [
foreach value series [keep rejoin [value delimiter]]
] copy {}
remove/part skip tail out negate length length
out
]
There is an old modification of rejoin doing that
rejoin: func [
"Reduces and joins a block of values - allows /with refinement."
block [block!] "Values to reduce and join"
/with join-thing "Value to place in between each element"
][
block: reduce block
if with [
while [not tail? block: next block][
insert block join-thing
block: next block
]
block: head block
]
append either series? first block [
copy first block
] [
form first block
]
next block
]
call it like this rejoin/with [..] delimiter
But I am pretty sure, there are other, even older solutions.
Following function works:
myjoin: function[blk[block!] delim [string!]][
outstr: ""
repeat i ((length? blk) - 1)[
append outstr blk/1
append outstr delim
blk: next blk ]
append outstr blk ]
probe myjoin ["A" "B" "C" "D" "E"] ", "
Output:
"A, B, C, D, E"

How to create triangle mesh from list of points in Elm

Lets say I have a list of points
[p1,p2,p3,p4,p5,p6, ...] or [[p1,p2,p3,...],[...]...]
were p1,p2,p3 are one stripe and p4,p5,p6 the other.
p1 - p4 - p7 ...
| / | / |
p2 - p5 - p8 ...
| / | / |
p3 - p6 - p9 ...
. . .
. . .
. . .
How can I transform this into a list of
[(p1,p2,p4), (p4,p5,p2), (p2,p3,p5), (p5,p6,p3), ...]
Is there a way without converting the list into an Array und use get and handle all the Maybes
First let's define how to split a square into two triangles:
squareToTriangles : a -> a -> a -> a -> List (a, a, a)
squareToTriangles topLeft botLeft topRight botRight =
[ (topLeft, botLeft, topRight)
, (topRight, botRight, botLeft)
]
Now, since squares are made of two lists, let's assume you can use a list of tuples as input. Now you can make triangles out of lists of left/right points:
triangles : List (a, a) -> List (a, a, a)
triangles list =
case list of
(tl, tr) :: ((bl, br) :: _ as rest) ->
List.append
(squareToTriangles tl bl tr br)
(triangles rest)
_ ->
[]
Of course, your input doesn't involve tuples, so let's define something that takes a list of lists as input:
triangleMesh : List (List a) -> List (a, a, a)
triangleMesh list =
case list of
left :: (right :: _ as rest) ->
List.append
(triangles <| List.map2 (,) left right)
(triangleMesh rest)
_ ->
[]
Now you can pass in your list of lists, such that:
triangleMesh [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
-- yields...
[(1,2,4),(4,5,2),(2,3,5),(5,6,3),(4,5,7),(7,8,5),(5,6,8),(8,9,6)]
Note that this can probably be optimized by using a better method than List.append, but the general algorithm holds.
You can simply pattern match on your list as follows:
toMesh: List Float -> List (Float, Float, Float)
toMesh list =
case list of
[ p1, p2, p3, p4, p5, p6] ->
Just [(p1,p2,p4), (p4,p5,p2), (p2,p3,p5), (p5,p6,p3)]
_ ->
[]

Awk - Substring comparison

Working native bash code :
while read line
do
a=${line:112:7}
b=${line:123:7}
if [[ $a != "0000000" || $b != "0000000" ]]
then
echo "$line" >> FILE_OT_YHAV
else
echo "$line" >> FILE_OT_NHAV
fi
done <$FILE_IN
I have the following file (its a dummy), the substrings being checked are both on the 4th field, so nm the exact numbers.
AAAAAAAAAAAAAA XXXXXX BB CCCCCCC 12312312443430000000
BBBBBBB AXXXXXX CC DDDDDDD 10101010000000000000
CCCCCCCCCC C C QWEQWEE DDD AAAAAAA A12312312312312310000
I m trying to write an awk script that compares two specific substrings, if either one is not 000000 it outputs the line into File A, if both of them are 000000 it outputs the line into File B, this is the code i have so far :
# Before first line.
BEGIN {
print "Awk Started"
FILE_OT_YHAV="FILE_OT_YHAV.test"
FILE_OT_NHAV="FILE_OT_NHAV.test"
FS=""
}
# For each line of input.
{
fline=$0
# print "length = #" length($0) "#"
print "length = #" length(fline) "#"
print "##" substr($0,112,7) "##" substr($0,123,7) "##"
if ( (substr($0,112,7) != "0000000") || (substr($0,123,7) != "0000000") )
print $0 > FILE_OT_YHAV;
else
print $0 > FILE_OT_NHAV;
}
# After last line.
END {
print "Awk Ended"
}
The problem is that when i run it, it :
a) Treats every line as having a different length
b) Therefore the substrings are applied to different parts of it (that is why i added the print length stuff before the if, to check on it.
This is a sample output of the line length awk reads and the different substrings :
Awk Started
length = #130#
## ## ##
length = #136#
##0000000##22016 ##
length = #133#
##0000001##16 ##
length = #129#
##0010220## ##
length = #138#
##0000000##1022016##
length = #136#
##0000000##22016 ##
length = #134#
##0000000##016 ##
length = #137#
##0000000##022016 ##
Is there a reason why awk treats lines of the same length as having a different length? Does it have something to do with the spacing of the input file?
Thanks in advance for any help.
After the comments about cleaning the file up with sed, i got this output (and yes now the lines have a different size) :
1 0M-DM-EM-G M-A.M-E. #DEH M-SM-TM-OM-IM-WM-EM-IM-A M-DM-V/M-DM-T/M-TM-AM-P 01022016 $
2 110000080103M-CM-EM-QM-OM-MM-TM-A M-A. 6M-AM-HM-GM-MM-A 1055801001102 0000120000012001001142 19500000120 0100M-D000000000000000000000001022016 $
3 110000106302M-TM-AM-QM-EM-KM-KM-A 5M-AM-AM-HM-GM-MM-A 1043801001101 0000100000010001001361 19500000100M-IM-SM-O0100M-D000000000000000000000001022016 $
4 110000178902M-JM-AM-QM-AM-CM-IM-AM-MM-MM-G M-KM-EM-KM-AM-S 71M-AM-HM-GM-MM-A 1136101001101 0000130000013001006061 19500000130 0100M-D000000000000000000000001022016 $