Is there a way to merge 2 arrays in GREL - openrefine

In a GREL expression, is there a way to merge 2 arrays?
I tried ["a","b"]+["c","d"] but the result is a java error.

Short answer: Not with Grel.
Here is the complete list of the "arrays" methods in Grel and their respective Java code. It should not be very difficult to add a "merge" or "append" method, but would it be worth it? It is very rare to have more than one array in a cell (I have never encountered this case).
It is precisely to solve this kind of rare but possible case that Open Refine offers two other more powerful scripting languages, Jython and Clojure. In Python/Jython, the operation you want to do is as simple as:
return [1,2,3] +[3,4,5] #result : [ 1, 2, 3, 3, 3, 4, 5 ]
Would it be possible/worth the effort to make it easier with some Grel new function?

There is a way to do it (though it might be a bad idea):
split(join(["a","b"], "|") + "|" + join(["c","d"], "|"), "|")
Join each array with a delimiter character that does not appear in the data. (I've chosen the pipe character.) Add the resulting joined-up arrays together, and add the delimiter between them. Now, they form the string a|b|c|d. This string can be split on the | delimiter into a new array.

Related

get each number in String and Compare in TCL/tk

I have string output:
1 4 2 1 4
I want to get each character in string to compare.
I did it to want to know whether the list is sorted yet.
It's not exactly clear to me what you are trying to achieve. Going by "to know whether the list is sorted", and assuming a list of integers, you can use tcl::mathop::< or tcl::mathop::<=, depending on whether you want to allow duplicate values:
if {[tcl::mathop::<= {*}$list]} {
puts "List is sorted"
} else {
puts "List is mixed up"
}
This will also work for ASCII comparison of strings. For more complex comparisons, like using dictionary rules or case insensitive, it's probably easiest to combine that with lsort along with the -indices option:
tcl::mathop::< {*}[lsort -indices -dictionary $list]
The -indices option returns the original index of each list element in sorted order. By checking if those indices are in incremental order, you know if the original list was already sorted.
Of course, if the point of the exercise was to avoid unnecessary sorting, then this is no use. But then again, bubble sort of an already sorted list is very fast and will basically do exactly the comparisons you described. So just sorting will probably be faster than first checking for a sorted list via a scripted loop.
To get each character in the string, do split $the_string "" (yes, on the empty string). That gives you a list of all the characters in the string; you can use foreach to iterate over them. Remember, you can iterate over two (or more) lists at once:
foreach c1 [split $the_string ""] c2 $target_comparison_list {
if {$c1 ne $c2} {
puts "The first not equal character is “$c1” when “$c2” was expected"
break
}
}
Note that it's rarely useful to continue comparison after a difference is found as the most common differences are (relative to the target string) insertions and deletions; almost everything after either of those will differ.

How to yield all substrings from string using sequence?

I'm trying to learn the Sequence in Kotlin.
Assume I want to get a sequence of all substrings of a string with the yield statement. I understand how to do this with two nested loops with the right and left borders.
It seems to me that there is an efficient way to use a Sequence or a pair of nested Sequences instead of loops. But I can't figure out how to do it.
How to yield all substrings from string using sequence?
Thanks
Frankly, I don't know what is the most efficient method. And I would just use for loops. But here's my solution to this problem, maybe it will help you understand sequences and this style of writing code:
Here it is on the Playground
fun String.substrings() =
indices.asSequence().flatMap { left ->
(left + 1..length).asSequence().map { right -> substring(left, right) }
}
Sequences aren't especially efficient, there's a bunch of overhead involved for each one - their main strength is being able to pass each element through the whole chain of operations one at a time.
This means you don't have to create an entire new collection of elements for each intermediate step (lower memory usage), you can terminate earlier once you find a result you're looking for, and sequences can be infinite. Even then, they might still be slower than the normal list version, depending on exactly what you're working with.
The most efficient sequence is probably what you're doing, using a couple of for loops and yielding items. But if you mean "efficient" like "using the standard library instead of writing out for loops" then #Furetur's answer is a way to do it, or you could use sliding windows like this:
val stuff = "12345"
val substrings = with(stuff) {
indices.asSequence().flatMap { i ->
windowedSequence(length - i)
}
}
print(substrings.toList())
>>>>[12345, 1234, 2345, 123, 234, 345, 12, 23, 34, 45, 1, 2, 3, 4, 5]
basically just using windowed (with the default of partialWindows=false) for every possible substring length, from length to 1, using the sequence versions of everything

Referencing nested arrays in awk

I'm creating a bunch of mappings that can be indexed into using 3 keys such as below:
mappings["foo"]["bar"]["blah"][1]=0
split( "10,13,19,49", mappings["foo"]["bar"]["blah"] )
I can then index into the nested array using for example
mappings[product][format][version][i]
But this is a bit long-winded when I need to refer to the same nested array several times, so in other languages I'd create a reference to the inner array:
map=mappings[product][format][version]
map[i]
However, I can't seem to get this to work in awk (gawk 4.1.3).
I can only find one link over google, that suggests this is impossible in previous versions of awk, and a loop setting the keys and values one-by-one is the only solution. Is this still the case or does anyone have a suggestions for a better solution?
https://developer.apple.com/library/archive/documentation/OpenSource/Conceptual/ShellScripting/Howawk-ward/Howawk-ward.html
EDIT
In response to comments a bit more background on what I'm trying to do. If there is a better approach, I'm all for using it!
I have set of CSV files that I'm feeding into AWK. The idea is to calculate a checksum based on specific columns after applying filtering to the rows.
The columns to checksum on, and the filtering to apply, are derivived from runtime parameters sent into the script.
The runtime parameters are a triple of (product,format,version), hence my use of a 3-nested assoicative array.
Another approach would be to use triple as a single key, rather than nesting, but gawk doesn't seem to natively support this, so I'd end-up concatenating the values as string. This felt a bit less structured to me, but if I'm wrong, happy to change my mind on this apporach.
Anyway, it is these parameters that are used to index into the array to structure to retrieve the column numbers, etc.
You can then build-up a tree-like structure, for example, the below shows 2 formats for product foo on version blah, and so on...:
mappings["product-foo"]["format-bar"]["version-blah"][1]=0
split( "10,13,19,49", mappings["product-foo"]["format-bar"]["version-blah"] )
mappings["product-foo"]["format-moo"]["version-blah"][1]=0
split( "55,23,14,6", mappings["product-foo"]["format-moo"]["version-blah"] )
The magic happens like this, you can see how long-winded the mappings indexing becomes without referencing:
(FNR>1 && (format!="some-format" ||
(version=="some-version" && $1=="some-filter") ||
(version=="some-other-version" && $8=="some-other-filter"))) {
# Loop over each supplied field summing an absolute tally for each
for (i=1; i <= length(mappings[product][format][version]); i++) {
sumarr[i] += ( $mappings[product][format][version][i] < 0 ? -$mappings[product][format][version][i]:$mappings[product][format][version][i] )
}
}
The comment from #ed-morton simplifies this as originally requested, but interested if their is a simpler approach.
The right answer is from #ed-morton above (thanks!).
Ed - if you write it out as an answer I'll accept it, otherwise I'll accept this quote in a few days for good housekeeping.
Right, there is no array copy functionality in awk and there are no pointers/references so you can't create a pointer to an array. You can of course create function map(i) { return mappings[product][format][version][i]}

how to get the longest string in an array in Openrefine

With GREL is it possible to get the longest string of an array ?
For example, if I have an array with 3 strings ["a","aaa","aa"], I want to obtain "aaa".
You can probably do that at the cost of a very complicated formula. It's typically to face this kind of case that Open Refine added Python (and Clojure) as scripting languages. Even if you don't know Python, you can find in two minutes the answer to the question "how to choose the longest string in list?" and simply copy and paste it (by adding a "return" instead of "print")
In this case :
return max(['a','aaa','aaaa','aa'], key=len)
EDIT
Just for the sake of the challenge, here is a possible solution with GREL.
value = "a,aa,aaaa,aa"
forEach(value.split(','), e, if(length(e)==sort(forEach(value.split(','), e, e.length()))[-1], e, null)).join(',').split(',')

How to append to a list in Automation Anywhere 10.5?

The list starts empty. Then I want to append an value to it for each iteration in a loop if certain condition is met. I don't see append option in Variable Operation.
You can use string split for this, assuming you know of a delimiter that won't ever be in your list of values. I've used a semi-colon, and $local_joinedList$ starts off empty.
If (certain condition is met)
Variable Operation: $local_joinedList$;$local_newValue$ To $local_joinedList$
End If
String Operation: Split "$local_joinedList$" with delimiter ";" and assign output to $my-list-variable$
This overwrites $my-list-variable$.
If you need to append to an existing list, you can do it the same way by using String Join first, append your values to the string, then split it again afterward.
String Operation: Join elements of "$my-list-variable$" by delimiter ";" and assign output to $local_joinedList$
Lists are buggy in Automation Anywhere and have been buggy for several versions. I suggest not using them and instead use XML.
It it a much more versatile approach and allows you to do much more that with lists. You can search, filter, insert, delete etc.
For the example you mention, you would use the "Insert Node" command.
Throwing in my 2 cents as well - my-list-variable appears to be the only mutable in size list you can work with. From my experience with 10.7, it only grows though.
So if you made a list with 60 values, and you wanted to use my-list-variable again for 55, you'll need to clear out those remaining 5 values and create an if condition when looping over the list to ensure the values are not whatever you set those 5 values to be.
I used lime's answer as a reference (thanks lime!) to populate a list variable from some data in an Excel spreadsheet.
Here's my automation for it: