Does exptrk has a built-in operator to allow a search in a vector of strings - exprtk

I was reading the exptrk documentation. I could not see an operator that would work to find if a given string is a part of vector of strings. Does exptrk support this
e.g. I am looking for an operator like "A" in ["A", "B", "C"]

Related

Writing an `__array_ufunc__` for string dtypes

I'm implementing a class that mixes in NDArrayOperatorsMixin using the appraoch described here.
This works well for numbers, but doesn't work with string dtypes. For example,
x = MyNewArrayClass(np.array(["a", "b", "c"]))
x == "a"
raises the following UFuncTypeError:
numpy.core._exceptions.UFuncTypeError: ufunc 'equal' did not contain a loop with signature matching types (dtype('<U1'), dtype('<U1')) -> dtype('bool')
How can I modify the implementation suggested in the docs to support str dtypes?

Is there a way to merge 2 arrays in GREL

In a GREL expression, is there a way to merge 2 arrays?
I tried ["a","b"]+["c","d"] but the result is a java error.
Short answer: Not with Grel.
Here is the complete list of the "arrays" methods in Grel and their respective Java code. It should not be very difficult to add a "merge" or "append" method, but would it be worth it? It is very rare to have more than one array in a cell (I have never encountered this case).
It is precisely to solve this kind of rare but possible case that Open Refine offers two other more powerful scripting languages, Jython and Clojure. In Python/Jython, the operation you want to do is as simple as:
return [1,2,3] +[3,4,5] #result : [ 1, 2, 3, 3, 3, 4, 5 ]
Would it be possible/worth the effort to make it easier with some Grel new function?
There is a way to do it (though it might be a bad idea):
split(join(["a","b"], "|") + "|" + join(["c","d"], "|"), "|")
Join each array with a delimiter character that does not appear in the data. (I've chosen the pipe character.) Add the resulting joined-up arrays together, and add the delimiter between them. Now, they form the string a|b|c|d. This string can be split on the | delimiter into a new array.

Find a specific number in a string

I'm using an IF THEN statement to determine if a string contains a specific number. For example, the string = 1, 9, 13. I'm trying to isolate strings that contain the single number "3". However, when I use Like "3", or Like "3", I also get the results back that contain 13. How do I use wildcards to do this?
If your string is just a list of numbers in the form you have shown ...
"n1, n2, n3, n4"
... then you can use VBA's Like function as follows:
Debug.Print " 1, 7, 5, 13," Like "*[ ]3,*" 'matches 3 but not 13
If your string is arbitrary, then this is actually a regular expression question, and you'll need to decide if this a route you want to go down. Something like the following works at the Regex Tester Page:
/\b3\b/g
There's a useful VBA Regex Regular Expressions Guide which shows you how to use syntax like this, including how to set a reference to the additional library that you need for it to work.

StringTemplate 3: how to filter a list?

How can I remove specific elements from a list(=multi-valued-attribute) using a map? For example, let's say I want to filter out all b's in a given list:
<["a", "b", "c", "b"]: {<table.(it)>}; separator=",">
table ::= ["b":, default: key]
The desired outcome would be "a,c" but the actual result is "a,,c,"
The thing is that the map successfully turn b's into nulls, but then they're wrapped in an anonymous template {} and become non-null values. So they won't go away with strip() function, either.
So the question is, would it be possible to filter a list using a map by slightly modifying the code above?
update
I've found a workaround:
filter(it) ::= "<if(it)><it><endif>"
<["a", "b", "c", "b"]: {<table.(it)>}: filter(); separator=",">
This gives the result I wanted: a,c
Might not want to filter in your template, but nonetheless, could be a bug.
Ok, i checked it out. That gives empty not null so it thinks it's an item. ST treats false conditionals same way: empty not null. I think you need to filter in model.

MongoDB or CouchDB or something else?

I know this is another question on this topic but I am a complete beginner in the NoSQL world so I would love some advice. People at SO told me MySQL might be a bad idea for this dataset so I'm asking this. I have lots of data in the following format:
TYPE 1
ID1: String String String ...
ID2: String String String ...
ID3: String String String ...
ID4: String String String ...
which I am hoping to convert into something like this:
TYPE 2
ID1: String
ID1: String
ID1: String
ID1: String
ID2: String
ID2: String
This is the most inefficient way but I need to be able to search by both the key and the value. For instance, my queries would look like this:
I might need to know what all strings a given ID contains and then intersect the list with another list obtained for a different ID.
I might need to know what all IDs contain a given string
I would love to achieve this without transforming Type 1 into Type 2 because of the sheer space requirements but would like to know if either MongoDB or CouchDB or something else (someone suggested NoSQL so started Googling and found these two are very popular) would help me out in this situation. I can a 14 node cluster I can leverage but would love some advice on which one is the right database for this usecase. Any suggestions?
A few extra things:
The input will mostly be static. I will create new data but will not modify any of the existing data.
The ID is 40 bytes in length whereas the strings are about 20 bytes
MongoDB will let you store this data efficiently in Type 1. Depending on your use it will look like one these (data is in JSON):
Array of Strings
{ "_id" : 1, "strings" : ["a", "b", "c", "d", "e"] }
Set of KV Strings
{ "_id" : 1, "s1" : "a", "s2" : "b", "s3" : "c", "s4" : "d", "s5" : "e" }
Based on your queries, I would probably use the Array of Strings method. Here's why:
I might need to know what all strings
a given ID contains and then intersect
the list with another list obtained
for a different ID.
This is easy, you get one Key Value look-up for the ID. In code, it would look something like this:
db.my_collection.find({ "_id" : 1});
I might need to know what all IDs contain a given string
Similarly easy:
db.my_collection.find({ "strings" : "my_string" })
Yeah it's that easy. I know that "strings" is technically an array, but MongoDB will recognize the item as an array and will loop through to find the value. Docs for this are here.
As a bonus, you can index the "strings" field and you will get an index on the array. So the find above will actually perform relatively fast (with the obvious trade-off that the index will be very large).
In terms of scaling a 14-node cluster may almost be overkill. However, Mongo does support auto-sharding and replication sets. They even work together, here's a blog post from a 10gen member to get you started (10gen makes Mongo).