I'm using neo4j 2.01
Could you please tell me how to make this map literals work?
neo4j-sh (?)$ MERGE (n:person {name:'Alice', age:38, address:{city:'London', residential:true}}) RETURN n;
MatchError: Map(city -> London, residential -> true) (of class scala.collection.immutable.Map$Map2)
Map literals are only allowed in the RETURN clause. Neo4j properties on nodes are either primitives, Strings or arrays of them. If you want to store a more complex value with a node, decompose the complex value into a set of nodes and relationships and attach it to your entity node with a relationship:
MERGE (n:person {name:'Alice', age:38})-[:LIVES_IN]->(a:address {city:'London', residential:true}}) RETURN n,a;
Even the residential flag is implicitly encoded into the relationship as LIVES_IN, so you might omit it and rely on verbose relationship types.
To use map literals in RETURN:
match (n:person)-[:LIVES_IN]->(a:address) return {name:n.name, age:n.age, address:a}
I'm interested in using the Visions library to automate the process of identifying certain types of security (stock) identifiers. The documentation mentions that it could be used in such a way for ISBN codes but I'm looking for a more concrete example of how to do it. I think the process would be pretty much identical for the fields I'm thinking of as they all have check digits (ISIN, SEDOL, CUSIP).
My general idea is that I would create custom types for the different identifier types and could use those types to
Take a dataframe where the types are unknown and identify columns matching the types (even if it's not a 100% match)
Validate the types on a dataframe where the intended type is known
Great question and use-case! Unfortunately, the documentation on making new types probably needs a little love right now as there were API breaking changes with the 0.7.0 release. Both the previous link and this post from August, 2020 should cover the conceptual idea of type creation in greater detail. If any of those examples break then mea culpa and our apologies, we switched to a dispatch based implementation to support different backends (pandas, numpy, dask, spark, etc...) for each type. You shouldn't have to worry about that for now but if you're interested you can find the default type definitions here with their backends here.
Building an ISBN Type
We need to make two basic decisions when defining a type:
What defines the type
What other types are our new type related to?
For the ISBN use-case O'Reilly provides a validation regex to match ISBN-10 and ISBN-13 codes. So,
What defines a type?
We want every element in the sequence to be a string which matches a corresponding ISBN-10 or ISBN-13 regex
What other types are our new type related to?
Since ISBN's are themselves strings we can use the default String type provided by visions.
Type Definition
from typing import Sequence
import pandas as pd
from visions.relations import IdentityRelation, TypeRelation
from visions.types.string import String
from visions.types.type import VisionsBaseType
isbn_regex = "^(?:ISBN(?:-1[03])?:?●)?(?=[0-9X]{10}$|(?=(?:[0-9]+[-●]){3})[-●0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[-●]){4})[-●0-9]{17}$)(?:97[89][-●]?)?[0-9]{1,5}[-●]?[0-9]+[-●]?[0-9]+[-●]?[0-9X]$"
class ISBN(VisionsBaseType):
def get_relations() -> Sequence[TypeRelation]:
relations = [
return relations
def contains_op(series: pd.Series, state: dict) -> bool:
return series.str.contains(isbn_regex).all()
Looking at this closely there are three things to take note of.
The new type inherits from VisionsBaseType
We had to define a get_relations method which is how we relate a new type to others we might want to use in a typeset. In this case, I've used an IdentityRelation to String which means ISBNs are subsets of String. We can also use InferenceRelation's when we want to support relations which change the underlying data (say converting the string '4.2' to the float 4.2).
A contains_op this is our definition of the type. In this case, we are applying a regex string to every element in the input and verifying it matched the regex provided by O'Reilly.
In theory ISBNs can be encoded in what looks like a 10 or 13 digit integer as well - to work with those you might want to create an InferenceRelation between Integer and ISBN. A simple implementation would involve coercing Integers to string and applying the above regex.
Can I turn off case sensitivity in DataWeave?
Two different requests are returning responses where the first contains a node called CDATA while the other contains a node called CData. In DataWeave is there a way to treat these as equal or do I need to have separate code statements such as payload.Data.CDATA and payload.Data.CData? If things were case insensitive I could have a single statement such as payload.data.cdata.
Thanks in advance,
It appears that I need two different statements.
payload.Data.*CDATA map $.#SeqId when payload.Data? and payload.Data.CDATA? and payload.Data.CDATA.#SeqId?
payload.Data.*CData map $.#SeqId when payload.Data? and payload.Data.CData? and payload.Data.CData.#SeqId?
No, but you can create a function like the following to select ignoring case.
Which filters an object by a given key (mapObject comparing keys using lower) and then gets the values from the resulting object (with pluck).
%function selectIgnoreCase(obj, keyName)
obj mapObject ((v, k) -> k match {
x when (lower x) == keyName -> {(k): v},
default -> {}
}) pluck $
And you'd use it like this:
selectIgnoreCase(payload.Data, "cdata")
Note: With Mule 4 (and DW 2) syntax for this would be a little bit better.
For an application I'm considering, there would be a large (100,000+) 'database' of trees (think expressions in a programming language, or S-expressions), and I would need to query that database for expressions that match a specific given expression.
Before giving the details of what I'd like to have, note that I'd appreciate any information related to indexing a large set of trees for optimizing lookup by a subtree.
In my specific situation (which would be for a backend to be used by Metamath proof assistants), expressions have the following structure (in Haskell-like notation):
data Expression = Placeholder Id | VarName Id | ConstName Id [Expression]
or as a BNF for an S-expression form:
Expression = '?' Id | Id | '(' Id Expression* ')'
where Id is some kind of identifier.
For example, I could have a database with expressions like
(equiv ?ph ?ps)
(not (in (appl (sqrt) (2)) (Q)))
(equiv (eq ?A ?B) (forall ?x (equiv (in ?x ?A) (in ?x ?B))))
In this context, two expressions match if they can be made equal by substitution of expressions for placeholders. So looking up (equiv (eq A (emptyset)) ?ph) in the above mini-database would result in the first and last expressions.
So again: how would I implement fast lookups in a large set of (expression) trees with placeholders? What kind of index data structure could I use?
I would implement the lookup with a trie. Each key would consist of one of the following:
ConstName Identifier
Variable w/ context info
These should be ordered in some fashion- possibly Placeholder, then all ConstNames (alphabetical), then variables (scope ordering, then argument order), then ConstValues (numerical order). As long as there's a concrete ordering for usage in the trie, you're fine.
Traverse the expression's tree, injecting the appropriate keys into the trie as they are encountered. Do this for all the expressions you want to insert into your data structure. When it comes time to query it, you can traverse the trie in a similar fashion, but with a few new rules.
Everything matches a placeholder node. If it matches some other key as well, then you'll need to explore both branches (easily done via a recursive DFS-like approach).
A placeholder matches everything. This is not equivalent to the previous point- we are talking about placeholders in the query here, the previous bullet is regarding placeholders as trie keys.
Now, this does mean that the search space can somewhat "explode" as you encounter placeholders, but there is one thing you can do to try to mitigate this in practice. Traverse the expression's tree in a breadth-first fashion (both in construction of the trie, and querying). This means if one of the arguments is a placeholder, you won't have to full-depth search every single subtree that matches that expression so far- instead you jump ahead to the next argument- which may not be a placeholder, and will thus greatly prune the search space (compared to matching "everything").
For completeness sake, lets take one of your examples
(not (in (appl (sqrt) (2)) (Q)))
and make a trie entry from that-
not -> in -> apply -> "Q" -> sqrt -> 2
adding (not (in ?ph E)) to this would result in-
not -> in -> apply -> "Q" -> sqrt -> 2
\-> ?ph -> "E"
Continue in this fashion injecting expressions into the trie. Also traverse in this fashion for querying until you reach the ends of your searches into the trie, and return those that matched.
Note- the uniqueness of these entries is based on the assumption you do not have to support variadic functions. If you do, attach to each key some context info (read the next paragraphs for info on how to do this) to distinguish which arguments go to which functions
There is one detail I glossed over- variables. If you only want it to match if they are the exact same variable name, then no work is necessary. But this likely isn't what you want; you probably want it to match generic variables as long as they are "consistent" with each other. The way to do this is to assign each variable an identifier that represents the scope of which it was first defined.
The easiest way to do this is just compose an identifier from the concatenation of the argument ordering of its ancestors. That is, if a variable is first defined as the second argument to a function which is the fifth argument to the root function, then we might label it as (5, 2) or (2, 5), whichever makes more sense intuitively. Either way, this will ensure the variable is given a consistent identifier regardless of other variables / functions elsewhere. Then proceed as normal with this new variable name.
In my Room model, I have an attribute named available_days, which is being stored as an array.
For example:
=> ["wed", "thurs", "fri"]
What is the best way to find all Rooms where the size of the array is equal to 3?
I've tried something like
Room.where('LENGTH(available_days) = ?', 3)
with no success.
Update: the data type for available_days is a string, but in order to store an array, I am serializing the attribute from my model:
serialize :available_days
Can't think of a purely sql way of doing it for sqlite since available_days is a string.
But here's one way of doing it without loading all records at once.
rooms = []
Room.in_batches(of: 10).each_record do |r|
rooms << r if r.available_days.length == 3
p rooms
If you're using postgres you can parse the serialized string to an array type, then query on the length of the array. I expect other databases may have similar approaches. How to do this depends on how the text is being serialized, but by default for Rails 4 should be YAML, so I expect you data is encoded like this:
- first
- second
The following SQL will remove the leading ---\n- as well as the final newline, then split the remaining string on - into an array. It's not strictly necessary to cleanup the extra characters to find the length, but if you want to do other operations you may find it useful to have a cleaned up array (no leading characters or trailing newline). This will only work for simple YAML arrays and simple strings.
Room.where("ARRAY_LENGTH(STRING_TO_ARRAY(RTRIM(REPLACE(available_days,'---\n- ',''),'\n'), '\n- '), 1) = ?", 3)
As you can see, this approach is rather complex. If possible you may want to add a new structured column (array or jsonb) and migrate the serialized string into the a typed column to make this easier and more performant. Rails supports jsonb serialization for postgres.
I am new to redis so I apologize if this question seems naive. I want to create a hash of the following type:
item = {{"bititem":00001010000100...001010},
Where bititem is a bit array created by setbit and property is a simple integer value. Is there any way to do this in redis or do I have to create different objects?
From your example, it is not clear to me why you need the extra depth-level around bititem.
Also, it is not clear to me what you want to do with it afterwards. So I give you three scenario's:
1. Serialized:
You can always serialize your data if it involves multiple levels. Most efficient is MsgPack, second best is JSON. You can deserialize the data in Lua-Redis when needed.
2. Hashed:
If you don't need multiple levels, simply do:
HSET item:01 bititem 00001010000100...001010
HSET item:01 property 1
Only do this though, if you really need to extract the different datamembers often. Separate members have quite some overhead. In general, I prefer to serialize the whole object (with a SET or a HSET).
3. Bitwise enabled:
If you want to make use of Redis' bitwise operations, you need to use simple strings (GET/SET). For example:
SET item:01:bititem "00001010000100...001010"
SET item:01:property 1
or even better:
SET item:01:bititem "00001010000100...001010"
SET item:01:properties [all-other-properties-serialized-as-msgpack]
Hope this helps, TW