What is the difference (if any) between setting indexNodeName=true on the node type definition and defining a virtual nodeName property with the attribute name=:nodeName. indexNodeName is defined as follows:
Default to false. If set to true then index would also be created for
node name. This would enable faster evaluation of queries involving
constraints on Node name
Index the nodename as property aims the be similar to indexNodeName, but this doesn't imply "the same as". The docs are not saying that much about this:
The string :nodeName - this special case indexes node name as if it’s
a virtual property of the node being indexed. Setting this along with
nodeScopeIndex=true is akin to setting indexNodeName=true on indexing
rule.
So is it required to set both or only one of the settings in order to query the nodename. If just one of them, which one and what is the difference?
Examples:
//element(*, app:Asset)[fn:name() = ‘kite’]
//*[jcr:like(fn:name(), ‘kite%’)]
//element(kite, app:Asset)
//element(*, dam:Asset)[(jcr:like(fn:lower-case(fn:name()), 'kite%')
indexNodeName=true is a shortcut to having a property definition with name=:nodeName AND nodeScopeIndex=true.
The name=:nodeName allows for more flexibility (at the cost of a bit of complexity) to index node names for other usages too - suggestions, spellchecks, etc.
So, if you just want to query for node names using either of the methods should work well (although, imo, indexNodeName=true is simpler and cleaner).
Otoh, if you also want for node names to show up as suggestion/spellcheck results, then you'd have to resort to have a property definition with name=:nodeName AND nodeScopeIndex=true AND useInSuggest=true.
Related
I'm new to Aerospike and am probably missing something fundamental, but I'm trying to see an enumeration of the Keys in a Set (I'm purposefully avoiding the word "list" because it's a datatype).
For example,
To see all the Namespaces, the docs say to use SHOW NAMESPACES
To see all the Sets, we can use SHOW SETS
If I want to see all the unique Keys in a Set ... what command can I use?
It seems like one can use client.scan() ... but that seems like a super heavy way to get just the key (since it fetches all the bin data as well).
Any recommendations are appreciated! As of right now, I'm thinking of inserting (deleting) into (from) a meta-record.
Thank you #pgupta for pointing me in the right direction.
This actually has two parts:
In order to retrieve original keys from the server, one must -- during put() calls -- set policy to save the key value server-side (otherwise, it seems only a digest/hash is stored?).
Here's an example in Python:
aerospike_client.put(key, {'bin': 'value'}, policy={'key': aerospike.POLICY_KEY_SEND})
Then (modified Aerospike's own documentation), you perform a scan and set the policy to not return the bin data. From this, you can extract the keys:
Example:
keys = []
scan = client.scan('namespace', 'set')
scan_opts = { 'concurrent': True, 'nobins': True, 'priority': aerospike.SCAN_PRIORITY_MEDIUM }
for x in (scan.results(policy=scan_opts)): keys.append(x[0][2])
The need to iterate over the result still seems a little clunky to me; I still think that using a 'master-key' Record to store a list of all the other keys will be more performant, in my case -- in this way, I can simply make one get() call to the Aerospike server to retrieve the list.
You can choose not bring the data back by setting includeBinData in ScanPolicy to false.
I found the output of the dependency:packageCycles constraint shipped with jQAssistant hard to interpret. Specifically I'm keen on finding an example instance of classes that make up the cyclic dependency.
Given I found a cyclce of packages, for each pair of adjunct packages I would need to find two classes that connect them.
This is my first attempt at the Cypher query, but there are still some relevant parts missing:
MATCH nodes = (p1:Package)-[:DEPENDS_ON]->(p2: Package)-[:DEPENDS_ON*]->(p1)
WHERE p1 <> p2
WITH extract(x IN relationships(nodes) |
(:Type)<--(:Package)-[x]->(:Package)-->(:Type)) AS cs
RETURN cs
Specifically, in order to really connect the two packages, the two types should be related to each other with DEPENDS_ON as shown below:
(:Type)<--(:Package)-[x]->(:Package)-->(:Type)
| ^
| DEPENDS_ON |
+--------------------------------------+
For above pattern I would have to return the two types (and not the packages, for instance). Preferably the output for a single cyclic dependency consists of a list of qualified class names (otherwise multiple one cannot possibly distinguish the class chains of more than one cyclic dependency).
For this specific purpose I find Cypher to be very limited, support for identifying and collecting new graph patterns during path traversal does not seem to be the easiest thing to do. Also the attempt to give names to the (:Type) nodes resulted in syntax errors.
Also I messed a lot around with UNWIND, but to no avail. It lets you introduce new MATCH clauses on per-element basis (say, the elements of relationships(nodes)), but I do not know of another method to undo the damaging effects of unwind: the surrounding list structure is removed, such that the traces of multiple cyclic dependencies merge into each other. Additionally the results appear permuted to me. That being said below query is conceptually also very close on what I am trying to achieve but does not work:
MATCH nodes = (p1:Package)-[:DEPENDS_ON]->(p2: Package)-[:DEPENDS_ON*]->(p1)
WHERE p1 <> p2
WITH relationships(nodes) as rel
UNWIND rel AS x
MATCH (t0:Type)<-[:CONTAINS]-(:Package)-[x]->(:Package)-[:CONTAINS]->(t1:Type),
(t0)-[:DEPENDS_ON]->(t1)
RETURN t0.fqn, t1.fqn
I do appreciate that there seems to be some scripting support within jQAssistant. However, this would really be my last resort, since it is surely more difficult to maintain than a Cypher query.
To rephrase it: given a path, I'm looking for a method to identify a sub-pattern for each element, project a node out of that match, and to collect the result.
Do you have any ideas on how could one accomplish that with Cypher?
Edit #1: Within one package, one has also to consider that the class that is target to the inbound edge of type DEPENDS_ON may not be the same class that is source to the outgoing edge. In other words, as a result
two classes of the same package may be part of the trace
if one wanted to express the cyclic dependency trace as a path, one must take into account detours that navigate to classes in the same package. For instance (edges in bold mark package entry / exit; an edge of type DEPENDS_ON is absent between the two types):
-[:DEPENDS_ON]->(:Type)<-[:CONTAINS]-(:Package)-[:CONTAINS]->(:Type)-[DEPENDS_ON]->
Maybe it gets a little clearer using the following picture:
Clearly "a, b, c" is a package cycle and "TestA, TestB1, TestB2, TestC" is a type-level trace for justifying the package-level dependency.
The following query extends the package cycle constraint by drilling down on type level:
MATCH
(p1:Package)-[:DEPENDS_ON]->(p2:Package),
path=shortestPath((p2)-[:DEPENDS_ON*]->(p1))
WHERE
p1 <> p2
WITH
p1, p2
MATCH
(p1)-[:CONTAINS]->(t1:Type),
(p2)-[:CONTAINS]->(t2:Type),
(p1)-[:CONTAINS]->(t3:Type),
(t1)-[:DEPENDS_ON]->(t2),
path=shortestPath((t2)-[:DEPENDS_ON*]->(t3))
RETURN
p1.fqn, t1.fqn, EXTRACT(t IN nodes(path) | t.fqn) AS Cycle
I'm not sure how well the query will work on large projects, we'll need to give it a try.
Edit 1: Updated the query to match on any type t3 which is located in the same package as t1.
Edit 2: The answer is not correct, see discussion below.
For an application I'm considering, there would be a large (100,000+) 'database' of trees (think expressions in a programming language, or S-expressions), and I would need to query that database for expressions that match a specific given expression.
Before giving the details of what I'd like to have, note that I'd appreciate any information related to indexing a large set of trees for optimizing lookup by a subtree.
In my specific situation (which would be for a backend to be used by Metamath proof assistants), expressions have the following structure (in Haskell-like notation):
data Expression = Placeholder Id | VarName Id | ConstName Id [Expression]
or as a BNF for an S-expression form:
Expression = '?' Id | Id | '(' Id Expression* ')'
where Id is some kind of identifier.
For example, I could have a database with expressions like
(equiv ?ph ?ps)
(not (in (appl (sqrt) (2)) (Q)))
(equiv (eq ?A ?B) (forall ?x (equiv (in ?x ?A) (in ?x ?B))))
In this context, two expressions match if they can be made equal by substitution of expressions for placeholders. So looking up (equiv (eq A (emptyset)) ?ph) in the above mini-database would result in the first and last expressions.
So again: how would I implement fast lookups in a large set of (expression) trees with placeholders? What kind of index data structure could I use?
I would implement the lookup with a trie. Each key would consist of one of the following:
ConstName Identifier
Variable w/ context info
ConstValue
Placeholder
These should be ordered in some fashion- possibly Placeholder, then all ConstNames (alphabetical), then variables (scope ordering, then argument order), then ConstValues (numerical order). As long as there's a concrete ordering for usage in the trie, you're fine.
Traverse the expression's tree, injecting the appropriate keys into the trie as they are encountered. Do this for all the expressions you want to insert into your data structure. When it comes time to query it, you can traverse the trie in a similar fashion, but with a few new rules.
Everything matches a placeholder node. If it matches some other key as well, then you'll need to explore both branches (easily done via a recursive DFS-like approach).
A placeholder matches everything. This is not equivalent to the previous point- we are talking about placeholders in the query here, the previous bullet is regarding placeholders as trie keys.
Now, this does mean that the search space can somewhat "explode" as you encounter placeholders, but there is one thing you can do to try to mitigate this in practice. Traverse the expression's tree in a breadth-first fashion (both in construction of the trie, and querying). This means if one of the arguments is a placeholder, you won't have to full-depth search every single subtree that matches that expression so far- instead you jump ahead to the next argument- which may not be a placeholder, and will thus greatly prune the search space (compared to matching "everything").
For completeness sake, lets take one of your examples
(not (in (appl (sqrt) (2)) (Q)))
and make a trie entry from that-
not -> in -> apply -> "Q" -> sqrt -> 2
adding (not (in ?ph E)) to this would result in-
not -> in -> apply -> "Q" -> sqrt -> 2
\-> ?ph -> "E"
Continue in this fashion injecting expressions into the trie. Also traverse in this fashion for querying until you reach the ends of your searches into the trie, and return those that matched.
Note- the uniqueness of these entries is based on the assumption you do not have to support variadic functions. If you do, attach to each key some context info (read the next paragraphs for info on how to do this) to distinguish which arguments go to which functions
There is one detail I glossed over- variables. If you only want it to match if they are the exact same variable name, then no work is necessary. But this likely isn't what you want; you probably want it to match generic variables as long as they are "consistent" with each other. The way to do this is to assign each variable an identifier that represents the scope of which it was first defined.
The easiest way to do this is just compose an identifier from the concatenation of the argument ordering of its ancestors. That is, if a variable is first defined as the second argument to a function which is the fifth argument to the root function, then we might label it as (5, 2) or (2, 5), whichever makes more sense intuitively. Either way, this will ensure the variable is given a consistent identifier regardless of other variables / functions elsewhere. Then proceed as normal with this new variable name.
I have a data set which includes a number of nodes, all of which labeled claim, which can have various properties (names P1, P2, etc., through P2000). Currently, each of the claim nodes can have only one of these properties, and each property has value, which can be of different types (i.e. P1 may be string, P2 may be float, P3 integer, etc.). I also need to be able to look up the nodes by any property (i.e. "find all nodes with P3 which equals to 42").
I have modeled it as nodes having property value and label according to the P property. Then I define schema index on label claim and property value. The lookup then would look something like:
MATCH (n:P569:claim) WHERE n.value = 42 RETURN n
My first question is - is this OK to have such index? Are mixed type indexes allowed?
The second question is that the lookup above works (though I'm not sure whether it uses index or not), but this doesn't - note the label order is switched:
neo4j-sh (?)$ MATCH (n:claim:P569) WHERE n.value>0 RETURN n;
IncomparableValuesException: Don't know how to compare that. Left: "113" (String); Right: 0 (Long)
P569 properties are all numeric, but there are string properties from other P-values one of which is "113". Somehow, even though I said the label should be both claim and P569, the "113" value is still included in the comparison, even though it has no P569 label:
neo4j-sh (?)$ MATCH (n:claim) WHERE n.value ="113" RETURN LABELS(n);
+-------------------+
| LABELS(n) |
+-------------------+
| ["claim","P1036"] |
| ["claim","P902"] |
+-------------------+
What is wrong here - why it works with one label order but not another? Can this data model be improved?
Let me at least try to side-step your question, there's another way you could model this that would resolve at least some of your problems.
You're encoding the property name as a label. Perhaps you want to do that to speed up looking up a subset of nodes where that property applies; still it seems like you're causing a lot of difficulty by shoe-horning incomparable data values all into the same property named "value".
What if, in addition to using these labels, each property was named the same as the value? I.e.:
CREATE (n:P569:claim { P569: 42});
You still get your label lookups, but by segregating the property names, you can guarantee that the query planner will never accidentally compare incomparable values in the way it builds an execution plan. Your query for this node would then be:
MATCH (n:P569:claim) WHERE n.P569 > 5 AND n.P569 < 40 RETURN n;
Note that if you know the right label to use, then you're guaranteed to know the right property name to use. By using properties of different names, if you're logging your data in such a way that P569's are always integers, you can't end up with that incomparable situation you have. (I think that's happening because of the particular way cypher is executing that query)
A possible downside here is that if you have to index all of those properties, it could be a lot of indexes, but still might be something to consider.
I think it makes sense to take a step back and think what you actually want to achieve, and why you have those 2000 properties in the first place and how you could model them differently in a graph?
Also make sure to just leave off properties you don't need and use coalesce() to provide the default.
I want to make a case-insensitive index in Neo4j using Py2neo.
Read through the docs and googled a lot but didn't find anything. There seems to be this option in Java but not in Py2neo.
Please help!
You can pass configuration options into the GraphDatabaseService.get_or_create_index function as indicated here:
http://book.py2neo.org/en/latest/graphs_nodes_relationships/#py2neo.neo4j.GraphDatabaseService.get_or_create_index
These arguments are passed directly into the REST call as described here:
http://docs.neo4j.org/chunked/milestone/rest-api-indexes.html#rest-api-create-node-index-with-configuration
Hope this helps.
When using legacy indexes you can supply a configuration upon initial creation of the index. You have to set to_lower_case=true in combination with type=fulltext.
Schema indexes on the other hand do not yet support case insensitivity. As a workaround, introduce a copy of the respective property, e.g. name -> nameLower, which gets populated by the lowercase variant of that string. You could do something like this on existing datasets:
CREATE INDEX ON :Person(nameLower);
// --- use seperate transaction
MATCH (p:Person) set p.nameLower = lower(p.name); // maybe apply LIMITs for large amount of nodes
Your query string of course needs to use lower case:
MATCH (p:Person {nameLower:'john'}) RETURN p