Lucene indexes in Neo4j don't work as expected - lucene

I think that the title is a little bit vague, so I'm going to explain precisely my problem.
I am creating some nodes in Neo4j and then index them like this :
Index<Node> myindex = graphDb.index().forNodes(
"myindex",
MapUtil.stringMap(IndexManager.PROVIDER, "lucene", "type",
"fulltext"));
Node n = graphDb.createNode(); //create the node
node.setProperty("firstname", "firstname"); //add properties
node.setProperty("familyname", "familyname");
myindex.add(node, "familyname", "familyname"); //index it
But when I need to update Node "n" 's properties (for instance change "familyname" to "fname"), this node can't be found anymore through an index based search!
So before updating the property, this cypher query
start n= node:myindex(familyname:"familyname") return n
was returning the Node, whereas after update, I am expecting :
start n= node:myindex(familyname:"fname") return n
to return the same node with the new property, but it doesn't work ! While the first query is always working, like if the index is bound to the property "familyname"
Any thoughts about this ?
Thanks

So as tstorms suggested, the solution is to remove the index after updating the property :
n.setProperty("familyname","fname");
myindex.remove(n);
then add it with the new property :
myindex.add(n, "familyname","fname");

Related

Jackrabbit oak : Not able to set VersionGCOptions

I am using JackRabbit Oak(1.22.3) implementation for deleting nodes using Version garbage collection.
I am setting below custom values for garbage collection.
VersionGCOptions versionGCOptions = new VersionGCOptions();
versionGCOptions.withOverflowToDiskThreshold(900000);
versionGCOptions.withCollectLimit(900000L);
versionGCOptions.withMaxIterations(10);
documentNodeStore.getVersionGarbageCollector().setOptions(versionGCOptions);
But when I am trying to get above values, I am getting default values not custom values:
System.out.println("collectLimit : "+versionGarbageCollector.getOptions().collectLimit);
System.out.println("maxIterations : "+versionGarbageCollector.getOptions().maxIterations);
**output:-**
collectLimit : 100000
maxIterations : 0
I am not getting why this is happening, please help me here to resolve this.
The "with...()" methods return a new VersionGCOptions object (they do not modify the existing one).
Thus you need to do something like:
versionGCOptions = versionGCOptions.withOverflowToDiskThreshold(900000);

Lucene query with filter "without property"

I need to write lucene query/filter to get objects without specific property.
I tried with ... ISNULL:"cm:param_name" but id didn't work.
Edit: I have added new property in aspect but objects that haven't been updated yet don't have it amongst their listed properties (checked with node browser).
With a query like "cm:*", you should only receive documents that have the field "cm" plus content. Note that you have to allow leading wildcard queries by the query parser with setAllowLeadingWildcard(true).
Also check out this post, which deals with a reversed version of your problem:
Find all Lucene documents having a certain field
Can you please be more clear as to what "without property" means ? Do you mean that you do not want to specify the field like so "field:value" and instead set the filter to "value" ?
EDIT
Are you generating these field names dynamically or is this the only field name that can have it's value missing ? If there is only one field that may or may not appear in your document then you could just populate it with a default value when it's missing and then search for that . Otherwise, you could try a negated rangequery like so : NOT foo:[* TO *] . This should match all documents without a value in the foo field. For performance purposes , in the second case the field should be indexed as a string field (not analyzed).
I managed to get this done with .. AND NOT (#namespace\:property:"")
In Java and Lucene 3.6.2 the "FieldValueFilter" with activated negation can be used: (which was not the question)
import org.apache.lucene.search.FieldValueFilter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.TopDocs;
final IndexSearcher indexSearcher = getIndexSearcher() <- whereever that comes from
final TopDocs topdocs = indexSearcher.search(new MatchAllDocsQuery(), new FieldValueFilter("cm", true), Integer.MAX_VALUE);
You can use ISUNSET and/or ISNULL for this scenario.
ISUNSET:"cm:title"
ISNULL:"cm:title"

Neo4j 2.0.1 Cypher performance difference between using start and match with a predicate

Started using Cypher about a week ago (really like it). In the 'browser' interface I'm running two queries:
1) start n=node:Node(name="foo") match (n)-[r*..4]-(m) return n,m
2) match (n{name:"foo"})-[r*..4]-(m) return n,m
The first query returns almost immediately, the second query more than an hour and counting. Naively I would think these would be equivalent, clearly they are not. I ran a 'smaller' (path just up to 1) version of both in the neo-shell so I could profile them.
profile start n=node:Node(name="foo") match (n)-[r*..1]-(m) return n,m;
ColumnFilter(symKeys=["m", "n", " UNNAMED51", "r"], returnItemNames=["n", "m"], _rows=4, _db_hits=0)
TraversalMatcher(start={"expr": "Literal(foo)", "identifiers": ["n"], "key": "Literal(name)",
"idxName": "Node", "producer": "NodeByIndex"}, trail="(n)-[*1..1]-(m)", _rows=4, _db_hits=5)
.
profile match (n{name:"foo"})-[r*..1]-(m) return n,m
ColumnFilter(symKeys=["n", "m", " UNNAMED33", "r"], returnItemNames=["n", "m"], _rows=4, _db_hits=0)
Filter(pred="Property(n,name(0)) == Literal(foo)", _rows=4, _db_hits=196870)
TraversalMatcher(start={"producer": "AllNodes", "identifiers": ["m"]},
trail="(m)-[*1..1]-(n)", _rows=196870, _db_hits=396980)
From other stackoverflow questions I understand db_hits is good to look at, so it looks like the second query has basically done a linear scan (my db is almost 400k nodes). This seems to be confirmed by the "producer" value of "AllNodes" instead of "NodeByIndex".
Obviously I need to specify the match (predicate) differently so that it hits the index. The index is called 'Node' on parameter 'name'. My googling, stacko search is failing me.. how do I specify the conditional in the match so that it hits the index?
Update:
After some poking around it appears I'm using a 'legacy' index? and then trying to hit that with the 'new style (don't use start) query... (kinda extrapolating here). So I can do the following:
create index ON :label(name)
and that would provide an index for a particular label on the name property, but I really want an index (I guess non-legacy index) on ALL the node names. I have use cases where that's important (user may not know the label but does know the name).
Any suggestions or guidance is much appreciated.
Right now there is no global schema index, so you would probably want to create an index on a generic label like Entity or Node and create an index like this:
create index on :Entity(name);
And add that Entity label to all your nodes.
match (n) set n:Entity;

EntityNotFoundException while firing a cypher

I am newbie to neo4j and I really need help.
I have created nodes properties NAME, EMAIL and AGE. These nodes are having relationship: IS_FRIEND_OF with property SINCE with other nodes.
I have given property values in NAME as “A”, “B”, “C”, “D” and so on.
Now when I fire a query in console like: Start n=node(*) where n.NAME=’A’ return n;
It is giving an exception like: EntityNotFoundException: The property 'NAME' does not exist on Node[0]
Now if I add a property NAME = “” on node [0] and then fire the same query, it is providing the correct output. For small data set it can work but for larger ones specifying each property for node [0] doesn’t seems to be the good solution.
Is it the only workaround or something else and better can be applied?
STARTn=node(*) WHERE n.NAME! = "A" RETURN n
The exclamation mark will do the following:
TRUE if n.prop = value, FALSE if n is NULL or n.prop does not exist
Cypher has two special operators: ? and ! to use in this case to handle this exception
Using ? will evaluate to true if n.prop is missing:
START n=node(*) WHERE n.NAME? = "A" RETURN n
And using ! will evaluate to false if n.prop is missing:
START n=node(*) WHERE n.NAME! = "A" RETURN n

Neo4j python binding: Querying node index for a set of values

I have a Neo4j database graphDb where nodes have a property 'label'. I have a Lucene index 'my_nodes' with key 'label' which indexes the values of node property 'label'. Now I want to retrieve nodes which have property 'label' equal to a value from a list of possible values labellist. To accomplish this, I wrote a Cypher query the following way:
cypherQ = """START n=node:my_nodes('"""
+' OR '.join(['label:'+str(i) for i in labellist]) + """')
RETURN n"""
result = graphDb.query(cypherQ)
That works fine, but I wonder whether there is a way to write a parameterized query anyhow?
I tried something like:
cypherQ = """START n=node:my_nodes('label:{params}')
RETURN n"""
result = graphDb.query(cypherQ, params = labellist)
But this surely does not work, though if there is one value in labellist it works. And the neo4j tutorial does not provide much material on this issue.
Once again I am using a python binding for Neo4j.
The parameter is working for the whole query part of the index, so this would be
cypherQ = """START n=node:my_nodes({queryParam})
RETURN n"""
and you construct the query in your client code and pass it into Cypher as one parameter.