why neo4j indexing can't find the nodes which I know they exist? - indexing

I have created and indexed my graph database through localhost:7474 in neo4j(visually).
The Nodes have three properties,name,priority,link.
and I created index on name property of nodes through
add or remove indexes
tab of localhost:7474(as shown in picture)
but when I try to retrieve nodes based on their names,in data browser,console,or my java application the nodes can not be found.
in console or data browser,when I write this query for red(there is a node with the name of red),for example:
start n=node:name(name="red")
return n;
I get returned 0 rows.
and when I type this query:
start n=node:node(name="red")
return n;
or this one:
start n=node:Node(name="red")
return n;
I get Indexnodedoes not exist,IndexNodedoes not exist,in console or data browser.
my database file is in the same path which neo4j default.graphdb file exists(I mean in "C:\Users\fereshteh\Documents\Neo4j" ),and I first created the index,and then the graph database.
I don't know what I am doing wrong,please help me,I will be so thankful.
version of neo4j:1.9.4

I believe your assumption about how to set up the indexing is incorrect. You can read here for more information, but basically there are 3 things that are needed to create/read from an index. The Index name, the entry key, and the entry value.
What you have specified above in the Web Console is the Index name, but in your cypher query, you are specifying the entry key. You either want to use the Node Auto index, or to create a node in cypher and index it there but that isn't an option in 1.9.4.

Related

How to handle supernodes with DSE Graph

I’ve a simple Vertex „url“:
schema.vertexLabel('url').partitionKey('url_fingerprint', 'prop1').properties("url_complete").ifNotExists().create()
And a edgeLabel called „links“ which connects one url to another.
schema.edgeLabel('links').properties("prop1", 'prop2').connection('url', 'url').ifNotExists().create()
It’s possible that one url has millions of incoming links (e.g. front page of ebay.com from all it’s subpages).
But that seems to result in really big partitions / and a crash of dse because of wide partitions (From Opscenter wide partitions report):
graphdbname.url_e (2284 mb)
How can i avoid that situation? How to handle this „Supernodes“? I’ve found a „partition“ command (article about this [1]) for Labels but that is set deprecated and will be removed in DSE 6.0 / the only hint in release notes is to model the data on another way - but i’ve no idea how i can do that in that case.
I’m happy about every hint. Thanks!
[1] https://www.experoinc.com/post/dse-graph-partitioning-part-2-taming-your-supernodes
The current recommendation is to use the concept of "bucketing" that drives data model design in the C* world and apply that to the graph by creating an intermediary Vertex that represents groups of links.
2 Vertex Labels
URL
URL_Group | partition key ((url, group)) … i.e. a composite primary key with 2 partition key components
2 Edges
URL -> URL_Group
URL_Group (replaces existing self reference edge) URL_Group <->URL_Group
Store no more than 100Kish url_fingerprints per group. Create a new group after each 100kish edges exist.
This solution requires bookkeeping to determine when a new group is needed.
This could be done through a simple C* table for fast, easy retrievable.
CREATE TABLE lookup url_fingerprint, group, count counter PRIMARY KEY (url_fingerprint, group)
This should preserve DESC order, may need to add an ORDER BY statement if DESC order is not preserved.
Prior to writing to the Graph, one would need to read the table to find the latest group.
SELECT url_fingerprint, group, count from lookup LIMIT(1)
If the counter is > 100kish, create a new group (increment group +1). During or after writing a new row to Graph, one would need to increment the counter.
Traversing would require something similar to:
g.V().has(some url).out(URL).out(URL_Group).in(URL)
Where conceptually you would traverse the relationships like URL -> URL_Group->URL_Group<-URL
The visual model of this type of traversal would look like the following diagram

Using index in DSE graph

I'm trying to get the list of persons in a datastax graph that have the same address with other persons and the number of persons is between 3 and 5.
This is the query:
g.V().hasLabel('person').match(__.as('p').out('has_address').as('a').dedup().count().as('nr'),__.as('p').out('has_address').as('a')).select('nr').is(inside(3,5)).select('p','a','nr').range(0,20)
At first run I've noticed this error messages:
Could not find an index to answer query clause and graph.allow_scan is
disabled: ((label = person))
I've enabled graph.allow_scan=true and now it's working
I'm wondering how can I create an index to be able to run this query without enabling allow_scan=true ?
Thanks
You can create an index by adding it to the schema using a command like this:
schema.vertexLabel('person').index('address').materialized().by('has_address').add()
Full documentation on adding indexes is available here: https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/using/createIndexes.html
You should not enable graph.allow_scan=true as under the covers it is turning on ALLOW FILTERING on the CQL queries. This will cause a lot of cluster scans and will inevitably time out with any real amount of data in the system. You should never enable this in any sort of production environment.
I am not sure that indexing is the solution for your problem.
The best way to do this would be to reify addresses as nodes and look for nodes with an indegree between 3 and 5.
You can use index on textual fields of your address nodes.

How to add filter to match the value in a map bin Aerospike

I have a requirement where I have to find the record in an aerospike based on attributeId. The data in aerospike is inthe below format
{
name=ABC,
id=xyz,
ts=1445879080423,
inference={2601=0.6}
}
Now I will be getting the value "2601" programatically and I should find this record based on this value. But the problem is the value is in a Map and the size of this map may be more than 1 like
inference={{2601=0.6},{2830=0.9},{2931=0.8}}
So how can I find this record using attributeId in java. Any suggestions much appreciated
A little know feature of Aerospike is that, in addition to a Bin, you can define an index on:
List values
Map Keys
Map Values
Using in index defined on your map keys in the "inference" bin, you will be able to query (filter) base on the key's name.
I hope this helps

Changing index in Neo4j affects the search in cypher

I am using Neo4j 2.0 .
I have a label PROD. All nodes having label PROD have a property name. I have created an index on name like this :
CREATE INDEX ON :PROD(name)
After creating an index if I change value of the property name from "old" to "new", the following query works fine in a small dataset used for testing but not in our production data with 700 nodes having label PROD (where totally there are around a million nodes with other labels).
MATCH (n:PROD) WHERE n.name="new" RETURN n;
I have also created legacy index on the same field and after dropping and re-indexing the node on modification, it works perfectly fine on both test and production datasets.
Is there a way to ensure the indices are updated? What am I doing wrong? Why does the above query fail for large dataset?
You can use the :schema command in Neo4j browser or schema in neo4j-shell. The output should indicate which indexes have been created and which of them are already online.
Additionally there's a programmatic way to wait until index population has finished.
Consider upgrading to 2.1.3, indexing has improved since 2.0.

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.
Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.
Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?
You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}
If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)