Creating Index Neo4j - indexing

If I do the CREATE INDEX FOR (var:TYPE) ON var.property command in Cypher, does this apply to all current and future nodes of type TYPE? My assumption is that it does, as I never specify which nodes of type TYPE it applies to. So this means I'll only need to issue this command once, when setting up my database?

Yes, once you run the command
CREATE INDEX FOR (var:TYPE) ON var.property
For example:
CREATE INDEX FOR (var:Person) ON var.name
It will create an index on Person.name on current nodes and future nodes to come. So you only need it to run once.

Related

How to prevent data duplication in redisgraph?

I wrote one code to store graph in redisgraph. Initially it is storing single graph but if I execute the same code second time then it is storing the same graph in database without replacing the previous graph.So, now I am getting two same graphs in a single key in the database.I don't want any duplicate graph or any duplicate node that means if I execute same code again it should replace previous graph.How will I do that?
If your code consists of a series of CREATE commands (whether through Cypher or one of the RedisGraph clients), running it twice will duplicate all of your data. This is not to say that the key stores two graphs; rather, it is one graph with every entity repeated.
If you would like to replace an existing graph, you should delete the existing graph first. You can delete a graph using a Redis command:
DEL [graph key]
Or a RedisGraph command:
GRAPH.DELETE [graph key]
The two are functionally identical.
Conversely, if you want to update an existing graph without introducing duplicates, you should use the MERGE clause as described in the RedisGraph documentation.
You can Use MERGE clause to prevent inserting duplicate data.
Below is the query to remove duplicate records from existing data
MATCH (p:LabelName)
WITH p.id as id, collect(p) AS nodes
WHERE size(nodes) > 1
UNWIND nodes[1..] AS node
DELETE node
MERGE will do like a find or create.
If your node, edge or path does not exist it will create it.
That's the recommended way to avoid duplicate entities if they are not permitted.

Neo4j - Find node by ID - How to get the ID for querying?

I want to be able to to find a specific node by it's ID for performance reasons (IDs are more efficient than indexes)
In order to execute the following example:
MATCH (s)
WHERE ID(s) = 65110
RETURN s
I will need the ID of the node (65110 in this case)
But how to I get it? Since the ID is auto-generated, It's impossible to find the ID without querying the graph, which kind of defeats the purpose since I will already have the node.
Am I missing something?
TL;DR: use an indexed property for lookups unless you absolutely need to optimise and can measure the difference.
Typically you use an index lookup as an entry point to the graph, that is, to obtain the node that provides the start of an edge traversal. While the pointer-like nature of Neo4j node IDs means they are theoretically faster, index lookups are also very efficient so you should not discount them on performance grounds unless you are sure it will make a measurable difference.
You should also consider that Neo4j node IDs are not stable. If you delete a node it is possible for the same ID to be re-used in future. For this reason they should really be considered an internal implementation detail and not one that should be relied on as part of your application's external interface.
That said, I have an application that stores Neo4j IDs in a Solr index for looking up nodes in bulk, but this index is considered volatile and the nodes also contain an indexed, application-generated UUID property (with a unique constraint) that serves as their main "primary key".
Further reading and discussion: https://github.com/neo4j/neo4j/issues/258

No nodes added to index in Neo4j

The query I used to create the nodes is as follows
load csv with headers from "file:/sample.csv" as Jobs
create (TheJobs {Job_name: Jobs.insert_job, Job_type: Jobs.job_type, Owner: Jobs.#owner})
return TheJobs
and the command I used to create index was
CREATE INDEX ON :TheJobs(Job_name)
Output: Added 1 index, statement executed in 32 ms.
Then I tried to create relationships using the following query
Profile load csv with headers from "file:/Jobstofiles.csv"
as rels2
match (from :Files {Filename: rels2.Filename})
where rels2.Automatic or Manual="Automatic"
match (to :TheJobs {Job_name: rels2.Job})
create (from)-[:Is_triggered_by {type: rels2.Is_triggered_by}]->(to)
return from, to
The execution plan shows nodeindexseek, but returns zero rows/matches when the data clearly matches
When I try to search a node from the index using the following query
PROFILE MATCH (node :Jobindex {Job_name: 'Job1'}) RETURN node
Output: 0 rows
What am I doing wrong?
I think you may have misread how indexes are used in neo4j.
Your creation of the index:
CREATE INDEX ON :Jobindex(Job_name)
does not create an index called Jobindex on the Job_name property of all nodes, that's not how it works.
Instead, what you did was create an index on the Job_name property of nodes with the :Jobindex label. Meaning that this index is only used when you have a :Jobindex node with a Job_name property.
If you need to index on the Job_name property on a different kind of node, create an index with the label of that node. If you want to index on that property across nodes with differing labels, then consider if there is some more universal label you can apply to those nodes and then index on that (remember that nodes can have multiple labels).
As for using indexes, though there are ways to force usage of an index, these aren't common cases. Index usage is largely invisible to you, there is no special syntax for it, just write your match using the property that happens to be indexed, and the index will be used under the hood (if present) to speed up the lookup.
In other words, if you indexed on :Job(Job_name), and you wanted to look up a job by name, you'd query it with:
MATCH (job:Job{Job_name:"Software Engineer"})
...
Which is exactly how you would query it without an index.
I'd suggest rereading the Schema section of the Cypher manual.
EDIT: Adding examples.
So, let's say that in the data you are importing, you expect to create nodes with the :Job label, which will have the Job_name property for fast lookup by name.
You'll want to create your index like this:
CREATE INDEX ON :Job(Job_name)
That way any operation that explicitly uses a :Job node and its Job_name property will take advantage of that index to improve the lookup speed, if possible.
But if you have a different case, needing to do the indexing over different kinds of nodes that have that same property, you still need a common label for that index. If this is just for data import, then you might cheat a bit, and create nodes with multiple labels, where one of those labels is the one you created an index on. Then, when you're done with your import, you can delete the index if it's not of any more use to you.

Do I need to issue CREATE INDEX everytime I update the database?

I have a geo-map database with columns of x,y,z,zoom and type. Initially the read speed is very slow when I use the call
SELECT image WHERE x = ... AND y=... AND zoom=... AND type =...
Thanks to the kind help from stack overflow, I found indexing of (x,y,z,zoom) has helped improved the read speed impressively.
However, I have a question this CREATE INDEX command only need to be issue once when the database initialize at the first time? And even the database grow up gradually, it will still enjoy the read speed improvement brought by indexing?
Or do I need to issue CREATE INDEX command every time before I close my application(during the application, the database will grow)?
You will only need to create an index once.
The database will remember the columns with index and will keep changing the index along with your table.
If you insert an entry to the table, it will be added to the index. If you change an entry - it will be modified in the index. Finally, if you delete an entry - it will be removed from the index.
Note, the index will speed up your search operation - SELECT on the indexed columns, but will downgrade INSERT, UPDATE, DELETE.

Obtain all keys of a Neo4j index

I have a Neo4j database whose content is generated dynamically from a big dataset.
All “entry points” nodes are indexed on a named index (IndexManager.forNodes(…)). I can therefore look up a particular “entry point” node.
However, I would now like to enumerate all those specific nodes, but I can't know on which key they were indexed.
Is there any way to enumerate all keys of a Neo4j Index?
If not, what would be the best way to store those keys, a data type that is eminently non-graph-oriented?
UPDATE (thanks for asking details :) ): the list would be more than 2 million entries. The main use case would be to never update it after an initialization step, but other use cases might need it, so it has to be somewhat scalable.
Also, I would really prefer avoiding killing my current resilience abilities, so storing all keys at once, as opposed to adding them incrementally, would be a last-resort solution.
I would either use a different data store to supplement Neo4j- I like Redis- or try #MattiasPersson's suggestion and store the the list on a node.
Is it just one list of keys or is it a list per node? You could store such a list on a specific node, say the reference node.
Instead of using a different storage which increases complexety you could try again with
lucene indices. normally lucene is able to handle this easily, especially now that the MatchAllDocsQuery is better. but one problem is that the neo4j guys are using a very old lucene version.
a special "reference" field in every node especially for this key-traversal case linking to the next node where you easily get ALL properties :)
If you want to get all Nodes, which were indexed in a particular index, you can just do:
IndexHits<Node> hits = IndexManager.forNodes(<INDEX_NAME>).query("*:*");
try{
while(hits.hasNext()){
Node n = hits.next();
...process the node...
}
}finally{
hits.close();
}