how to create geohash tree in neo4j - cypher

Based on a problem here an expert have answered with this code:
CALL spatial.addPointLayerGeohash('my_geohash_layer_name')
CREATE (n:Node {latitude:60.1,longitude:15.2}) WITH n
CALL spatial.addNode('my_geohash_layer_name',n) YIELD node
RETURN node
to create a geohash tree that organise spatial nodes.
so i tried that with two spatial nodes but unlike R-tree the spatial nodes aren't linked to the layer with any connection !? is this code true ? or what is wrong ?

If you want an in-graph tree structure as an index, you need to use the RTree index (which is the default in Neo4j Spatial). If you want a geohash index, there will be no tree in the graph because geohashes are being stored as strings in a lucene index for string prefix searches. String prefix searches are a common way of searching geohash based indexes.

Related

Semantic search on PostgreSQL

I know PostgreSQL has trigram trigram similarity search and even indexing optimized for it (CREATE INDEX trgm_idx ON table USING gist (column gist_trgm_ops);), which can be used directly from Django (web framework):
Model.objects.filter(attribute__trigram_similar=query_string)
But what if, instead of surface similarity, I wanted to perform semantic similarity query on database objects? (which is obviously quite different from classic trigram similarity).
Good example would be Google's universal sentence encoder, where I would convert all strings into 512 dimensional embedding vectors (using the library) and perform query by calculating normalized dot product (cosine similarity) and yield the object with highest similarity (or perhaps n amount objects with similarity >=0.50).
Simplest thing to do is to iterate (at the framework level) on the database objects, but this is highly inefficient (especially if database is large), therefore I would rather to find a way where I could perform query on the database level (and perhaps if possible set up optimal indexing for semantic search?).
What would be the best way to perform this custom similarity search on the database of pre-vectorized objects?
What if, I get dot product of all objects in the pre-vectorized database manually?
Thank you!

How does Neo4j indexing (using lucene) work under the hood?

A few questions relating lucene indexes in Neo4j and how they're used during queries and traversal. Basically, the way relationship are stored on disk (a linked list), it seems to me that any graph traversal would require to sequential visit all relationships for a node - not sure how an index could be used in this case. More specifically:
1) When node properties are indexed, how would that be used for a query such as "all my female friends of friends" (gender is indexed). The only way I see an index being used it by first finding all friends of friends, and then submitting a query to lucene to get all the females. Is it faster than just doing to comparison in memory though?
2) When relationships properties are indexed. Since the relationships are stored in a linked list, it's impossible to get a subset of relationships for a node without sequentially walking the list. I suppose we could always index relationships using node_ids but that seems silly - we end up storing adjacency lists in both lucene and Neo4J
Indexes are not used for traversals.
They are only used to find your starting points in the graph.
Depending on the relationship-types and directions you only traverse a subset of relationships from a node.
For your query 1, you don't need an index on gender, as it will return about 50% of the people in your graph. But you would use an index for the initial user lookup (me)
create index on :User(name);
MATCH (m:User {name:"Me"})-[:FRIEND]->(other:User)
WHERE other.gender = "female"
RETURN other;
2) yes, you are right.
You can do that, but it is only necessary if you have a lot of relationships (millions) and want to access a tiny slice of those.
So if that's your use case a relationship-index might help.
Relationships are actually indexed with both node-id's and a relationship-property

combine lucene indexing and traversal in neo4j to give a single resultset

Is there any way to combine lucene indexing and traversal in neo4j to search the users indexed by their name but the search results should return minimum depth first (or breadth first traversal)..
i.e. say search all users with name "John*" but closeness to a particular user node should be given more priority than others.
i.e. say the particular node is X then the output should be in the following order:
X--JohnG
X------JohnM
X------------------JohnY
and so on...
I am not sure if i should use an evaluator to filter out on names since there may be thousands of nodes and so it does not sound very efficient without indexing.
Thanks for any help!
I do not believe this is possible. I do not see anywhere in the REST traversal framework where you can define the Node by Index, only by Node ID. What you'd have to do is use the REST framework to perform the index lookup to get the Node ID, then perform the traversal on that.

Extending neo4j with new index structures

Suppose I want to implement a new index structure (e.g., BITMAT) that will improve the efficiency of some queries (Path queries for the BITMAT case). How do I extend Neo4j so that every query with a specified query pattern uses my new index instead of Neo4j's native index?
YOu can implement a new IndexProvider that hooks into the normal Neo4j indexing system. This is then automatically exposed to Cypher. You can see an example of this in this SpatialIndexProvider, projecting a subgraph query into an index lookup and run Cypher queries against it:
https://github.com/neo4j/spatial/blob/master/src/main/java/org/neo4j/gis/spatial/indexprovider/LayerNodeIndex.java
Test with Cypher:
https://github.com/neo4j/spatial/blob/master/src/test/java/org/neo4j/gis/spatial/IndexProviderTest.java#L141

Change dynamically elasticsearch synonyms

Is it possible to store the synonyms for elasticsearch in the index? Or is it possible to get the synonym list from a database like couchdb?
I'd like to add synonyms dynamically to elasticsearch via the REST-API.
There are two approaches when working with synonyms :
expanding them at indexing time,
expanding them at query time.
Expanding synonyms at query time is not recommended since it raises issues with :
scoring, since synonyms have different document frequencies,
multi-token synonyms, since the query parser splits on whitespaces.
More details on this at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory (on Solr wiki, but relevant for ElasticSearch too).
So the recommended approach is to expand synonyms at indexing time. In your case, if the synonym list is managed dynamically, it means that you should re-index every document which contains a term whose synonym list has been updated so that scoring remains consistent between documents analyzed pre and post update. I'm not saying that it is not possible but it requires some work and will probably raise performance issues with synonyms which have a high frequency in your index.
There are few new solutions now to those proposed in other answers few years ago. The two main approaches implemented as plugins:
The file-watcher-synonym filter is a plugin that can periodically reload synonyms every given numbers of seconds, as defined by user.
The refresh-token-plugin allows a real-time update of the index. However, this plugin aparrently has some problems, which stem from the fact that elasticsearch is unable to distinguish between analyzers at search time only from those used at index time.
Good discussion on this subject can be found on the elastisearch github ticket system: https://github.com/brusic/refresh-token-filters
It isn't too painful in elasticsearch to update the synonym list. It can be done by opening and closing You could have it driven from anywhere, but need some of your own infrastructure. It'd work like this:
You want an alias pointing at your current index
Sync down a new index file to your servers
Create a new index with a custom analyzer that uses the new index
Rebuild the content from current index to new index
Repoint index alias from current to new index
In 2021, just expand synonyms at query time using a specific search analyzer and use the Reload analyzer API:
POST /my-index/_reload_search_analyzers
The synonym graph token filter must have set updatable to true:
"my-synonyms": {
"type": "synonym_graph",
"synonyms_path": "my-synonyms.txt",
"updateable": true
}
Besides, you should probably expand synonyms at query time anyway. Why?
Chances are that you have too much data to reindex every night or so.
Elasticsearch does not allow the Synonym Graph Filter for an index analyzer, only the deprecated Synonym Filter which does not handle multi word synonyms correctly.