I have a graph that has only Schema indexes and not legacy indexes as Neo4j documentation recommends. I want to search for nodes like in this example described under the legacy indexing section (exact match, start queries etc). I am wondering if this is possible with schema indexes and if schema indexes use lucene underneath.
As of today schema indexes just support exact matches, e.g.
MATCH (p:Person) WHERE p.name='abc'
or IN operators
MATCH (p:Person) WHERE p.name in ['abc','def']
Future releases might have support for wildcards as well.
you can use wildcards as well in that case the query would be
MATCH (b:book) WHERE b.title=~"F.*" RETURN b;
Related
I have a usual string column which is not unique and doesn't have any indexes. I've added a search function which uses the Prisma contains under the hood.
Currently it takes Ä…round 40ms for my queries (with the test database having around 14k records) which could be faster as I understood from this article: https://about.gitlab.com/blog/2016/03/18/fast-search-using-postgresql-trigram-indexes/
The issue is that nothing really changes after I add the trigram index, like this
CREATE INDEX CONCURRENTLY trgm_idx_users_name
ON users USING gin (name gin_trgm_ops);
The query execution time is literally the same. I also found that I can check if the index is actually used by disabling the full scan. And I really see that the execution time became times worse after that (meaning the added index is not actually used as it's performance is worse than full scan).
I was trying to use B-tree and Gin indexes.
The query example for testing is just searching records with LIKE:
SELECT *
FROM users
WHERE name LIKE '%test%'
ORDER BY NAME
LIMIT 10;
I couldn't find articles describing best practices for such "LIKE" queries that's why I'm asking here.
So my questions are:
Which index type is suitable for such case - if I need to find N records in string column using LIKE (prisma.io contains). Some docs say that the default B-tree is fine. Some articles show that Gin is better for that purpose.
As I'm using the prisma contains with Postgres, some features are still not supported in Prisma and different workarounds are needed (such as using unsupported types, experimental features etc.). I'd appreciate if Prisma examples were given.
Also, the full text search is not suitable as it requires the knowledge of the language which was used in the column, but my column will contain data in different languages
I am new to Neo4j when I came across Neo4j indexes, all what I found is that there's a legacy index and another new one (the schema index), but I want to know what are the types of these indexes and if there's a way to specify it ? i.e in oracle we have clustered/non-clustered/b-tree/bitmap ...etc , do we have something similar in Neo4j?
In Neo4j there is two kinds of index :
Schema indexes (via create index or create constraint)
Legacy indexes
All those indexes are internally made with Lucene, and there is no type of indexe, like in oracle.
When you use legacy indexes, you can configure them like it's describe here : http://neo4j.com/docs/stable/indexing-create-advanced.html
You can find some additional informations here :
http://jexp.de/blog/2015/04/on-neo4j-indexes-match-merge/
http://blog.armbruster-it.de/2013/12/indexing-in-neo4j-an-overview/
Cheers
Is there not an expression in neo4j like startswith that runs fast on an indexed property?
I currently run a query like
match (p:Page) where p.Url =~ 'http://www.([\\w.?=#/&]*)' return p
The p.Url property is indexed however the query above is very slow. Especially a startswith index search should be quite fast or not?
Currently regex (or 'startsWith') filters are not supported with schema indexes. Your cypher statement will scan through all nodes having the Page label and filter them based on their properties.
More sophisticated query capabilities for schema indexes are to be expected in one of the next releases of Neo4j.
If you need that functionality now you basically have 2 options:
use legacy indexes as documented in the reference manual.
if your queries always filter for starting with protocoll://prefix you can work around by putting the prefix protocoll://prefix into an additional property urlPrefix and declare an index on it (create index on :Page(urlPrefix). Your query is then match (p:Page) where p.urlPrefix='http://www.' return p and will be run via the existing index.
I'm wondering if it is possible to query OrientDB indexes using the SQL LIKE operator?
Let's say I create an index in OrientDB in the following manner:
create index packageByName on Package (name) notunique
Now I can query this index using the equals (=) operator:
select from index:packageByName where key equals 'value'
This works, but I have to know the exact name to search for. What I'd really like to do is partial matching (the LIKE operator):
select from index:packageByName where key like 'val%'
While the latter command doesn't fail, it doesn't find anything either. Is pattern matching supported at all by OrientDB when querying indexes?
The one way I've found to get wildcard pattern matching is with LUCENE FULLTEXT indexes.
CREATE INDEX packageByName ON Package (name) FULLTEXT ENGINE LUCENE
Now you can query the index using the Lucene allowed fuzzy and wildcard patterns (http://lucene.apache.org/core/2_9_4/queryparsersyntax.html) :
SELECT * FROM index:packageByName WHERE key lucene "Sappo*"
These seem to also work:
SELECT * FROM index:packageByName WHERE key = "Sappo*"
select * from index:packageByName where key like "Sappo*"
There's a lot going on I can't quite grok from the documentation, like this gem: I've found if there is only on index (your LUCENE FULLTEXT) on the property you want you can also make direct queries:
SELECT * FROM Package WHERE name LUCENE "Sappo*"
I can't seem to make it work if the property (name in this case) is in multiple indexes.
No clue why the other types of indexes don't support wildcards (even the non-Lucene FULLTEXT).
Coming from a 20+ yr old mature DB, I'm also having a tough time getting used to not having some magic query analyzer pick the 'best' index behind the scenes, but I'm sure eventually OrientDB will get more polish.
I'm not sure I understand your use case, but I do see what you mean about the behavior when querying an index using the like operator. The following query kludge produces the results I think you're looking for; would it suffice for your intended usage?
select from (select from index:packageByName) where key like 'val%'
If not, you might want to post to the OrientDB User Group.
I have a database with two primary tables:
The components table (50M rows),
The assemblies table (100K rows),
..and a many-to-many relationship between them (about 100K components per assembly), thus in total 10G relationships.
What's the best way to index the components such that I could query the index for a given assembly? Given the amount of relationships, I don't want to import them into the Lucene index, but am looking instead for a way to "join" with my external table on-the-fly.
Solr supports multi-valued fields. Not positive if Lucene supports them natively or not. It's been a while for me. If only one of the entities is searchable, which you mentioned is components, I would index all components with a field called "assemblies" or "assemblyIds" or something similar and include whatever metadata you need to identify the assemblies.
Then you can search components with
assemblyIds:(1, 2, 3)
To find components in assembly 1, 2 or 3.
To be brief, you've got to process the data and Index it before you can search. Therefore, there exists no way to just "plug-in" Lucene to some data or database, instead you've to plug-in (process, parse, analyze, index, and query) the data it self to the Lucene.
rustyx: "My data is mostly static. I can even live with a read-only index."
In that case, you might use Lucene it self. You can iterate the datasource to add all the many-to-many relations to the Lucene index. How did you come up with that "100GB" size? People index millions and millions of documents using Lucene, I don't think it'd be a problem for you to index.
You can add multiple field instances to the index with different values ("components") in a document having an "assembly" field as well.
rustyx: "I'm looking instead into a way to "join" the Lucene search with my external data source on the fly"
If you need something seamless, you might try out the following framework which acts like a bridge between relational database and Lucene Index.
Hibernate Search : In that tutorial, you might search for "#ManyToMany" keyword to find the exact section in the tutorial to get some idea.