Schema index for words in a string - indexing

I have a large amount of nodes which have the property text containing a string.
I want to find all nodes which text contains a given string (exact match). This can be done using the CONTAINS operator.
MATCH (n)
WHERE n.text CONTAINS 'keyword'
RETURN n
Edit: I am looking for all nodes n where n.text contains the substring 'keyword'. E.g. n.text = 'This is a keyword'.
To speed up this I want to create an index for each word. Is this possible using the new Schema Indexes?
(Alternatively this could be done using a legacy index and adding each node to this index but I would prefer using a schema index)

Absolutely. Given that you are looking for an exact match you can use a schema index. Judging from your question you probably know this but to create the index you will need to assign your node a label and then create the index on that label.
CREATE INDEX ON :MyLabel(text)
Then at query time the cypher execution index will automatically use this index with the following query
MATCH (n:MyLabel { text : 'keyword' })
RETURN n
This will use the schema index to look up the node with label MyLabel and property text with value keyword. Note that this is an exact match of the complete value of the property.
To force Neo4j to use a particular index you can use index hints
MATCH (n:MyLabel)
USING INDEX n:MyLabel(text)
WHERE n.text = 'keyword'
RETURN n
EDIT
On re-reading your question I am thinking you are not actually looking for a full exact match but actually wanting an exact match on the keyword parameter within the text field. If so, then...no, you cannot yet use schema indexes. Quoting Use index with STARTS WITH in the Neo4j manual:
The similar operators ENDS WITH and CONTAINS cannot currently be solved using indexes.

If I understand your question correctly, a legacy index would accomplish exactly what you're looking to do. If you don't want to have to maintain the index for each node you create/delete/update, you can use auto indexing (http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/).
If you're looking to only use schema indexing, another approach would be to store each keyword as a separate node. Then you could use a schema index for finding relevant keyword nodes which then map to the node they exist on. Just a thought.

Related

couchbase index for variable array in query

I tried to make and run some N1QL query that find document some field is matched with element of variable array in query. but speed of the query is too slow.
The query is like below.
select * from bucket where tp='type' and
tm between 1484618520 and 1484618615 and nm='name' and
checked=false and (bucket.gm in ["TEST","TEST2"])
["TEST","TEST2"] part is variable depend on condition.
I want to speedup this query.
How can I create index for this query including variable array?
Thanks.
I solved this problem by using below command.
create index new_index on bucket(gm,tp,tm,nm,checked) using gsi;
I set the "gm" field as leading key of new index.
Then the query speed was totally improved.

Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting

I am going to switch to newest (4.10.2) version of Lucene and I'd like to make some optimization in my index and code.
I would like to use DocValuesField to get values but also for filtering and sorting.
So here I have some questions:
If I'd like to use range filter (FieldCacheRangeFilter) I need to store a value in XxxDocValuesField,
but if i want to use terms filter (FieldCacheTermsFilter) I need to store a value in SortedDocValuesField.
So it looks like if I want to use range and terms filters I need to have two different fields. Am I right? Am I using it correctly?
Another thing is Sort. I can choose between SortedNumericSortField and SortField. First one requires SortedNumericDocValues, another NumericDocValuesField. Is there any(big) difference in performance?
Should I use SortedNumericSortField (adding another field to the index)?
And the last one. Am I right that all corresponding DocValuesField will be removed from index when doc is removed? I saw an IndexWriter method for an update doc value but no delete method for doc value.
Regards
Piotr

Neo4j index for full text search

I am working on neo4j database version 2.0.I have following requirements :
Case 1. I want to fetch all records where name contains some string,for example if i am searching for Neo4j then all records having name Neo4j Data,Neo4j Database,Neo4jDatabase etc. should be returned.
Case 2. When i want to fire field less query,if a set of properties is having matching value then those records should be returned or it may also be global level instead of label level.
Case Sensitivity is also a point.
I have read multiple thing about like,index,full text search,legacy index etc.,so what will be the best fit for my case,or i have to use elastic search etc.
I am using spring-data-neo4j in my application,so provide some configuration for SDN
Annotate your name with #Indexed annotation:
#Indexed(indexName = "whateverIndexName", indexType = IndexType.FULLTEXT)
private String name;
Then query for it following way (example for method in SDN repository, you can use similar anywhere else you use cypher):
#Query("START n=node:whateverIndexName({query}) return n"
Set<Topic> findByName(#Param("query") String query);
Neo4j uses lucene as backend for indexing so the query value must be a valid lucene query, e.g. "name:neo4j" or "name:neo4j*".
There is an article that explains the confusion around various Neo4j indexes http://nigelsmall.com/neo4j/index-confusion.
I don't think you need to be using elastic search-- you can use the legacy indexes or the lucene indexes to do full text searches.
Check out Michael Hunger's blog: jexp.de/blog
thix post specifically: http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/

Basic Lucene Beginners q: Index and Autocomplete

I am using Lucene.NET and have a basic question:
Do I need to make an additional Index for Autocompletion?
I created an index based on two different tables from a database.
Here are two Docs:
stored,indexed,tokenized,termVector<URL:/Service/Zahlungsmethoden/Teilzahlung>
stored,indexed,tokenized,termVector<Website:Body:The Text of the first Page>
stored,indexed,tokenized,termVector<Website:ID:19>
stored,indexed,tokenized,termVector<Website:Title:Teilzahlung>
stored,indexed,tokenized,termVector<URL:/Service/Kundenservice/Kinderbetreeung>
stored,indexed,tokenized,termVector<Website:Body:The text of the second Page>
stored,indexed,tokenized,termVector<Website:ID:13>
stored,indexed,tokenized,termVector<Website:Title:Kinderbetreuung>
I need to create a dropdown for a search with suggestions:
eg: term "Pag" should suggest "Page"
so I assume that for every word (token) in every doc, I need a list like:
p
pa
pag
page
is this correct?
Where do I store these?
In an additional Index?
Or how would I re-arrange the existing structure of my index to hold the autocompletion-suggestions?
Thank you!
1) Like femtoRgon said above, look at the Lucene Suggest API.
2) That being said, one cheap way to perform auto-suggest is to look for all words that start with the string you've typed so far, like 'pa' returning 'pa' + 'pag' + 'page'. A wildcard query would return those results -- in Lucene query syntax, a query like 'pa*'. (You might want to restrict the suggestions to only those strings of length 2+.)
Mark Leighton Fisher has the right approach for a cheap way but performing a wildcard query just returns you the documents but not the words. It's better to look at the imlpementation of the WildcardQuery maybe. You need to use the Terms object retrieved from the IndexReader and iterate through the terms in the index.

How to overcome ( SQL Like operator ) performance issues?

I have Users table contains about 500,000 rows of users data
The Full Name of the User is stored in 4 columns each have type nvarchar(50)
I have a computed column called UserFullName that's equal to the combination of the 4 columns
I have a Stored Procedure searching in Users table by name using like operatior as below
Select *
From Users
Where UserFullName like N'%'+#FullName+'%'
I have a performance issue while executing this SP .. it take long time :(
Is there any way to overcome the lack of performance of using Like operator ?
Not while still using the like operator in that fashion. The % at the start means your search needs to read every row and look for a match. If you really need that kind of search you should look into using a full text index.
Make sure your computed column is indexed, that way it won't have to compute the values each time you SELECT
Also, depending on your indexing, using PATINDEX might be quicker, but really you should use a fulltext index for this kind of thing:
http://msdn.microsoft.com/en-us/library/ms187317.aspx
If you are using index it will be good.
So you can give the column an id or something, like this:
Alter tablename add unique index(id)
Take a look at that article http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning .
It easily describes how LIKE works in terms of performance.
Likely, you are facing such issues because your whole table shoould be traversed because of first % symbol.
You should try creating a list of substrings(in a separate table for example) representing k-mers and search for them without preceeding %. Also, an index for such column would help. Please read more about KMer here https://en.m.wikipedia.org/wiki/K-mer .
This will not break an index and will be more efficient to search.