Neo4j index for full text search - indexing

I am working on neo4j database version 2.0.I have following requirements :
Case 1. I want to fetch all records where name contains some string,for example if i am searching for Neo4j then all records having name Neo4j Data,Neo4j Database,Neo4jDatabase etc. should be returned.
Case 2. When i want to fire field less query,if a set of properties is having matching value then those records should be returned or it may also be global level instead of label level.
Case Sensitivity is also a point.
I have read multiple thing about like,index,full text search,legacy index etc.,so what will be the best fit for my case,or i have to use elastic search etc.
I am using spring-data-neo4j in my application,so provide some configuration for SDN

Annotate your name with #Indexed annotation:
#Indexed(indexName = "whateverIndexName", indexType = IndexType.FULLTEXT)
private String name;
Then query for it following way (example for method in SDN repository, you can use similar anywhere else you use cypher):
#Query("START n=node:whateverIndexName({query}) return n"
Set<Topic> findByName(#Param("query") String query);
Neo4j uses lucene as backend for indexing so the query value must be a valid lucene query, e.g. "name:neo4j" or "name:neo4j*".
There is an article that explains the confusion around various Neo4j indexes http://nigelsmall.com/neo4j/index-confusion.

I don't think you need to be using elastic search-- you can use the legacy indexes or the lucene indexes to do full text searches.
Check out Michael Hunger's blog: jexp.de/blog
thix post specifically: http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/

Related

SOLR indexed item has extra word which is not available in query parameter - how to identify those cases?

We have a scenario where we are trying to perform accurate name matching of Items using SOLR.
Query Parameter: Apple
SOLR Indexed Word: Apple-D
In our business case, "Apple" and "Apple-D" are totally different items and therefore SOLR shouldn't return the match.
Is there an option to achieve the same?
You need to change the fieldType used for the field. Use the String fieldType for the your field.
This String fieldType will make sure that the words will be stored as it is by solr.
It won't apply any analysis on the word. Or it won't create any tokes of it.
With the String type applied to it . The Apple and Apple-D are stored/indexed different token. As there won't be any tokenizing on the same. This will help you to achieve the exact match.
Once you change the fieldType. Re-index the same.
You can use the solr analysis tool to check how it is indexing and querying .
Note : Make sure whenever you ask question on it, Share your schema.xml

Neo4j - Querying with Lucene

I am using Neo4j embedded as database. I have to store thousands of articles daily and and I need to provide a search functionality where I should return the articles whose content match to the keywords entered by the users. I indexed the content of each and every article and queried on the index like below
val articles = article_content_index.query("article_content", search string)
This works fine. But, its taking lot of time when the search string contains common words like "the", "a" and etc which will be present in each and every article.
How do I solve this problem?
Probably a lucene issue.
You can configure your own analyzer which could leave off those frequent (stop-)words:
http://docs.neo4j.org/chunked/stable/indexing-create-advanced.html
http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/analysis/Analyzer.html
http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html
You might configure article_content_index as fulltext index, see http://docs.neo4j.org/chunked/stable/indexing-create-advanced.html. To switch to using fulltext index, you first have to remove the index and the first usage of IndexManager.forNodes(String, Map) needs to configure the index on creation properly.

Gremlin + Neo4j Lucene search

Does this gremlin script (executed via REST API of Neo4j) executes the sorting on the lucene index? Or are the nodes sorted in Neo4j?
g.idx('myIndex').get('name', 'aaa').sort{it.name}
Two additional questions:
1. How to set ordering? ASC/DESC
2. How to perform a fulltext search (LIKE). I already tried *, %, nothing worked
sort is a Groovy method. To reverse the order, use reverse:
g.idx('myIndex').get('name', 'aaa').sort{it.name}.reverse()
See:
http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html
http://groovy.codehaus.org/groovy-jdk/java/util/List.html
Besides doing what espeed suggested, which is using Gremlin's facilities to sort etc, you may also be interested in passing additional instructions down to Lucene itself. This can be done by prefixing the second argument into get with a magic string %query%. Like so:
... .get(null, "%query% _start_node_id_:15815486")
The key argument can be null if you don't need to use it.

Fulltext Solr statistical search

Consider I'm having a couple of documents indexed with Solr 4.0. Each has 2 fields - unique ID and text DATA field. DATA field contains few paragraphs of text. Who could advise me what kind of analyzers/parsers I should use and how to build statistical query to find out sorted list of most frequently used words in all DATA fields of all documents.
for the most frequent terms look into the terms- and statistical component
besides the answers mentioned here, you can use the "HighFreqTerms" class: its in the lucene-misc-4.0 jar (which is bundled with Solr).
This is a command line application which lets you see the top terms for any field either by document frequency or by total term frequency (the -t option)
Here is the usage:
java org.apache.lucene.misc.HighFreqTerms [-t] [number_terms] [field]
-t: include totalTermFreq
Here's the original patch, which is committed and in the 4.0 (trunk) and branch_3x codebases: https://issues.apache.org/jira/browse/LUCENE-2393
For ID field use analyzer based on keyword tokenizer - it will take all the content of the field as a single token.
For DATA field use language specific analyzer. Notice, that there's possibility to auto-detect the language of the text (patch).
I'm not sure, if it's possible to find the most frequent words with Solr, but if you can use Lucene itself, pay attention to this question. My own suggestion for it is to use HighFreqTerms class from Luke project.

Case-insensitive search using Hibernate

I'm using Hibernate for ORM of my Java app to an Oracle database (not that the database vendor matters, we may switch to another database one day), and I want to retrieve objects from the database according to user-provided strings. For example, when searching for people, if the user is looking for people who live in 'fran', I want to be able to give her people in San Francisco.
SQL is not my strong suit, and I prefer Hibernate's Criteria building code to hard-coded strings as it is. Can anyone point me in the right direction about how to do this in code, and if impossible, how the hard-coded SQL should look like?
Thanks,
Yuval =8-)
For the simple case you describe, look at Restrictions.ilike(), which does a case-insensitive search.
Criteria crit = session.createCriteria(Person.class);
crit.add(Restrictions.ilike('town', '%fran%');
List results = crit.list();
Criteria crit = session.createCriteria(Person.class);
crit.add(Restrictions.ilike('town', 'fran', MatchMode.ANYWHERE);
List results = crit.list();
If you use Spring's HibernateTemplate to interact with Hibernate, here is how you would do a case insensitive search on a user's email address:
getHibernateTemplate().find("from User where upper(email)=?", emailAddr.toUpperCase());
You also do not have to put in the '%' wildcards. You can pass MatchMode (docs for previous releases here) in to tell the search how to behave. START, ANYWHERE, EXACT, and END matches are the options.
The usual approach to ignoring case is to convert both the database values and the input value to upper or lower case - the resultant sql would have something like
select f.name from f where TO_UPPER(f.name) like '%FRAN%'
In hibernate criteria restrictions.like(...).ignoreCase()
I'm more familiar with Nhibernate so the syntax might not be 100% accurate
for some more info see pro hibernate 3 extract and hibernate docs 15.2. Narrowing the result set
This can also be done using the criterion Example, in the org.hibernate.criterion package.
public List findLike(Object entity, MatchMode matchMode) {
Example example = Example.create(entity);
example.enableLike(matchMode);
example.ignoreCase();
return getSession().createCriteria(entity.getClass()).add(
example).list();
}
Just another way that I find useful to accomplish the above.
Since Hibernate 5.2 session.createCriteria is deprecated. Below is solution using JPA 2 CriteriaBuilder. It uses like and upper:
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<Person> criteria = builder.createQuery(Person.class);
Root<Person> root = criteria.from(Person.class);
Expression<String> upper = builder.upper(root.get("town"));
criteria.where(builder.like(upper, "%FRAN%"));
session.createQuery(criteria.select(root)).getResultList();
Most default database collations are not case-sensitive, but in the SQL Server world it can be set at the instance, the database, and the column level.
You could look at using Compass a wrapper above lucene.
http://www.compass-project.org/
By adding a few annotations to your domain objects you get achieve this kind of thing.
Compass provides a simple API for working with Lucene. If you know how to use an ORM, then you will feel right at home with Compass with simple operations for save, and delete & query.
From the site itself.
"Building on top of Lucene, Compass simplifies common usage patterns of Lucene such as google-style search, index updates as well as more advanced concepts such as caching and index sharding (sub indexes). Compass also uses built in optimizations for concurrent commits and merges."
I have used this in the past and I find it great.