Gremlin + Neo4j Lucene search - lucene

Does this gremlin script (executed via REST API of Neo4j) executes the sorting on the lucene index? Or are the nodes sorted in Neo4j?
g.idx('myIndex').get('name', 'aaa').sort{it.name}
Two additional questions:
1. How to set ordering? ASC/DESC
2. How to perform a fulltext search (LIKE). I already tried *, %, nothing worked

sort is a Groovy method. To reverse the order, use reverse:
g.idx('myIndex').get('name', 'aaa').sort{it.name}.reverse()
See:
http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html
http://groovy.codehaus.org/groovy-jdk/java/util/List.html

Besides doing what espeed suggested, which is using Gremlin's facilities to sort etc, you may also be interested in passing additional instructions down to Lucene itself. This can be done by prefixing the second argument into get with a magic string %query%. Like so:
... .get(null, "%query% _start_node_id_:15815486")
The key argument can be null if you don't need to use it.

Related

What does the Liferay documentation mean by "without using the indexer"

In the Liferay documentation, many *LocalServiceUtil classes have search methods with the following documentation:
Returns an ordered range of all the [...] matching the parameters without using the indexer, including keyword parameters for [...].
What does the without using the indexer part of the sentence mean?
In particular, does it mean that it does not use any database indexes? Does it mean that for instance JournalArticleLocalServiceUtil.search can be expected to run much slower than the equivalent JournalArticleLocalServiceUtil.getArticles? Or is it a different meaning?
Or does this indexer refer to the indexes in the result set in the same method's documentation, maybe?
The indexer refers to searchengine indexers such as those using Lucene, Solr, Elastic (or similar) implementations.
search and getArticles operations will query the database - if you do a keyword search your database might not use in (DB) index, because content or title are not part of an index by default. Therefore, when there is a bigger amount of articles, a keyword searchengine query might lead to a better response time.

Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting

I am going to switch to newest (4.10.2) version of Lucene and I'd like to make some optimization in my index and code.
I would like to use DocValuesField to get values but also for filtering and sorting.
So here I have some questions:
If I'd like to use range filter (FieldCacheRangeFilter) I need to store a value in XxxDocValuesField,
but if i want to use terms filter (FieldCacheTermsFilter) I need to store a value in SortedDocValuesField.
So it looks like if I want to use range and terms filters I need to have two different fields. Am I right? Am I using it correctly?
Another thing is Sort. I can choose between SortedNumericSortField and SortField. First one requires SortedNumericDocValues, another NumericDocValuesField. Is there any(big) difference in performance?
Should I use SortedNumericSortField (adding another field to the index)?
And the last one. Am I right that all corresponding DocValuesField will be removed from index when doc is removed? I saw an IndexWriter method for an update doc value but no delete method for doc value.
Regards
Piotr

Neo4j index for full text search

I am working on neo4j database version 2.0.I have following requirements :
Case 1. I want to fetch all records where name contains some string,for example if i am searching for Neo4j then all records having name Neo4j Data,Neo4j Database,Neo4jDatabase etc. should be returned.
Case 2. When i want to fire field less query,if a set of properties is having matching value then those records should be returned or it may also be global level instead of label level.
Case Sensitivity is also a point.
I have read multiple thing about like,index,full text search,legacy index etc.,so what will be the best fit for my case,or i have to use elastic search etc.
I am using spring-data-neo4j in my application,so provide some configuration for SDN
Annotate your name with #Indexed annotation:
#Indexed(indexName = "whateverIndexName", indexType = IndexType.FULLTEXT)
private String name;
Then query for it following way (example for method in SDN repository, you can use similar anywhere else you use cypher):
#Query("START n=node:whateverIndexName({query}) return n"
Set<Topic> findByName(#Param("query") String query);
Neo4j uses lucene as backend for indexing so the query value must be a valid lucene query, e.g. "name:neo4j" or "name:neo4j*".
There is an article that explains the confusion around various Neo4j indexes http://nigelsmall.com/neo4j/index-confusion.
I don't think you need to be using elastic search-- you can use the legacy indexes or the lucene indexes to do full text searches.
Check out Michael Hunger's blog: jexp.de/blog
thix post specifically: http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/

RavenDb. Sorting syntax in Lucene query using Index

I am trying to write Lucene query in RavenDB Index which will return results sorted by some field.
But still no success.
The Query looks like:
Language:EN AND Key:*car* AND sort=KEY
Question:
Is it possible at all to add sorting statement in query?
If yes, how the sorting syntax looks like?
No. The sort parameter is passed along with the query, but not within the query. If you're using the C# client API you'll have an operator OrderBy, otherwise you're most likely using the REST API, in which the sort parameter is being passed as an additional URL parameter

Lucene Tag Searching problems with C#, escape problems?

I am using lucene 2.9.2 (.NET doesnt have a lucene 3)
"tag:C#" Gets me the same results as "tag:c". How do i allow 'C#' to be a searchword? i tried changing Field.Index.ANALYZED to Field.Index.NOT_ANALYZED but that gave me no results.
I assuming i need to escape each tag, how might i do that?
The problem isn't the query, its the query analyzer you are using which is removing the "#" from both the query and (if you are using the same analyzer for insertion - which you should be) and the field.
You will need to find an analyzer that preserves special characters like that or write a custom one.
Edit: Check out KeywordAnalyzer - it might just do the trick:
"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.
According to the Java Documentation for Lucene 2.9.2 '#' is not a special character, which needs escaping in the Query. Can you check out (i.e. by opening the index with Luke), how the value 'C#' is actually stored in the index?