Will Lucene ALWAYS return ALL the documents that match my query same way as the SQL select query does? - lucene

I'm using Lucene to index the values that I'm storing in an object database. I'm storing a reference (UUID) to the object along with the field names and their corresponding values (Lucene Fields) in the Lucene Document.
My question is will Lucene ALWAYS return ALL the documents that match my query?
Thanks.

it depends on analyzer which you are using and also you can limit the no of result while searching.
for better searching you also can use Apache's open source search platform - Solr.

Related

Lucene Field.Store.YES versus Field.Store.NO

Will someone please explain under what circumstance I may use Field.Store.NO instead of Field.Store.YES? I am extremely new to Lucene. And I am trying to create a document. Per my basic knowledge, I am doing
doc.add(new StringField(fieldNameA,fieldValueA,Field.Store.YES));
doc.add(new TextField(fieldNameB,fieldValueB,Field.Store.YES));
There are two basic ways a document can be written into Lucene.
Indexed - The field is analyzed and indexed, and can be searched.
Stored - The field's full text is stored and will be returned with search results.
If a document is indexed but not stored, you can search for it, but it won't be returned with search results.
One reasonably common pattern is to use lucene for search, but only have an ID field being stored which can be used to retrieve the full contents of the document/record from, for instance, a SQL database, a file system, or an web resource.
You might also opt not to store a field when that field is just a search tool, but you wouldn't display it to the user, such as a soundex/metaphone, or an alternate analysis of a content field.
Use Field.Store.YES when you need a document back from Lucene document. Use NO when you just need a search from document. Here is a link explained with a scenario.
https://handyopinion.com/java-lucene-saving-fields-or-not/

Neo4j index for full text search

I am working on neo4j database version 2.0.I have following requirements :
Case 1. I want to fetch all records where name contains some string,for example if i am searching for Neo4j then all records having name Neo4j Data,Neo4j Database,Neo4jDatabase etc. should be returned.
Case 2. When i want to fire field less query,if a set of properties is having matching value then those records should be returned or it may also be global level instead of label level.
Case Sensitivity is also a point.
I have read multiple thing about like,index,full text search,legacy index etc.,so what will be the best fit for my case,or i have to use elastic search etc.
I am using spring-data-neo4j in my application,so provide some configuration for SDN
Annotate your name with #Indexed annotation:
#Indexed(indexName = "whateverIndexName", indexType = IndexType.FULLTEXT)
private String name;
Then query for it following way (example for method in SDN repository, you can use similar anywhere else you use cypher):
#Query("START n=node:whateverIndexName({query}) return n"
Set<Topic> findByName(#Param("query") String query);
Neo4j uses lucene as backend for indexing so the query value must be a valid lucene query, e.g. "name:neo4j" or "name:neo4j*".
There is an article that explains the confusion around various Neo4j indexes http://nigelsmall.com/neo4j/index-confusion.
I don't think you need to be using elastic search-- you can use the legacy indexes or the lucene indexes to do full text searches.
Check out Michael Hunger's blog: jexp.de/blog
thix post specifically: http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/

Neo4j - Querying with Lucene

I am using Neo4j embedded as database. I have to store thousands of articles daily and and I need to provide a search functionality where I should return the articles whose content match to the keywords entered by the users. I indexed the content of each and every article and queried on the index like below
val articles = article_content_index.query("article_content", search string)
This works fine. But, its taking lot of time when the search string contains common words like "the", "a" and etc which will be present in each and every article.
How do I solve this problem?
Probably a lucene issue.
You can configure your own analyzer which could leave off those frequent (stop-)words:
http://docs.neo4j.org/chunked/stable/indexing-create-advanced.html
http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/analysis/Analyzer.html
http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html
You might configure article_content_index as fulltext index, see http://docs.neo4j.org/chunked/stable/indexing-create-advanced.html. To switch to using fulltext index, you first have to remove the index and the first usage of IndexManager.forNodes(String, Map) needs to configure the index on creation properly.

Sitecore Search Ranking

Does Sitecore/Lucene support filtering/ranking of content?
I cannot find any related documentation.
Lucene returns ranked results, and you can structure queries to filter results using the QueryOccurance.MustNot clause, or to boost results using the QueryOccurance.Should clause.
From Sitecore's documentation of the QueryOccurance class:
Lucene uses the following operators for the search terms in complex
queries:
 Must – the search term must occur in the document to be
included into the search results.
 Should – the search term may occur
in the document but is not necessary, and the document may be
included in search results based on other criteria. However, the
documents containing the search term are ranked higher than
equivalent documents that do not contain the search term.
 Must not
– the search term must not occur in the document in order to be
included in the search results. Documents with the search term will
be excluded from the results
Some additional resources for Lucene in Sitecore:
Syntax of Lucene Queries: http://sitecoregadgets.blogspot.com/2009/11/working-with-lucene-search-index-in_25.html
Lucene Walkthrough: http://learnsitecore.cmsuniverse.net/en/Developers/Articles/2009/06/LuceneQuery1.aspx
Alex Shyba's Lucene posts: http://sitecoreblog.alexshyba.com/search/label/lucene
This question may also be useful: Sitecore + Lucene + QueryOccurance.Should not returning desired results
Sitecore has built-in sitecore_master_content, sitecore_web_content, sitecore_core_content indexes which are indexing all the content in Sitecore and already have an API to search for these indexes. You can specify boosting value in Sitecore "Indexing" item section (by default it's empty).
Also you can set boosting for the fields in your search query.

Search by field in Lucene

Although being a total newbie, may be this question is pretty naive.
I want to search my index based on the index. So I tried created a document with just one index, Name, and then want to search for that particular field.
I am doing this in process of trying to find out if I can update the fields of a document without actually deleting a document in lucene.
Thanks.
You can search for words within a particular field with the colon syntax i.e. name:john.
But because a lot of indexes just have one field you are going to want to search on, there is a default field, in case you just search for john. You can set which field that is when you instanciate your QueryParser
QueryParser parser = new QueryParser(Version.LUCENE_30, "name", anAnalyzer);
Query q = parser.parse("john");
If you want to create your queries programmatically rather than parsing a user-entered query string, then you also have to specify the field explicitly, for example:
Query q = new TermQuery(new Term("name", "john"));
Links: Using fields in Lucene queries (Lucene Query Syntax) | QueryParser Javadoc | TermQuery Javadoc
I am doing this in process of trying to find out if I can update the fields of a document without actually deleting a document in lucene.
I do not understand the first question, but you cannot update a document in Lucene. You have to delete and re-insert.