Search by field in Lucene - lucene

Although being a total newbie, may be this question is pretty naive.
I want to search my index based on the index. So I tried created a document with just one index, Name, and then want to search for that particular field.
I am doing this in process of trying to find out if I can update the fields of a document without actually deleting a document in lucene.
Thanks.

You can search for words within a particular field with the colon syntax i.e. name:john.
But because a lot of indexes just have one field you are going to want to search on, there is a default field, in case you just search for john. You can set which field that is when you instanciate your QueryParser
QueryParser parser = new QueryParser(Version.LUCENE_30, "name", anAnalyzer);
Query q = parser.parse("john");
If you want to create your queries programmatically rather than parsing a user-entered query string, then you also have to specify the field explicitly, for example:
Query q = new TermQuery(new Term("name", "john"));
Links: Using fields in Lucene queries (Lucene Query Syntax) | QueryParser Javadoc | TermQuery Javadoc

I am doing this in process of trying to find out if I can update the fields of a document without actually deleting a document in lucene.
I do not understand the first question, but you cannot update a document in Lucene. You have to delete and re-insert.

Related

Sitecore: Full text search using lucene

I'm using sitecore 8 and I'm looking for a way to run a full text search for all my sitecore content. I have a solution in place, but I feel there's got to be a better way to do this.
My approach:
i have a computed field that merges all text fields into a single computed field. Before I execute a search I tokenize my search text and build a ORed predicate to match on the field.
I do not like this approach because it gets really complicated if I need to boost items that match the title vs the body i.e. i loose the field level boosting.
FYI: my code is very similar to this so post.
Thanks
Sitecore already maintains a full text field, _content, that contains all the text fields. You can run your search against that. You can even create computed fields that add to _content (such as the datasource content example here).
So assuming you are building a LINQ query for your full text search, and have already filtered on templates, latest version, location, etc., adding your search terms to the query would look something like this:
var terms = SearchTerm.Split();
var currentExpression = PredicateBuilder.True<SiteSearchResultItem>();
foreach (var term in terms)
{
//Content is mapped to _content
currentExpression = PredicateBuilder.And(currentExpression, x => x.Content.Contains(term));
}
query = query.Where(currentExpression);
Typically you would want to AND search terms rather than ORing them.
You are right that field level boosting is lost in this. In the end, Lucene is not a great solution for creating a quality full-text site search. If this is an important requirement, you may want to look at Coveo or even something like a Google Site Search.

Examine lucene.net custom query after analyzer tokenizes

I'm using Examine in Umbraco to query Lucene index of content nodes. I have a field "completeNodeText" that is the concatenation of all the node properties (to keep things simple and not search across multiple fields).
I'm accepting user-submitted search terms. When the search term is multiple words (ie, "firstterm secondterm"), I want the resulting query to be an OR query: Bring me back results where fullNodeText is firstterm OR secondterm.
I want:
{+completeNodeText:"firstterm ? secondterm"}
but instead, I'm getting:
{+completeNodeText:"firstterm secondterm"}
If I search for "firstterm OR secondterm" instead of "firstterm secondterm", then the generated query is correctly: {+completeNodeText:"firstterm ? secondterm"}
I'm using the following API calls:
var searcher = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var searchCriteria = searcher.CreateSearchCriteria();
var query = searchCriteria.Field("completeNodeText", term).Compile();
Is there an easy way to force Examine to generate this "OR" query? Or do I have to manually construct the raw query by calling the StandardAnalyzer to tokenize the user input and concatenating together a query by iterating through the tokens? And bypassing the entire Examine fluent query API?
I don't think that question mark means what you think it means.
It looks like you are generating a PhraseQuery, but you want two disjoint TermQueries. In Lucene query syntax, a phrase query is enclosed in quotes.
"firstterm secondterm"
A phrase query is looking for precisely that phrase, with the two terms appearing consecutively, and in order. Placing an OR within a phrase query does not perform any sort of boolean logic, but rather treats it as the word "OR". The question mark is a placeholder using in PhraseQuery.toString() to represent a removed stop word (See #Lucene-1396). You are still performing a phrasequery, but now it is expecting a three word phrase firstterm, followed by a removed stop word, followed by secondterm
To simply search for two separate terms, get rid of the quotes.
firstterm secondterm
Will search for any document with either of those terms (with higher score given to documents with both).

Will Lucene ALWAYS return ALL the documents that match my query same way as the SQL select query does?

I'm using Lucene to index the values that I'm storing in an object database. I'm storing a reference (UUID) to the object along with the field names and their corresponding values (Lucene Fields) in the Lucene Document.
My question is will Lucene ALWAYS return ALL the documents that match my query?
Thanks.
it depends on analyzer which you are using and also you can limit the no of result while searching.
for better searching you also can use Apache's open source search platform - Solr.

In Lucene, using a Standard Analyzer, I want to make fields with spaces searchable

In Lucene, using a Standard Analyzer, I want to make fields with space searchable.
I set Field.Index.NOT_ANALYZED and Field.Store.YES using the StandardAnalyzer
When I look at my index in LUKE, the fields are as I expected, a field and a value such as:
location -> 'New York'.
Here I found that I can use the KeywordAnalyzer to find this value using the query:
location:"New York".
But I want to add another term to the query. Let's say a have a body field which contains the normalized and analyzed terms created by the StandardAnalyzer. Using the KeywordAnalyzer for this field I get different results than when I use the StandardAnalyzer.
How do I combine two Analyzers in one QueryParser, where one Analyzer works for some fields and another one for another fields. I though of creating my own Analyzer which could behave differently depending on the field, but I have no clue how to do it.
PerFieldAnalyzerWrapper lets you apply different analyzers for different fields.

Searching for multiple terms in a field

I want to do an AND query, say 'foo AND bar', in Lucene.NET. I have a WholeIndex field which has the whole document indexed, and I want Lucene to search in the whole document.
Up to here it's quite easy, but there's a constraint.
I want both terms 'foo' and 'bar' to be in the same field.
Is there an easy way to do this without querying the index for the full list of fields and searching in every field?
Edit: What I want to know is if there is a way to tell Lucene to perform a search in every field, without having to know all the fields in my index. An automated way to search the following:
"field1:(+foo +bar) field2:(+foo +bar) ... fieldN:(+foo +bar)"
You can use GetFieldNames to get all the field names, and then go programmatically over the list and generating a query like the one you wrote, using BooleanQuery.