Getting exact matches in Lucene using the standard analyzer? - lucene

Given 2 documents with the content as follows
"I love Lucene"
"Lucene is nice"
I want to be able to query lucene only for those documents with Lucene in the beginning , i.e , everything that will match the regexp "^Lucene .*".
Is there a way to do it , provided that I can't change the index , and it was analyzed using the standard analyzer?

Sure, take a look at SpanFirstQuery. Here is a good tutorial:
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/

Related

How to extract a single document out of a Lucene 4.0 index?

This may be one of the simplest and dullest questions ever, but after indexing all the Documents in Lucene, how can I extract one Document only that has a specified id stored e.g. in a StringField? It should be an equivalent to e.g. an SQL-expression like
Select id, description
from index
where id = '1'
Where the Document has two Fields, an ID and a description.
I already apologyze if this question had been asked too many times before etc. but after hours of searching the internet with probably wrong search terms, I decided to ask it here :)
The Lucene demo shows how to use Lucene's standard QueryParser to search for documents: http://lucene.apache.org/core/4_1_0/demo/overview-summary.html#overview_description
Here is a an excellent tutorial on Lucene : Lucene in 5 minutes
It will indeed take only 5 minutes, you will find the answer in sections Search, Display. You will find the query formation for your requirements in the "Query" section

How to extract the information from one resume using lucene

everyone!
I am a fresh man to Lucene.
And I am working on a resume filter project using lucene . Firstly I want to extract some basic informations such as bithday etc from the resumes .
Suppose there is always one line says that birthday: 1989/10/19 or something like this . How could I extract this kind of info with Lucene instead of directly using regular expression.
currently I find maybe use SpanNearQuery will be helpful . But it seems that I can not add a WildcardQuery to the SpanNearQuery to match the birthday info.
I have totally got stucked . Any good suggestions ? Really Appreciate!
There is not magic bullet to extract dates from a Lucene field that includes a bunch of text and a date format inside it. The best way would be to write a custom analyzer that can break the terms apart during the indexing process and identify the numerical characters as a date.
I wrote a couple Analyzers for Lucene, however something like that is not really trivial...especially if you are new to Lucene.

Standard Analyser in lucene

is standard Analyzer in lucene is equall to
Select * from table where name Like raaga
will it searches only the exact match alone
The short answer is: No.
You are comparing apples and oranges.
Here is the StandardAnalyzer API.
Here is a Lucene tutorial to give some context to the analyzer/query parser/search.
Analyzers are used for the tokenization of the Query for the QuerySearch phase. It is also used for the tokenization phase for the analyzed fields.
Your searches would depend on the type of Query you are running.

Lucene Fuzzy Match on Phrase instead of Single Word

I'm trying to do a fuzzy match on the Phrase "Grand Prarie" (deliberately misspelled) using Apache Lucene. Part of my issue is that the ~ operator only does fuzzy matches on single word terms and behaves as a proximity match for phrases.
Is there a way to do a fuzzy match on a phrase with lucene?
Lucene 3.0 has ComplexPhraseQueryParser that supports fuzzy phrase query. This is in the contrib package.
Came across this through Google and felt solutions where not what I was after.
In my case, solution was to simply repeat the search sequence against the solr API.
So for example if I was looking for: title_t to include match for "dog~" and "cat~", I added some manual code to generate query as:
((title_t:dog~) and (title_t:cat~))
It might just be what above queries are about, however links seems dead.
There's no direct support for a fuzzy phrase, but you can simulate it by explicitly enumerating the fuzzy terms and then adding them to a MultiPhraseQuery. The resulting query would look like:
<MultiPhraseQuery: "grand (prarie prairie)">

Retrieving per keyword/field match position in Lucene Solr -- possible?

Is there any way to retrieve the match field/position for each keyword for each matching document from solr?
For example, if the document has title "Retrieving per keyword/field match position in Lucene Solr -- possible?" and the query is "solr keyword", I'd like to get, in addition to the doc-id (I normally only want the doc-id, not the full document), something that can tell me the matches are at:
solr:
title: 9
keyword:
title: 3
I'm pretty sure such info is computing during query execution (for phrase queries), but is it possible to return these to the application?
Thanks!
Debugging Relevance Issues in Search suggest using Solr analysis, which you can get to from the admin URL, using something like http://localhost:8983/solr/admin/analysis.jsp?highlight=on .
This highlights matching terms and gives their position.
AFAIK there is no way to do that directly, but you can use hit highlighting to implement it.