how to search word from String field in Lucene Index - lucene

How to search word from Lucene index String field ?
i have lucene index with field TITLE ,containts Document titles
eg:TV not working,Mobile not working
i want to search particular word from title .
code below gives me result from Full content,if i change FULL_CONTENET to TITLE then i dont get any results.
Query qry = null;
qry = new QueryParser(FULL_CONTENT, new SimpleAnalyzer()).parse("not");
Searcher searcher = null;
searcher = new IndexSearcher(indexDirectory);
Hits hits = null;
hits = searcher.search(qry);
System.out.println(hits.length());

As "NOT" is a Lucene query syntax operator, that may be your problem.

The problem is StringAnalyzer applies a Lower Case filter. Your query will be in lower case:
e.g. title:mobile.
StringField doesn't apply any analysis so your text will be indexed as is. If you change StringField to TextField it will be analyzed by the StringAnalyzer and get converted to lower case in the index.
If you replace StringAnalyzer with WhitespaceAnalyzer there is no Lower Case filter and it will work again (because your query doesn't get converted to lower case).

Related

Lucene -Lexical error while parsing Proximity query

I write a code for dynamic search on a database while using lucene.net.
I started creating queries and find the position of the results, It worked great!!
but when I used Proximity Searches, I get an error:
Lexical error at line 1, column 72. Encountered: after : "\" "
my Searching function:
private static List<String> GeneralSearch(string txt, Table type)
{
txt= "10~" + txt;
string newQuery = "";
foreach (var field in fields[type])
{
newQuery += field + ": " + txt + " OR ";
}
newQuery = newQuery.Substring(0, newQuery.Length - 4)+" ";
parser.MultiTermRewriteMethod =
MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE;
BooleanQuery bq = new BooleanQuery();
Query query = parser.Parse(newQuery);
bq.Add(query, Occur.MUST);
bq.Add(new TermQuery(new Term("tbl", type.ToString())), Occur.MUST);
TopDocs hits = searcher.Search(bq, reader.MaxDoc);........
The "txt" variable contained a query like that:
txt= "I like to read"
The function create a new query for searching on all the field of specific table
title: 10~"I like to read" OR content: 10~"I like to read"
I think my problem is maybe that the language alignment was right to left.
If you have an idea, it will help me !!
I can't speak to the specific error, however your query is malformed in two ways
The slop (proximity) operator must trail a query not lead the query
Literal phrase queries must be enclosed with double quotes
It's wise to log the result of a query parse with Query.ToString(). Assuming StandardAnalyzer, your query is parsing to something like this:
(text:10~0.5 text:i text:like text:read) +tbl:somevalue
What you think is your slop is parsed as a term query with the default slop value of 0.5
text:10~0.5
and what you thought was a phrase query is in reality parsing to multiple term queries because your phrase is not double quoted:
text:i text:like text:read
You want your raw query to look something like this:
text: "I like to read"~10
Here's a nice guide regarding Lucene query syntax. Good luck!

how to get search hits when at least one character present in field value using lucene search

How do I get search hits when at least one character searched is present in a field's value, using lucene search?
I got search hits only when I search with a complete word.
Example:
Hello world
In above example, if I enter "Hello", then I will get a hit, but not if I enter "Hel"
Here is my code to get hits:
QueryParser parser = null;
Query query = null;
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT, new HashSet());
BooleanQuery.setMaxClauseCount(32767);
parser = new QueryParser("fieldname", analyzer);
parser.setAllowLeadingWildcard(true);
query = parser.parse("searchString");
TopDocs topResultDocs = searcher.search(query, null, 20);
Always append * to the query to get all suffix matches: Hel* will match Hello.

lucene - most relevant search and sort the results

I am trying to make a search page based on the data we have. Here is my code.
SortField sortField = new SortField(TEXT_FIELD_RANK, SortField.Type.INT, true);
Sort sort = new Sort(sortField);
Query q = queryParser.parse(useQuery);
TopDocs topDocs = searcher.search(q, totalLimit, sort);
ScoreDoc[] hits = topDocs.scoreDocs;
log.info("totalResults="+ topDocs.totalHits);
int index = getStartIndex(start, maxReturn);
int resultsLength = start * maxReturn;
if (resultsLength > totalLimit) {
resultsLength = totalLimit;
}
log.info("index:"+ index + "==resultsLength:"+ resultsLength);
for (int i = index; i < resultsLength; ++i) {
}
Basically, here is my requirement. If there is an exact match, I need to display the exact match. If there is no exact match, I need to sort the results by the field. So i check the exact match inside the for loop.
But it seems to me that it sorts the results no matter what, so even though there is an exact match, it doesn't show up at the first page.
Thanks.
You set it to Sort on a field value, not on relevance, so there is no guarantee that the best matches will be on the first page. You can sort by Relevance first, then on your field value, like:
Sort sort = new Sort(SortField.FIELD_SCORE, sortField);
If that is what you were looking for.
Otherwise, if you are looking to ignore relevance for anything except a direct match, you could query using a more restrictive (exact matching) query first, getting your exact matches as an entirely separate result set.

searching lucene index on multiple fields

I have an index with 2 content fields (analyzed, indexed & stored):
for example: name , hobbies. (The hobbies field can be added multiple times with different values).
I have another field that is only indexed (un_analyzed & not stored) used for filtering:
for example: country_code
Now, I want to build a query that will retrieve documents that match (as best as possible) to some "search" input field but only such documents where country_code has some exact value.
What would be the most suitable combination query syntax / query parser to use to build such a query.
You can use the following query:
country_code:india +(name:search_value OR hobbies:search_value)
Why don't you start with QueryParser, it might work for your use case and it requires the least amount of effort.
It's not clear from your question, but let's assume you have a single input field ('search') and a combobox for the country code. You would then read those values and create a query:
// you don't have to use two parsers, you can do this using one.
QueryParser nameParser = new QueryParser(Version.LUCENE_CURRENT, "name", your_analyzer);
QueryParser hobbiesParser = new QueryParser(Version.LUCENE_CURRENT, "hobbies", your_analyzer);
BooleanQuery q = new BooleanQuery();
q.add(nameParser.parser(query), BooleanClause.Occur.SHOULD);
q.add(hobbiesParser.parser(query), BooleanClause.Occur.SHOULD);BooleanClause.Occur.SHOULD);
/* Filtering by country code can be done using a BooleanQuery
* or a filter, the difference will be how Lucene scores matches.
* For example, using a filter:
*/
Filter countryCodeFilter = new QueryWrapperFilter(new TermQuery(new Term("country_code", )));
//and finally searching:
TopDocs topDocs = searcher.search(q, countryCodeFilter, 10);

How to get reliable docid from Lucene 3.0.3?

I would like to get the int docid of a Document I just added to a Lucene index so that I can stick it into a Filter to update a standing query. My documents have a unique external id, so I thought that doing a TermDocs enumeration on the unique id would return the correct document, like this:
protected int getDocId(IndexReader reader, String idField, Document doc) throws IOException {
String id = doc.get(idField);
TermDocs termDocs = reader.termDocs(new Term(idField, id));
int docid = -1;
while (termDocs.next()) {
docid = termDocs.doc();
Document aDoc = reader.document(docid);
String docIdString = aDoc.get(idField);
System.out.println(docIdString + ": " + docid);
}
return docid;
}
Unfortunately, this loops and loops, returning the same docIdString and increasing docids.
What is the recommended way to get the docids for newly-added documents so that I could use them in a Filter immediately after the documents are commited?
The doc Id of a document is not the same as the value in your id field. The doc ID is an internal Lucene identifier, which you probably shouldn't access. Your field is just a field - you can call it "ID", but Lucene won't do anything special with it.
Why are you trying to manually update the filter? When you commit, merges can happen etc. so the IDs before will not be the same as the IDs afterwards. (Which is just an example of the general point that you shouldn't rely on Lucene IDs for anything.) So you don't need to just add that one document to the filter, you need to update the whole thing.
To update a cached filter, just run a query for "foo" and use your filter with a CachingWrapperFilter.
EDIT: Because your id field is just a field, you do a search for it like you would anything else:
TopDocs results = searcher.Search(new TermQuery(new Term("MyIDField", Id)), 1);
int internalId = results.scoreDocs[0].doc;
However, like I said, I think you want to ignore internal IDs. So I would build a filter from a query:
BooleanQuery filterQuery = new BooleanQuery(); // or get existing query from cache
filterQuery.Add(new TermQuery(new Term("MyIdField", Id)), BooleanClause.Occur.SHOULD);
// add more sub queries for each ID you want in the filter here
Filter myFilter = new CachingWrapperFilter(new QueryWrapperFilter(filterQuery));