lucene - most relevant search and sort the results - lucene

I am trying to make a search page based on the data we have. Here is my code.
SortField sortField = new SortField(TEXT_FIELD_RANK, SortField.Type.INT, true);
Sort sort = new Sort(sortField);
Query q = queryParser.parse(useQuery);
TopDocs topDocs = searcher.search(q, totalLimit, sort);
ScoreDoc[] hits = topDocs.scoreDocs;
log.info("totalResults="+ topDocs.totalHits);
int index = getStartIndex(start, maxReturn);
int resultsLength = start * maxReturn;
if (resultsLength > totalLimit) {
resultsLength = totalLimit;
}
log.info("index:"+ index + "==resultsLength:"+ resultsLength);
for (int i = index; i < resultsLength; ++i) {
}
Basically, here is my requirement. If there is an exact match, I need to display the exact match. If there is no exact match, I need to sort the results by the field. So i check the exact match inside the for loop.
But it seems to me that it sorts the results no matter what, so even though there is an exact match, it doesn't show up at the first page.
Thanks.

You set it to Sort on a field value, not on relevance, so there is no guarantee that the best matches will be on the first page. You can sort by Relevance first, then on your field value, like:
Sort sort = new Sort(SortField.FIELD_SCORE, sortField);
If that is what you were looking for.
Otherwise, if you are looking to ignore relevance for anything except a direct match, you could query using a more restrictive (exact matching) query first, getting your exact matches as an entirely separate result set.

Related

Unique values from one column using ActiveAndroid?

I'm trying to get all unique values from a single column from a table. I'm utterly failing, and the docs don't seem to go into enough depth, and what I've gleaned from looking at the source seems like that should help, but doesn't.
List<Question> questions = new Select().from(Question.class).where("ZCLASSLEVEL = ? ", classLevel).execute();
works to get all the columns of all the Questions.
However,
List<Question> questions = new Select(columns).from(Question.class).where("ZCLASSLEVEL = ? ", classLevel).execute();
doesn't return any data (questions.size() = 0), where I've tried
String[] columns = { "ZHRSSECTION" };
and
Select.Column[] columns = { new Select.Column("ZHRSSECTION", "ZHRSSECTION")};
Presumably, throwing .distinct(). after the Select() should return only unique values, but I can't even get just the single column I'm interested in to get returned.
What am I missing here?
Thanks!
randy
You should use groupBy() method to make the use of distinct() method.
For example:
List<Question> questions = new Select()
.distinct()
.from(Question.class)
.groupBy("ZCLASSLEVEL")
.execute();
The above code returns a list of Questions, one for each ZCLASSLEVEL (probably the last row).
Then you can get the unique values of ZCLASSLEVEL as
for(Question question: questions){
int level = question.ZCLASSLEVEL;
//do something with level.
}
A little late in answering this, but the issue is because SQLiteUtils.rawQuery() makes the assumption that the Id column is always going to be in the cursor result. Set your columns String[] to {Id, ZHRSSECTION} and you'll be fine.
List<FooRecord> list = new Select(new String[]{"Id,tagName"}).from(FooRecord.class).execute();

Berkeley DB equivalent of SELECT COUNT(*) All, SELECT COUNT(*) WHERE LIKE "%...%"

I'm looking for Berkeley DB equivalent of
SELECT COUNT All, SELECT COUNT WHERE LIKE "%...%"
I have got 100 records with keys: 1, 2, 3, ... 100.
I have got the following code:
//Key = 1
i=1;
strcpy_s(buf, to_string(i).size()+1, to_string(i).c_str());
key.data = buf;
key.size = to_string(i).size()+1;
key.flags = 0;
data.data = rbuf;
data.size = sizeof(rbuf)+1;
data.flags = 0;
//Cursor
if ((ret = dbp->cursor(dbp, NULL, &dbcp, 0)) != 0) {
dbp->err(dbp, ret, "DB->cursor");
goto err1;
}
//Get
dbcp->get(dbcp, &key, &data_read, DB_SET_RANGE);
db_recno_t cnt;
dbcp->count(dbcp, &cnt, 0);
cout <<"count: "<<cnt<<endl;
Count cnt is always 1 but I expect it calculates all the partial key matches for Key=1: 1, 10, 11, 21, ... 91.
What is wrong in my code/understanding of DB_SET_RANGE ?
Is it possible to get SELECT COUNT WHERE LIKE "%...%" in BDB ?
Also is it possible to get SELECT COUNT All records from the file ?
Thanks
You're expecting Berkeley DB to be way more high-level than it actually is. It doesn't contain anything like what you're asking for. If you want the equivalent of WHERE field LIKE '%1%' you have to make a cursor, read through all the values in the DB, and do the string comparison yourself to pick out the ones that match. That's what an SQL engine actually does to implement your query, and if you're using libdb instead of an SQL engine, it's up to you. If you want it done faster, you can use a secondary index (much like you can create additional indexes for a table in SQL), but you have to provide some code that links the secondary index to the main DB.
DB_SET_RANGE is useful to optimize a very specific case: you're looking for items whose key starts with a specific substring. You can DB_SET_RANGE to find the first matching key, then DB_NEXT your way through the matches, and stop when you get a key that doesn't match. This works only on DB_BTREE databases because it depends on the keys being returned in lexical order.
The count method tells you how many exact duplicate keys there are for the item at the current cursor position.
You can use method DB->stat().
For example, number of unique keys in the BT_TREE.
bool row_amount(DB *db, size_t &amount) {
amount = 0;
if (db==NULL) return false;
DB_BTREE_STAT *sp;
int ret = db->stat(db, NULL, &sp, 0);
if(ret!=0) return false;
amount = (size_t)sp->bt_nkeys;
return true;
}

searching lucene index on multiple fields

I have an index with 2 content fields (analyzed, indexed & stored):
for example: name , hobbies. (The hobbies field can be added multiple times with different values).
I have another field that is only indexed (un_analyzed & not stored) used for filtering:
for example: country_code
Now, I want to build a query that will retrieve documents that match (as best as possible) to some "search" input field but only such documents where country_code has some exact value.
What would be the most suitable combination query syntax / query parser to use to build such a query.
You can use the following query:
country_code:india +(name:search_value OR hobbies:search_value)
Why don't you start with QueryParser, it might work for your use case and it requires the least amount of effort.
It's not clear from your question, but let's assume you have a single input field ('search') and a combobox for the country code. You would then read those values and create a query:
// you don't have to use two parsers, you can do this using one.
QueryParser nameParser = new QueryParser(Version.LUCENE_CURRENT, "name", your_analyzer);
QueryParser hobbiesParser = new QueryParser(Version.LUCENE_CURRENT, "hobbies", your_analyzer);
BooleanQuery q = new BooleanQuery();
q.add(nameParser.parser(query), BooleanClause.Occur.SHOULD);
q.add(hobbiesParser.parser(query), BooleanClause.Occur.SHOULD);BooleanClause.Occur.SHOULD);
/* Filtering by country code can be done using a BooleanQuery
* or a filter, the difference will be how Lucene scores matches.
* For example, using a filter:
*/
Filter countryCodeFilter = new QueryWrapperFilter(new TermQuery(new Term("country_code", )));
//and finally searching:
TopDocs topDocs = searcher.search(q, countryCodeFilter, 10);

How to get reliable docid from Lucene 3.0.3?

I would like to get the int docid of a Document I just added to a Lucene index so that I can stick it into a Filter to update a standing query. My documents have a unique external id, so I thought that doing a TermDocs enumeration on the unique id would return the correct document, like this:
protected int getDocId(IndexReader reader, String idField, Document doc) throws IOException {
String id = doc.get(idField);
TermDocs termDocs = reader.termDocs(new Term(idField, id));
int docid = -1;
while (termDocs.next()) {
docid = termDocs.doc();
Document aDoc = reader.document(docid);
String docIdString = aDoc.get(idField);
System.out.println(docIdString + ": " + docid);
}
return docid;
}
Unfortunately, this loops and loops, returning the same docIdString and increasing docids.
What is the recommended way to get the docids for newly-added documents so that I could use them in a Filter immediately after the documents are commited?
The doc Id of a document is not the same as the value in your id field. The doc ID is an internal Lucene identifier, which you probably shouldn't access. Your field is just a field - you can call it "ID", but Lucene won't do anything special with it.
Why are you trying to manually update the filter? When you commit, merges can happen etc. so the IDs before will not be the same as the IDs afterwards. (Which is just an example of the general point that you shouldn't rely on Lucene IDs for anything.) So you don't need to just add that one document to the filter, you need to update the whole thing.
To update a cached filter, just run a query for "foo" and use your filter with a CachingWrapperFilter.
EDIT: Because your id field is just a field, you do a search for it like you would anything else:
TopDocs results = searcher.Search(new TermQuery(new Term("MyIDField", Id)), 1);
int internalId = results.scoreDocs[0].doc;
However, like I said, I think you want to ignore internal IDs. So I would build a filter from a query:
BooleanQuery filterQuery = new BooleanQuery(); // or get existing query from cache
filterQuery.Add(new TermQuery(new Term("MyIdField", Id)), BooleanClause.Occur.SHOULD);
// add more sub queries for each ID you want in the filter here
Filter myFilter = new CachingWrapperFilter(new QueryWrapperFilter(filterQuery));

Lucene Field Grouping

say i m having fields stud_roll_number and date_leave.
select stud_roll_number,count(*) from some_table where date_leave > some_date group by stud_roll_number;
how to write the same query using Lucene....I tried after querying date_leave > some_date
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = search.doc(scoreDoc.doc);
String value = doc.get(fieldName);
Integer key = mapGrouper.get(value);
if (key == null) {
key = 1;
} else {
key = key+1;
}
mapGrouper.put(value, key);
}
But, I m having huge data set, it takes much time to compute this. Is there any other way to find it???? Thanks in advance...
Your performance bottleneck is almost certainly the I/O it takes to perform the document and field value lookups. What you want to do in this situation is use a FieldCache for the field you want to group by. Once you have a field cache, you can look up the values by Lucene doc ID, which will be fast because all the values are in memory.
Also remember to give your HashMap an initial capacity to avoid array resizing.
There is a very new grouping module, on https://issues.apache.org/jira/browse/LUCENE-1421 as a patch, that will do this.