filter lucene search based on a particular filed - lucene

I want to return all matched documents found after a document with a certain value. The value is unique.
I have tried to use numericfilterrange. Thisis not a good solution as the field values may be in any orders

Using a numeric range is the correct way to accplish what you want, if I understand what you need. In order to sort on the same field, you'll need to pass a Sort argument to your search call, something like:
Sort sort = new Sort(new SortField("myNumericField", SortField.Type.INT));
searcher.search(query, maxDocs, sort)

Related

Restrict Google custom search by date range but keep default sorting?

I want to run a google custom search and apply a date range restriction. I can do this using the "sort" attribute with something like "sort=date:r:20160101:20170101" but this seems to not only restrict the data by date, but it also applies a sort by date (which I don't want).
The docs state that you can apply multiple sorting attributes (comma separated) so I'd like to sort by the default sorting option first then apply the date range filter as a second "sort". I'm hoping this will achieve what I want.
Does anyone know what the default sorting option is or how I can apply a date range filter without affecting the result set's ordering?
I've been looking at these docs:
https://developers.google.com/custom-search/docs/structured_search#sort_by_attribute
https://developers.google.com/custom-search/docs/structured_search#restrict_to_range
It akways defaults to relevance. go to setup, search features, advanved, and check off sort by date

Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting

I am going to switch to newest (4.10.2) version of Lucene and I'd like to make some optimization in my index and code.
I would like to use DocValuesField to get values but also for filtering and sorting.
So here I have some questions:
If I'd like to use range filter (FieldCacheRangeFilter) I need to store a value in XxxDocValuesField,
but if i want to use terms filter (FieldCacheTermsFilter) I need to store a value in SortedDocValuesField.
So it looks like if I want to use range and terms filters I need to have two different fields. Am I right? Am I using it correctly?
Another thing is Sort. I can choose between SortedNumericSortField and SortField. First one requires SortedNumericDocValues, another NumericDocValuesField. Is there any(big) difference in performance?
Should I use SortedNumericSortField (adding another field to the index)?
And the last one. Am I right that all corresponding DocValuesField will be removed from index when doc is removed? I saw an IndexWriter method for an update doc value but no delete method for doc value.
Regards
Piotr

Any way to use strings as the scores in a Redis sorted set (zset)?

Or maybe the question should be: What's the best way to represent a string as a number, such that sorting their numeric representations would give the same result as if sorted as strings? I devised a way that could sort up to 9 characters per string, but it seems like there should be a much better way.
In advance, I don't think using Redis's lexicographical commands will work. (See the following example.)
Example: Suppose I want to presort all of the names linked to some ID so that I can use ZINTERSTORE to quickly get an ordered list of IDs based on their names (without using redis' SORT command). Ideally I would have the IDs as the zset's members, and the numeric representation of each name would be the zset's scores.
Does that make sense? Or am I going about it wrong?
You're trying to use an order preserving hash function to generate a score for each id. While it appears you've written one, you've already found out that the score's range allows you to use only the first 9 characters (it would be interesting to see your function btw).
Instead of this approach, here's a simpler one that would be easier IMO - use set members of the form <name>:<id> and set the score to 0. You'll be able to use lexicographical ordering this way and use something like split(':') to get the id from the set's members.

Lucene not giving results when specifying field

I have a database which I have indexed in Lucene (using Pylucene) by section (specified by markup in the document) using lucene's fields. This index seems to work fine. I can search it using the default field which is simply the entire document and get reasonable results.
The problem is, when I search it using a specific section (not the default), I expect to get a certain number of results back (as specified by IndexSearcher.search(query, results)), but instead it might simply return nothing. So my question is: how can I get it to return a ranked list with the number of results I specify?
The only place I specify the field is in the QueryParser, by calling:
QueryParser(Version.LUCENE_CURRENT, field, StandardAnalyzer)
I would verify the index using Luke (which is something I do often when modifying my index strategy).

How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?

I've found how to sort query results by a given field in a Lucene.Net index instead of by score; all it takes is a field that is indexed but not tokenized. However, what I haven't been able to figure out is how to sort that field while ignoring stop words such as "a" and "the", so that the following book titles, for example, would sort in ascending order like so:
The Cat in the Hat
Horton Hears a Who
Is such a thing possible, and if yes, how?
I'm using Lucene.Net 2.3.1.2.
I wrap the results returned by Lucene into my own collection of custom objects. Then I can populate it with extra info/context information (and use things like the highlighter class to pull out a snippet of the matches), plus add paging. If you took a similar route you could create a "result" class/object, add something like a SortBy property and grab whatever field you wanted to sort by, strip out any stop words, then save it in this property. Now just sort the collection based on that property instead.
When you create your index, create a field that only contains the words you wish to sort on, then when retrieving, sort on that field but display the full title.
It's been a while since I used Lucene but my guess would be to add an extra field for sorting and storing the value in there with the stop words already stripped. You can probably use the same analyzers to generate this value.
There seems to be a catch-22 in that you must tokenize a field with an analyzer in order to strip punctuation and stop words, but you can't sort on tokenized fields. How then to strip the stop words without tokenizing?
For search, I found search lucene .net index with sort option link interesting to solve ur problem