I have Apache Solr 6.x-1.0-rc3 module installed in my site and it works fine. I wanted to know how are the Apache Solr search results sequenced. I have tried a few things and have concluded that it's not alphabetical or according to the recently updated node.
How are the search results sequenced? I mean in what order or logic.
Basically, it's scored based on the strength of the match with a few boosting factors added. There's a great breakdown on the algorithm here:
http://www.supermind.org/blog/378/lucene-scoring-for-dummies
Yeah, Solr results are sorted by relevance score by default. The fields within a single result should be in the same order in which they were committed (at least within multi-valued fields...should be true for all fields, though), so you should be able to preserve some field sorting that you might do in the processing script. If you just show all results, I think the results are ordered by the order they were added to the index.
The Solr Wiki page (http://wiki.apache.org/solr/CommonQueryParameters) says the default sort order is score desc.
Related
Let's say I have Apache Solr index, with posts and comments. They are connected via post_id. How Can I query for MoreLikeThis posts with more than 1 comment?
You've to customize apache solr search results or simply write a hook / function for search results which will decrease ranking of result. Some idea is presented here and here
I am implementing search engine using Apache Solr. I want to improve results on the basis of most frequent searches. For example: Consider my index has 5 wordsDown 99 Drawn 46 Dark 86 Dull 75 Dirty 63
The numbers shows that how many times users searcded a particular word.
I want if a next user comes it and type D the response should be in descending order of previously searched and should be in order DownDarkDullDirtyDrawn
The results will change from time to time as word searched frequency will change after every search.. How can I implement this in Solr... Any help in this will help me a lot. Thanking you in anticipation
Regards A.S.Danyal
As vinod writes, you'll have to keep track of actual searches yourself - there is nothing built-in to Solr to handle this for you. However, when you DO have the search statistics available, you can implement the feature by having a separate collection / core with searches and their popularity that you search against. Each document would be a search term and the frequency of how often that document is searched, i.e. document: search, search_count.
You can also use a logarithmic function to use the score of a search_count to affect the score of the search terms, for example if you have more than just the search as a field to influence the score (such as active category, etc.).
Depending on search volume, you probably don't need to update these values after each single search - just updating it once a day or every other hour will usually be good enough. Keep track of the terms that have changed in search volume since the last update, and update those documents in a batch job in certain intervals.
Solr doesn't provide this kind of feature.
One way to achieve this is by using logs,
you will need to have an index of search terms entered. This can be built by mining your search logs.
I crawled with nutch 3 domains (domain01, domain02 and domain03).
I want to get all posts which contains specific keyword (ex. "champions league"), and than in results first show the posts from domain02, next posts from domain01 and last posts from domain03. simply i want to sort them in priority by domain
If there is a way to set priority of domains ?
If you always have the same order of domains, then you can use either index time document level boost or query time sort by domain (or domainorder) then by score.
If the domain order depends on the query, you can use QueryElevationComponent, though I think you have to provide full list of IDs then for each elevation rule and it may not support sequence.
You could also write your own Custom Function Query or component (similar to Query Elevation one).
I plan to build something like pricegrabber.com/google product search.
Assume I already have the data available in a huge table. I plan to submit this all to Solr. This solves the problem of search. However I am not sure how to do comparison. I can do a group by query(on UPC/SKU) for the products returned by Solr on the DB. However, I dont want to do that. I want to somehow get product comparison data returned to me along with search from Solr itself.
How do you think should my schema be? Do you think this use-case can be solved all by Solr/Sphinx?
You need 'result grouping' or 'field collapsing' support to properly handle it.
In Solr, the feature is not available in any release version and is still under development. If you are willing to use an unreleased version of Solr, then get the details here.
Sphinx supports result grouping and I had used it a long time ago in a similar project. You can get more details here.
An alternative strategy could be to preprocess your data so that only a single record per UPC/SKU gets inserted in the index. Each record can have a separate field containing the ids of all the items with the same UPC/SKU.
Doing a database GROUP BY on the products returned by Solr may not be enough. For example, if products A and B have the same UPC and a certain query matches A but not B, then you will not get both A and B in your result set.
I have a StackOverflow-like system where content is organised into threads, each thread having content of its own (the question body / text), and posts / replies.
I'm producing the ability to search this content via Lucene, and if possible I have decided I would like to index individual posts, (it makes the index easier to update, and means I have more control and ability to tweak the results), rather than index entire threads. The problem I have however is that I want the search to display a list of threads, rather than a list of posts.
How can I get Lucene to return only unique threads as results, while also searching the content of the posts?
Each document can have a "threadId" field. After running a search, you can loop through your result set and return all the unique threadId's.
The tricky part is specifying how many results you want to return. If you want to show say, 10 results on your results page, you'll probably need Lucene to return 10 + m results, since a certain percentage of the return set will be de-duped out, because they are posts belonging to the same thread. You'll need to incorporate some extra logic that will run another Lucene search if the deduped set is < 10.
This is what the Nutch project does when collapsing multiple search results that belong to the same domain.
When you index the threads, you should break each thread into postings and make each post a Document with a field containing a unique id identifying the thread to which it belongs.
When you do the search implementation, I would recommend using lucene 2.9 or later, which enables you to use a Collector. Collectors lets you preprocess the retrieved documents and thereby you'll be able to group together posts that originate from the same thread-id.
Just for completenes, latest Lucene versions (from 3.2 onwards) support a grouping API that is very useful for this kind of use-cases:
http://lucene.apache.org/java/3_2_0/api/contrib-grouping/org/apache/lucene/search/grouping/package-summary.html