Apache SOLR search by category - apache

I am using apache-solr-1.4.1 and jdk1.6.0_14.
I have the following scenario.
I have 3 categories of data indexed in SOLR i.e. CITIES, STATES, COUNTRIES.
When I query data from SOLR I need the search result from SOLR based on the following criteria:
In a single query to SOLR I need data fetched from SOLR grouped by each category with a predefined results count for each category.
How can I specify this condition in SOLR?
I have tried to use SOLR Field Collapsing feature, but I am not able to get the desired output from SOLR.
Please suggest.

My solution is not exactly what you have asked but is my take on what SOLR does best, which is full text search. Instead of grouping the results by "category", I'd suggest you order the results by relevance score but also provide a facet count for the category values. In my experience users expect a "search" to behave like Google, with the best matches at the top. Deviating form this norm confuses the user in most cases.
If you want exactly as you have asked (actual results grouped by category) then you could use a relational database and do a group_by or write a custom function query with SOLR (I cannot advise on this as I've never done it).
More info: index the data with the appropriate fields, e.g. name, population, etc. But also add a field called "category", which would have a value of either CITIES, STATES or COUNTRIES. Then perform a standard SOLR search, which will return results in order of relevance - i.e. best matches at the top. As part of the request, you can specify a facet.field=category, which will return counts for the search results for each of the given categories (in the "facet" results section). In the UI you can then create links for each category facet which performs the original search plus &fq=category:CITIES, etc., thus restricting results to just that category. See the facetting overview on the SOLR wiki for more info.

Related

How to re-rank documents based on their attributes rather than just their field relevance?

I'm trying to use Solr to re-rank document results based relevance to the user searching. For example, if I search joann*this could return documents where the Name field is anything from joanna to joanne. What I'm trying to do is to return documents that match on certain attributes that I have as well-- this could be something like us both having the field Location = "NYC".
So my question is two fold- is there a way to grab and handle a users information when they are making a query and also is there a way to re-rank based on these additional field values? Would this look more like writing some code or just an expanded query?
it looks to me like you are talking about functionality that Query Reranking exactly provides. Did you check that out?

Build a Kibana Histogram with buckets dynamically created by ElasticSearch terms aggregation

I want to be able to combine the functionality of the Kibana Terms Graph (be able to create buckets based on uniqueness of values from a particular attribute) and Histogram Graph (separate data into buckets based on queries and then illustrate the date based on time).
Overall, I want to create a Histogram, but I only want to create the Histogram based on the results of one query, not multiple queries like it's being done in the Kibana demo app. Instead, I want each bucket to be dynamically created per unique value of my particular field. For example, consider the following data returned by my query:
{"myValueType": "New York"}
{"myValueType": "New York"}
{"myValueType": "New York"}
{"myValueType": "San Francisco"}
{"myValueType": "San Francisco"}
Also assume that each record has a timestamp field for separating histogram data by date. For that particular date, I want the data to be communicated as a count of 3 into the New York bucket and a count of 2 into the San Francisco bucket. However, I am only able to show a count of 5 for my one linked query. When I configure the Histogram, I am able to specify a field to use for my timestamp, but not to create buckets from. I could've sent a field to compute a total/min/max/mean, but this field would've had to be numeric, so that is not the solution either.
If I were to use a Term Graph to create a pie or bar graph, I am indeed able to separate my data into buckets based on the unique values of my specified field (in this case, "myValueType"), but this would total up the data for all-time, not split up the data by timestamp. Although this is good information to know, it is not ideal because I wouldn't be able to detect trends in my data.
I am looking for a solution that will do one of the following:
Let me dynamically create queries in my Kibana dash board to create "buckets" in a Histogram
Allow me to run an ElasticSearch Terms Aggregation to supposidly split up my data into buckets based on "myValueType" and integrate these results into my Histogram
Customize the JSON of my dashboard, but this doesn't look possible to me
Create my own custom panel, but this is not desirable
Link a Kibana "TopN" query in Kibana. Actually, this has proven to be a work-around for my problem because the TopN query dynamically created one query per unique value/term from the specified fieldName. However, the problem is that I can only link one colour to this TopN query and each unique term will be placed in a bucket that uses a different shade of the colour. Ideally, every bucket in my Histogram will have a completely different colour associated to it. Imagine how difficult it will be to distinguish unique terms as the number of buckets grows.
If all else fails, I make one query per unique value from my search field. This will allow me to have one unique colour per bucket, but as the number of unique terms in the "myValueType" field changes, I need to keep adding/removing queries from Kibana, which can get quite messy.
I'm sure there is someting that I am missing here. Please help me out. Many thanks.
A highly related SOF question: Is it Possible to Use Histogram Facet or Its Curl Response in Kibana
This would be a great feature. It looks like it will be supported in Kibana4, but there doesn't seem to be much more info out there than that.
For reference: https://github.com/elasticsearch/kibana/issues/1249
Maybe a little late but it is actually possible in the newest BETA release.
kibana 4 beta 3 installation download

Solr/Lucene result field term count

I am using solr to do a search. As result I get back a set of fields. One of the fields is "domains". The domain field is a many to many relationship in my database, so my docs contain an array of "domains" the are linked to.
What I want to do is, for each domain in the resultset, count how many times this "domain term" is found in the global result set.
How should I do this ?
You need to look at the Field collapsing feature.

Solr: Search in multiple fields BUT STOP if documents match was found

I want to search in multiple fields in Solr.
(In know the concept of the copy-fields and I know the (e)dismax search handler.)
So I have an orderd list of fields, I want the terms to be searched against.
1.) SKU
2.) Name
3.) Description
4.) Summary
and so on.
Now, when the query matches a term, let's say in the SKU field, I want this match and no further searches in the proceeding fields.
Only, if there are NO matches at all in the first field (SKU field), the second field (in this case "name") should be used and so on.
Is this possible with Solr?
Do I have to implement my own Lucene Search Handler for this?
Any advice is welcome!
Thank you,
Bernhard
I think your case requires executing 4 different searches. If you implement you very own SearchHandler you could avoid penalty of search result accumulation in 4 different request. Which means, you would send one query, and custom SearchHandler would execute 4 searches and prepare one result set.
If my guess is right you want to rank the results based on the order of the fields. If so then you can just use standard query like
q=sku:(query)^4 OR name:(query)^3 OR description:(query)^2 OR summary:(query)
this will rank the results by the order of the fields.
Hope is helps.

Show hitted documents in the same series together in Lucene

The are some articles are written in several parts,
for example, I got those articles from IBM developer works:
Distributed data processing with
Hadoop, Part 1:Getting started
Distributed data processing with
Hadoop, Part 2:Going further
Distributed data processing with
Hadoop, Part 3: Application
development
I will index those three articles separately. And some one search certain keywords, it is possible the part3 is on the top of hit whle part1 is on the 32th. Therefor, if I list results page by page, the part1 and part3 will display on different page.
How can I make sure the hitted documents in the same series displayed together?
I guess in SQL, we can use "group by".
I believe what you are asking for is Field Collapsing, which is currently a trunk feature in Solr, and will be incorporated into the next Solr version.
If you want to roll your own, One possible way to do this is:
Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
Make an initial query to Lucene, and get a hit list.
For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.
An alternative is to store the ids of all the series members in a field inside each member's document.