Solr Range Facets dynamically modifying the ranges based on the search query - apache

So here is what I am trying to do.
I have a facet defined on the price field.
1) When there is a query for "wordA"
I want the facets to be from 0-1000 divided in 5 intervals since my maximum price for "wordA" will never exceed 1000
2) When there is a query for "wordB"
I want the facets to be from 0-50 divided in 5 intervals since my maximum price for the query "wordB" will never exceed 50.
So basically I want the facet range to change dynamically so that I don't end up with a range of 0-1000 for a query "wordB" where all the hits will lie in the 1st range.
If solr does not support this then it will be a lot of query post processing involved to modify the ranges based on the returned results.

Solr does not support that, but you can do it yourself by doing two queries - first get the stats (max, min) for the field, then submit a query for the facet ranges (intervals) that are suitable for your application.

Related

Elasticsearch Lucene query to search for text and numbers

I am using Kibana version v7.13.2.
I am trying to search for messages carrying the phrase "amount":"3000","taxRate"
where the amount value ranges from 3000 to 3999.
I am using the following Lucene Query:
message:amount /3[0-9]{3}/ taxRate
The query works and fetches the following:
"amount":"3100","taxRate"
"amount":"200","taxRate"
"amount":"3983","taxRate"
"amount":"3100"
"taxRate"
"amount"
"3100","taxRate"
"amount":"3100","trxFee":"90","taxRate"
Means the query is working with OR condition. It fetches the messages if there is word amount OR the number ranging from [3000-3999] OR the word "taxRate".
I want to modify the query to search for messages where ALL THREE PATTERNS exist together IN THAT SPECIFIC ORDER, i.e. first amount second amount value and finally the word taxRate
So with the correct query, the following resultSet would appear only:
"amount":"3100","taxRate"
"amount":"3983","taxRate"
Kindly help.

BigQuery Google Analytics sessionsWithEvent metric

I'm having trouble creating a BigQuery query that will allow for me to fetch the Google Analytics ga:sessionsWithEvent metric.
This is what I tried:
SELECT
EXACT_COUNT_DISTINCT(concat(fullvisitorid, string(visitid))) AS distinctVisitIds
FROM
(TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2016-11-30'), TIMESTAMP('2016-12-26')))
WHERE
hits.type='EVENT'
The logic in the query above seems sound - get all the rows that have a hit.type of 'EVENT' and sum up the exact count of distinct fullVisitorId/VisitId results - aka. the number of unique sessions with an event.
But the numbers I get from here are close but higher than what I get using query explorer
Thank you.
EDIT: Addressing comment below to use wider date range with date filter
With date range +-5 days, this makes the query
SELECT
EXACT_COUNT_DISTINCT(concat(fullvisitorid, string(visitid))) AS distinctVisitIds
FROM
(TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2016-11-25'), TIMESTAMP('2016-12-31')))
WHERE
hits.type='EVENT'
AND ('20161130'<=date AND date<='20161226')
Unfortunately I still get the same number
Don't rely on the table dates, usually even on later days you can have metrics from previous days. Instead use a larger date range on from and exact date range on columns.
AFAIK also the data explorer does approximations.

Dynamically filtering large query result for presentation in SSRS

We have a system that records data to an SQL Server DB captured from field equipment every minute. This data is used for a number of purposes, one of which is for charting in reports via SSRS.
The issue is that with such a high volume of data, when a report is run for period of for example 3 months, the volume of data returned obviously causes excessive report rendering times.
I've been thinking of finding a way of dynamically reducing the amount of data returned, based on the start and end time periods chosen. Something along the lines of a sliding scale where from the duration between the start and end period, I can apply different levels of filtering so that where larger periods are chosen, more filtering occurs while for smaller periods less or no filtering occurs.
There is still a need to be able to produce higher resolution (as in more data points returned) reports for troubleshooting purposes.
For example:
Scenario 1:
User is executing a report for a period of 3 months. Result set returned by the query is reduced for performance reasons without adversely affecting what information the user wants to see (the chart is still representative of the changes over time).
Scenario 2:
User executes the report for a period of 1 hour, in order to look for potential indicator(s) of problems with field devices while troubleshooting the system. For this short time period, no filtering is applied.
My first thought was to use a modulo operation on the primary key of the data (which is an identity field), whereby the divisor is chosen depending on the difference between the start and end dates.
For example, something like if the difference between the start and end dates for the report execution period is 5 weeks, choose a divisor of 5 and apply a mod to the PK, selecting where the result is equal to zero.
I would love to get feedback as to whether this sounds like a valid approach or whether there is a better way to do this.
Thanks.

MongoDB infinite scroll sorted results

I am having a problem trying to achieve the following:
I'd like to have a page with 'infinite' scrolling functionality and all the results fetched to be sorted by certain attributes. The way the code currently works is, it places the query, sorts the results, and displays them. The problem is, that once the user reaches the bottom of the page and new query is placed, the results from this query are sorted, but in its own context. That is, if you have a total of 100 results, and the first query display only 50, then they are sorted. But the next query (for the next 50) sorts the results only based on these 50 results, not based on the 100 (total results).
So, do I have to fetch all the results at once, sort them, and then apply some pagination logic to them or there's a way for MongoDB to actually have infinite scrolling (AJAX requests) with sorting applying to the results?
There's a few ways to do this with MongoDB. You can use the .skip() and .limit() commands (documented here: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-CursorMethods) to apply pagination to the query.
Alternatively, you could add a clause to your query like: {sorted_field : {$gt : <value from last record>}}. In other words, filter out matches of the query whose sorted value is less than that of the last resulting item from the current page of results. For example, if page 1 of results returns documents A through D, then to retrieve the next page 2 you repeat the same query with the additional filter x > D.
Let me preface this by saying that I have no experience with MongoDB (though I am aware that it is a NoSQL database).
This question, however, is somewhat of a general database one (you'd probably get more responses tagging it as such). I've implemented such a feature using Cassandra (another, albiet quite different NoSQL database), however the same principles apply.
Use the sorted-by attribute of the last retrieved record, and conduct a range search based on it in the database. So, assuming your database consists of the following set of letters:
A
B
C
D
E
F
G
..and you were retrieving 2 letters at a time, you'd retrieve A, B first. When more records are needed, you'd use B to conduct a range search on the set of letters in the database. In plain English this would be something like:
Get the letters that appear after B, limit the results to 2
From a brief look at the MongoDB tutorial, it looks like you have conditional operators to help you implement this.

How to get total match count in Solr/lucene

I have a problem that i want to get total count of matched text in solr.
but when i want to perform search using solr i have to set max rows parameter. can anybody explain how i could get the total matched count using solr efficiently?
You can get the total result count, independently from max rows defined, through the numFound attribute in the Solr response.