How to get the most searched words in Solr? [duplicate] - apache

I'm trying to organize a solr search engine. I've already set up the misspelling system and the suggestions.
However I can't seem to find how to retrieve the top 10 most searched words/terms/keywords in solr/lucene. How can I get this? I want to display those on my homepage.

Solr does not provide this kind of feature out of the box. There is the StatsComponent, that provides you with all kind of statistics, but all of those are numeric only.
Depending on how you access solr (directly or via your own app) you could intercept all calls an log the query string. I did this in a recent project where I logged a queries to a database. If you submit all keywords to an other core on your solr server, you can faceting queries on your search terms as described by Hyque

You could use a facet for retrieving the Top X words like this:
http://yourservergoeshere/solr/select?q=*&wt=xml&indent=true&facet=true&facet.query=*&facet.field=message&facet.limit=10&facet.minCount=1
The value of facet.field depends on the field you like to search in. With facet.limit you'll (obviously) limit the amount of results to 10. You'll find the facet results at the end of the results, starting with "facet_counts"
Edit: I really should go to bed earlier. I didn't see the "most searched" in your question. Sorry for that.

Apache Solr does not provide any such capability as of today. There is a desire for this and a JIRA ticket corresponding to it. You can vote for it if you'd like to see it in Solr some day: https://issues.apache.org/jira/browse/SOLR-10359.
The stats component provides information around statistics, but it's mostly numeric in nature. You could parse server logs and come up with a way to build a Frequently Searched Terms (e.g. pump those logs in SiLK or Kibana for visualization).
If you have the ability to change the front end and add some javascript code to the UI or can intercept the search request and make an async or batch calls to APIs for tracking, you can use SearchStax Analytics that provides Search Analytics that tracks searches, clicks, cart actions, revenue, etc.

Related

How to fetch results from an offset when the API doesn't support offset (HERE Maps API)

I have a search functionality that gets data from HERE API's Search endpoint. I maintain records of each search's results so I can add metadata that I need for my own purposes and also so I can provide results without always going back to HERE API. The problem I have is with paginating, specifically with providing a starting index when fetching results from HERE. Similar to how Algolia does it, I want to be able to search for a term and begin with the results at a certain index, the offset. HERE API apparently doesn't allow this at all. The closest it comes to such a feature is that it provides the URL for the next search, as described here. This is limited because it doesn't allow me to start the search results at a particular index that I specify. So essentially I want to know if there's a "standard" way of getting such functionality even when it's not provided by the API.
My own solution
The HERE API provides a size parameter that allows specifying the total number of results that I want, so I can specify a larger size than I need, and basically use code to start the results from my desired index. But this feels a bit hacky, and I wonder if there's a better/more established way of doing this.
Happy to listen to any ideas! Thanks. :)
Such a kind of an 'offset' for starting the paging after a specific number of results is indeed not supported by the Places API itself.
You have to set up a workaround within your application.

Kapow Robot - Extract business Operating hours from Google Search Results

Is it possible to create a Kapow Robot that can search Google for the Operating hours of the Businesses from our list/database and update the timings if changes are made?
Please share if there are any other more efficient ways than the KAPOW robot that can be implemented with minimal effort and cost-effectiveness.
That's what the Google Places API is there for. While you could in theory just open Google Maps in a Load Page action, enter the query string and then parse the results, I would advise against it. Here's why:
The API will be faster, returning results in a structured manner (JSON)
Kapow has actions for calling RESTful services and parsing/modifying JSON
Google does not like robots parsing their pages, and most likely will lock you out (i.e. present you with Captchas sooner or later)
If you decide to go for the API, here's what you should do:
Get your API key first, see this page for details: https://developers.google.com/places/web-service/get-api-key. Note that the free plan allows for 1,000 requests within a 24-hours limit (https://developers.google.com/places/web-service/usage)
Maintain the place ids for all the businesses you'd like to query regularly, and update your list.
For each place, retrieve the details as described in the API documentation. The opening hours will be within the JSON response: https://developers.google.com/places/web-service/details
Update your list. I'd recommend using a definite type in Kapow for that, and using the actions Store in Database and Query Database. In case you need the data elsewhere, you may create additional robots (e.g. for Excel files, sending data per email, et cetera).

Amazon API search results vs. Amazon.com search results

For our web app, which will use Amazon's API as a basis for some of the site's main interactions, we required the ability to do a generalized search of Amazon's products and return results based on relevancy. The expectation was that their API would work exactly like their actual site's search.
Unfortunately it does not. For instance, querying "joy of cooking" does not return a link to the famous cook book, but to some food processor. Contrarily, on the actual site, one would see the book isn't just first, but it and any derivations occupy the top 5 or so results.
Is there a way of getting this level of relevancy search from Amazon's API without specifying a node to browse through? We need to be able to search everything at once, and the API seems very limited on parameter sets.
The answer is that, if you use "All" as your sorting basis, rather than "Blended", you will get results that are inline with Amazon's own product search. Older docs don't seem to account for this discrepency, but testing both methods has shown "All" to be the preferred product sorting method.
http://docs.amazonwebservices.com/AWSECommerceService/2010-11-01/DG/
Pagesearch under "SearchIndex: All"
You don't get any item sorting options with this method, but if all you want is "most relevant" results, this is the preferred method.

Programmatic Querying of Google and Other Search Engines With Domain and Keywords

I'm trying to find out if there is a programmatic way to determine how far down in a search engine's search results my site shows up for given keywords. For example, my query would provide my domain name, and keywords, and the result would return a say 94 indicating that my site was the 94th result. I'm specifically interested in how to do this with google but also interested in Bing and Yahoo.
No.
There is no programmatic access to such data. People generally roll out their own version of such trackers. Get the Google search page and use regexes to find your position. But now different results are show in different geographies and results are personalize.
gl=us parameter will help you getting results from US, you can change geography accordingly to get the results.
Before creating this from scratch, you may want to save yourself some time (and money) by using a service that does exactly that [and more]: Ginzametrics.
They have a free plan (so you can test if it fits your requirements and check if it's really worth creating your own tool), an API and can even import data from Google Analytics.

Google Analytics retrieve custom variables statistics

Edit refurbished the question that was not clear
New to GA, I'm looking at the way to retrieve automatically custom variables data statistics
The query would have
a start and an end dates (possibly equal)
a variable name
For instance, a Page-level variable Brand takes only three possible values, that are set by the web server, and seen by the client.
The values are Apple, Google and Microsoft.
The query to Google-Analytics could be something like (pseudo-code), provided that I use an authentication token previously acquired
...getstatistics?myToken=123&variable=Brand&datefrom=20110121&dateto=20110121
And the result could be some xml like data
<variable>Brand</variable><value>Apple</value><count>3214</count>
<variable>Brand</variable><value>Google</value><count>4321</count>
<variable>Brand</variable><value>Microsoft</value><count>1345</count>
Meaning for instance that the page-level custom variable Brand was set to the value Apple by the web server (and thus seen by the client / sent to GA) 3214 times.
What is the correct way/protocol to query values/statistics from GA, in order to get statistics related to custom variables?
So, this is my understanding of what you're doing:
You're setting page-level custom variables (important technical note: these need to be called before the _trackPageview or some other call, else they won't be tracked.)
Your code might looks something like this:
_gaq.push(['_setCustomVar', 2, 'Brand', 3]);
Now, when querying the Google Analytics API, its important to note that the slot # is very important, since the slot you're accessing is explicitly named in the query.
So, to do this, you'd need to set your dimensions to ga:customVarName2 and ga:customVarValue2, and decide what metric you're interesting it getting. You mention Page views, so you'd use ga:pageviews. (You're by no means limited to pageviews. You can use any Metric besides a couple of the AdWords specific ones.)
This query would return you all of the custom variable from this slot, and the number of pageviews associated with them.
You also mentioned you'd want to be able to filter by value.
You'd do that by setting the filter value to something like ga:customVarValue2==Apple.
You can see what a query like that would look like here in the query explorer.
Here's a sample screenshot:
Finally, all Google Analytics API queries by default require you to set a date range, so you could query that on your own.
All you need to do is decide which library you want to use as interface, and you're set to go.
Google has a handy resource, called the Google Analytics Data Explorer that can help answer a lot of your questions by letting you experiment through an interface, as long as you login with your Google Analytics credentials.
As you add parameters using their tools, the system will automatically build your URL/Query.
If that's not enough, Google also has some Interactive Examples using JavaScript. Like the Data Explorer, you can also login with your Google Analytics credentials and run the examples to see what data would be returned.
These tools are awesome because they help take the guesswork out of figuring out how to target the exact data you're searching for.