Data Segregation in Amazon CloudSearch - amazon-cloudsearch

We have a SaaS service that is hosting data from multiple clients. We are trying to use Aamzon's CloudSearch to provide document searching against all the hosted data.
One feature we are trying to make work is the "suggest" feature, where user can type in a few letters, and Amazon CloudSearch will allow user to auto-complete with likely search terms.
However, I would like the suggest feature to only suggest search terms based on one particular client (or tenant, to be technically correct). I don't want the search suggestions to prompt user from tenant A, with keywords that are based on the docuemnts from tenant Z. Is there a way to "segregate" the data in a single Search Domain, so that each client/tenant's data is tagged in a way so that, the "suggest" feature will leverage the data from the matching client/tenant?
(I can use API to filter the search result based on the tenant, but the suggest API doesn't have a way to apply filter)

Related

Kapow Robot - Extract business Operating hours from Google Search Results

Is it possible to create a Kapow Robot that can search Google for the Operating hours of the Businesses from our list/database and update the timings if changes are made?
Please share if there are any other more efficient ways than the KAPOW robot that can be implemented with minimal effort and cost-effectiveness.
That's what the Google Places API is there for. While you could in theory just open Google Maps in a Load Page action, enter the query string and then parse the results, I would advise against it. Here's why:
The API will be faster, returning results in a structured manner (JSON)
Kapow has actions for calling RESTful services and parsing/modifying JSON
Google does not like robots parsing their pages, and most likely will lock you out (i.e. present you with Captchas sooner or later)
If you decide to go for the API, here's what you should do:
Get your API key first, see this page for details: https://developers.google.com/places/web-service/get-api-key. Note that the free plan allows for 1,000 requests within a 24-hours limit (https://developers.google.com/places/web-service/usage)
Maintain the place ids for all the businesses you'd like to query regularly, and update your list.
For each place, retrieve the details as described in the API documentation. The opening hours will be within the JSON response: https://developers.google.com/places/web-service/details
Update your list. I'd recommend using a definite type in Kapow for that, and using the actions Store in Database and Query Database. In case you need the data elsewhere, you may create additional robots (e.g. for Excel files, sending data per email, et cetera).

API for searching for specific places by proximity

I would like to return specific places (stores like Target, Macy's, etc.) by location (latitude, longitude).
I have been using the google places api and entering the different stores in the name parameter. The results are inconsistent at best.
Is this the api I should be using to return specific stores by name and proximity? The google places api near by search has only a single name parameter. I would ideally like to search for several specific stores in a single request to the endpoint for performance reasons.
In conclusion
What api should I be using to return specific stores by name and proximity?
The Google Places API is the correct Google API to use; however, as you mentioned it does not support multiple name or keyword parameters. There is an active Places API - Feature Request for this here, please star it if you wish to see it resolved and to be notified of future changes.
Instead of using the name parameter, try using the keyword parameter. The keyword parameter is matched against all available fields, including but not limited to name, type, and address, as well as customer reviews and other third-party content. This can often yield more or better results.
It seems as if the foursquare api does let you search for multiple specific places.
Here is an example url that does the trick:
https://api.foursquare.com/v2/venues/search?ll=34.017717,-118.159335&query=Target Victoria's Secret Macy's &intent=browse&radius=16094&oauth_token=mytoken
g
So I'm abandoning Google Places and going with FourSquare.

Filtering Foursquare Venue Results

I am currently evaluating several different APIs in order to get venue information. A key component of any provider is the ability to not just return all venues nearby but tailor the list based on previously entered user preferences.
Foursquare does not allow 'munging' their venue data with other data, like Google's places to create an aggregated service. But can I take Foursquare's venues for a given area, apply some filtering based on user preferences and recommendation engine techniques, and present a modified, personalized version of their information? Do they frown on only using their venue info as a jumping off point, even if attribution on the final results is given?
This customization would be above and beyond using retailer categories, something that can be included in the facebook request. Asking because other services require results presented exactly as returned from the API, including ads.
First, check out the policies at https://developer.foursquare.com/overview/community
We welcome you to use foursquare as your location database. You can associate additional content with our venue data in your system, but you may not combine our database with another database or export it on your own.
I think that they even encourage you to manipulate the data and create creative solutions with it, as long as you do not break their ground rule of not merging it with another database (see the full text at the link).
The API even lets you filter the results according to your needs with the categoryId and intent parameters. For example in our app, we filter out places that have less than 2 unique people checked in, because we assume its faked places.. we do other filtering on the result set as well, but we display only data from from foursquare venues database, and we attribute.

Amazon API search results vs. Amazon.com search results

For our web app, which will use Amazon's API as a basis for some of the site's main interactions, we required the ability to do a generalized search of Amazon's products and return results based on relevancy. The expectation was that their API would work exactly like their actual site's search.
Unfortunately it does not. For instance, querying "joy of cooking" does not return a link to the famous cook book, but to some food processor. Contrarily, on the actual site, one would see the book isn't just first, but it and any derivations occupy the top 5 or so results.
Is there a way of getting this level of relevancy search from Amazon's API without specifying a node to browse through? We need to be able to search everything at once, and the API seems very limited on parameter sets.
The answer is that, if you use "All" as your sorting basis, rather than "Blended", you will get results that are inline with Amazon's own product search. Older docs don't seem to account for this discrepency, but testing both methods has shown "All" to be the preferred product sorting method.
http://docs.amazonwebservices.com/AWSECommerceService/2010-11-01/DG/
Pagesearch under "SearchIndex: All"
You don't get any item sorting options with this method, but if all you want is "most relevant" results, this is the preferred method.

Programmatic Querying of Google and Other Search Engines With Domain and Keywords

I'm trying to find out if there is a programmatic way to determine how far down in a search engine's search results my site shows up for given keywords. For example, my query would provide my domain name, and keywords, and the result would return a say 94 indicating that my site was the 94th result. I'm specifically interested in how to do this with google but also interested in Bing and Yahoo.
No.
There is no programmatic access to such data. People generally roll out their own version of such trackers. Get the Google search page and use regexes to find your position. But now different results are show in different geographies and results are personalize.
gl=us parameter will help you getting results from US, you can change geography accordingly to get the results.
Before creating this from scratch, you may want to save yourself some time (and money) by using a service that does exactly that [and more]: Ginzametrics.
They have a free plan (so you can test if it fits your requirements and check if it's really worth creating your own tool), an API and can even import data from Google Analytics.