Cloudsearch suggester - amazon-cloudsearch

I recently started working with cloudsearch. When I came across term "Suggester", it confused me. I am not getting difference between a Suggester and prefix search. Aren't those two same? If not can someone explain me what is the difference?
Thank you in advance.

The /suggest API and prefix search are similar, in the sense that they both perform prefix queries. But there are some key differences with suggestions to be aware of:
Limited to matches in a single field
Prefix matches only
Dedicated API
Compact response body (only returns the matched field, score, and document ID)
I'm guessing that the suggest API was thrown together with a limited feature set just to make it easy to provide search-as-you-type suggestions. In my experience, the big downside to this API is that you are relying on users beginning their query with the exact word that your field begins with.
Here's an example from my company to help illustrate the issue. Let's say you have 5 documents with the word "soap" in the title, but at different positions. Only the document that begins with "soap" would be returned as a match.
luxury bath soap
foaming hand soap
soap dispenser <--- (only prefix match)
liquid hand soap
dish soap
Obviously all of those titles are relevant, because they all contain the exact search term. But only "soap dispenser" is a prefix match, which would result in a pretty lousy user experience. I think there's definitely a place for prefix queries like this, but most users aren't going to be familiar enough with the search index to know what word to begin searching for.
I ended up just using the /search API so I could provide suggestions based on matches anywhere in the field. I limited the number of fields being returned, to limit the size of the response body, and it's worked out very nicely for me.

Related

How do I filter a RESTful collection resource? Query parameters or query strings?

I need to filter a list of employees and support both simple and complex queries.
RESTful APIs have query paramaters which are key value pairs provided after the ?
/employees?location=london
What would be used if I wanted to reduce the list to Employees with a start date between 01/01/2020 and 01/05/2020 that are also male and work at the Birmingham office?
Is this where a query string ?q=.... should be used? Is there any best practice to follow for this?
Is there any best practice to follow for this?
Anything that is consistent with the other identifiers in your API is fine.
REST doesn't care what spellings you use for your resource identifiers, so long as they are consistent with the production rules defined by RFC 3986.
A query part that is an application/x-www-form-urlencoded representation of key value pairs is a popular choice because HTML form support means those resource identifiers are easy to test with a web browser.
?q= is just another key value pair -- your values can be pretty much anything so long as they are encoded correctly. For prior art, see the text area input control in html.
Key value pairs are a way to encode information into the query part, but you aren't required to do that. http://example.org/?select%20%2A%20from%20students%3b is a perfectly satisfactory resource identifier from a REST client perspective.
(Of course, you probably wouldn't want to take an unsanitized input and run it in your production relational database using a role authorized to do arbitrary things.)
You aren't restricted to encoding useful information in the query part; if you prefer to encode information into the path segments, that's OK too. HTML doesn't support that out of the box, but a generalization of the HTML form is a URI Template, which gives you more options for communicating to the client how the URI is to be constructed.

REST: One to Many for GET via a foreign key

Relationships
robot has many brains
brain has one robot
Background
How to form the resource URL where we provide a robotId (foreign key) to retrieve its brain?
I could come up with this resource:
GET /robots/:robotId/brain
I am not sure if using brain in singular is against REST conventions and practices.
However, using GET /robots/:robotId/brains (brain in plurals) implies a collection will be returned but it will always have 1 item only.
Question
Can you advise me on a RESTful way?
Can you advise me on a RESTful way?
REST doesn't care what spelling conventions you use for your resource identifiers.
Therefore, you should use whatever spellings make sense within your local context. That might mean, for your own convenience, that the spelling conventions that you use for your path segments are similar to those that you use when naming collections/tables in your data store. Or perhaps not - you could equally decide that, because the audiences differ, so too should the spelling conventions.
GET /robots/:robotId/brain
GET /robots/:robotId/brains
GET /brains/:robotId
GET /ee4fcf74-d494-4f90-8964-9e4d65aa61ef
These are all fine.
Stefan Tilkov's 2014 talk: REST: I don't Think it Means What You Think it Does may be helpful.
For me, you should have only 1 endpoint GET /robots/:robotId/brains, you will get all your collection, but your frontend have to processing your data like he want
If you have GET /robots/:robotId/brains this can still return a collection of size 1?
You should have a rest end point which gives you the collection of all the brains of a robot. The uri should look like this :
GET /robots/:robotId/brains
Number of items the collection has should not matter.
If the rest end point is GET /robots/:robotId/brain, you are very much ignoring the fact that a robot might have multiple brains in the future and if very much possible if you database supports a one to many relationship.
To get 1 brain of a robot you can always keep scope for the below rest uri: GET /robots/:robotId/brains/:brainId Where brainId is the unique/primary key for a brain.

Using Twitter's public API to find similar tweets

I am working on an application that amongst other things tries to find similar tweets based on a tweet's text as input. The similarity of the tweet would be based on the amount of matching text. I would like to use the public twitter search api to accomplish this.
The closest thing the twitter API offers is searching using OR operators. This however returns a list of seemingly randomly ordered tweets that contain any of the query's words, ussualy matching common words like 'with' or 'we' (which is expected behaviour of the OR operator). I however am interested in results with as much matching text as possible and also in results with text that is characteristic to the input tweet (matching common words is less relevant then matching uncommon words).
Is there any way I can use the twitter API to find results with as much matching words as possible?
Example of results from query with OR operators.
The Twitter REST API does not expose a function that does what you are describing. You will need to capture a large number of tweets (probably from the Streaming API and then do the comparisons/identifications of similar tweets in your own code.

Providing complex filtering REST API [duplicate]

This question already has answers here:
REST and complex search queries
(5 answers)
Closed 10 years ago.
So I am building a RESTful (as RESTful as I can) API with the Laravel 4 PHP Framework. Right now I have dozens of API calls working and I have a process for being to do limit, ordering, and do simple filtering. Here would be an example of one of the calls:
/api/v1/users?limit=10&offset=10&firstName=John&order[]=createdTimestamp desc
This would return the 11th through 20th users that have a first name of John ordered by the createdTimestamp in descending order. The simple filtering here can only does exact matches (=). Now I also want to be able to provide a more complex filtering system through the REST API that supports the ability to specific the equality match type that way they could do a != or > or LIKE, etc... The issue is that I don't know if I am going to be able to provide this type of filtering through a normal query string.
What is the best way to provide this complex filtering through a REST API? Is doing through a POST still considered the best way even though it is not "truly" RESTful (even though this would prevent issues of the user trying to run a long query that exceeds the URI character length limit that some browsers have)?
#ryanzec
Now I also want to be able to provide a more complex filtering system
through the REST API that supports the ability to specific the
equality match type that way they could do a != or > or LIKE, etc...
The issue is that I don't know if I am going to be able to provide
this type of filtering through a normal query string.
It's not possible with simple query string(well, maybe it's possible but is very hard to encode such logic properly in query string). You need to define custom query format and use POST to submit such query. Server may respond with:
"201 Created" status and "Location" header field indicating query resource if there was no such query before; or
"303 See Other" and "Location" header field indicating already existing query resource.
Is doing through a POST still considered the best way even though it
is not "truly" RESTful
I do not know who said this, but it's wrong. There is nothing wrong with using POST for such purposes.
Use forms in your collection resource responses to tell the client how to search the collections. See my answer to REST and complex search queries for examples.

Programmatic Querying of Google and Other Search Engines With Domain and Keywords

I'm trying to find out if there is a programmatic way to determine how far down in a search engine's search results my site shows up for given keywords. For example, my query would provide my domain name, and keywords, and the result would return a say 94 indicating that my site was the 94th result. I'm specifically interested in how to do this with google but also interested in Bing and Yahoo.
No.
There is no programmatic access to such data. People generally roll out their own version of such trackers. Get the Google search page and use regexes to find your position. But now different results are show in different geographies and results are personalize.
gl=us parameter will help you getting results from US, you can change geography accordingly to get the results.
Before creating this from scratch, you may want to save yourself some time (and money) by using a service that does exactly that [and more]: Ginzametrics.
They have a free plan (so you can test if it fits your requirements and check if it's really worth creating your own tool), an API and can even import data from Google Analytics.