Bing Image Search API returns duplicate reults - api

The Bing Image Search API returns all duplicate results for offset > 200 or 300. This costs money as api calls are wasted. It should stop returning results if it doesn't have any more.

It would be nice if the Bing Image Search API stopped returning results when the offset value is greater than the number of results available, but that's not how the API works. If you look at the Image Search API Reference, users are expected to check the totalEstimatedMatches parameter from the first request and make sure that the offset value has an acceptable value before making subsequent requests:
The offset should be less than (totalEstimatedMatches - count).
So if you perform this check, you can decide when to stop making new requests. If offset exceeds the number of results, it looks like the API just returns the last count results which would explain the "duplicate results" you're getting.

Related

Get a random search result from Amazon CloudSearch

My query is like:
/2013-01-01/search?q=(and author:'william' category:'Videos')&q.parser=structured&expr.random=_rand&return=_all_fields&size=1
and returns a video. However, I want a random videoId on every request.
Using the expression &expr.random=_rand; I'm unable to fetch a random result and I have failed to find any solution in documentation.
How can I get a random search result on every request?
The CloudSearch docs are quite lacking. I needed something similar and came up short with Google so I just started guessing by applying what I already knew about Solr and I found a solution:
/2013-01-01/search?q=what&sort=_rand_1 desc
Note that _rand will also work however after the first search it will always return the same results which is actually ideal if you need to paginate through a random result set (it works the same way in Solr). So, to get random results every time you need to randomly generate an _something and append it to _rand.
You can accomplish this through pagination by setting the start param to a random value from 0 to hits.found and requesting size=1:
search?q=matchall&q.parser=structured&size=1&start={yourRandomNumber}
If the number of documents in your index is fluctuating, you'll need to make 2 queries: one to get the max number of results (comes back as hits.found), and another to retrieve the random result.

Yammer API - Paging

I am trying to gather a range of messages through the rest API, and am aware that you can only retrieve 20 results at a time. I have tried incrementing a page variable, but this has no affect, and I am just getting the same results each time no matter the page number (https://www.yammer.com/api/v1/messages.json?page=6). I have proceeded to use the newer_than and older_than parameters to page through the results, and it works to some extent, but it appears to be excluding records. I am using the following approach below:
Since just setting a newer_than only results in the 20 most recent records as long as they are newer than the id that is sent in the newer_than parameter, I am also setting a dynamic older_than parameter.
Send request with only a newer than parameter. This returns the 20 most recent records. (eg. ww.yammer.com/api/v1/messages.json?newer_than=235560157)
Extract the ID of the 20th id in the JSON, and using this to populate the older_than parameter. The result is 20 different records. (eg.ww.yammer.com/api/v1/messages.json?newer_than=235560157&older_than=405598096)
Repeat step 2 until no results are returned since the newer_than and older_than parameters will eventually overlap.
The problem is that the set of records that is returned with this method is less than the number of records that is returned for messages from the data export API. I am working under the assumption that newer message IDs are always generated with a value greater than any older messages.
Could I possibly be misunderstanding how paging through results is supposed to be implemented with the REST API?
Any help would be much appreciated!
Thanks in advance!
First of all, the page parameter works only for the search API.
Secondly, the way you are trying to fetch messages will not return any comments on the messages or will return top 2 comments on any message based on the "extended" parameter. By default it returns 2 comments on every message. To get all the comments on the message you will have to get it individually message wise.
That must be causing the difference in the number of messages in the two methods mentioned.
I agree with Farhann - The rest API endpoint returns only top two comments for any message by default. To get all the comments for a post, you have to make a separate request.
With the use of the Data Export API, all the comments along with the message (public and private) are also exported which increases the count of the number of the messages. While, the API call returns only recent 2 comments on any message by default.
The data export includes private messages. Private messages will not be returned by that API call.
Check if the messages you are not seeing are private messages.

What is the maximum permitted response data size?

In the API Docs section Browsing Table Data, there is a reference to the "permitted response data size"; however, that link is dead. Experimentation revealed that requests with maxResults=50000 are usually successful, but as I near maxResults=100000 I begin to get errors from the BigQuery server.
This is happening while I page through a large table (or set of query results), so after each page is received, I request the next one; it thus doesn't matter to me what the page size is, but it does affect the communication with BigQuery.
What is the optimal value for this parameter?
Here is some explanations: https://developers.google.com/bigquery/docs/reference/v2/jobs/query?hl=en
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
To sum up: max size is 10MB, no row count limit.
You can choose value of maxResult parameter based on your usage of app.
If you want show data on the report, then you need to set low value for fast showing first page.
If you need to load data to other app, then you can use max possible value (record size * row count < 10MB).
As you say, you manually set maxResults = 100000 to page through result set, it will get errors from BigQuery server. What errors you will get? Could you paste the error message?

Youtube API problem - when searching for playlists, start-index does not work past 100

I have been trying to get the full list of playlists matching a certain keyword. I have discovered however that using start-index past 100 brings the same set of results as using start-index=1. It does not matter what the max-results parameter is - still the same results. The total results returned however is way above 100, thus it cannot be that the query returned only 100 results.
What might the problem be? Is it a quota of some sort or any other authentication restriction?
As an example - the queries bring the same result set, whether you use start-index=1, or start-index=101, or start-index = 201 etc:
http://gdata.youtube.com/feeds/api/playlists/snippets?q=%22Jan+Smit+Laura%22&max-results=50&start-index=1&v=2
Any idea will be much appreciated!
Regards
Christo
I made an interface for my site, and the way I avoided this problem is to do a query for a large number, then store the results. Let your web page then break up the results and present them however is needed.
For example, if someone wants to do a search of over 100 videos, do the search and collect the results, but only present them with the first group, say 10. Then when the person wants to see the next ten, you get them from the list you stored, rather than doing a new query.
Not only does this make paging faster, but it cuts down on the constant queries to the YouTube database.
Hope this makes sense and helps.

Flickr Geo queries not returning any data

I cannot get the Flickr API to return any data for lat/lon queries.
view-source:http://api.flickr.com/services/rest/?method=flickr.photos.search&media=photo&api_key=KEY_HERE&has_geo=1&extras=geo&bbox=0,0,180,90
This should return something, anything. Doesn't work if I use lat/lng either. I can get some photos returned if I lookup a place_id first and then use that in the query, except then all the photos returned are from anywhere and not the place id
Eg,
http://api.flickr.com/services/rest/?method=flickr.photos.search&media=photo&api_key=KEY_HERE&placeId=8iTLPoGcB5yNDA19yw
I deleted out my key obviously, replace with yours to test.
Any help appreciated, I am going mad over this.
I believe that the Flickr API won't return any results if you don't put additional search terms in your query. If I recall from the documentation, this is treated as an unbounded search. Here is a quote from the documentation:
Geo queries require some sort of limiting agent in order to prevent the database from crying. This is basically like the check against "parameterless searches" for queries without a geo component.
A tag, for instance, is considered a limiting agent as are user defined min_date_taken and min_date_upload parameters — If no limiting factor is passed we return only photos added in the last 12 hours (though we may extend the limit in the future).
My app uses the same kind of geo searching so what I do is put in an additional search term of the minimum date taken, like so:
http://api.flickr.com/services/rest/?method=flickr.photos.search&media=photo&api_key=KEY_HERE&has_geo=1&extras=geo&bbox=0,0,180,90&min_taken_date=2005-01-01 00:00:00
Oh, and don't forget to sign your request and fill in the api_sig field. My experience is that the geo based searches don't behave consistently unless you attach your api_key and sign your search. For example, I would sometimes get search results and then later with the same search get no images when I didn't sign my query.