Github API Conditional Requests with paging - api

Context: let's say we want to retrieve whole list of Starred repositories by given User periodically (ones per day, hour or few minutes).
There are at least 2 approaches to do that:
execute GET to https://api.github.com/users/evereq/starred and use Url with rel='next' in 'Link' Response Headers to get next page Url (we should do that till we get no "next" page in response, mean that we reach the end). Seems that is recommended approach (by Github).
iterating 'page' parameter (from 1 to infinite) using GET to https://api.github.com/users/evereq/starred?page=XXX till you get 0 results in response. Ones you get 0 results, you finish (not recommended because for example instead of page numbers Github can move to "hash" values. Github already did it for some API operations.).
Now, let's say we want to make sure we use Conditional Requests (see https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests) to save our API usage limits (and traffic, trees in the world, etc.).
So we add for example 'If-None-Match' to our Requests Headers and check if response Status is 304 (Not Modified). If so, it means that nothing was changed from our last request. That works OK.
The issue however that what we have in 1) and 2) above, related to the way how we detect when to stop is NOT working anymore when you use Conditional Requests!
I.e. with approach 1), you don't get Link Response Headers at all when you use Conditional Requests.
So you will need to execute one more request with page bigger than page for which you already have ETag and see that it return 0 results and than you know you are done. That way you basically "waste" one request to Github API (because it miss Conditional Requests Headers).
Same with approach 2), you basically have 0 responses in every request with status 304... So again, to know you are done, you need to make at least one additional request which do return 0 results.
So the question is: when we do conditional requests with the fact that Github API does not send back Link Response Header (at least with queries using ETag which result Status 304) how could we know when to stop paging? Is it a bug in Github API implementation OR I miss something?
We don't know maximum page number, so to get when to stop we should execute one more "waste" request and check if we get 0 results back!
I also can't found how to query Github for total count of starred repositories (so I can calculate how many pages I should iterate in advice), same as responses does not include something like "X-Total-Count" so I know when to stop using simple math for pages count.
Any ideas how to save that one ('end') request and still use Conditional Requests?
If you do one request per day, it's OK to accept such waste, but what if you do such request ones per minute? You will quickly use all your API usage Limits!
UPDATE
Well, after a few more tests, I see now following "rule" (can't however found it anywhere in the docs, so note sure if its rule or just assumption): if user star something new, result for EVERY requested page contains different ETag value compared to previous and does not have status 304 anymore! That means that it's enough to just request first page and check for status. if its 304 (not modified), we do NOT need to check next pages, ie we are DONE as nothing was changed in any page. Is it correct approach or just coincidence?

We indeed return pagination relations in the Link response header when the content has changed 1. Since we don't support a since parameter for that call, you'll need to sort by most recent results and maintain a client-side cursor for the last known ID or timestamp (based on your sort criteria) and stop paging when it shows up in your paginated results. Conditional requests will just let you know if Page 1 has changed.
We haven't settled on a way to return counts on our listing methods, but a really low-tech solution is to set the page size to 1, grab the rel=last Link relation and check its page parameter value.
Hope that helps.

Related

REST Fetching data with GET not possible due to exceed header size limit

I am having a dilema. I need to fetch data for some products by Id, these products which are selected can vary from a couple to thousands.
I see and tested that GET is not possible due to exceeding the HeaderSizeLimit of 8192.
I had discussions with colleagues and changed to POST and the ids are in the body. Everything works but have a lot of discussions about this. Have you encountered something like this? What was your approach?
First question for me is, do you really pass all those ids in a single request? How is this list of IDs generated in the first place? Could the server know this list in advance?
For example, if the list of IDs is obtained by doing a search query on the same server, perhaps that search query can already emit the list of entities.
I find that in most cases this can be avoided, but there's some exceptions.
If you find that you can't avoid this, I would suggest you use the new http QUERY method instead of of POST, but POST should be fine too as a fallback.

List call fails about 10 pages in

I'm having some trouble with the "list" API call - it works OK for about 10 pages of results (feeding in the value of nextPageToken into subsequent calls), but then I get a 500 error (always at the same point). Is there a limit on the number of pages of results that can be listed?
Also, just confirming - the discovery file has a maxResults parameter but it seems to have no effect - is this correct?
Cheers,
Miles
I'm an engineer at Google. We had a bug related to certain unicode characters in the titles of surveys that was causing some List requests to fail. We've fixed the decoding issue now, so list should be able to work to access the rest of your remaining surveys now.
maxResults is not implemented by the API and should not be used.
Thanks for reaching out about this issue.

Obtaining System Log using Okta API

I would like to do the following using OKTA api:
One time, I would like to pull the entire system log.
Going forward I would like to pull only the days log information.
The challenge that I am facing is whenever I get the logs, I only get 1000 records. How do I get the whole days log, it maybe more that 1000 records. Is there some body who can help me with a piece of code which shows how to do this.
Thanks
You can use the Events API to retrieve this information. This API supports Pagination so you can retrieve all the events for a particular filter (like all events after a certain point in time).
1000 is the default limit for the Events API because this object can potentially contain a lot of data.
However, you can specify how many records for a specific time range are returned via the Events API using filters. For example, the following GET statement would retrieve the first 100 successful login requests since 1-Mar.
https://{{YOUR_COMPANY}}.okta.com/api/v1/events?limit=100&filter=published gt "2015-01-01T00:00:00.000Z" and action.objectType eq “core.user_auth.login_success"
If there are more than 100 records, you can get the next set by passing rel=“next” in the next request header. If you wanted to get only messages for today, you could change the date.

Youtube data API version 3 pagination

I understand that youtube response contains the next and previous page tokens and we can use those to go to previous and next pages.
eg :
https://www.googleapis.com/youtube/v3/search?&key={key}&part=snippet&maxResults=20&order=viewCount&q=abc&type=video&videoDuration=long&videoType=movie&pageToken=Cd323A
My question is how can I navigate to nth page of particular search?
Please note that some people may see this is impossible. But I have seen some site implemented this.
Youtube API only provides NextPageToken and PreviousPageToken, that's it, sorry: see documentation here: https://developers.google.com/youtube/v3/docs/videos/list
You can still make a pagination item but you will have to implement it by calling multiple times the URL, passing the NextPageToken as pageToken in your requestion every time, and caching results.
By the way, this question is already answered here (see the 1st point of the answer): youtube data api v3 php search pagination?
Quoting them:
In particular your case where you need 50 pages per page and you are
showing 3 pagination like (1,2,NEXT) then you need to fetch results
two times. Both the results you will keep in cache so for page 1 and 2
results will be retrieved from cache. For next you make it sure that
you are making query google again by sending nextPageToken.
Thus to show pagination 1-n and every page 50 results then you need to
make n-1 queries to google api. But if you are showing 10 results per
page then you cane make single query of 50 results using which you can
show first 5 pages (1-5) with the help of retrieved results and at
next you should again send next page token like above.
NOTE- Google youtube api provide 50 results max.
nextPageToken and prevPageToken comes with the api request.
Now you save the token in an array and then insert it as a nextPageToken in another api call and you will get the next page. The same way you will get a prevPageToken when u get the next page results.
Hope it helps.

Instagram realtime get post from callback

Right, this is really working on my nerves, but Instagram has to do something about their bloody documentation.
I am already trying for a week to live update my website with new instagram posts without refreshing the page. Twitter was fairly easy, but instagram is just one big mess. Basically I use the realtime Instagram API, the callback and all that stuff is working fine, but thanks to Instagram it does not return me an ID from the post that is new, the callback only returns some basic stuff:
[{"changed_aspect": "media", "object": "tag", "object_id": "nofilter", "time": 1391091743, "subscription_id": xxxxx, "data": {}}]
with this data you are nothing, except for the Tag, but I knew the tag before this callback too so doesn't matter. It actually only tells me that there is a new post. I have tried doing the same request as when the page loads, when this callback occurs, and get the posts that are newer than those already on the page. Unfortunately I have not succeeded in this yet. I have picked the ID from the last posted instagram post, and checked if it is in the callback request, and it's not.
What am I doing wrong?
I'd appreciate some help, thanks!
Edit:
I'd like to note that this is not only a problem with the realtime api, but also with the normal API. I just don't know how to compare data so I don't get duplicates in my database(normal api), or on my website (realtime). I can't find any tutorial or documentation (Yes, I might be blind), that explains to me how to compare data. I can only find the min_id and max_id, but no explanation what these id's contains. I checked these id's with id's from results, and they do not match. It's not an ID from a media item.
I also checked the next_url, and in my logic thinking, this should be a URL to the next page (like Twitter).
Am I looking at this all wrong?
Ok strike my old answer, I changed the way I do this. Here's how I'll do it now.
I still wait for 10 hits on my Real-time subscription, when I reach 10 I send off a new thread (if one is not already running).
The sync thread queries my DB for a value, I need the last min_tag_id I used. Then I query:
https://api.instagram.com/v1/tags/*/media/recent?access_token=*&min_tag_id=*
Try it out here: https://api.instagram.com/v1/tags/montreal/media/recent?access_token=*
You'll get 20 results, and a min_tag_id value. Append that to your url, you'll see you get no results. Wait a couple of seconds and refresh. Eventually you'll get some media, and a new min_tag_id.
(You can ignore the "next_url" value they give you, you won't be using that).
Basically you only need to store that min_tag_id and query until you have no more results, that means you're done then.
When you get a subscription push, you need to query that endpoint (tag / recent).
I normally start an synchronous thread to perform this so I can answer in under 2 seconds to Instagram.
Then you parse that endpoint and look for a "next url" value.
Keep querying that end point, parsing the media and going to the next url until you find your stop condition.
For me I try to match 10 consecutive records in my DB. Basically from the tag, I store media when then meet my business rules.
The Instagram documentation is accurate and actually well written.
The realtime API is working correctly. As stated in the documentation:
The changed data is not included in the payload, so it is up to you
how you'd like to fetch the new data. For example, you may decide only
to fetch new data for specific users, or after a certain number of
photos have been posted.
http://instagram.com/developer/realtime/
You only receive a notification that an update has happened to your subscribed object. It is up to you to call the API to find out what that data is.
You can call the /tags/[tag-name]/media/recent with an access token that you have previously stored on your own server or DB. Then, you should be able to compare the data returned from that endpoint with any data you have retrieved prior, and just pull the objects that you do not yet have.