Google Reader API not showing all results - api

I'm using Google reader API to get all available items for any RSS feed. I use it as follows:
http://www.google.com/reader/atom/feed/[RSS FEED LINK]?n=[NUMBER OF ITEMS TO SHOW]&r=o&ot=[UNIX TIME STAMP FOR START DATE]
As I understand, this should return all items starting with the date specified by the time stamp (start date should not be older than one month ago). It works great for some feeds, but in most feeds, it doesn't show all available items (although they are available when using Google Reader).
For Example:
http://www.google.com/reader/atom/feed/http://www.360cities.net/rss/area/Greece.rss?n=1000&r=o&ot=1306959543
this link only shows items starting with 24-07-2011 to current date although it should show items starting with 26-06-2011. If the same link (http://www.360cities.net/rss/area/Greece.rss) is read by Google Reader, it'll show much more results.
Have any solutions?

Fortunately, I found the solution to my problem after a lot of research:
A url in this form returns the most recent N items of the RSS Feed
http://www.google.com/reader/atom/feed/[RSS]?n=[N]
[N] = Number of items to be displayed (max: 1000).
[RSS] = The url for the rss feed.
To get the next N older items, another parameter called Continuation String should be used. It can be found inside gr:continuation tag in each results' page. So, To get the N older items, a url in this form should be used:
http://www.google.com/reader/atom/feed/[RSS]?n=[N]&c=[C]
[N] = Number of items to be displayed (max: 1000).
[RSS] = The url for the rss feed.
[C] = Continuation string
Example:
Let's say we are interested to get results from http://www.360cities.net/rss/area/north-america.rss
To get newest 1000 item of this rss feed, The url to be used should look like:
http://www.google.com/reader/atom/feed/http://www.360cities.net/rss/area/north-america.rss?n=1000
To get the next older 1000 items, We should first search in the first result page and find the Continuation String. In this case the Continuation String is COnu-r7znpsC (it may be different when you view this post). Then, the url to be used should look like:
http://www.google.com/reader/atom/feed/http://www.360cities.net/rss/area/north-america.rss?n=1000&c=COnu-r7znpsC
To get the next older 1000 items, repeat the same process by finding the new Continuation String, etc...
If no Continuation String was found, this means that no more items are available.
I hope this would help someone.
Thanks

Related

Unpredictable soundcloud api tracklist

I need to get whole user's tracklist by SC api.
Before I use the next link format:
https://api.soundcloud.com/users/{user_id}/tracks/?page_size=200&linked_partitioning=1&client_id={app_id}
But it has stopped to work correct recently.
Changes:
limit has been decreased by SC from 200 tracks to 50;
returned tracks have become random (eg user has 300 tracks, but request above returns 54 tracks and link to the next page of tracks where fields "offset=50&limit=50" are appearing;
when I change field "page_size" from 200 to 50, SC returns only 18 tracks).
Also I've tried to use fields "offset" and "limit" instead of "page_size" but it has worked incorrect, too.
How I can get whole user's tracklist?
You need to parse the response and read next_href then change the url to the next_href value.
It's better to do this as a loop until there is no more next_href
Some recent updates on pagination from the API can be found below. You need to rely on the cursor rather than offset
https://developers.soundcloud.com/blog/pagination-updates-on-our-api

Extract portion of HTML from website?

I'm trying to use VBA in Excel, to navigate a site with Internet explorer, to download an Excel file for each day.
After looking through the HTML code of the site, it looks like each day's page has a similar structure, but there's a portion of the website link that seems completely random. But this completely random part stays constant and does not change each time you want to load the page.
The following portion of the HTML code contains the unique string:
<a href="#" onClick="showZoomIn('222698519','b1a9134c02c5db3c79e649b7adf8982d', event);return false;
The part starting with "b1a" is what is used in the website link. Is there any way to extract this part of the page and assign it as a variable that I then can use to build my website link?
Since you don't show your code, I will talk too in general terms:
1) You get all the elements of type link (<a>) with a Set allLinks = ie.document.getElementsByTagName("a"). It will be a vector of length n containing all the links you scraped from the document.
2) You detect the precise link containing the information you want. Let's imagine it's the 4th one (you can parse the properties to check which one it is, in case it's dynamic):
Set myLink = allLinks(3) '<- 4th : index = 3 (starts from zero)
3) You get your token with a simple split function:
myToken = Split(myLink.onClick, "'")(3)
Of course you can be more synthetic if the position of your link containing the token is always the same, like always the 4th link:
myToken = Split(ie.document.getElementsByTagName("a")(3).onClick,"'")(3)

Token to the next page in pagination API

I have a list of records on the server sorted by a key and use pagination API to return list of segments one by one. Since items can be inserted in the middle of the list, I return the first key of the next page as a pagination token that has to be passed to get the next page.
However, I've found that DynamoDB uses the last key of the current page instead for querying API, which is null if the next page does not exist.
Question:
What are pros and cons between using the last item of the current page and the first item of the next page as a pagination token?
N.B:
As for me returning the first item is more intuitive since it's null only if the next page does not exist.
Using the "last item of the current page" (LICP) is better than using the "first item of the next page" (FINP) because it deals better with the possibility that, in the meantime, some item is inserted between these two items.
For example suppose the first page contains 3 alphabetically ordered names: Adam/Basil/Claude. And suppose the next page is Elon/Francis/Gilbert.
Then with LICP the token is Claude, while with FINP the token is Elon. If no new names are inserted, the result is the same when we get the next page.
However, suppose we insert the name Daniel after getting the first page but before getting the second page. In this case, when we get the second page with LICP we get Daniel/Elon/Francis, while with FINP we get Elon/Francis/Gilbert. That is to say, FINP will miss Daniel, while LICP will not.
Also, FINP may consume more computing resources than LICP, since you must retrieve one extra item (4 items, in the above example, instead of only 3).

YouTube API Search v3 - Start index?

I'm using the YouTube Search API to grab 5 videos per time under a specific keyword. But I've been trying and trying, but couldent find the parameter for the start index. Does anyone know how to add it, so it gets the next 5 videos etc..?
Current URL I have:
https://www.googleapis.com/youtube/v3/search?part=snippet&q=wiz+khalifa&type=video&key=AIzaSyB9UW36sMDA9rja_J0ynSYVcNY4G25
In the results of your first query, you should get back a nextPageToken field.
When you make the request for the next page, you must send this value as the pageToken.
So you need to add pageToken=XXXXX to your query, where XXXXX is the value you received in nextPageToken.
Hope this helps

Google custom search REST number of results (num field)

I'm trying to figure out how to force google custom search to give me back 20 results per page.
I've tried to send this REST request configuring my new Custom Search Engine to:
Standard edition: Free, ads are required on results pages.
https://www.googleapis.com/customsearch/v1?key=AIzaSyCgGuZie_Xo-hOECNXOTKp5Yk7deryqro8&cx=015864032944730029962:5ipe0q27hgy&q=test&alt=json&num=20
IT NOT WORKS!
but
https://www.googleapis.com/customsearch/v1?key=AIzaSyCgGuZie_Xo-hOECNXOTKp5Yk7deryqro8&cx=015864032944730029962:5ipe0q27hgy&q=test&alt=json&num=10
IT WORKS!
But reading documentation at
https://developers.google.com/custom-search/docs/xml_results#numsp
it says that:
Optional. The num parameter identifies the number of search results to return.
The default num value is 10, and the maximum value is 20. If you request more than 20 results, only 20 results will be returned.
Note: If the total number of search results is less than the requested number of results, all available search results will be returned.
Someone has experienced this problem?
PS: I've tried also to send that REST request configuring my new Custom Search Engine to:
Site Search: Starts at $100 per year, ads are optional on results pages.
But nothing has changed no way to obtain 20 results in a request/page
This documentation url has descriptions of each parameter. It also says num is restricted to integers between 1 and 10, inclusive.
https://developers.google.com/custom-search/v1/using_rest#query-params