Is it possible to retrieve arbitrary number of items from reddit using API? - api

I am writing a small assistant app to read (well, filter/rank) /r/programming/ for me, because it has so many damn posts, and because certain area of my coding skills was getting rusty and it sounds like good exercise.
I am getting items from "new" page of the subreddit using json api; however it only returns 25 items per request (which is the page size), so to retrieve items for, say, last week, I need to make dozens of requests. As the mandated request interval is 2s, it is painful.
I wonder if there's some way to retrieve more items? Query string parameters for standard html gets also work for json gets, but I cannot find one for page size.
EDIT: for posterity, the paramrter name is "limit", although that too is capped at 100

for posterity, the parameter name is "limit", although that too is capped at 100

Related

REST Fetching data with GET not possible due to exceed header size limit

I am having a dilema. I need to fetch data for some products by Id, these products which are selected can vary from a couple to thousands.
I see and tested that GET is not possible due to exceeding the HeaderSizeLimit of 8192.
I had discussions with colleagues and changed to POST and the ids are in the body. Everything works but have a lot of discussions about this. Have you encountered something like this? What was your approach?
First question for me is, do you really pass all those ids in a single request? How is this list of IDs generated in the first place? Could the server know this list in advance?
For example, if the list of IDs is obtained by doing a search query on the same server, perhaps that search query can already emit the list of entities.
I find that in most cases this can be avoided, but there's some exceptions.
If you find that you can't avoid this, I would suggest you use the new http QUERY method instead of of POST, but POST should be fine too as a fallback.

Does it make sense to define a GET /users/{id}/photos route or should I just send multiple GET /photos/{id} requests to return a user's photos?

This is a dilemma I find myself facing very often when dealing with nested resources.
So suppose the target user has n photos. Does it make sense to define a GET /users/{id}/photos route or should I just send n GET /photos/{id} requests by first requesting the User object and then looping through that User's photo_ids attribute?
From a best practices perspective, I would not send several request for such similar resources. In that scenario, you end up creating more work for yourself by having to rate limit on the backend, in order to keep your users with a lot of pictures from blowing up your server. That would be impactful to your UX as well.
I recommend you format your route as such:
GET /users/photos?id={{id}}
And return all associated photos with that user ID all at once. You can always limit that to X number of photos per call too, and paginate:
GET /users/photos?id=658&page={{1,2,3, etc.}}
My personal preference is always to try to keep variable data in the URL parameters. Having spent the afternoon with several unrelated APIs, I can tell you a whole slew of developers agree.

How do I search this? Possible to access more than 100 JSON api search results if I pay for it?

How to search this?
I want to be able to:
1. create a search engine
2. programatically search it thorugh an API (python, or other)
3. paginate through the results (all of them, if I chose)
4. store URL's or results that I want.
Is this even possible with Google Custom Search Engine?
I enabled billing, my CC is up to date with Google, I do steps 1..3 above.
On a search, I will get back 4,000 results for example, but I can only access 10 at a time with the API, none more, and when I reach 100 results I am shut off.
I want to be able to process 1000 results if I wish.
Before you reply, do you personally have working code that goes beyond the 100 limit?
If so, would be very much interested in speaking, learning how you did it.
I am using Python at the moment, but it could be any language.
--
I tried using the &start=100, 200, and so on to paginate through, but this does not work.
I tried getting 100 results in a python script, ending the program, calling it again setting start=100 (after the first set returned), and nothing happened.
I want to be able to use the Google Custom Search API, pay Google for a monthly subscription but have not found that this is possible.
For any given search, I want to decide how many results to process, could be 1K, could be 20K, I simply need/want access to the full result set, but I do not, have not seemed to find a way to do this.
The API allows only a max result depth of 100. See https://developers.google.com/custom-search/v1/cse/list

What is the difference between parsing betting website for live scores vs official website API?

I want to monitor some live scores on soccer matches. I have 2 ways to do this:
official api from the website(free)
parse websites source code myself and get data from it( need to do it every second)
What is the difference? Is calling API faster?
This can depend on quite a lot external to this specific scenario, but given the context, yes the API's would much faster. The difference is in what data is being sent/received/parsed.
In either scenario you'd need some timer to tick and parse the results (website or API) so there's no performance difference in the "wait code", but the big difference will be in the data itself that is parsed. When you call the API, chances are more likely that you will send a specific parameter or call a specific function that indicates what you're looking for, pseudo-code example:
SoccerSiteApi.GetValue(SCORE, team1, team2);
Or
SoccerSiteApi.GetCurrentScores(team1, team2);
By calling the API, you are only sending and receiving a few hundred bytes (or more depending on data) and getting back exactly what you want, that is, you don't need to parse the scores out of the values sent back since they are the scores, so no processing time is spent doing anything additional with the data itself.
If, however, you were to parse the entire web site, you would need to make an HTTP GET request (and all that entails) to get the entire page (which could be a couple hundred KB or MB depending on content) and then spend processing time extracting the exact data you were looking for, and then doing this every second.
So the biggest difference is amount of data and time spent processing it.
Hope that can help

Product catalogue storage in mongoDB from an RDBMS perspective

I have a product page with an URL of the form http://host/products/{id}/{seo-friendly-url}, where the "seo-friendly-url" part may be something like category/subcategory/product.
The products controller gets the product with the specified ID and then ensures that the URL that follows is correct for the product - if it isn't the user is redirected to the appropriate URL (all URLs in the shop are generated correctly though, the redirect is just to maintain a canonical URL in the case of mistyping by the user or the URL changing since Google crawled it etc). The ID ensures fast product look-up, and the part on the end ensures keywords make it into the URL.
To check the URL, I have a SQL view which utilises a recursive common table expression to concatenate the product URL chunk with the URLs of its parent category URLs all the way up the hierarchy (generally just 3 deep).
I've recently came across document oriented storage and I can see it being very useful in a variety of situations (for example, my product entities have tags and multibuy prices and attributes etc all in different tables currently).
So on to my question - how can I achieve the above functionality in mongoDB, or is there a better way to think about it? The naive way would be to retrieve each category in the hierarchy individually, but I'm assuming that would be slow.
Related: I've read in the docs that skip/limit for is slow for large result sets - would this be noticeable for the maximum of say 10 pages of 25 products each likely to be present in a retail website category?
I think your best option is to just store the full slug with the product. Then when you get the product just check to see if the slug matches and if not, redirect. Now, the trade-off is that if you want to rename a category you will need to do a batch job to find all products in the category and change their slugs. The good news is that category renames will be much less common than views (hopefully) so your total load will be reduced.
Not sure how skip and limit are related to this question, except that they both involve mongodb. Anyway for 25 results it's really no problem. Limit isn't slow and in fact can speed things up if less than 100 (default first batch size). Skip can hurt performance, but only by making it as slow as if you fetched all skipped documents w/o the extra network traffic. Therefore I wouldn't skip 1 million docs, but skipping 100 would be fine.
You can model a collection called products, with the document like:
product:{id:someId,category:someCategory,subcategory:someSubCategory,productSlug:somenameslug}
The query to get the product given the id, category and subcategory would be something like:
db.products.find({id:123,category:cat,subCategory:subcat})
This sounds pretty simpleton but given my understanding of your question IMO this should be a good start.
For your other question, there are skip and limit modifiers to help with pagination.