What is the difference between parsing betting website for live scores vs official website API? - api

I want to monitor some live scores on soccer matches. I have 2 ways to do this:
official api from the website(free)
parse websites source code myself and get data from it( need to do it every second)
What is the difference? Is calling API faster?

This can depend on quite a lot external to this specific scenario, but given the context, yes the API's would much faster. The difference is in what data is being sent/received/parsed.
In either scenario you'd need some timer to tick and parse the results (website or API) so there's no performance difference in the "wait code", but the big difference will be in the data itself that is parsed. When you call the API, chances are more likely that you will send a specific parameter or call a specific function that indicates what you're looking for, pseudo-code example:
SoccerSiteApi.GetValue(SCORE, team1, team2);
Or
SoccerSiteApi.GetCurrentScores(team1, team2);
By calling the API, you are only sending and receiving a few hundred bytes (or more depending on data) and getting back exactly what you want, that is, you don't need to parse the scores out of the values sent back since they are the scores, so no processing time is spent doing anything additional with the data itself.
If, however, you were to parse the entire web site, you would need to make an HTTP GET request (and all that entails) to get the entire page (which could be a couple hundred KB or MB depending on content) and then spend processing time extracting the exact data you were looking for, and then doing this every second.
So the biggest difference is amount of data and time spent processing it.
Hope that can help

Related

Best HTTP method for a stateless REST API service call

I want to make a REST API that does spellchecking on text that is passed in, without storing any of the text on the server.
The call would probably look something like `example.com/api/v1/spelling/mistakes', with optional query param for locale and an list of the mistakes as return value.
What would be the best HTTP method to use, given that the text passed in would be too large for a GET. Neither POST, PUT nor PATCH seem to reasonably map to the intended purpose and there don't seem to be any other suitable matches in the less commonly used methods either.
What is the best HTTP method to use for a "translation"-like REST API service, taking and returning large amounts of data?
I would say this is a POST. But it could have been a GET if the data was previously posted. The reason it is not a GET is because you are passing all the data in this API call, as you mentioned. For example, if the data was 'posted' somewhere else previously, then the GET can be used where the address (URI) of the location, or ID, of that 'posted' data is passed to the API as a param in the GET. But because we are both 'posting' the data and retrieving information about that in the same call, I would say then that this is a POST. Grant it the data being posted has a short life span, it is still being posted. If the data being posted was instead a customer order, then it would still be a POST but the data would be persisted somewhere. The difference here is the the short period of time that the data will exist for. And in future iterations of your API, you might actually want to keep that data and refer back to it with some ID. So by using POST you allow for future enhancements also.
By the way, as a precaution, be careful with the memory footprint of these calls. I can see this as being very memory intensive if the data being passed grows large and the API becomes very popular. Not a show stopped but something to consider when designing it.
Hope that helps alleviate what I call REST anxiety when designing an API.

Many requests in an API vs Many separate requests

I am making an application based on Google maps API. This requires requesting for distance between two cities. Now I want distances between many cities.
So should I use "for loop" and make many requests separately or should I send all the cities names in one link. Which one will work faster? And which one will be better?
For sure you should avoid sending multiple requests, because each roundtrip to a server takes time.
However when you are grouping many requests this can also take a long time (both to send, and to process on the server), and affect the user experience (long waiting time).
In your case I suspect that the "for loop" will not load to a lot of data, and server side processing will also not be too heavy, so sending a grouped single request should be the way to go.
You can use the "DirectionService" sevices ,which is providing by Google i.e "api3".
You can find the distance between the Many cities ,it takes one origin point ,destination point and 8 way points (total 10 places) for one request and it provides a JSON file
in return ,which contains all the information (distance in KM,value meters,city names and lot more) .Please check this link, https://developers.google.com/maps/documentation/javascript/directions . i hope this answer will meet your requirement,otherwise don't mind.

WCF Services, what's better, one big request or a lot of little ones?

I'm reviewing some code where we've had some issues with return data from a WCF web service. Currently the service makes a list of objects, serializes them (as JSON for the record) and returns the entire serialized list down the wire. Obviously when there's a lot of data users run into quota limit problems.
I'm considering changing it so the service returns one item at a time which would send a bunch of requests on a loop adding one object at a time into the list until it was done.
Obviously in scenario one we're making one request to the service that has the potential to return a massive amount of data and run up against the quota. In the other scenario we never hit the quota but the requesting app will be requesting data item after data item in a stream of separate requests.
To illustrate we have a list of items which come in a variety of item types and those types come at a variety of price points. The app might want to aggregate a number of items, the customers who want that item, the types of item and price requested by the customer and their could be maybe seventy items with between five and eighty customers each requesting on average 2 types of product at 1 price each.
Taking averages at the extreme end this could make 7000 separate (very small) data requests in a single complete job. Is that a problem? It is possible to package it up a bit so that customer types and prices requested could be bundled but that's still potentially a couple of thousand requests at one time.
Am I better off with a single huge data stream? Or a couple of thousand smaller ones?
You're better off with the optimal sized return for your scenario :) It kinda depends on the overhead on the request. Generally the less chatter back and forth to a web service the better.
Facetious answer, so here's the rub:
You're probably best off with some sort of paging system, wherein your request asks for a specific number of items, and your response returns a "n of m" in the results. That way you can tune the number of requests and size of the response to perform best in your situation.

FaceBook Graph API

When I use FaceBook API for retrieving posts information, I found that the returned information are changing all the time.
for e.g., when I retrieved information 2 times with 1mins interval, one record appears in the 1st time, and disapeared in the 2nd time.
https://graph.facebook.com/search?q=baby&type=post&limit=100&since=2010-05-19&until=2010-05-21
Does anyone know what happen?
Cheers,
LingChen
Searches of large datasets are (in general) nondeterministic.

How to skip known entries when syncing with Google Reader?

for writing an offline client to the Google Reader service I would like to know how to best sync with the service.
There doesn't seem to be official documentation yet and the best source I found so far is this: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
Now consider this: With the information from above I can download all unread items, I can specify how many items to download and using the atom-id I can detect duplicate entries that I already downloaded.
What's missing for me is a way to specify that I just want the updates since my last sync.
I can say give me the 10 (parameter n=10) latest (parameter r=d) entries. If I specify the parameter r=o (date ascending) then I can also specify parameter ot=[last time of sync], but only then and the ascending order doesn't make any sense when I just want to read some items versus all items.
Any idea how to solve that without downloading all items again and just rejecting duplicates? Not a very economic way of polling.
Someone proposed that I can specify that I only want the unread entries. But to make that solution work in the way that Google Reader will not offer this entries again, I would need to mark them as read. In turn that would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.
Cheers,
Mariano
To get the latest entries, use the standard from-newest-date-descending download, which will start from the latest entries. You will receive a "continuation" token in the XML result, looking something like this:
<gr:continuation>CArhxxjRmNsC</gr:continuation>`
Scan through the results, pulling out anything new to you. You should find that either all results are new, or everything up to a point is new, and all after that are already known to you.
In the latter case, you're done, but in the former you need to find the new stuff older than what you've already retrieved. Do this by using the continuation to get the results starting from just after the last result in the set you just retrieved by passing it in the GET request as the c parameter, e.g.:
http://www.google.com/reader/atom/user/-/state/com.google/reading-list?c=CArhxxjRmNsC
Continue this way until you have everything.
The n parameter, which is a count of the number of items to retrieve, works well with this, and you can change it as you go. If the frequency of checking is user-set, and thus could be very frequent or very rare, you can use an adaptive algorithm to reduce network traffic and your processing load. Initially request a small number of the latest entries, say five (add n=5 to the URL of your GET request). If all are new, in the next request,
where you use the continuation, ask for a larger number, say, 20. If those are still all new, either the feed has a lot of updates or it's been a while, so continue on in groups of 100 or whatever.
However, and correct me if I'm wrong here, you also want to know, after you've downloaded an item, whether its state changes from "unread" to "read" due to the person reading it using the Google Reader interface.
One approach to this would be:
Update the status on google of any items that have been read locally.
Check and save the unread count for the feed. (You want to do this before the next step, so that you guarantee that new items have not arrived between your download of the newest items and the time you check the read count.)
Download the latest items.
Calculate your read count, and compare that to google's. If the feed has a higher read count than you calculated, you know that something's been read on google.
If something has been read on google, start downloading read items and comparing them with your database of unread items. You'll find some items that google says are read that your database claims are unread; update these. Continue doing so until you've found a number of these items equal to the difference between your read count and google's, or until the downloads get unreasonable.
If you didn't find all of the read items, c'est la vie; record the number remaining as an "unfound unread" total which you also need to include in your next calculation of the local number you think are unread.
If the user subscribes to a lot of different blogs, it's also likely he labels them extensively, so you can do this whole thing on a per-label basis rather than for the entire feed, which should help keep the amount of data down, since you won't need to do any transfers for labels where the user didn't read anything new on google reader.
This whole scheme can be applied to other statuses, such as starred or unstarred, as well.
Now, as you say, this
...would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.
True enough. Neither keeping a local read/unread state (since you're keeping a database of all of the items anyway) nor marking items read in google (which the API supports) seems very difficult, so why doesn't this work for you?
There is one further hitch, however: the user may mark something read as unread on google. This throws a bit of a wrench into the system. My suggestion there, if you really want to try to take care of this, is to assume that the user in general will be touching only more recent stuff, and download the latest couple hundred or so items every time, checking the status on all of them. (This isn't all that bad; downloading 100 items took me anywhere from 0.3s for 300KB, to 2.5s for 2.5MB, albeit on a very fast broadband connection.)
Again, if the user has a large number of subscriptions, he's also probably got a reasonably large number of labels, so doing this on a per-label basis will speed things up. I'd suggest, actually, that not only do you check on a per-label basis, but you also spread out the checks, checking a single label each minute rather than everything once every twenty minutes. You can also do this "big check" for status changes on older items less often than you do a "new stuff" check, perhaps once every few hours, if you want to keep bandwidth down.
This is a bit of bandwidth hog, mainly because you need to download the full article from Google merely to check the status. Unfortunately, I can't see any way around that in the API docs that we have available to us. My only real advice is to minimize the checking of status on non-new items.
The Google API hasn't yet been released, at which point this answer may change.
Currently, you would have to call the API and dis-regard items already downloaded, which as you said isn't terribly efficient as you will be re-downloading items every time, even if you already have them.