Suggestions to log the number of requests when building an API? [closed] - sql

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We have a pretty large website, which handles a couple million visitors a month, and we're currently developing an API for some of our partners. In the near future, we hope to expose the API to all of our visitors.
As we are trying to limit the number of requests to say, 100.000 / day and 1.000 / minute, I need to log the total number of requests per API key, and verify that the API user doesn't exceed this total. We won't check real-time whether the limit is exceeded at this point, but afterwords in the site's control panel. We also need to display a timeline per user in the control panel, so we have to get a quick per day or per hour overview if we need this.
My first idea was to build the following app:
API User => Webserver => Posts message with API key to a message queue => Service picks up the message => Posts to the database, where there is 1 item for each user-hour combo (key|hour|count). Which is gonna be quite fast, yet we'll remove quite some useful information (queries, requests / minute, etc.) Saving each and every request as separate record in the database will likely generate millions of records a day, and will (I guess, I'm not that much of a DBA) be quite slow when generating some chart. Even with the correct indices.
Our platform consists of around ten webservers, ten front end SQL servers, statsserver, some other servers for processing large tasks. All Windows (except our EMC), and MS SQL. Dev platform is ASP.Net WCF.

My advice would be to log everything - a simple append-only text file is simplest - and have a background task periodically read and summarize log segments into the database. This has several advantages over more 'sophisticated' approaches:
It's simpler.
It's really easy to debug.
You can keep individual log segments around until you need to delete them from disk, so you can get information on individual requests for debugging and accounting purposes.
You can easily extend it to collect more information, or improve and change your summarizer, because the components are loosely coupled.
It's easy to shard - just have each server keep its own logs.

I'd start with logging at first, and leave off enforcement. Your logging may show you that you don't need enforcement, or it may show you you need a different kind of enforcement.
I'd just start off creating a simple logging API: ApiLogger.Log(apiKey). I'd have the logger take authentication information etc. from HttpContext. I'd start at first just dumping it into a database table, and only get fancier if performance required it.
Later analysis could determine who is making how many calls, whether you want multiple tiers, charging different amounts per tier, etc. But for the moment, just store the data that your Business people will need.

As we are trying to limit the number of requests to say, 100.000 / day and 1.000 / minute, I need to log the total number of requests per api key, and verify that the api user doesn't exceed this total.
A feature like this will be part of WCF (if not already) in the very near future. I am currently racking my brain on where I heard it so I can point you in the right direct.
EDIT: FOUND IT!
This week on a podcast called "The Thirsty Developer", this very topic came up. Download the podcast here, and at 39:40 into the podcast the topic comes up. For those that do not want to listen there is a REST toolkit that has this feature in it. I think the toolkit can be found here

Related

YouTube API Services Compliance Review

I have a project where I need to have the API quota increased significantly from the 10,000 daily hits, and I think this is being processed by Google as part of a YouTube API Services Compliance Review.
However, I have not had any response in over a week and the delay is putting the project at risk of a delayed launch and additional costs.
Does anyone know if this is normal and if there is a way to expedite the review, or speak to someone? Even pay for a higher tier of support?
Thanks in advance.
If you’ve filled the audit form https://support.google.com/youtube/contact/yt_api_form?hl=en properly, you should get a response within two weeks (Google reviews thousands of these, among other things to prevent abuse this is one of the processes that isn’t fully automated).
I recommend if your in a rush since your paying for credits you might as well open a second account and load balance between two or even three accounts; in your code you can create counters and swap before capping out the 24 hour term; not sure what data you’re looking to extract but depends on what data you may be able to even use other services to supplement.
They will get back to you about your application; just requires massive patience.

What is a good strategy for polling Microsoft Graph API so that I don't get throttled?

My goal is to maintain "real-time" (or as close as possible) information about the email messages sent by a group of users.
My ideal solution would be to periodically query the API for messages by all users in the group. This feature is not (yet?) implemented.
My second choice would be to create subscriptions (https://graph.microsoft.io/en-us/docs/api-reference/v1.0/api/subscription_post_subscriptions) for every member in the group and then request message information after I become aware of an event. The problem is, in practice, I am only allowed to create 20-30 simultaneous subscriptions (Issues to use Webhook for Microsoft Graph API), which might not be enough.
So I'm stuck with polling all the users in a cycle. The main problem with this approach is I can't find any information how many request are "too many", ie I get throttled. I want to maximize the number of requests to minimize the time it takes to get through one cycle.
A solution that comes to mind is to develop an adaptive program that slowly decreases the time between requests until throttled, then abruptly adds some time to it until a nice balance is found and maintained. This seems like a lot of work though. Right now I'm working on the assumption that 1/second is about the highest I can safely go (0.5 seconds on average per round trip, then a cool down of another 0.5 seconds).
What is the best way to deal with an unknown throttling limit in general, and Microsoft Graph in particular?
Edit:
While I think the accepted answer is a good response, it might not be suitable for all cases. For instance, if you don't want to use the 365 API, and you don't mind using beta features, perhaps you might check out this (delta tokens), which seem to be designed for real-time syncing with the data.
The only potential downside with the accepted answer is that you still need a subscription for each user you want to track (I think...), and there are limits on those. Very curious as to how other people tackled this problem.
While still in Preview, you may want to take a look at Outlook Streaming Notifications. These APIs provide a more robust notification model than simple web hooks. You would still need to establish multiple subscriptions but I expect you'll see far less latency.

API call request limit

I have been looking into various different APIs which can provide my the weather data I need in JSON format. A lot of these API's have certain limits such as: in order to get more requests per minute, you need to pay more money per month so that your app can make more API requests.
However, a lot of these API's also have free account which five you limited access to them.
So what I was thinking is, wouldn't it be possible for a developer to just make lots of different developer accounts with an API provider and then just make lots of different API keys?
That way, they wouldn't have to pay anything as they could stick with the free accounts. Whenever one of the API keys has reached the maximum daily request calls, the developer could just put a switch statement in their code which gets their software to use a different API key.
I see no reason why this wouldn't work from a technical point of view... but, is such a thing allowed?
Thanks, Dan.
This would technically be possible, and it happens.
It is also probably against the service's terms, a good reason for the service to ban all your sock puppet accounts, and perhaps even illegal.
If the service that offers the API has spent time and money implementing a per-developer limit for their API, they have almost certainly enforced that in their terms of service, and you would be wise to respect those.
(relevant xkcd)

"reasonable" use of web APIs to sync data

My goal is to synchronize a web-application with an internal database. The web-application has a public API, but in order to fully synchronize the two sources I would need to make around 2000 separate API calls every time. My instinct tells me that this is excessive and possibly irresponsible, but I lack the experience to know for sure.
In this particular case the web-application is Asana, but I've encountered similar situations before with other services. Is there any way to know if you're abusing a service through excessive API calls? I know I'm not going to DOS a company like Asana, but I can't shake the feeling that there must be a better way than making ~150k requests per day.
The only other option I can think of is to update the web-service only when I know there's been a change in the database, but I'll lose a lot of capability that way.
I apologize for the subjectivity of this question, but I'm really hoping that someone can explain if there's any kind of etiquette that's expected when using public APIs.
(I work at Asana)
This is an excellent question, or rather set of questions.
You are designing a system that will repeatedly make requests for every object. What will happen as the number of objects grows? Even if your initial request rate were reasonable, this would suffer problems with scalability. A more scalable solution is one that scales with the number of changes in the system. This will also grow over time, but much more slowly - the number of changes a single user can make per day is relatively constant, but the total number of objects they've created over time grows and grows. So my first piece of advice would be to avoid doing things this way, and instead find a way to detect changes and just act on those. It would be interesting to know why you feel you'll lose capability by taking this approach.
Now, I happen to know that the Asana API does not currently provide you with any friendly mechanism to just detect changes in the system. This is a commonly requested feature and we are looking into it, though I unfortunately cannot promise a delivery date. So you might be left with no choice but to poll our system for now.
As for being polite to the API, many service providers set limits on their API usage to prevent accidental or malicious use of the API from impacting the service to their other customers -- Asana is no exception. Sometimes these limits are published, other times not, and there is no standard limit: it all depends on the service. But it is very thoughtful of you to be curious about service limitations.
That said, 150k requests per day is, for the Asana API, kind of a lot. If all of our API users gave us that much traffic, we might be serving more requests per day than Google Web Search, and we're not quite that scalable yet. :) Technically, sometimes, we might handle requests at that volume from a single user.
If you must poll, try to poll on intervals like 15 minutes. But please do not poll your entire workspace on this time period; it's likely to be too much traffic/data. We're working on trying to provide you with a better solution.
If you do happen to make too many requests of the Asana API, you will get back HTTP status code 429 instead of your desired response; you can read more about that here (https://asana.com/developers/documentation/getting-started/errors).

Getting historical data from Twitter [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
For a research project I would like to get the last 3 months worth of Twitter messages. Technical challenges aside, is this possible? by using some sort of slow polling mechanism to keep the rate limiter at bay?
The Twitter API states "Clients may request up to 3,200 statuses via the page and count parameters for timeline REST API" Are these per hour? Per day? or...ever?
Any suggestions? Would it even be theoretically possible? Did some one do something similar before?
Thanks!
Marco
Twitter notoriously does not make "available" tweets older than three weeks. In some cases you can only get one week. You're better off storing tweets for the next three months. Many rightly doubt if they're even persisted by Twitter.
Are you looking for just any tweets? If so, check out the Streaming API's status/sample method. The streaming API uses persistent HTTP sockets that can be a pain to program, but it's quite graceful when you get it working. I'd recommend setting up a little script to dump tweets from status/sample into a DB. You should have a TON of data after just a few days.
You could use the Search API, don't give it a search, return the maximum of 100 per page, then got through each page twice a minute(120 times an hour - 30 times less than the rate limit). However, if my math is correct, that could possibly give you 720,000 tweets an hour..... the problem is that Twitter has added approximately 1.75 billion tweets over the past 3 months. So if my math is correct, it would take you 2361 days, or 6 years to complete this.
You could ask this question over on the Twitter Development talk on Google Groups, or contact Twitter to get white-listed so you could make up to 20,000 requests an hour.
Personally, I don't think it's possible.
DataSift claims to have a twitter historical data api coming soon, you can signup to be notified when its available here.
This may not have existed when you first asked the question but the "PeopleBrowsr" API is perfect for this and you can go back 1400 days with a single API call: https://developer.peoplebrowsr.com/pb
Hope that helps!
Keyhole can get you historical tweets in xls or present them in a visual dashboard. The preview samples only a few most recent tweets, however, you can request historical data if you email them.
See: http://keyhole.co/conversation_tracking
You can read the twitter historic data using Gnip's Historic PowerTrack tool. It will give you access to all twitter data since first tweet and fairly it is very simple tool t use.
You can get free estimates for the data scope and cost using a service built by my company called Sifter. If you decide to purchase access to the data it will be available via our text analytics platform DiscoverText, where you can search, filter, de-duplicate, cluster, human code, and machine-classify the data.