What is a good strategy for polling Microsoft Graph API so that I don't get throttled? - api

My goal is to maintain "real-time" (or as close as possible) information about the email messages sent by a group of users.
My ideal solution would be to periodically query the API for messages by all users in the group. This feature is not (yet?) implemented.
My second choice would be to create subscriptions (https://graph.microsoft.io/en-us/docs/api-reference/v1.0/api/subscription_post_subscriptions) for every member in the group and then request message information after I become aware of an event. The problem is, in practice, I am only allowed to create 20-30 simultaneous subscriptions (Issues to use Webhook for Microsoft Graph API), which might not be enough.
So I'm stuck with polling all the users in a cycle. The main problem with this approach is I can't find any information how many request are "too many", ie I get throttled. I want to maximize the number of requests to minimize the time it takes to get through one cycle.
A solution that comes to mind is to develop an adaptive program that slowly decreases the time between requests until throttled, then abruptly adds some time to it until a nice balance is found and maintained. This seems like a lot of work though. Right now I'm working on the assumption that 1/second is about the highest I can safely go (0.5 seconds on average per round trip, then a cool down of another 0.5 seconds).
What is the best way to deal with an unknown throttling limit in general, and Microsoft Graph in particular?
Edit:
While I think the accepted answer is a good response, it might not be suitable for all cases. For instance, if you don't want to use the 365 API, and you don't mind using beta features, perhaps you might check out this (delta tokens), which seem to be designed for real-time syncing with the data.
The only potential downside with the accepted answer is that you still need a subscription for each user you want to track (I think...), and there are limits on those. Very curious as to how other people tackled this problem.

While still in Preview, you may want to take a look at Outlook Streaming Notifications. These APIs provide a more robust notification model than simple web hooks. You would still need to establish multiple subscriptions but I expect you'll see far less latency.

Related

YouTube API Services Compliance Review

I have a project where I need to have the API quota increased significantly from the 10,000 daily hits, and I think this is being processed by Google as part of a YouTube API Services Compliance Review.
However, I have not had any response in over a week and the delay is putting the project at risk of a delayed launch and additional costs.
Does anyone know if this is normal and if there is a way to expedite the review, or speak to someone? Even pay for a higher tier of support?
Thanks in advance.
If you’ve filled the audit form https://support.google.com/youtube/contact/yt_api_form?hl=en properly, you should get a response within two weeks (Google reviews thousands of these, among other things to prevent abuse this is one of the processes that isn’t fully automated).
I recommend if your in a rush since your paying for credits you might as well open a second account and load balance between two or even three accounts; in your code you can create counters and swap before capping out the 24 hour term; not sure what data you’re looking to extract but depends on what data you may be able to even use other services to supplement.
They will get back to you about your application; just requires massive patience.

API call request limit

I have been looking into various different APIs which can provide my the weather data I need in JSON format. A lot of these API's have certain limits such as: in order to get more requests per minute, you need to pay more money per month so that your app can make more API requests.
However, a lot of these API's also have free account which five you limited access to them.
So what I was thinking is, wouldn't it be possible for a developer to just make lots of different developer accounts with an API provider and then just make lots of different API keys?
That way, they wouldn't have to pay anything as they could stick with the free accounts. Whenever one of the API keys has reached the maximum daily request calls, the developer could just put a switch statement in their code which gets their software to use a different API key.
I see no reason why this wouldn't work from a technical point of view... but, is such a thing allowed?
Thanks, Dan.
This would technically be possible, and it happens.
It is also probably against the service's terms, a good reason for the service to ban all your sock puppet accounts, and perhaps even illegal.
If the service that offers the API has spent time and money implementing a per-developer limit for their API, they have almost certainly enforced that in their terms of service, and you would be wise to respect those.
(relevant xkcd)

"reasonable" use of web APIs to sync data

My goal is to synchronize a web-application with an internal database. The web-application has a public API, but in order to fully synchronize the two sources I would need to make around 2000 separate API calls every time. My instinct tells me that this is excessive and possibly irresponsible, but I lack the experience to know for sure.
In this particular case the web-application is Asana, but I've encountered similar situations before with other services. Is there any way to know if you're abusing a service through excessive API calls? I know I'm not going to DOS a company like Asana, but I can't shake the feeling that there must be a better way than making ~150k requests per day.
The only other option I can think of is to update the web-service only when I know there's been a change in the database, but I'll lose a lot of capability that way.
I apologize for the subjectivity of this question, but I'm really hoping that someone can explain if there's any kind of etiquette that's expected when using public APIs.
(I work at Asana)
This is an excellent question, or rather set of questions.
You are designing a system that will repeatedly make requests for every object. What will happen as the number of objects grows? Even if your initial request rate were reasonable, this would suffer problems with scalability. A more scalable solution is one that scales with the number of changes in the system. This will also grow over time, but much more slowly - the number of changes a single user can make per day is relatively constant, but the total number of objects they've created over time grows and grows. So my first piece of advice would be to avoid doing things this way, and instead find a way to detect changes and just act on those. It would be interesting to know why you feel you'll lose capability by taking this approach.
Now, I happen to know that the Asana API does not currently provide you with any friendly mechanism to just detect changes in the system. This is a commonly requested feature and we are looking into it, though I unfortunately cannot promise a delivery date. So you might be left with no choice but to poll our system for now.
As for being polite to the API, many service providers set limits on their API usage to prevent accidental or malicious use of the API from impacting the service to their other customers -- Asana is no exception. Sometimes these limits are published, other times not, and there is no standard limit: it all depends on the service. But it is very thoughtful of you to be curious about service limitations.
That said, 150k requests per day is, for the Asana API, kind of a lot. If all of our API users gave us that much traffic, we might be serving more requests per day than Google Web Search, and we're not quite that scalable yet. :) Technically, sometimes, we might handle requests at that volume from a single user.
If you must poll, try to poll on intervals like 15 minutes. But please do not poll your entire workspace on this time period; it's likely to be too much traffic/data. We're working on trying to provide you with a better solution.
If you do happen to make too many requests of the Asana API, you will get back HTTP status code 429 instead of your desired response; you can read more about that here (https://asana.com/developers/documentation/getting-started/errors).

Rate Limiting to prevent against DOS attack (Heroku)

I have 2 POST, 1 PUT and 1 DELETE API in my application. My application is deployed on Heroku. I want to rate limit these APIs but only to prevent against a DOS attack or in case someone by mistake calls API in infinite loop. What's the ideal rate limit for this scenario. E.g. x per minute, y per hour. What are the idea numbers for x and y?
One way to do it is to drop in the 3scale API plugin (https://github.com/3scale/3scale_ws_api_for_ruby) and this enforces rate limits, does analytics etc. using external infrastructure.
That way you can rate limit individual users and have different quotas for each one (plus all the signup etc.).
It won't strictly prevent true DOS because even unauthed request will still reach you - but it'll stop them going down into your stack + cut off the people who are doing the damage first.
This really depends on your app.
1) It depends on how "heavy" your app is:
If each of your requests are processor-heavy and/or hits your database a lot, then you'll want to set a low limit, because each request is "expensive". If your app is fast and efficient, then you have more room to maneuver and you can get set a higher API limit.
2) It depends on the use cases of your app.
Does your app require a lot of API hits to be useable? How does an API limit affect the features your app provides?
3) It depends on how many processors you have.
Heroku allows you to scale your app by setting web dynos and worker dynos. The more you have, the better you'll be able offer high API limits.
In any case, if you want to prevent someone from DOS-ing you or calling your API in an infinite loop, this is the wrong approach to take because an API limit affects the good guys as well as the bad guys. A better way is to detect the bad behaviour and respond appropriately (i.e. deny the offending IP address for a limited time).

How do "professional" IM bots avoid being kicked off line or locked out?

I'm looking to develop a scalable IM bot (aka Automated Service Agent). It's been done before and I'm wondering what methods are used to maintain reliability. I see two immediate problems with scaling:
1) On AIM, you can be kicked off if too many users warn you. My bot does not spam or do anything malicious but the vulnerability is still there.
2) If there are network problems, and the bot signs on/off too many times in a row, AOL will lock it out for an unknown period of time.
Here are some preventative measures for detection:
A bot can use several user accounts, so that its activity is less likely to be detected.
A bot can use proxy servers to hide detection even further by obscuring its real IP address.
A bot can be programmed with the network's rules in mind, and is simply prevented from breaking those rules in its logic.
Also, in response to your first problem, fewer people than you might expect will actually report a problem.
Additionally, and this is purely speculative, depending on the network's rules, it could be possible to simulate enough legitimate activity between two or more bots (and several user accounts), so as to offset the actual reports that are made.
In response to problem number two, with multiple accounts, the bot will just move to the next account when a failure occurs.
Just some thoughts.
Regarding #1, you're dealing with human interaction. If your bot doesn't annoy or piss people off, then I doubt that most people will care. The #1 rule with chat bots (IMHO) is to test it with a number of people from different backgrounds. Record their responses, and how they feel about interacting with the bot. You can also collect good data to improve your bots comprehension skills this way.
Regarding #2, you need to code an effective rate limiter. If there's a small number of flake outs in a short period of time its probably ok to reconnect right away, but if they become more frequent, then you'll need to back off more. This is actually good for the service in general, because if they're experiencing server side problems, and there's a horde of bots pouncing on them when they try to bring things up that's a pain.