Performance testing Twitter Streaming API consumer

Performance testing Twitter Streaming API consumer - api

I have a service that consumes twitter posts in realtime using the Twitter Streaming API.
I have built a background process which connects to the stream and pushes tweet into Redis. This is built with node.js
What I need to do is to figure out what the maximum number of tweets this process can consume. I need to performance test this setup.
What is the best way to test this?
I need to know:
how many tweets it can handle before it falls over
what happens when the process can't handle any more tweets
Another reason why I would want to do this is to work out whether its worth using node.js at all.
I would prefer to write it with EventMachine instead.

Since you're inherently limited by the frequency and volume of tweets coming from the Twitter Streaming API, what you're actually interested in benchmarking is the I/O performance of your background process with respect to Redis.
Mock the tweets and generate pseudo-tweets or collect a significant sampling of actual tweets and use this data set in your benchmarking. After mocking/generating this data set, you can precisely write your benchmark against this. For example, data set in hand, you could push the entirety of this data set all at once into your new tweet event handling logic, or simulate peaks and valleys of activity.
The point being, when benchmarking, identify and isolate the desired variable (number of tweets), use a standardized sample, and mock away inconsistent and outside behavior (API limits, variable tweet/sec rate).

I would suggest to create custom client simulating Twitter Stream API. The client can generate tweets for your application to consume. We can use a load testing tool that supports custom scripts to run this twitter script from distributed machines to generated the desired load. While the tweets are being generated you can monitor the health of the system to measure the impact of tweet throughput on your application.

Related

Callback or real-time communication for social table api

I am evaluating social table API to see if there's any way to get notified when data in the social tablet side changes so that we can sync the data in real time. I can't find anything on call back or long running operations. Does that mean polling is the only option?

There is no real time API for Social Tables. This means that you're correct in that polling is the only option for keeping data in sync.

Podio API limit

I am working on one product which is fetching all the organization/workspace and app details of the customer. The customer can refresh them any time.
So let’s say I have one customer who has 100 applications across multiple workspaces so around it is making around 110 calls to get each application detail, workspace details and organizations.
Now if that customer refreshed the applications multiple times like 10 times in an hour so the action only for that is 1000 API calls. If I have 50 such users active and doing this thing then it will be something 50000.
AFAIK I can not make so many API calls in an hour so how to handle this scenario. I know a lot of applications are doing such things so want to understand how everyone is handling this.

If you need a higher rate limit, I would encourage you to contact Podio support and ask specifically for what you need. We have internal guidelines for evaluating these kinds of requests and may increase the limit for your user and client ID if appropriate.
In general, though, I would expect your app to implement some kind of batching, transient storage, and/or caching layers, especially if your customers are interacting with Podio exclusively or primarily through your system.
Please see our official statement here: https://developers.podio.com/index/limits
Summary:
The general limit is 5,000 API calls per hour, but if the API call is marked as "Rate limited" in the API reference the call is deemed resource intensive and a lower rate of 1000 calls per hour is enforced. If you hit the rate limits the API will begin returning 420 HTTP error codes for all API calls. Rate limits are per user per API key.
Contacting support:
If you have a project that requires a higher rate limit contact support#podio.com with a brief description of your project, your estimated usage and the client_id of the API key you are using.
Usage tips:
Tips for reducing API usage
Avoid making API requests inside loops. Instead of fetching individual objects inside a loop, fetch a collection of objects in one API operation. E.g. filter items
Cache results whenever possible. This is especially true when you are displaying data to the public (i.e. every sees the same output).
Don't poll for changes. Instead of polling Podio to see if your content has changed use webhooks or push to receive a notification. This might save you thousands of requests: https://developers.podio.com/doc/hooks
Use logging to see how many requests you're making
Bundle responses with "fields" parameter

You might want to build an API proxy app; you would need a messaging queue and a rate limiter. This would lets you keep track of the api calls consumptions across apps and users.
Also worth noting: some API routes are more expensive than others if they are more resource intensive on the Podio side… The term in use is rate-limited: rate limited api route are bound to 1k calls an hours, so in effect costs 5 times as much as regular routes.
Hope this helps!

Google plus determine changes in network

I am trying to determine changes in the Google+ network in an efficient manner (profile changes). My first idea was to use the eTags of the People.List and People.Get. My assumption was that the eTag in the List (person) would be the same as the one in the Get. This is not the case!
I rather not want to get the details of all the people in the network and checking the eTag for each of them. I will run out of daily api-calls very quickly using that scenario.
Are there any other ways of determining the changes in the network?
Thanks!

I'm not aware of a way to notify your service when changes occur on a user's profile. I don't think that etags will work for what you are trying to do and the client libraries should already be using the etags to manage any query caching. You can perform a few tricks to make queries lighter on your backend though:
Batch API calls
Use a fields filter to just only get the data that matters for your application
If you are running out of quota, you can also request to have your limits raised from the Google APIs console by clicking the Quotas link on the left. The developer relations team from Google+ checks the request regularly and will raise your quota limits if your usage justifies it.

"reasonable" use of web APIs to sync data

My goal is to synchronize a web-application with an internal database. The web-application has a public API, but in order to fully synchronize the two sources I would need to make around 2000 separate API calls every time. My instinct tells me that this is excessive and possibly irresponsible, but I lack the experience to know for sure.
In this particular case the web-application is Asana, but I've encountered similar situations before with other services. Is there any way to know if you're abusing a service through excessive API calls? I know I'm not going to DOS a company like Asana, but I can't shake the feeling that there must be a better way than making ~150k requests per day.
The only other option I can think of is to update the web-service only when I know there's been a change in the database, but I'll lose a lot of capability that way.
I apologize for the subjectivity of this question, but I'm really hoping that someone can explain if there's any kind of etiquette that's expected when using public APIs.

(I work at Asana)
This is an excellent question, or rather set of questions.
You are designing a system that will repeatedly make requests for every object. What will happen as the number of objects grows? Even if your initial request rate were reasonable, this would suffer problems with scalability. A more scalable solution is one that scales with the number of changes in the system. This will also grow over time, but much more slowly - the number of changes a single user can make per day is relatively constant, but the total number of objects they've created over time grows and grows. So my first piece of advice would be to avoid doing things this way, and instead find a way to detect changes and just act on those. It would be interesting to know why you feel you'll lose capability by taking this approach.
Now, I happen to know that the Asana API does not currently provide you with any friendly mechanism to just detect changes in the system. This is a commonly requested feature and we are looking into it, though I unfortunately cannot promise a delivery date. So you might be left with no choice but to poll our system for now.
As for being polite to the API, many service providers set limits on their API usage to prevent accidental or malicious use of the API from impacting the service to their other customers -- Asana is no exception. Sometimes these limits are published, other times not, and there is no standard limit: it all depends on the service. But it is very thoughtful of you to be curious about service limitations.
That said, 150k requests per day is, for the Asana API, kind of a lot. If all of our API users gave us that much traffic, we might be serving more requests per day than Google Web Search, and we're not quite that scalable yet. :) Technically, sometimes, we might handle requests at that volume from a single user.
If you must poll, try to poll on intervals like 15 minutes. But please do not poll your entire workspace on this time period; it's likely to be too much traffic/data. We're working on trying to provide you with a better solution.
If you do happen to make too many requests of the Asana API, you will get back HTTP status code 429 instead of your desired response; you can read more about that here (https://asana.com/developers/documentation/getting-started/errors).

API, building an API but giving priority access for certain requests

Not sure how others have addressed this, but generally speaking what is the best practice for giving your own apps priority treatment when it comes to using one of your own public APIs?

Use Cache Priority
Caching responses or interim calculations in RAM is typically the first optimization point because caching is easier than micro optimizing all your code. Controlling what goes into the cache and how long it stays presents a top level place to apply "priority treatment".
I like the cache management approach better than thread priority because if you are under load delaying the execution of a request often creates complex thread pool problems and decreases overall server throughput.
Caching Based on Load (rather than on app ownership) will Expand the Resource Pie
We take the ram cache priority approach with MapLarge Tile Server and Geocoding API. However, we don't actually give our own apps priority, instead we base priority on request frequency and time required to render a response. Unless you have large numbers of low value api users, I would recommend doing something similar because this approach should reduce overall load and enables the server to handle more api requests.
I recently wrote a white paper that highlights the different load profiles of cached and non cached responses in a multi tenant api environment. You can see it here:
http://maplarge.com/Tile-Server-Performance
API Policies can drive revenue
If you have free or low paying users who are generating massive load you might want to review your business plan and consider instituting account based rate limits that match user revenue to server costs in a scalable way. If you do limit API users I would recommend having explicit and predictable policies so they can project usage and know when to purchase an API account upgrade.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas