How many percent of the tweets does twitter sample API give? - api

Does anyone know what's the ratio between the number of tweets we get from twitter sample API over the total number of tweets which the Twitter server receives? I am doing some analysis based on the data read from the sample API, and would like to estimate the actual workload handled by Twitter server. I observed that the number of tweets we get from the API varies over time. So, I presume it is something like percentage sample. Any clue is highly appreciated.
Thanks

The sample stream /statuses/sample does return roughly 1% of all tweets. Twitter samples the tweets by delivering only tweets created within a 10-millisecond window out of the 1,000 milliseconds in every second. If you want more details, you can read my blog post: http://blog.falcondai.com/2013/06/666-and-how-twitter-samples-tweets-in.html

When Twitter Spritzer (basically the old-fashioned Streaming API) was launched, it was supposedly about 1-2% of all tweets. Based on my use of the current Streaming API, I'd be surprised if it was any more than 1% right now, and possibly less. According to the docs, the "Twitter streaming volume is not constant," but they neglect to mention if the volume outputted by the API is proportional to the rate of actual tweets.

On 2 February 2015 Twitter announced intent to reset the streaming API sample rate to 1% (it had crept higher unintentionally):
The public Streaming API sample endpoints (aka POST statuses/filter and GET statuses/sample) are intended to be levelled at approximately 1% of the public Tweet volumes at any time.
Due to some past inconsistencies in configuration, there have been periods of time where the volumes of Tweets delivered via the Streaming API may have exceeded these parameters.
This notice is to indicate that over the next couple of weeks, we will be making changes to the public Streaming API to rebalance the volume of Tweets at the 1% capacity that was intended.
This plot shows the effect of the reset on a typical tweet stream.

This is something I found at
https://brightplanet.com/2013/06/25/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/. I hope you find this useful.
Studies have estimated that using Twitter’s Streaming API users can
expect to receive anywhere from 1% of the tweets to over 40% of tweets
in near real-time.
There are references to the studies they have cited at the bottom of the webpage.

Related

Get the most recent tweet on Twitter overall

I'm creating an application using the Twitter API and want to get the last Tweet tweeted. Not the last Tweet from people I follow or something like this, the actual last Tweet from all around the world in that moment. Is there a possible way for that?
Not really possible. Best you can do is get the latest tweet from an approximately one percent sampling of all tweets.
I think the only way you'd be able to do that accurately would be to consume the Twitter firehose. As an alternative, you could use the statuses/sample realtime API to get Tweets from 1% of the current volume of the firehose. Another option would be to use statuses/filter with a geofence over the whole planet, but again this would just be a 1% dataset.

Measure how hot a topic is on Twitter

What kind of service should I use to measure how hot a topic is on Twitter, and how hot it has been in the past?
I thought about:
The Twitter API (https://dev.twitter.com/rest/reference/get/search/tweets) that lets me run searches up to 100 tweets. So in this case I have to make multiple calls to determine how many tweets there are. Is that correct?
TweetReach, that gives reports like this: https://tweetreach.com/reports/16000571, but the cheapest plan is at 300$/month.
With the Twitter API, you have a few options, but none of them may be exactly what you want, and none of them can go back very far into the past. You would have to either compile that information yourself, or use an external service like the one you mentioned.
Using the search API, you can only get results from the past 7 days, and are limited to 100 tweets per request. You can also set result_type to popular to get the most popular tweets about that search term. Twitter does have rate limits, but the ones for search are relatively high. You can use 180 requests every 15 minutes for any user you have authenticated, plus 450 requests every 15 minutes for the app itself (completely separate from the user requests). So if you only use app requests, you can get 45,000 tweets every 15 minutes.
If you don't need to search for specific terms, you can get trending topics in different areas using trends. The available areas can be retrieved using trends/available. Searching for trends also gives you the tweet_volume of each trend over the past 24 hours. If you check the trends every 24 hours and save the volumes, you can build up histories of trending topics.
Another option is using the streaming api. This only gives you current tweets, but you can use track to only get results for a set of terms, which you can then analyze.
Any external service, like TweetReach, will probably either cost you money or strictly limit the amount you can do with it unless you pay.
I'm the Social Media Manager for Union Metrics (we make TweetReach and lots of other things) and I just wanted to let you know that our free snapshots are built on the Search API which gives it those restrictions you've already discussed above, while our full snapshot reports can grab up to 1500 tweets for $20.
We do have more comprehensive Twitter analytics which I think you've already looked at, and those do backfill 30 days before tracking going forward. However you might have missed our new product Echo, which allows for a full, interactive search of the entire Twitter archive (you can see it in action here https://unionmetrics.com/product/echo-twitter-archive-search/) and is available through our Social Suite.
I understand if you don't have a large budget, and I completely understand the dilemma of cost of your time to build what you need vs. budget restrictions. Hope this helps at least let you know what else we offer!
Sarah A. Parker
Social Media Manager | Union Metrics
Fine Makers of TweetReach, The Union Metrics Social Suite, and more

Twitter Streaming API limits?

I understand the Twitter REST API has strict request limits (few hundred times per 15 minutes), and that the streaming API is sometimes better for retrieving live data.
My question is, what exactly are the streaming API limits? Twitter references a percentage on their docs, but not a specific amount. Any insight is greatly appreciated.
What I'm trying to do:
Simple page for me to view the latest tweet (& date / time it was posted) from ~1000 twitter users. It seems I would rapidly hit the limit using the REST API, so would the streaming API be required for this application?
You should be fine using the Streaming API, unless those ~1000 users combined are tweeting more than (very) roughly 60 tweets per second at any moment.
Using the Streaming API endpoint statuses/filter with the follow parameter, you are allowed up to 5000 users. There is no rate limit except when the stream returns more than about 1% of the all tweets being tweeted at that moment. (60 tweets per second is 1% of the average rate of tweets, which is always fluctuating, so don't rely on that number.)
If your stream does go above the 1% threshold, you can detect this. (See the LIMIT notice.) Then you would use the REST API to find missed tweets.
Twitter simply will not allow multiple streams from one registered app/account. Doing so will result in the older one being closed.
Also too many connection tries are not allowed as well and will result in a user being blocked.
Reference docs: Public Streaming API (outdated)

Tumblr API call or request limits

Anybody know if there is any API call limits per second, hour or day for Tumblr API? It seems to me the limits do exist when I make a lot of api calls in a short period via OAuth. However, I couldn't find any document on Tumblr API website or on Google. Many thanks.
I have been using Tumblr API for about 2 years now, and I must admit that "Rate Limit Exceeded" issue has no deterministic and, more important, officially confirmed answer.
In Tumblr's API Agreement you may find some reference to limitations under section "Respect for Limitations" which says
In addition, you will comply with any limitations imposed by Tumblr on the frequency of access, calls and use of the Tumblr API and Tumblr Firehose
We ask that you respect these limitations, as well as any rate limits that we may place on actions, which are designed to protect our systems
Notes:
There is a special Tumblr tagged blog "rate-limit-exceeded" dedicated to this. However, it does not say much about number of request per period of time that a reported person used when facing this problem.
For example here you can find avg 1000 requests per minute to be the limit.
As for my application the request rate is approximately 1 request per second. The application runs for about a year already in 24/7 manner. There were several times though this issue occurred to me even with this relatively low rate. However, I consider the failure rate to be insignificant.
From: https://www.tumblr.com/oauth/apps
Newly registered consumers are rate limited to 1,000 requests per hour, and 5,000 requests per day.
If you go to that link it looks like you can get the rate limit removed if you ask nicely! :)

How often to run the cron, to mine twitter public timeline?

The webapps that depend on the public timeline of twitter, how often do they collect the data? There must be hundreds of thousands of messages every minute, correct? How do they manage to collect all the tweets, without missing any of them?
Some services (Friendfeed is a good example) are granted access to the Twitter Streaming API, aka the 'firehose'. It requires approval and a written agreement.
The publictimeline is not a great place to mine data anymore. Twitter now uses its Streaming APIs to output tweets like crazy. The closest comparison to the publictimeline would be the spritzer method, but that only includes a small sample. If you need to gather all (or more) tweets than the spritzer method, you'll need to sign a written agreement to get access to other Streaming API (HTTP push) feeds, such as the firehose feed, which returns all public tweets.
The twitter API is rate limited, as has been said. The public timeline (twitter.com/public_timeline) is not rate limited in the same sense, but it is only updated every 5 seconds, so most tweets never appear there.
There are I think three or four companies that have access to the firehose, as Twitter's full feed is called. FriendFeed is one of these. Another is Gnip. Gnip resells the feed to other companies. This is probably the only feasible way to get a full twitter feed.
Go here:
http://twitter.com/help/request_whitelisting
and get your account white-listed (allows 20,000 per hour) if 100 requests per hour isn't enough.
#ceejayoz its not 100 GET requests its 100 requests in general excluding a few requests like verify_credentials and rate_limit_status.