We use the twitter user_timeline api to get the last 200 tweets for a set of twitter accounts.
I noticed a couple of weird issues
A few tweets arrived to the system hours after their actual creation time. Meaning, a person tweets, an hour later we run the user_timeline api for the user, we don't see the tweet, 8 hours later we run the timeline, and we receive the tweet. Does this mean it might take twitter hours sometime to index a tweet and make available for the timeline api
Sometimes the user statuses_count decreases with every new tweet for a specific account. for example, the first tweet has the statuses_count = 100, then next tweet which was tweeted after the first has statuses_count = 99. Is this because the user deleted some tweets? Is the statuses_count reliable?
Thanks
The Twitter API is eventually consistent, so I would theorise that for the timelines call, what could be happening is that there is some data center synchronisation going on behind the scenes and that you might be hitting an older copy of the data at the time of the call. It could also be because of some local caching, but it's not clear from the question how you've built your system. In most cases where I've seen an issue like this, that would be my guess as what is going on. If you want to get Tweets in more real-time, that's what the streaming API is optimized for - the REST API works differently.
On the second question, there's again a small chance that this is a consistency issue, or it could indeed be due to Tweet deletion. The different elements of the Tweet object (user object, media info, links etc) are hydrated from different systems, so they may just be momentarily out of sync, or, Tweets may have been deleted.
Related
Is it possible to create multiple API keys for the YouTube Data API?
The majority of Live YouTube Subscriber Counters use loads of different API keys for their counters (as can be seen in their JavaScript code).
The aim of doing so is to not exceed the daily quota limit of 1,000,000 and having to send requests every few seconds per page visited would mean that the limit would be reached very quickly.
How are they able to get away with this?
Here is a SO post to answer your question.
Technically you can run your application using different API Keys it
should work fine. Technically there is nothing wrong with creating
additional projects on Google Developer console. You don't need to go
as far as creating another Google account.
I understand the Twitter REST API has strict request limits (few hundred times per 15 minutes), and that the streaming API is sometimes better for retrieving live data.
My question is, what exactly are the streaming API limits? Twitter references a percentage on their docs, but not a specific amount. Any insight is greatly appreciated.
What I'm trying to do:
Simple page for me to view the latest tweet (& date / time it was posted) from ~1000 twitter users. It seems I would rapidly hit the limit using the REST API, so would the streaming API be required for this application?
You should be fine using the Streaming API, unless those ~1000 users combined are tweeting more than (very) roughly 60 tweets per second at any moment.
Using the Streaming API endpoint statuses/filter with the follow parameter, you are allowed up to 5000 users. There is no rate limit except when the stream returns more than about 1% of the all tweets being tweeted at that moment. (60 tweets per second is 1% of the average rate of tweets, which is always fluctuating, so don't rely on that number.)
If your stream does go above the 1% threshold, you can detect this. (See the LIMIT notice.) Then you would use the REST API to find missed tweets.
Twitter simply will not allow multiple streams from one registered app/account. Doing so will result in the older one being closed.
Also too many connection tries are not allowed as well and will result in a user being blocked.
Reference docs: Public Streaming API (outdated)
I've been working with the twitter search api, retrieving tweets with a php script run by a cron job, 3 or 4 times per hour.
All works fine, I can save some fields from the resulting tweets into mySQL for doing some research, contests, and accounting.
I begun experiencing some "trouble" some days ago when some hashtag hit Global Trending Topic, and the saved tweets werent't reflecting the real quantity of tweets We could see through search, etc.
So:
1- Should I use instead the twitter Streaming API?
2- Should I contact api AT twitter.com and request special permissions for my app or username?
3- Finally, is there a working way to acchieve this "realtime" monitoring script that can give more accurate and real results?
Thanks a lot in advance
Got a reply from twitter api staff...
It seems I should use STREAMING API, and they point me to this url
https://dev.twitter.com/docs/streaming-api/methods#track
Hope it is useful for others
I'd like to pull all of a user's tweets. I could do this the hard way (manually scraping twitter) or the easy way: using their api. The problem with the easy (api) way is that I seem to be limited to the 200 most recent tweets. What's a simple way to get all tweets?
Thanks
Yes you can get up to 3,200 historical tweets by requesting as follows...
Make a request to:
http://api.twitter.com/1/statuses/user_timeline.format
And use the count parameter 200 and iterate through the page parameter from page 1 to 16 or until there are no more tweets.
Thats the only thing you can currently do because Twitter specifically say they prevent this in their API Doc...
https://apiwiki.twitter.com/Things-Every-Developer-Should-Know#6Therearepaginationlimits
I would add, please don't screen-scrape because it will cause undue load on Twitter and in bulk requests it would probably get your server blocked from accessing Twitter.
you can make sure you get all future tweets by subscribing to your Twitter RSS feed with Google Reader. Then you can use their infinite scrolling feature to look back to the first tweet tracked.
The webapps that depend on the public timeline of twitter, how often do they collect the data? There must be hundreds of thousands of messages every minute, correct? How do they manage to collect all the tweets, without missing any of them?
Some services (Friendfeed is a good example) are granted access to the Twitter Streaming API, aka the 'firehose'. It requires approval and a written agreement.
The publictimeline is not a great place to mine data anymore. Twitter now uses its Streaming APIs to output tweets like crazy. The closest comparison to the publictimeline would be the spritzer method, but that only includes a small sample. If you need to gather all (or more) tweets than the spritzer method, you'll need to sign a written agreement to get access to other Streaming API (HTTP push) feeds, such as the firehose feed, which returns all public tweets.
The twitter API is rate limited, as has been said. The public timeline (twitter.com/public_timeline) is not rate limited in the same sense, but it is only updated every 5 seconds, so most tweets never appear there.
There are I think three or four companies that have access to the firehose, as Twitter's full feed is called. FriendFeed is one of these. Another is Gnip. Gnip resells the feed to other companies. This is probably the only feasible way to get a full twitter feed.
Go here:
http://twitter.com/help/request_whitelisting
and get your account white-listed (allows 20,000 per hour) if 100 requests per hour isn't enough.
#ceejayoz its not 100 GET requests its 100 requests in general excluding a few requests like verify_credentials and rate_limit_status.