scrape a user's entire tweets - api

I'd like to pull all of a user's tweets. I could do this the hard way (manually scraping twitter) or the easy way: using their api. The problem with the easy (api) way is that I seem to be limited to the 200 most recent tweets. What's a simple way to get all tweets?
Thanks

Yes you can get up to 3,200 historical tweets by requesting as follows...
Make a request to:
http://api.twitter.com/1/statuses/user_timeline.format
And use the count parameter 200 and iterate through the page parameter from page 1 to 16 or until there are no more tweets.
Thats the only thing you can currently do because Twitter specifically say they prevent this in their API Doc...
https://apiwiki.twitter.com/Things-Every-Developer-Should-Know#6Therearepaginationlimits
I would add, please don't screen-scrape because it will cause undue load on Twitter and in bulk requests it would probably get your server blocked from accessing Twitter.

you can make sure you get all future tweets by subscribing to your Twitter RSS feed with Google Reader. Then you can use their infinite scrolling feature to look back to the first tweet tracked.

Related

instagram retrieve hashtag images - update June 1

I am aware of the update to Instagram apis. I have read through the documentation regarding fetching hashtag images. I'm confused regarding 2 points -
They have a section "Endpoints", which gives the url for fetching images using tags - https://api.instagram.com/v1/tags/{tag-name}?access_token=ACCESS-TOKEN
At the same time, when i try to submit for review (under Permissions Review section), in order to get access token, i get this message -
"This use case is not supported. We do not approve the public_content permission for one-off projects such as displaying hashtag based content on your website. As alternative solution, you can show your own Instagram content, or find a company that offers this type of service (content discover, moderation, and display)."
The 2nd point makes me believe that Instagram has stopped sharing hashtag images to apis, at the same time i can find a lot of widgets still fetching hashtag images. How do they do that? Can anyone point me in the right direction?
The 2nd point makes me believe that Instagram has stopped sharing hashtag images to apis,
Correct. Instagram has made business decision to block most developers from accessing this content.
at the same time i can find a lot of widgets still fetching hashtag images.
This doesn't tell you much. They might have gotten their app approved for other purposes. Also it appears that Instagram has made some exceptions for big apps (like Tinder). Life is not fair.
How do they do that? Can anyone point me in the right direction?
You probably cannot. 99% of the use cases are not allowed and so they will reject your app if you try to submit it. Read this short article about what you can and cannot do with the new Instagram API
The other widgets you are talking about probably have presented Instagram with one of the valid use cases to fetch the data. They are able to get only the public content. This new restriction is probably a business decision. If you would still want to get the data you are looking for, you shopuld possibly go to a third party data provider who sell such data

Twitter API most tweeted

I understand the Twitter API is only getting more restricted, but is it possible to use the search API to find the most tweeted links within a given time period?
That is currently not possible with the Twitter API. It is possible with Gnip (disclosure I work for Gnip), but to figure that out would be very costly. In essence what you would need to do is get every Tweet with a link in it and then from that dataset determine the most Tweeted links.

Retrieve Steam activity feed?

Has anyone found a way to retrieve the activity feed on Steam for a specific user to post on a website similar to tweets? I'm adding an activity feed to my website, but really the only thing I'm most active in is Steam, so it will get stale pretty quickly without Steam in there. I've looked at the web API, but it doesn't specify if I can grab my full feed and post it or if I can just grab certain stats for specific games or not. I've tried to find an RSS feed for my activity but had no luck so far, that would definitely be the preferable format.
I've just looked at this and you can't grab the web feed directly from the site on RSS or JSON. The Web API is meant for developers of Steam applications so that they can get at the player information. For that, you need an API Key, which is provided by Steam. It is not a casual web interface like you might find on Google.
The nasty solution is to HTML scrape the page. I used Yahoo! Pipes to scrape the page (and automatically update) but ultimately decided that was entirely too dirty as it assumes that the Steam pages won't change.
A bit too late, but I'm also searching for this kind of RSS feed. I think I will end up creating a RSS bot to parse the AJAX response used to fetch the activity feed:
http://steamcommunity.com/id/[your username]/ajaxgetusernews
This URL doesn't work out of the box, I think we have to pass some cookies to get access to this page to make Steam think the bot is logged in as a normal user. It returns the HTML markup used to render the activity feed, and a URL to fetch the next batch of activities.
Be advised that this HTML markup is hard to parse because it is inconsistent.

Monitor twitter hashtags in realtime with common search api?

I've been working with the twitter search api, retrieving tweets with a php script run by a cron job, 3 or 4 times per hour.
All works fine, I can save some fields from the resulting tweets into mySQL for doing some research, contests, and accounting.
I begun experiencing some "trouble" some days ago when some hashtag hit Global Trending Topic, and the saved tweets werent't reflecting the real quantity of tweets We could see through search, etc.
So:
1- Should I use instead the twitter Streaming API?
2- Should I contact api AT twitter.com and request special permissions for my app or username?
3- Finally, is there a working way to acchieve this "realtime" monitoring script that can give more accurate and real results?
Thanks a lot in advance
Got a reply from twitter api staff...
It seems I should use STREAMING API, and they point me to this url
https://dev.twitter.com/docs/streaming-api/methods#track
Hope it is useful for others

How often to run the cron, to mine twitter public timeline?

The webapps that depend on the public timeline of twitter, how often do they collect the data? There must be hundreds of thousands of messages every minute, correct? How do they manage to collect all the tweets, without missing any of them?
Some services (Friendfeed is a good example) are granted access to the Twitter Streaming API, aka the 'firehose'. It requires approval and a written agreement.
The publictimeline is not a great place to mine data anymore. Twitter now uses its Streaming APIs to output tweets like crazy. The closest comparison to the publictimeline would be the spritzer method, but that only includes a small sample. If you need to gather all (or more) tweets than the spritzer method, you'll need to sign a written agreement to get access to other Streaming API (HTTP push) feeds, such as the firehose feed, which returns all public tweets.
The twitter API is rate limited, as has been said. The public timeline (twitter.com/public_timeline) is not rate limited in the same sense, but it is only updated every 5 seconds, so most tweets never appear there.
There are I think three or four companies that have access to the firehose, as Twitter's full feed is called. FriendFeed is one of these. Another is Gnip. Gnip resells the feed to other companies. This is probably the only feasible way to get a full twitter feed.
Go here:
http://twitter.com/help/request_whitelisting
and get your account white-listed (allows 20,000 per hour) if 100 requests per hour isn't enough.
#ceejayoz its not 100 GET requests its 100 requests in general excluding a few requests like verify_credentials and rate_limit_status.