Getting historical data from Twitter [closed] - api

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
For a research project I would like to get the last 3 months worth of Twitter messages. Technical challenges aside, is this possible? by using some sort of slow polling mechanism to keep the rate limiter at bay?
The Twitter API states "Clients may request up to 3,200 statuses via the page and count parameters for timeline REST API" Are these per hour? Per day? or...ever?
Any suggestions? Would it even be theoretically possible? Did some one do something similar before?
Thanks!
Marco

Twitter notoriously does not make "available" tweets older than three weeks. In some cases you can only get one week. You're better off storing tweets for the next three months. Many rightly doubt if they're even persisted by Twitter.
Are you looking for just any tweets? If so, check out the Streaming API's status/sample method. The streaming API uses persistent HTTP sockets that can be a pain to program, but it's quite graceful when you get it working. I'd recommend setting up a little script to dump tweets from status/sample into a DB. You should have a TON of data after just a few days.

You could use the Search API, don't give it a search, return the maximum of 100 per page, then got through each page twice a minute(120 times an hour - 30 times less than the rate limit). However, if my math is correct, that could possibly give you 720,000 tweets an hour..... the problem is that Twitter has added approximately 1.75 billion tweets over the past 3 months. So if my math is correct, it would take you 2361 days, or 6 years to complete this.
You could ask this question over on the Twitter Development talk on Google Groups, or contact Twitter to get white-listed so you could make up to 20,000 requests an hour.
Personally, I don't think it's possible.

DataSift claims to have a twitter historical data api coming soon, you can signup to be notified when its available here.

This may not have existed when you first asked the question but the "PeopleBrowsr" API is perfect for this and you can go back 1400 days with a single API call: https://developer.peoplebrowsr.com/pb
Hope that helps!

Keyhole can get you historical tweets in xls or present them in a visual dashboard. The preview samples only a few most recent tweets, however, you can request historical data if you email them.
See: http://keyhole.co/conversation_tracking

You can read the twitter historic data using Gnip's Historic PowerTrack tool. It will give you access to all twitter data since first tweet and fairly it is very simple tool t use.

You can get free estimates for the data scope and cost using a service built by my company called Sifter. If you decide to purchase access to the data it will be available via our text analytics platform DiscoverText, where you can search, filter, de-duplicate, cluster, human code, and machine-classify the data.

Related

Requirements for beginner using twitter api to retweet specific tweets

I am new to twitter api.
I want to search tweets that have 2 specific terms and 1 specific hashtag, and then I want to retweet them in my account for the purpose of consolidating all the tweets.
Do I need to have a developer account?
Should I look to an already existing app (I prefer one that is free or open source), or can I do this with twitter api as a regular user?
Any tutorials or instructions are greatly appreciated. TIA.
I have applied for a developer account, but I don't know how long it will take - I also don't know what the criteria are for being granted one.
I found different kinds of "retweet" applets on ifttt.com - I implemented one of them, and it accomplished what I wanted to achieve, though not perfectly, and there was no documentation to customize functionality, etc.
I couldn't find information anywhere about using twitter API without a developer account, so I applied for that type of account. They emailed approximately 3 times to get more information about my use case, and purposes, intent of use, what I intend to develop, etc. My application was approved within approximately 48 hours.
I will update this answer if there is more information I think might be valuable to share.

Measure how hot a topic is on Twitter

What kind of service should I use to measure how hot a topic is on Twitter, and how hot it has been in the past?
I thought about:
The Twitter API (https://dev.twitter.com/rest/reference/get/search/tweets) that lets me run searches up to 100 tweets. So in this case I have to make multiple calls to determine how many tweets there are. Is that correct?
TweetReach, that gives reports like this: https://tweetreach.com/reports/16000571, but the cheapest plan is at 300$/month.
With the Twitter API, you have a few options, but none of them may be exactly what you want, and none of them can go back very far into the past. You would have to either compile that information yourself, or use an external service like the one you mentioned.
Using the search API, you can only get results from the past 7 days, and are limited to 100 tweets per request. You can also set result_type to popular to get the most popular tweets about that search term. Twitter does have rate limits, but the ones for search are relatively high. You can use 180 requests every 15 minutes for any user you have authenticated, plus 450 requests every 15 minutes for the app itself (completely separate from the user requests). So if you only use app requests, you can get 45,000 tweets every 15 minutes.
If you don't need to search for specific terms, you can get trending topics in different areas using trends. The available areas can be retrieved using trends/available. Searching for trends also gives you the tweet_volume of each trend over the past 24 hours. If you check the trends every 24 hours and save the volumes, you can build up histories of trending topics.
Another option is using the streaming api. This only gives you current tweets, but you can use track to only get results for a set of terms, which you can then analyze.
Any external service, like TweetReach, will probably either cost you money or strictly limit the amount you can do with it unless you pay.
I'm the Social Media Manager for Union Metrics (we make TweetReach and lots of other things) and I just wanted to let you know that our free snapshots are built on the Search API which gives it those restrictions you've already discussed above, while our full snapshot reports can grab up to 1500 tweets for $20.
We do have more comprehensive Twitter analytics which I think you've already looked at, and those do backfill 30 days before tracking going forward. However you might have missed our new product Echo, which allows for a full, interactive search of the entire Twitter archive (you can see it in action here https://unionmetrics.com/product/echo-twitter-archive-search/) and is available through our Social Suite.
I understand if you don't have a large budget, and I completely understand the dilemma of cost of your time to build what you need vs. budget restrictions. Hope this helps at least let you know what else we offer!
Sarah A. Parker
Social Media Manager | Union Metrics
Fine Makers of TweetReach, The Union Metrics Social Suite, and more

Monitor twitter hashtags in realtime with common search api?

I've been working with the twitter search api, retrieving tweets with a php script run by a cron job, 3 or 4 times per hour.
All works fine, I can save some fields from the resulting tweets into mySQL for doing some research, contests, and accounting.
I begun experiencing some "trouble" some days ago when some hashtag hit Global Trending Topic, and the saved tweets werent't reflecting the real quantity of tweets We could see through search, etc.
So:
1- Should I use instead the twitter Streaming API?
2- Should I contact api AT twitter.com and request special permissions for my app or username?
3- Finally, is there a working way to acchieve this "realtime" monitoring script that can give more accurate and real results?
Thanks a lot in advance
Got a reply from twitter api staff...
It seems I should use STREAMING API, and they point me to this url
https://dev.twitter.com/docs/streaming-api/methods#track
Hope it is useful for others

Public Hotel API [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to programatically pull a list of Hotel names and addresses based on a city and state or zip code. I am looking for a public API that can accommodate real-time searching. I have evaluated Yahoo Local Search, Google Local and Kayak APIs but have found them unusable for the following reasons:
Yahoo Local - Commercial use not allowed
Google Local - Must attribute to Google (ok), cannot intersperse results with other data,
cannot save any of the data
Kayak - Limit to 1000 queries a day
Any ideas would be appreciated. Thanks!
I recommend Booking and Expedia APIs.
In my search for hotel APIs I have found only one API giving unrestricted open access to their hotel database and allowing you to book their hotels:
Expedia's EAN http://developer.ean.com/
You need to sign for their affiliate program, which is very easy. You get immediate access to their hotel databases plus you can make availability/booking requests with several response options, including JSON, which is more convenient and lightweight than the (unfortunately) more widespread XML.
As you immediately access their API, you can start developing and testing, but still need their approval to launch the site, basically to make sure it provides the needed quality and security, which is reasonable.
They also offer "deep linking", i.e. you may customize your requests by adding parameters. Then if it sufficient for your purpose (for mine it is not), you don't even need to store their content on your server.
I have also signed for HotelsCombined program: (link removed as this site doesn't seem to let me put more links)
However, they do not immediately allow you to use their API even for testing. From their answer:
"Apologies for the inconvenience caused, but it’s simply a business decision to limit access to our rich hotel content. Please kindly check back within the next 2-3 months, where we will be able to judge your traffic, and in turn judge your status on standard data feeds."
I have also signed for Booking.com affiliate program: (link removed as this site doesn't seem to let me put more links)
Unfortunately, again, they limit access, from their answer: "Please do note that, since there's a high amount of time and cost involved in the XML integration, we are only able to offer the XML integration to a small amount of partners with a high potential."
I did not explore Tripadvisor as they seem only to offer top 10 hotels and only as widgets, but most importantly for me, they wouldn't allow booking through them.
I've checked the hotelbase.org mentioned above, they have very extensive list but not as rich as by Expedia, also they don't seem to have images and don't allow booking either.
Try contacting Orbitz.com's affiliate team, and checking out Booking.com's api (http://xml.booking.com/), both have good API's.
Expedia has a Mashery-built API through EAN (Expedia Affiliate Network)that exposes hotel data and is incredibly easy to sign up for:
http://developer.ean.com/docs/hotels/version_3/request_hotel_list/Examples/
Check out api.hotelsbase.org - its a free xml hotel api

Suggestions to log the number of requests when building an API? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We have a pretty large website, which handles a couple million visitors a month, and we're currently developing an API for some of our partners. In the near future, we hope to expose the API to all of our visitors.
As we are trying to limit the number of requests to say, 100.000 / day and 1.000 / minute, I need to log the total number of requests per API key, and verify that the API user doesn't exceed this total. We won't check real-time whether the limit is exceeded at this point, but afterwords in the site's control panel. We also need to display a timeline per user in the control panel, so we have to get a quick per day or per hour overview if we need this.
My first idea was to build the following app:
API User => Webserver => Posts message with API key to a message queue => Service picks up the message => Posts to the database, where there is 1 item for each user-hour combo (key|hour|count). Which is gonna be quite fast, yet we'll remove quite some useful information (queries, requests / minute, etc.) Saving each and every request as separate record in the database will likely generate millions of records a day, and will (I guess, I'm not that much of a DBA) be quite slow when generating some chart. Even with the correct indices.
Our platform consists of around ten webservers, ten front end SQL servers, statsserver, some other servers for processing large tasks. All Windows (except our EMC), and MS SQL. Dev platform is ASP.Net WCF.
My advice would be to log everything - a simple append-only text file is simplest - and have a background task periodically read and summarize log segments into the database. This has several advantages over more 'sophisticated' approaches:
It's simpler.
It's really easy to debug.
You can keep individual log segments around until you need to delete them from disk, so you can get information on individual requests for debugging and accounting purposes.
You can easily extend it to collect more information, or improve and change your summarizer, because the components are loosely coupled.
It's easy to shard - just have each server keep its own logs.
I'd start with logging at first, and leave off enforcement. Your logging may show you that you don't need enforcement, or it may show you you need a different kind of enforcement.
I'd just start off creating a simple logging API: ApiLogger.Log(apiKey). I'd have the logger take authentication information etc. from HttpContext. I'd start at first just dumping it into a database table, and only get fancier if performance required it.
Later analysis could determine who is making how many calls, whether you want multiple tiers, charging different amounts per tier, etc. But for the moment, just store the data that your Business people will need.
As we are trying to limit the number of requests to say, 100.000 / day and 1.000 / minute, I need to log the total number of requests per api key, and verify that the api user doesn't exceed this total.
A feature like this will be part of WCF (if not already) in the very near future. I am currently racking my brain on where I heard it so I can point you in the right direct.
EDIT: FOUND IT!
This week on a podcast called "The Thirsty Developer", this very topic came up. Download the podcast here, and at 39:40 into the podcast the topic comes up. For those that do not want to listen there is a REST toolkit that has this feature in it. I think the toolkit can be found here