After many tests, I've been unable to get the Twitter Search API to return more than 80% of tweets containing a specific keyword or hashtag. This is not related to the maximum number of results, one test involved a hashtag which had been tweeted 50 times and only 15 of them were returned by the Twitter Search API. The same results were returned when using Twitter's own search tool.
Is the Twitter Search API simply a tool for getting estimates and trends, rather than accurate data?
Has anyone found a way to capture 100% of tweets containing a specific keyword or hashtag?
Twitter filters search api for better results. Here is a quote from developer site:
Both the Streaming API and the Search
API filter, and on some end-points,
discard, statuses created by a small
proportion of accounts based upon
status quality metrics. For example,
frequent and repetitious status
updates may, in some instances, and in
combination with other metrics, result
in a different status quality score
for a given account.
Search api simply returns a subset of the found tweets.
Related
The pricing regarding CSE is a little bit vague:
For CSE users, the API provides 100 search queries per day for free. If you need more, you may sign up for billing in the API Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day
Does one query equal one keyword regardless of pagination used, or one request? (in this sense XML is more efficient than JSON, as it allows 20 in num parameter, as opposed to JSONs 10)
Are the queries counted per API key, or per cx key?
It is vague and you are not the first to be puzzled. When I did my research I found this blog post helpful.
Assume you are talking about Custom Search Engine (terms you noted in your Q) and NOT Google Site Search (paid from the start). The reason I ask is that the XML function is only for Google Site Search customers. There is the JSON/Atom API and Custom Search API available for CSE.
For Q1, one Query = one request. You can use as many keywords or other parameters in your request (see comments in the blog post I referenced), but you will always be limited to 100 results.
For Q2, the billing is enabled through the API console. Once enabled (and in order to allow the 101st query) your code must include both your cx and API key. So in theory you could set up multiple search engines within your API and stay under the 100 request limit, but I have not seen a way to allow an API to support multiple cx keys.
Context
I am in the process of providing some consultancy on doing a HTTP GET using YouTube Data API V3; in order to develop a Windows based application to GET a list of results from Youtube, for say a specific CATEGORY, or a specific TAG.
We are open to using any programming language(I'm from a C++ background and am hoping You tube will support direct HTTP connections without using Google client SDK and so on) to connect to YouTube and (HTTP) GET data.(Once a month or so, so YouTube API quotas should not be problem).
The Issue
We are being told by some of my client's web developers that YouTube API v3 will only return a maximum of 500 records/results, for say a query that returns JUST the Total viewers, the Video's link, and basic meta data such as that.
S, say I wish to find 5,000 results for category "House music" or "basketball" - and I have the Developer Key etc are all set up, would that be possible?
If so, what GET fields would I need to populate(such as "max_results_per_page")?
Thank you.
The API won't provide more than ~500 search results for any arbitrary query. It's by design. Technically, it means that the nextPageToken field won't be returned once you hit ~500 results. No additional parameter can change that.
If you want more than ~500 results for a query, you have to split it into more specific sub-queries. I'd suggest using the publishedAfter and publishedBefore parameters to achieve that, but feel free to experiment with the other ones here.
This only holds for the search-Query. Other queries like "PlaylisItem:list" deliver more results. I have tested with 100.000 items to get the videos of a playlist.
Is this API simply for searching your website only, or can any standard google search (even advanced search features) be submitted to it? I understand there is a limit of 100 per day, I am just curious if it can be invoked from say your own machine as the code samples and introduction indicate its intended use is for displaying results on your website. I want to search outside of a given domain and scrape standard google results for any given search. This will not be an ajax call.
My current understanding:
You're limited to 100/day only if you don't pay.
You do have to specify domains, but some tlds are fine (eg: .uk)
There's a limit to 100 search results for any given search query (ten pages of up to ten responses)
It can be invoked from your own machine.
With https://dev.twitter.com/docs/api/1/get/statuses/user_timeline I can get 3,200 most recent tweets. However, certain sites like http://www.mytweet16.com/ seems to bypass the limit, and my browse through the API documentation could not find anything.
How do they do it, or is there another API that doesn't have the limit?
You can use twitter search page to bypass 3,200 limit. However you have to scroll down many times in the search results page. For example, I searched tweets from #beyinsiz_adam. This is the link of search results:
https://twitter.com/search?q=from%3Abeyinsiz_adam&src=typd&f=realtime
Now in order to scroll down many times, you can use the following javascript code.
var myVar=setInterval(function(){myTimer()},1000);
function myTimer() {
window.scrollTo(0,document.body.scrollHeight);
}
Just run it in the FireBug console. And wait some time to load all tweets.
The only way to see more is to start saving them before the user's tweet count hits 3200. Services which show more than 3200 tweets have saved them in their own dbs. There's currently no way to get more than that through any Twitter API.
http://www.quora.com/Is-there-a-way-to-get-more-than-3200-tweets-from-a-twitter-user-using-Twitters-API-or-scraping
https://dev.twitter.com/discussions/276
Note from that second link: "…the 3,200 limit is for browsing the timeline only. Tweets can always be requested by their ID using the GET statuses/show/:id method."
I've been in this (Twitter) industry for a long time and witnessed lots of changes in Twitter API and documentation. I would like to clarify one thing to you. There is no way to surpass 3200 tweets limit. Twitter doesn't provide this data even in its new premium API.
The only way someone can surpass this limit is by saving the tweets of an individual Twitter user.
There are tools available which claim to have a wide database and provide more than 3200 tweets. Few of them are followersanalysis.com, keyhole.co which I know of.
You can use a tool I wrote that bypasses the limit.
It saves the Tweets in a JSON format.
https://github.com/pauldotknopf/twitter-dump
You can use a Python library snscrape to do it. Or you can use ExportData tool to get all tweets for the user, which returns already preprocessed CSV and spreadsheet files. The first option is free, but has less information and requires more manual work.
I'm building a Twitter Application to show specific tweets (that matching pre defined criteria). I used a good library to grab the tweets and before showing them to the user I the tweets must get stored in a local database, so that I have more data and amazing statistics (ego? huh) to be calculated and shown to the user.
The problem is that tweets are not stored in the hashtag, so if I search for the hashtag one week later I will not be able to find the tweets, so I must have a way to show the tweets from the database instead of Twitter API. I decided that I will show data from database when the last tweet from a hashtag (in the database) is stored before than three days or more. when the last tweet is stored in less than three days, then I will ask Twitter to show the tweets.
So I'm asking you if you have an idea how to show tweets from database since my library depends on JSON (or consider it XML). Any ideas?
Store the tweets in CouchDB. If you use twitter streaming api or search api, that should be the most straightforward way for "saving" tweets.