Twitter API: How to search only for geotagged tweets - api

How can I use Twitter Search API (or other) to get a list of tweets which have the "geo" param?
--EDIT--
By example: I wont get list of geotagged tweets, by #apple tag. Without location filter, worldwide.

Looks like the latest API supports that; simply use a large enough geo region for your query:
-180,-90,180,90
See more from the API link for filter and location

The streaming API allowed you to filter by a location and the search API allows you to search by geocode. You can find more information on these services on our developer resources site.
Streaming API: http://dev.twitter.com/pages/streaming_api
Example: Create a file called ‘locations’ that
contains, excluding the quotation
marks, the phrase:
“locations=-122.75,36.8,-121.75,37.8,-74,40,-73,41” then execute:
curl -d #locations
http://stream.twitter.com/1/statuses/filter.json
-uAnyTwitterUser:Password.
You will receive all geo tagged tweets
from the San Francisco and New York
City area.
Search API: http://dev.twitter.com/doc/get/search
Example: http://search.twitter.com/search.json?geocode=37.781157,-122.398720,1mi

From the Twitter API Documentation, this should be the format of your search query:
http://search.twitter.com/search.json?geocode=37.781157,-122.398720,1mi
Where 37.781157 is the latitude, -122.398720 is the longitude and 1mi is the radius to search within.

You can look for every tweet but save only the geotaged ones.
I know it dont make a lot of sense, but works quite well.
if you call you search results, you can state
for result in results:
if result.geo != None:
print result.text.encode('utf-8', errors='ignore') # or do anything you want with the tweets

Use -180,-90,180,90 to get any geotagged tweet.

Related

how to get table info and summary of page using Wikipedia api?

I want to get minimal information of a Wikipedia page using MediaWiki API like DuckDuckGo. For example for Steve Carell: https://duckduckgo.com/?q=steve+carell&t=hp&ia=news&iax=about
How can I get this information with a Wikipedia url (eg https://en.wikipedia.org/wiki/Steve_Carell) in HTML format?
You can use the MediaWiki API for that. There's an extension, TextExtracts, which is exactly for that (and it is installed on Wikipedia).
In your case, e.g.:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exsentences=1&titles=Steve%20Carell
will return something like:
<p class=\"mw-empty-elt\">\n</p>\n\n<p class=\"mw-empty-elt\">\n \n</p>\n<p><b>Steven John Carell</b> (<span></span>; born August 16, 1962) is an American actor, comedian, producer, writer and director.</p>
You can customize how many sentences (or characters) the API returns, as well, please consult the API documentation for that.
There's also the way to retrieve the short description, which is saved at Wikidata (and visible in the mobile view of Wikipedia). This call would be:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&titles=Steve_Carell
This returns the following property in the pageprops of the page:
"wikibase-shortdesc": "American actor"
This may fit better depending on your use case.
You can even get both of the results with a single, combined, request:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts|pageprops&exsentences=1&titles=Steve_Carell

Exclude retweets from twitter streaming api using tweepy

When using the python tweepy library to pull tweets from twitter's streaming API is it possible to exclude retweets?
For instance, if I want only the tweets posted by a particular user ex: twitterStream.filter(follow = ["20264932"]) but this returns retweets and I would like to exclude them. How can I do this?
Thank you in advance.
Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.
Sometimes people type RT at the beginning of a Tweet to indicate that they are re-posting someone else's content. This isn't an official Twitter command or feature, but signifies that they are quoting another user's Tweet.
If you're going by the 'official' definition, then you want to filter tweets out if they have a True value for their retweeted attribute, like this:
if not tweet['retweeted']:
# do something with standard tweets
And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT #' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:
if not tweet['retweeted'] and 'RT #' not in tweet['text']:
# do something with standard tweets
The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT #' in the tweet text, leaving you with tweets that are supposedly regular tweets.
Yes there are possible ways of doing this, One of them is to check if the text of the tweet, starts with RT, For this we can easily use .startswith() method on strings and for this you need to change the code of the on_data() method in your streaming class, which can be done as:
class TwitterStreamListener(tweepy.StreamListener):
def on_data(self, data):
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
if not decoded[`text`].startswith('RT'):
#Do processing here
print decoded['text'].encode('ascii', 'ignore')
return True

REST API Explore: How to get same list ordered like the search on Foursquare Website?

I'm using the REST API (venues platform) to get a list of the top 5 venues per destination and category, like they are listed in the search results on the foursquare website.
For example i do the following request using the explore api endpoint:
https://api.foursquare.com/v2/venues/explore?near=zurich,CH&query=College %26 University
I'm using the apias a not authenticated. I do not pass a radius, to get the default radius.
Now the results from the api is as follows:
1. ETH Hönggerberg HIL
2. Technopark
3. Kantonsschule Stadelhofen
4. Zürcher Hochschule der Künste, Departement Musik
5. Klubschule Migros
When i search on the Foursquare Website as follow (not logged in!):
https://foursquare.com/explore?near=Zurich%2C%20CH&q=College%20%26%20University
I get the following results listed on the website:
1. SBB Digital
2. Technopark
3. ETH Hönggerberg HIL
4. EF Education First
5. Klubschule Migros
Is it possible to get the same list, as shown on the website (in the same order) also from the api? If yes? how can i do that? how do i call the api or how do i have to sort the results from the api to get the same list?
First i tried to sort the api results by the rating field, but that doesn't do the trick at all. Because in this example no one of the first 30 results does have any rating.
Thank you in advance for your help!
Greets
Tom
Make sure you are making the explore request with the same user account as on the web. Also, it looks like your queries do not match up exactly. One is "College & University and the other is "college" - try matching up the queries and seeing if the results change.
Well meanwhile i got the answer from foursquare.
The trick is very simple: just have to add the following parameters to the api call: time=any and day=any.
Then you get the results ordered exactly the same way as on the foursquare webpage (when not authenticated).
Thank you david from foursquare! :-)

How to get the result of "all pages with prefix" using Wikipedia api?

I wish to use Wikipedia api to extract the result of this page:
http://en.wikipedia.org/wiki/Special:PrefixIndex
When searching "something" on it, for example this:
http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4
Then, I would like to access each of the resulting pages and extract their information.
What api call might I use?
You can use list=allpages and specify apprefix. For example:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=allpages&apprefix=tal&aplimit=max
This query will give you the id and title of each article that starts with tal. If you want to get more information about each page, you can use this list as a generator:
http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=allpages&gapprefix=tal&gaplimit=max&prop=info
You can give different values to the prop parameter to get different information about the page.

Accessing Flixster data

Is there any way of accessing the data on Flixster? Specifically, I'd like to retrieve a list of all of my movie ratings. I know you can get an rss feed of these, but it only appears to return a subset of all of the ratings.
View your Rotten Tomatoes / Flixster movie ratings by accessing this URL, first replacing USERIDHERE in the URL.
http://community.flixster.com/api/v1/users/USERIDHERE/ratings.rss
You can find out your user id by clicking to view your Profile, which will contain your unique id in the URL, such as:
http://www.flixster.com/user/USERIDHERE/
Yes, there is a new-ish Rotten-Tomatoes API at http://developer.rottentomatoes.com/docs/read/json/v10/Movie_Reviews
Here's a simple program, using the API, that I've found:
https://github.com/mmihaljevic/flixter
...and you can read her blog post for more information.
Just tested both fetching and parsing, and it still appears to work.