Use API to gather statistics on my followers - api

I am very new to this and would like to know how to start gathering statistics on my followers as I am currently growing my follower base. I am subscribed to several statistic tracking apps but none are really good.
I wish to track things such as:
Follower count by Location
Frequency distribution of followers and tags
Follower growth rate by Hour, Day, week, etc..
Follower Loss
Is this at all possible using APIs? Can anyone tell me how to get started?

There is no direct API call to get follower growth by hour and week, you have to get all followers every hour and store it in database and analyze for growth or loss every hour compared to previous hour and save it on the server.
You cannot get location of followers from API, you can may be estimate the location by checking for location in bio or analyzing all user posts and finding most posted location (this is expensive on API side and will have to make a lot of API calls to get analyze)
Yes, all this is possible to do using API, but it is a lot of work on backend, so if some service does this, it will cost you money cause they cannot do it for free, my guess is that you have checked all free or cheap services and they cannot do all this analysis for cheap.

You can get a broad breakdown of follower count in Google Sheets. This doesn't require API access so you won't get all of the data you are looking for, such as GEO. But, if you would like to see your follower increase by the hour, do this -
Open up a new Google Sheet
Go to Tools > Script Editor
Name your script 'IGFollowers'
In the code box, copy and paste this code below, but make sure to write this replace 'AccountName' with your username
var sheetName = "IGFollowers";
var instagramAccountName = "AccountName";
function insertFollowerCount() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheetByName("IGFollowers");
sheet.appendRow([Utilities.formatDate(new Date(), "PST", "yyyy-MM-dd"),
Utilities.formatDate(new Date(), "PST", "hh:mm"),
getInstagramFollowerCount(this.instagramAccountName)]);
};
function getInstagramFollowerCount(username) {
var url = "https://www.instagram.com/" + username + "/?__a=1";
var response = UrlFetchApp.fetch(url).getContentText();
return JSON.parse(response).user.followed_by.count;
}
Go to Run > InsertFollowerCount
NOTE: You may need to do a bit of formatting with the main Google Sheet, but this will get you some very long columns showing an increase in followers by the hour.

Related

How to get the contributors you've coincided editing the most in Wikipedia

I'm doing a gamification web app to help Wikimedia's community health.
I want to find what editors have edited the same pages as 'Jake' the most in the last week or 100 last edits or something like that.
I know my query, but I can't figure out what tables I need because the Wikimedia DB layout is a mess.
So, I want to obtain something like
Username
Occurrences
Pages
Mikey
13
Obama,..
So the query would be something like (I'm accepting suggestions):
Get the pages that the user 'Jake' has edited in the last week.
Get the contributors of that page in last week.
For each of these contributors, get the pages they have edited in the last week and see if they match with the pages 'Jake' has edited and count them.
I've tried doing that something simpler in Pywikibot, but it's very, very slow (20secs for the last 500 contributions of Jake).
I only get the edited pages and get the contributors of that page and just count them and it's very slow.
My pywikibot code is:
site = Site(langcode, 'wikipedia')
user = User(site, username)
contributed_pages = set()
for page, oldid, ts, comment in user.contributions(total=100, namespaces=[0]):
contributed_pages.add(page)
return get_contributor_ocurrences(contributed_pages,site, username)
And the function
def get_contributor_ocurrences(contributed_pages, site,username):
contributors = []
for page in contributed_pages:
for editor in page.contributors():
if APISite.isBot(self= site,username=editor) or editor==username:
continue
contributors.append(editor)
return Counter(contributors)
PS: I have access to DB replicas, which I guess are way faster than Wikimedia API or Pywikibot
You can filter the data to be retrieved with timestamps parameters. This decreases the needed time a lot. Refer the documentation for their usage. Here is a code snippet to get the data with Pywikibot using time stamps:
from collections import Counter
from datetime import timedelta
import pywikibot
from pywikibot.tools import filter_unique
site = pywikibot.Site()
user = pywikibot.User(site, username) # username must be a string
# Setup the Generator for the last 7 days.
# Do not care about the timestamp format if using pywikibot.Timestamp
stamp = pywikibot.Timestamp.now() - timedelta(days=7)
contribs = user.contributions(end=stamp)
contributors= []
# filter_unique is used to remove duplicates.
# The key uses the page title
for page, *_ in filter_unique(contribs, key=lambda x: str(x[0])):
# note: editors is a Counter
editors = page.contributors(endtime=stamp)
print('{:<35}: {}'.format(page.title(), editors))
contributors.extend(editors.elements())
total = Counter(contributors)
This prints a list of pages and for each page it Shows the editors and their contribution counter of the given time range. Finally total should have the same content as your get_contributor_ocurrences functions above.
It requires some additional work to get the table you mentioned above.

Twitter Premium API Profile location operators profile_country: and profile_region: not working

I am using premium account (not sandbox) for data collection.
I want to collect:
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to US and not geolocated at tweet level, excluding all retweets
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to ‘Minnesota’ and not geolocated at tweet level, excluding all retweets
The code is as follows:
premium_search_args = load_credentials('twitter_API.yaml',
yaml_key ='search_tweets_premium_api', env_overwrite=False)
# keywords for the search
# key word 1
keywords = '(China OR Chinese) lang:en profile_country:US -place_country:US -is:retweet'
# key word 2
keywords = '(China OR Chinese) lang:en -place_country:US profile_region:"Minnesota" -is:retweet'
# define search rule
rule = gen_rule_payload(keywords,from_date='2019-12-01',
to_date='2019-12-10',results_per_call=500)
# create result stream and print before start
rs = ResultStream(rule_payload=rule, max_results=1250000,
**premium_search_args)
My problems are that:
For the first one, a large portion of the results I get didn’t satisfy the query. First, some don’t have Profile Geo enrichment, i.e. user.derived.locations attribute is not in the user object. Second, if it is, a lot don’t have country code US, i.e. they are identified to other countries.
For the second one, the result I get from this method is a smaller subset of the results I can get from 1). That is, when I filter all tweets user geolocated to Minnesota (by user.derived.locations.region) from profile_country:US, it gives a larger sample than using profile_region:“Minnesota”. A considerable amount of data is missing using this method.
I have tried several times but it seems that user geolocation operators don’t work exactly what I want. Does anyone has any idea why this is the case? I would very much appreciate any answers/suggestions/comments.
Thank you!

Yahoo Weather API 3 days Forecast with humidity value

I am building a web app, which uses Yahoo Weather API to provide weather information, based on a ZIP code, provided by the users.
I know that in order to obtain this data for certain number of days I have to add it as a parameter in my request, like so:
http://weather.yahooapis.com/forecastrss?p=33035&u=c&d=3
Which gives this result:
<channel>
....
<yweather:location city="Homestead" region="FL" country="US"/>
<yweather:units temperature="C" distance="km" pressure="mb" speed="km/h"/>
<yweather:wind chill="19" direction="90" speed="11.27"/>
<yweather:atmosphere humidity="78" visibility="16.09" pressure="1021" rising="1"/>
<yweather:astronomy sunrise="7:12 am" sunset="7:36 pm"/>
...
<item>
...
<yweather:forecast day="Wed" date="2 Apr 2014" low="19" high="28" text="Mostly Sunny" code="34"/>
<yweather:forecast day="Thu" date="3 Apr 2014" low="21" high="29" text="Partly Cloudy" code="30"/>
<yweather:forecast day="Fri" date="4 Apr 2014" low="20" high="28" text="Partly Cloudy" code="30"/>
<guid isPermaLink="false">USFL0208_2014_04_04_7_00_EDT</guid>
</item>
</channel>
However I need to be able to get the humidity level for EVERY day in the forecast and not just the current one. I've tried to find solution here and also read the Yahoo API documentation, but it's really a short one.
I've also tried http://www.myweather2.com/ and http://developer.worldweatheronline.com/, but they have the same issue with humidity - it's shown only for the current day and stripped from the whole forecast.
I'll keep trying with other free Weather APIs, but if you can help here I'd be very grateful.
I've spent some time researching for better APIs than the Yahoo! Weather API and I found something, that works for me, so I've decided to share the info, in case anyone else bump into this some day.
Version 2 of the Forecast API looks like a nice solution, because it provides detailed information, more specifically the humidity index for each day in the forecast, which was what I was looking for in the first place.
There are also number of libraries for this API (languages include PHP, Node.js, Python, Perl and others).
It works with longitude & latitude as in this example
https://api.forecast.io/forecast/APIKEY/LATITUDE,LONGITUDE,TIME
and of course one needs to register for an API key.
The TIME parameter is optional, the temperature uses Fahrenheit metrics by default, but this can be changed with additional ?units=ca parameter.
One disadvantage is that you get only a 1,000 free calls per day, which probably won't be suitable for big applications.
If you know a better answer to the original question, I'll be very happy to hear it. Until then I'll use this solution, without going into much details:
// The default result from a Forecast API call is in JSON,
// so decode it to an object for easier access
$fcObj = json_decode($jsonResult);
if(!empty($fcObj->daily->data[0]->humidity)) {
// get humidity value in %
$humidty = $fcObj->daily->data[0]->humidity * 100;
}

Get ALL tweets, not just recent ones via twitter API (Using twitter4j - Java)

I've built an app using twitter4j which pulls in a bunch of tweets when I enter a keyword, takes the geolocation out of the tweet (or falls back to profile location) then maps them using ammaps. The problem is I'm only getting a small portion of tweets, is there some kind of limit here? I've got a DB going collecting the tweet data so soon enough it will have a decent amount, but I'm curious as to why I'm only getting tweets within the last 12 hours or so?
For example if I search by my username I only get one tweet, that I sent today.
Thanks for any info!
EDIT: I understand twitter doesn't allow public access to the firehose.. more of why am I limited to only finding tweets of recent?
You need to keep redoing the query, resetting the maxId every time, until you get nothing back. You can also use setSince and setUntil.
An example:
Query query = new Query();
query.setCount(DEFAULT_QUERY_COUNT);
query.setLang("en");
// set the bounding dates
query.setSince(sdf.format(startDate));
query.setUntil(sdf.format(endDate));
QueryResult result = searchWithRetry(twitter, query); // searchWithRetry is my function that deals with rate limits
while (result.getTweets().size() != 0) {
List<Status> tweets = result.getTweets();
System.out.print("# Tweets:\t" + tweets.size());
Long minId = Long.MAX_VALUE;
for (Status tweet : tweets) {
// do stuff here
if (tweet.getId() < minId)
minId = tweet.getId();
}
query.setMaxId(minId-1);
result = searchWithRetry(twitter, query);
}
Really it depend on which API system you are using. I mean Streaming or Search API. In the search API there is a parameter (result_type) that is an optional parameter. The values of this parameter might be followings:
* mixed: Include both popular and real time results in the response.
* recent: return only the most recent results in the response
* popular: return only the most popular results in the response.
The default one is the mixed one.
As far as I understand, you are using the recent one, that is why; you are getting the recent set of tweets. Another issue is getting low volume of tweets that have the geological information. Because there are very few users added the geological information to their profile, you are getting very few tweets.

Rails show different object every day

I want to match my user to a different user in his/her community every day. Currently, I use code like this:
#matched_user = User.near(#user).order("RANDOM()").first
But I want to have a different #matched_user on a daily basis. I haven't been able to find anything in Stack or in the APIs that has given me insight on how to do it. I feel it should be simpler than having to resort to a rake task with cron. (I'm on postgres.)
Whenever I find myself hankering for shared 'memory' or transient state, I think to myself "this is what (distributed) caches were invented for".
#matched_user = Rails.cache.fetch(#user.cache_key + '/daily_match', expires_in: 1.day) {
User.near(#user).order("RANDOM()").first
}
NOTE: While specifying a TTL for cache entry tells Rails/the cache system to try and keep that value for the given timeframe, there's NO guarantee that it will. In particular, a cache that aggressively tries to reclaim memory may expire an entry well before its desired expires_in time.
For this particular use case, it shouldn't be a big deal but in cases where the business/domain logic demands periodically generated values that are durable then you really have to factor that into your database.
How about using PostgreSQL's SETSEED function? I used the date to seed so that every day the seed will change, but within a day, the seed will be consistent.:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
You may want to seed a random value after using this so that any future calls to random aren't biased:
random = User.connection.execute("SELECT RANDOM()").to_a.first["random"]
# Same code as above:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
# Use random value before seed to make new seed:
User.connection.execute "SELECT SETSEED(#{random})"
I have split these steps in different sections just for readability. you can optimise query later.
1) Find all user records till today morning. so that the count will freeze.
usrs_till_today_morning = User.where("created_at <?", DateTime.now.in_time_zone(Time.zone).beginning_of_day)
2) Pluck all ID's
user_ids = usr_till_today_morning.pluck(:id)
3) Today date it will be a range (1..30) but will remain constant throughout the day.
day_today = Time.now.day
4) Select the same ID for the day
todays_user_id = user_ids[day_today % user_ids.count]
#matched_user = User.find(todays_user_id)
So it will give you random user records by maintaining same record throughout the day!!