How to get crawl stats FROM webmaster tools api - google-api-webmasters

I want to get this graph datas :
I can't add an image here : graph.png I don't have the reputation 10.
So I want to get for each day the 3 values (Pages crawled per day, kilobytes downloaded per day, time spent downloading a page)
the idea is to get an array like this :
$datas['2015-11-20']['pages_crawled'] = 125;
$datas['2015-11-20']['kilobytes'] = 1452;
$datas['2015-11-20']['time_spent'] = 1023;
$datas['2015-11-21']['pages_crawled'] = 146;
$datas['2015-11-21']['kilobytes'] = 2410;
$datas['2015-11-21']['time_spent'] = 1563;
$datas['2015-11-22']['pages_crawled'] = 102;
$datas['2015-11-22']['kilobytes'] = 1560;
$datas['2015-11-22']['time_spent'] = 1400;
Something like this.
thanks specially to #alex for his greathfull Help.

Unfortunately you can't get this Crawl Stats via API.
The only supported methods are webmasters.urlcrawlerrorscounts.query, webmasters.urlcrawlerrorssamples.list, webmasters.urlcrawlerrorssamples.get, webmasters.urlcrawlerrorssamples.markAsFixed ( https://developers.google.com/apis-explorer/#p/webmasters/v3/ )
So you can get information about crawl errors, but not general crawl stats.

The crawl errors you can retrieve with this API call :
https://www.googleapis.com/webmasters/v3/sites/https%3A%2F%2Fwww.mywebsite.com/urlCrawlErrorsCounts/query?latestCountsOnly=true&fields=countPerTypes&key={YOUR_API_KEY}
But the crawl statistics are not exposed through API.

Related

How to get the contributors you've coincided editing the most in Wikipedia

I'm doing a gamification web app to help Wikimedia's community health.
I want to find what editors have edited the same pages as 'Jake' the most in the last week or 100 last edits or something like that.
I know my query, but I can't figure out what tables I need because the Wikimedia DB layout is a mess.
So, I want to obtain something like
Username
Occurrences
Pages
Mikey
13
Obama,..
So the query would be something like (I'm accepting suggestions):
Get the pages that the user 'Jake' has edited in the last week.
Get the contributors of that page in last week.
For each of these contributors, get the pages they have edited in the last week and see if they match with the pages 'Jake' has edited and count them.
I've tried doing that something simpler in Pywikibot, but it's very, very slow (20secs for the last 500 contributions of Jake).
I only get the edited pages and get the contributors of that page and just count them and it's very slow.
My pywikibot code is:
site = Site(langcode, 'wikipedia')
user = User(site, username)
contributed_pages = set()
for page, oldid, ts, comment in user.contributions(total=100, namespaces=[0]):
contributed_pages.add(page)
return get_contributor_ocurrences(contributed_pages,site, username)
And the function
def get_contributor_ocurrences(contributed_pages, site,username):
contributors = []
for page in contributed_pages:
for editor in page.contributors():
if APISite.isBot(self= site,username=editor) or editor==username:
continue
contributors.append(editor)
return Counter(contributors)
PS: I have access to DB replicas, which I guess are way faster than Wikimedia API or Pywikibot
You can filter the data to be retrieved with timestamps parameters. This decreases the needed time a lot. Refer the documentation for their usage. Here is a code snippet to get the data with Pywikibot using time stamps:
from collections import Counter
from datetime import timedelta
import pywikibot
from pywikibot.tools import filter_unique
site = pywikibot.Site()
user = pywikibot.User(site, username) # username must be a string
# Setup the Generator for the last 7 days.
# Do not care about the timestamp format if using pywikibot.Timestamp
stamp = pywikibot.Timestamp.now() - timedelta(days=7)
contribs = user.contributions(end=stamp)
contributors= []
# filter_unique is used to remove duplicates.
# The key uses the page title
for page, *_ in filter_unique(contribs, key=lambda x: str(x[0])):
# note: editors is a Counter
editors = page.contributors(endtime=stamp)
print('{:<35}: {}'.format(page.title(), editors))
contributors.extend(editors.elements())
total = Counter(contributors)
This prints a list of pages and for each page it Shows the editors and their contribution counter of the given time range. Finally total should have the same content as your get_contributor_ocurrences functions above.
It requires some additional work to get the table you mentioned above.

Use API to gather statistics on my followers

I am very new to this and would like to know how to start gathering statistics on my followers as I am currently growing my follower base. I am subscribed to several statistic tracking apps but none are really good.
I wish to track things such as:
Follower count by Location
Frequency distribution of followers and tags
Follower growth rate by Hour, Day, week, etc..
Follower Loss
Is this at all possible using APIs? Can anyone tell me how to get started?
There is no direct API call to get follower growth by hour and week, you have to get all followers every hour and store it in database and analyze for growth or loss every hour compared to previous hour and save it on the server.
You cannot get location of followers from API, you can may be estimate the location by checking for location in bio or analyzing all user posts and finding most posted location (this is expensive on API side and will have to make a lot of API calls to get analyze)
Yes, all this is possible to do using API, but it is a lot of work on backend, so if some service does this, it will cost you money cause they cannot do it for free, my guess is that you have checked all free or cheap services and they cannot do all this analysis for cheap.
You can get a broad breakdown of follower count in Google Sheets. This doesn't require API access so you won't get all of the data you are looking for, such as GEO. But, if you would like to see your follower increase by the hour, do this -
Open up a new Google Sheet
Go to Tools > Script Editor
Name your script 'IGFollowers'
In the code box, copy and paste this code below, but make sure to write this replace 'AccountName' with your username
var sheetName = "IGFollowers";
var instagramAccountName = "AccountName";
function insertFollowerCount() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheetByName("IGFollowers");
sheet.appendRow([Utilities.formatDate(new Date(), "PST", "yyyy-MM-dd"),
Utilities.formatDate(new Date(), "PST", "hh:mm"),
getInstagramFollowerCount(this.instagramAccountName)]);
};
function getInstagramFollowerCount(username) {
var url = "https://www.instagram.com/" + username + "/?__a=1";
var response = UrlFetchApp.fetch(url).getContentText();
return JSON.parse(response).user.followed_by.count;
}
Go to Run > InsertFollowerCount
NOTE: You may need to do a bit of formatting with the main Google Sheet, but this will get you some very long columns showing an increase in followers by the hour.

What is the best way to call channel9 api for getting video?

Lets say that i have URL to channel9 movie;
Ex: https://channel9.msdn.com/Series/Office-365-Tips--Tricks/01-Wprowadzenie
And I want to display this movie on my site, and display some information for it ex. duration.
All what I know already is that, I can get list of movies by calling
https://channel9.msdn.com/odata/Entries and skipping it +25 for showing next 25 results.
My implementation right now is something like:
Get first 25 elements from api
Iterate throught them
compare my url with elementFromApi[i].url
Its working but I don't like this solution, it is non elegant and slow as hell. I have no knowledge about the api so i dunno know how to refactor this^.
Maybe someone of You can help me.
PS. I need information from api, embed iframe with given url is not the solution here :)
PS2. Sorry for my english.
I got this requirement through my client and I ended up doing it using RSS! In your use case the URL is https://channel9.msdn.com/Series/Office-365-Tips--Tricks/01-Wprowadzenie - Simply we can read the rss by using the link https://s.ch9.ms/Series/Office-365-Tips--Tricks/rss/mp4 - In PowerShell we can explore the content with the below piece of code
$Sessions = Invoke-Restmethod -Uri 'https://s.ch9.ms/Series/Office-365-Tips--Tricks/rss/mp4' -UseDefaultCredentials
foreach($Session in $Sessions) {
$Duration = [timespan]::FromSeconds($Session.duration)
[pscustomobject]#{
Title = $Session.title
Duration = ("{0:0}:{1:00}:{2:00}" -f ($Duration.Hours , $Duration.Minutes , $Duration.Seconds))
Creator = $Session.creator
"URl(MP3)" = $Session.group.content.url[0]
"URl(MP4)" = $Session.group.content.url[1]
"URl(webm)" = $Session.group.content.url[2]
"URl(MP4High)" = $Session.group.content.url[3]
}
}
Note: Code needs to be improvised!
The same can be achieved using C# - But I am not a certified developer - So , I used PowerShell to meet client requirement.

Google custom search api - Limit results to 1 per domain

Is there any way to limit the results returned by the Google Custom Search API to 1 per domain?
Yes. If you're looking for the first result. Then you can try the following if you are familiar with Python.
res = service.cse().list(query,cx='YOURSEARCHENGINEIDHERE,).execute()
name = res['items'][0]['title'] #title of the first search result
url = res['items'][0]['link'] #url of the first search result

Get ALL tweets, not just recent ones via twitter API (Using twitter4j - Java)

I've built an app using twitter4j which pulls in a bunch of tweets when I enter a keyword, takes the geolocation out of the tweet (or falls back to profile location) then maps them using ammaps. The problem is I'm only getting a small portion of tweets, is there some kind of limit here? I've got a DB going collecting the tweet data so soon enough it will have a decent amount, but I'm curious as to why I'm only getting tweets within the last 12 hours or so?
For example if I search by my username I only get one tweet, that I sent today.
Thanks for any info!
EDIT: I understand twitter doesn't allow public access to the firehose.. more of why am I limited to only finding tweets of recent?
You need to keep redoing the query, resetting the maxId every time, until you get nothing back. You can also use setSince and setUntil.
An example:
Query query = new Query();
query.setCount(DEFAULT_QUERY_COUNT);
query.setLang("en");
// set the bounding dates
query.setSince(sdf.format(startDate));
query.setUntil(sdf.format(endDate));
QueryResult result = searchWithRetry(twitter, query); // searchWithRetry is my function that deals with rate limits
while (result.getTweets().size() != 0) {
List<Status> tweets = result.getTweets();
System.out.print("# Tweets:\t" + tweets.size());
Long minId = Long.MAX_VALUE;
for (Status tweet : tweets) {
// do stuff here
if (tweet.getId() < minId)
minId = tweet.getId();
}
query.setMaxId(minId-1);
result = searchWithRetry(twitter, query);
}
Really it depend on which API system you are using. I mean Streaming or Search API. In the search API there is a parameter (result_type) that is an optional parameter. The values of this parameter might be followings:
* mixed: Include both popular and real time results in the response.
* recent: return only the most recent results in the response
* popular: return only the most popular results in the response.
The default one is the mixed one.
As far as I understand, you are using the recent one, that is why; you are getting the recent set of tweets. Another issue is getting low volume of tweets that have the geological information. Because there are very few users added the geological information to their profile, you are getting very few tweets.