How to use Wikipedia API to get the page view statistics of a particular page in wikipedia? - wikipedia-api

The stats.grok.se tool provides the pageview statistics of a particular page in wikipedia. Is there a method to use the wikipedia api to get the same information? What does the page views counter property actually mean?

The Pageview API was released a few days ago: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}
https://wikimedia.org/api/rest_v1/?doc#/
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageview_API
For example https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Foo/daily/20151010/20151012 will give you
{
"items": [
{
"project": "en.wikipedia",
"article": "Foo",
"granularity": "daily",
"timestamp": "2015101000",
"access": "all-access",
"agent": "all-agents",
"views": 79
},
{
"project": "en.wikipedia",
"article": "Foo",
"granularity": "daily",
"timestamp": "2015101100",
"access": "all-access",
"agent": "all-agents",
"views": 81
}
]
}

No, there is not.
The counter property returned from prop=info would tell you how many times the page was viewed from the server. It is disabled on Wikipedia and other Wikimedia wikis because the aggressive squid/varnish caching means only a tiny fraction of page views would make it to the actual server in order to affect that counter, and even then the increased database write load for updating that counter would probably be prohibitive.
The stats.grok.se tool uses anonymized logs from the cache servers to calculate page views; the raw log files are available from http://dammit.lt/wikistats. If you need an API to access the data from stats.grok.se, you should contact the operator of stats.grok.se to request one be created.
Note this was written 4 years ago, and an API has since been created (see this answer). There's not yet a way to access that via api.php, though.

get the daily JSON for the last 30 days like this
http://stats.grok.se/json/en/latest30/Britney_Spears

You can look into the stats here.
Have anyone experienced some API to get the Pageview Stats?
Furthermore, I have also looked into the available Raw Data but could not find the solution to extract the Pageview Count.

There doesn't seem to be any API; however, you can make HTTP requests to stats.grok.se and parse the HTML or JSON result to extract the page view counts.
I created a website http://wikipediaviews.org that does exactly that in order to facilitate easier comparison for multiple pages across multiple months and years. To speed things up, and minimize the number of requests to stats.grok.se, I keep all past query results stored locally.
The code I used is available at http://github.com/vipulnaik/wikipediaviews.
The file with the actual retrieval code is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/pageviewqueries.inc
function getpageviewsonline($page, $month, $language)
{
$url = getpageviewsurl($page,$month,$language);
$html = file_get_contents($url);
preg_match('/(?<=\bhas been viewed)\s+\K[^\s]+/',$html,$numberofpageviews);
return $numberofpageviews[0];
}
The code for getpageviewsurl is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/stringfunctions.inc:
function getpageviewsurl($page,$month,$language)
{
$page = str_replace(" ","_",$page);
$page = str_replace("'","%27",$page);
return "http://stats.grok.se/" . $language . "/" . $month . "/" . $page;
}
PS: In case the link to wikipediaviews.org doesn't work, it's because I registered the domain quite recently. Try http://wikipediaviews.subwiki.org instead in the interim.

em.. this question was asked 6 years ago. There's no such an API in official site in the past.
It changed.
A simple example:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageviews&titles=Buckingham+Palace%7CBank+of+England%7CBritish+Museum
See document:
prop=pageviews
Shows per-page pageview data (the number of daily pageviews for each of the last pvipdays days). The result format is page title (with underscores) => date (Ymd) => count.

Related

Azure Search API does not find indexed document despite correct query

Using Azure Search REST API v2016-09-01, the following query find the expected document:
?queryType=full&search=id:3119443 AND name:du*
{
"value": [
{
"#search.score": 4.425995,
"id": "3119443",
"name": "dupond"
}
]
}
Whereas the following broader query (searching d* instead of du*) does not find the same document:
?queryType=full&search=id:3119443 AND name:d*
{
"value": []
}
The name field uses a custom analyzer with the Whitespace tokenizer and the WordDelimiterTokenFilter, AsciiFoldingTokenFilter and Lowercase token filters.
Most of the indexed documents are correctly found when searching their first name letter.
The issue is 100% reproducible on those specific documents, for which I don't find anything special.
The Search Service is a "Standard" tier (1 replica, 1 partition, 1 search unit), with the index containing 3,000,000+ documents.
Thank you.
Thanks for reporting the issue. As commented, this is a regression introduced in a recent change. The bug has been fixed. Thanks.

How to construct intersection in REST Hypermedia API?

This question is language independent. Let's not worry about frameworks or implementation, let's just say everything can be implemented and let's look at REST API in an abstract way. In other words: I'm building a framework right now and I didn't see any solution to this problem anywhere.
Question
How one can construct REST URL endpoint for intersection of two independent REST paths which return collections? Short example: How to intersect /users/1/comments and /companies/6/comments?
Constraint
All endpoints should return single data model entity or collection of entities.
Imho this is a very reasonable constraint and all examples of Hypermedia APIs look like this, even in draft-kelly-json-hal-07.
If you think this is an invalid constraint or you know a better way please let me know.
Example
So let's say we have an application which has three data types: products, categories and companies. Each company can add some products to their profile page. While adding the product they must attach a category to the product. For example we can access this kind of data like this:
GET /categories will return collection of all categories
GET /categories/9 will return category of id 9
GET /categories/9/products will return all products inside category of id 9
GET /companies/7/products will return all products added to profile page of company of id 7
I've omitted _links hypermedia part on purpose because it is straightforward, for example / gives _links to /categories and /companies etc. We just need to remember that by using hypermedia we are traversing relations graph.
How to write URL that will return: all products that are from company(7) and are of category(9)? In otherwords how to intersect /categories/9/products and /companies/7/products?
Assuming that all endpoints should represent data model resource or collection of them I believe this is a fundamental problem of REST Hypermedia API, because in traversing hypermedia api we are traversing relational graph going down one path so it is impossible to describe such intersection because it is a cross-section of two independent graph paths.
In other words I think we cannot represent two independent paths with only one path. Normally we traverse one path like A->B->C, but if we have X->Y and Z->Y and we want all Ys that come from X and Z then we have a problem.
So far my proposition is to use query strings: /categories/9/products?intersect=/companies/9 but can we do better?
Why do I want this?
Because I'm building a framework which will auto-generate REST Hypermedia API based on SQL database relations. You could think of it as a trans compiler of URLs to SELECT ... JOIN ... WHERE queries, but the client of the API only sees Hypermedia and the client would like to have a nice way of doing intersections, like in the example.
I don't think you should always look at REST as database representation, this case looks more of a kind of specific functionality to me. I think I'd go with something like this:
/intersection/comments?company=9&product=5
I've been digging after I wrote it and this is what I've found (http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api):
Sometimes you really have no way to map the action to a sensible RESTful structure. For example, a multi-resource search doesn't really make sense to be applied to a specific resource's endpoint. In this case, /search would make the most sense even though it isn't a resource. This is OK - just do what's right from the perspective of the API consumer and make sure it's documented clearly to avoid confusion.
What You want to do is to filter products in one of the categories ... so following Your example if we have:
GET /categories/9/products
Above will return all products in category 9, so to filter out products for company 7 I would use something like this
GET /categories/9/products?company=7
You should treat URI as link to fetch all data (just like simple select query in SQL) and query parameters as where, limit, desc etc.
Using this approach You can build complex and readable queries fe.
GET /categories/9/products?company=7&order=name,asc&offset=10&limit=20
All endpoints should return single data model entity or collection of
entities.
This is NOT a REST constraint. If you want to read about REST constraints, then read the Fielding dissertation.
Because I'm building a framework which will auto-generate REST
Hypermedia API based on SQL database relations.
This is a wrong approach and has nothing to do with REST.
By REST you describe possible resource state transitions (or operation call templates) by sending hyperlinks in the response. These hyperlinks consist of a HTTP methods and URIs (and other data which is not relevant now) if you build the uniform interface using the HTTP and URI standards, and we usually do so. The URIs are not (necessarily) database entity and collection identifiers and if you apply such a constraint you will end up with a CRUD API, not with a REST API.
If you cannot describe an operation with the combination of HTTP methods and already existing resources, then you need a new resource.
In your case you want to aggregate the GET /users/1/comments and GET /companies/6/comments responses, so you need to define a link with GET and a third resource:
GET /comments/?users=1&companies=6
GET /intersection/users:1/companies:6/comments
GET /intersection/users/1/companies/6/comments
etc...
RESTful architecture is about returning resources that contain hypermedia controls that offer state transitions. What i see here is a multistep process of state transitions. Let's assume you have a root resource and somehow navigate over to /categories/9/products using the available hypermedia controls. I'd bet the results would look something like this in hal:
{
_links : {
self : { href : "/categories/9/products"}
},
_embedded : {
item : [
{json of prod 1},
{json of prod 2}
]
}
}
If you want your client to be able to intersect this with another collection you need to provide to them the mechanism to perform this. You have to give them a hypermedia control. HAL only has links, templated links, and embedded as control types. let's go with links..change the response to:
{
_links : {
self : { href : "/categories/9/products"},
x:intersect-with : [
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 1",
title : "Company 6 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 2",
title : "Company 5 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 3",
title : "Company 7 products"
}
]
},
_embedded : {
item : [
{json of prod 1},
{json of prod 2}
]
}
}
Now the client just picks the right hypermedia control (aka link) based on the title field of the link.
That's the simplest solution. But you'll probably say there's 1000's of companies i don't want 1000's of links...well ok if that;s REALLY the case...you just offer a state transition in the middle of the two we have:
{
_links : {
self : { href : "/categories/9/products"},
x:intersect-options : { href : "URL to a Paged collection of all intersect options"},
x:intersect-with : [
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 1",
title : "Company 6 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 2",
title : "Company 5 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 3",
title : "Company 7 products"
}
]
},
_embedded : {
item : [
{json of prod 1},
{json of prod 2}
]
}
}
See what i did there? an extra control for an extra state transition. JUST LIKE YOU WOULD DO IF YOU HAD A WEBPAGE. You'd probably put it in a pop up, well that's what the client of your app can do too with the result of that control.
It's really that simple...just think how you'd do it in HTML and do the same.
The big benefit here is that the client NEVER EVER needed to know a company or category id or ever plug that in to some template. The id's are implementation details, the client never knows they exist, they just executed Hypermedia controls..and that is RESTful.

Yodlee Rest APIs and all possible responses

I am looking for a more detailed list of possible API responses when using Yodlee's REST API. Think of it as an XSD response but for a JSON string. I want to know if there are possible data elements that are not listed Yodlee's JSON response examples.
The only info I can really find so far is here.
When I review these examples, it appears that the example JSON responses do not fully describe every field.
Here is part of the getItemSummaryForItem1 JSON example for maturityDate element
"maturityDate":{
},
It looks like there is an array, but the possible data elements for that maturityDate array are undeclared. Then later on maturityDate is shown to be:
"maturityDate":{
"date":"0014-02-01T00:00:00-0800",
"localFormat":"dd/MM/yyyy"
},
And then in another example from getUserTransactionCategories
{
"categoryId":31,
"categoryName":"Retirement Income",
"transactionCategoryTypeId":2,
"isBudgetable":1,
"localizedCategoryName":"Retirement Income",
"isHidden":false,
"categoryLevelId":3
},
Based on that I would think all possible data elements are there.
But then there is another one which introduces the childCategory data element
{
"categoryId":2,
"categoryName":"Automotive Expenses",
"isDeleted":0,
"transactionCategoryTypeId":4,
"isBudgetable":1,
"localizedCategoryName":"Automotive Expenses",
"isHidden":false,
"categoryLevelId":3,
"childCategory":[
{
"categoryId":5641,
"categoryName":"1_SubCategory1",
"categoryDescription":"Subcategory desc1",
"isDeleted":0,
"isBudgetable":0,
"localizedCategoryName":"1_SubCategory1",
"isHidden":false,
"parentCategoryId":2,
"categoryLevelId":4
}
}
Thanks!
Yodlee team is working on to get this details documented, this is a time taking process and will be soon available over their portal. Meanwhile, is there any specific field or API response for which you are looking to get all the child elements which will help you out without blocking your integration?

Instagram API error

I using Instagram API to get user info
api = InstagramAPI(access_token=access_token)
profile = api.user(user_id="kallaucyahoocojp") # I try to put output data to profile variable here
And I get the below error:
DownloadError: Unable to fetch URL: https://api.instagram.com/v1/users/kallaucyahoocojp.json?access_token=(u'1191812153.f78cd79.d2d99595c79d4c23a7994d85ea0d412c', {u'username': u'kallaucyahoocojp', u'bio': u'\u30c4\u30a4\u30c3\u30bf\u30d5\u30a9\u30ed\u30ef\u30fc\u5897\u52a0\u30b5\u30fc\u30d3\u30b9', u'website': u'http://twitter\u30d5\u30a9\u30ed\u30ef\u30fc.jp', u'profile_picture': u'http://images.ak.instagram.com/profiles/anonymousUser.jpg', u'full_name': u'Kallauc', u'id': u'1191812153'})
Can anybody help me to fix it?
You need to pass the numeric-based user id, rather than the username. For example, instead of passing kallaucyahoocojp, you might pass 1234 if t
Here's how to get the ID if you don't have it:
Search for the instagram user id using this endpoint. In the python api:
api.user_search(q="kallaucyahoocojp", count=100)
Check the results for an exact string match on each user name while iterating through the results (calling .lower() to be sure to ignore potential case issues).
If you don't find the user in the first page of results, call to the next page using the max id returned.
Get the user id object from the returned from the matching users search result, then call your original function again with the numeric id.
A couple of very important notes:
Notice that I called the search function for users with a count of 100. You can pick any number, but contrary to other SO posts, the first user is not always the user you want in a search. The search can and will match partials, and not always according to an exact match first. How do I know? I have production instagram apps. I will qualify and say that usually the results are in the first 2-3 matches. Decide what is cheaper; repeated API calls that bring you closer to the limit, or 1 large bulk call where you are certain to get all the results.
The python Instagram API last I checked does a terrible job returning paging information. You actually get the paging URL which defeats the purpose of the python API itself to get additional pages. Your options are extract the next id parameter from the URL using urlparse or something similar, or fix the API to return the paging data as an object per the json (I've done both). What happens is the API itself is discarding part of the json and only giving you the URL which normally you don't want/need.
In your example, here's the search response:
{
"meta": {
"code": 200
},
"data": [
{
"username": "kallaucyahoocojp",
"bio": "ツイッタフォロワー増加サービス",
"website": "http://twitterフォロワー.jp",
"profile_picture": "http://images.ak.instagram.com/profiles/anonymousUser.jpg",
"full_name": "Kallauc",
"id": "1191812153"
}
]
}
Revising your call:
api = InstagramAPI(access_token=access_token)
profile = api.user(user_id="1191812153")
I should note that you may not need to call the user call if you did a search because you may simply have all the info you need. It will depend on what you are doing of course, so I am giving you the general method to use the rest of the user api.
For extracting profile info using Instagram API, userid is required.
The endpoint for extracting userID:
https://api.instagram.com/v1/users/search?q=[username]&access_token=[HERE]
The endpoint for extracting profile info:
https://api.instagram.com/v1/users/[userid]/?access_token=[HERE]
Note that before extracting information, check the login permissions for your access token.

How to add content and moreDetailsUrl for Google Search suggest?

I'm using GSA (version 6.14) and we would like to get an auto suggest function on our website. Works fine for basic requests, but it seems the GSA offers more functionality when you would be using user-added results. However, I can find nowhere a reference on how to add user-added results.
This is what the information tells me today :
/suggest?q=<query>&max=<num>&site=<collection>&client=<frontend>&access=p&format=rich
should return a response as below :
{
"query": "<query>",
"results": [
{ "name": "<term 1>", "type": "suggest"},
{ "name": "<term 2>", "type": "suggest"},
{ "name": "<term 3>", "type": "uar", "content": "Title of UAR",
"moreDetailsUrl": "URL of UAR"}
]
}
I am able to get results as the first 2 lines, but would like to get results as the last line also, so with content and a moreDetailsUrl. So maybe a very stupid question but I am not able to find the answer anywhere : How and where do I add this UAR ?
I actually want to understand if it's feasible to get metadata into the content part of the JSON, so if for instance an icon meta is available I'd like to have it included in the JSON so I can enrich my search results.
User Added Results are a OneBox that can be added to multiple frontends. See this: https://developers.google.com/search-appliance/documentation/614/admin_searchexp/ce_improving_search#uar
When done with Suggest, the data is fed from user entering 'keymatches' directly. What's different about them is that they are a direct link versus a suggested query. If you use the out of the box experience, you'll click a link to the url instead of running another query.