Wikipedia API - grab 'Background Inforamtion' Table? - api

Does MediaWiki provide a way to return the information present in 'Background Information' Table? (usually right of the article page) For example I would like to grab the Origin from Radiohead:
http://en.wikipedia.org/wiki/Radiohead
Or do I need to parse the html page?

You can use the revisions property along with the rvgeneratexml parameter to generate a parse tree for the article. Then you can apply XPath or traverse it and look for the desired information.
Here's an example code:
$page = 'Radiohead';
$api_call_url = 'http://en.wikipedia.org/w/api.php?action=query&titles=' .
urlencode( $page ) . '&prop=revisions&rvprop=content&rvgeneratexml=1&format=json';
You have to identify yourself to the API, see more on Meta Wiki.
$user_agent = 'Your name <your email>';
$curl = curl_init();
curl_setopt_array( $curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERAGENT => $user_agent,
CURLOPT_URL => $api_call_url,
) );
$response = json_decode( curl_exec( $curl ), true );
curl_close( $curl );
foreach( $response['query']['pages'] as $page ) {
$parsetree = simplexml_load_string( $page['revisions'][0]['parsetree'] );
Here we use XPath in order to find the Infobox musical artist's parameter Origin and its value. See the XPath specification for the syntax and such. You could as well traverse the tree and look for the nodes manually. Feel free to investigate the parse tree to get a better grip of it.
$infobox_origin = $parsetree->xpath( '//template[contains(string(title),' .
'"Infobox musical artist")]/part[contains(string(name),"Origin")]/value' );
echo trim( strval( $infobox_origin[0] ) );
}

MediaWiki as installed on Wikipedia provides no way to get this information (there are extensions such as Semantic MediaWiki that are designed for this sort of thing, but they are not installed on Wikipedia). You can either parse the output HTML or parse the page's wikitext, or in certain cases (e.g. birth/death year) you might be able to look at the page's categories via the API.

It's a steep learning curve but DBpedia does what you want.
The "Background information table" you mention is called an "Infobox" in Wikipedia parlance and DBpedia allows very powerful queries on them. Unfortunately because it's powerful it's not easy to learn and I've mostly forgotten what I learned about it a year or two ago. I'll paste a query here though if I manage to learn it again (-:
In the meantime, here is DBpedia's idea of an introduction in how to use it.
This previous SO question will help: Getting DBPedia Infobox categories
UPDATE
OK here is the SPARQL query:
SELECT ?org
WHERE {
<http://dbpedia.org/resource/Radiohead> dbpprop:origin ?org
}
Here is a URL where you can see it working and play with it.
And here is the output on that page: (you can get output in various formats too)
SPARQL results: org "Abingdon,
Oxfordshire, England"#en

Related

Symfony2 API test POST using YAML/Faker

I'm building a REST API using Symfony2. I am already using Liip bundle for my functional tests together with Alice and Faker to genereate all the fixtures. However, I have little trouble when I want to directly test POST calls themselves as long JSON are included in the POST data, which made my functions quite long, ugly and unreadable.
I decided to move the fake JSON out of the class, converting them to YAML files and then loading them using Symfony's parser:
private function loadYaml($resource){
$data = Yaml::parse(file_get_contents('src/AppBundle/DataFixtures/YAML/' . $resource . '.yml'));
return $data;
}
This seems to work quite well, since I can easily convert them back to JSON objects and then use it in the call:
$postData = json_encode($this->loadYaml('newapplication'));
$this->client->request(
'POST',
'/api/application/save/',
array('data' => $postData), // The Request parameters
array(), // Files
array(),
'mybody', // Raw Body Data
true
);
My first question is: is this a right approach? Is there any bundle that I have missed which will make my life much easier?
My second question is wheter it will be possible to use Faker within this YAML constructions. On my fixtures, I call Faker functions (e.g. < firstName() >) that when fixtures are loaded automatically fill my entities with random but meaningful values. Would it be possible to use them in these YAML constructions?
Thanks a lot! ;)
For your question about bundle, WebTestCase from Symfony\Bundle\FrameworkBundle\Test\WebTestCase is really nice to do test on REST API in Symfony project.
In POST, data are in body and not has parameter. (How are parameters sent in an HTTP POST request?)
Try
$this->client->request(
'POST',
'/api/application/save/',
array(), // The Request parameters
array(), // Files
array(),
$postData, // Raw Body Data
true
);

Structure of a RESTful API

I want to build a RESTful API for my small project. There are three simple resources that I have:
- Categories (id, title)
- Posts (id, text, category_id)
- Comments (id, text, post_id)
These are the end points that I need:
GET /categories/ => list of all categories
GET /categories/:id/posts => list of riddles in specified category
GET /posts/:id => get single post
GET /posts/:id/comments => list of comments for specified post
GET /comments/:id => get single comment
POST /posts/:id/comments => create a comment (text comes from POST params)
Is this a good structure for API in this case?
Is this consider to be a RESTful API?
REST doesn't have anything to say about URI structure, so it's not really meaningful to ask if your endpoints are RESTful.
As far as the design, I would consider this instead:
GET /categories
GET /posts?categoryId=<categoryId> -- or you could use category name, if the name is not the same as the id
GET /posts/<postId>
GET /comments?postId=<postId>
GET /comments/<commentId>
POST /comments
{ "postId" : 123, ... }
According to REST, url should uniquely identify the resource which is happening in your case. As long as your url is Cacheable and you are using correct verbs and correct status codes, do not indulge in too much quabble about url structure. Additionally, you might want to look into 'Hypermedia', if you want your apis to be truly restful

MailChimp API Get Subscribers ammount

I have been looking at the mailchimp api, and am wondering how to display the live ammount of subscribers to a list, is this possible? And is it possible to have this counter LIVE? I.e as users join, the number increases in real time?
EDIT:
I have been getting used to the API slightly...
after using Drewm's mailchimp php wrapper its starting to make more sense...
I have so far
// This is to tell WordPress our file requires Drewm/MailChimp.php.
require_once( 'src/Drewm/MailChimp.php' );
// This is for namespacing since Drew used that.
use \Drewm;
// Your Mailchimp API Key
$api = 'APIKEY';
$id = 'LISTID';
// Initializing the $MailChimp object
$MailChimp = new \Drewm\MailChimp($api);
$member_info = $MailChimp->call('lists/members', array(
'apikey' => $api,
'id' => $id // your mailchimp list id here
)
);
But not sure how to display these values, it's currently just saying 'array' when I echo $member_info, this maybe completly because of my ignorance in PHP. Any advice to s
I know this may be old, but maybe this will help someone else looking for this. Latest versions of API and PHP Files.
use \DrewM\MailChimp\MailChimp;
$MailChimp = new MailChimp($api_key);
$data = $MailChimp->get('lists');
print_r($data);// view output
$total_members = $data['lists'][0]['stats']['member_count'];
$list_id = $data['lists'][0]['id'];
$data['lists'][0] = First list. If you have more, then it would be like $data['lists'][1] ect...
And to get a list of members from a list:
$data = $MailChimp->get("lists/$list_id/members");
print_r($data['members']);// view output
foreach($data['members'] as $member){
$email = $member['email_address'];
$added = date('Y/m/d',strtotime($member['timestamp_opt']));
// I use reverse dates for sorting in a *datatable* so it properly sorts by date
}
You can view the print_r output to get what you want to get.

Drupal: Display a list of contribution (posted content) for each user

I’m searching for a way to display on a member profile page, the number of contributions in some content types. Basically it has to display something like this:
Blog(10)
Articles(10)
Questions(19)
Comments(30)
Tips(3)
I’ve installed some different modules (like “user stats”) that I though could help me but haven’t been successful.
I’m wondering if it would be easiest just to hard-code it into my template file by starting taking the uid and just run some queries with the content types I want to display but I’m not sure on how to do that either.
Any help og suggestions would be very much appreciated.
Sincere
- Mestika
Edit:
I found a solution to do it manually with a query for each content type but I'm still very interested in a solution that's more elegant and smoother.
I use this code:
global $user;
$userid = $user->uid;
$blog_count = db_result(db_query("SELECT COUNT(0) AS num FROM {node} n where n.type = 'blog' AND status = 1 AND n.uid = {$userid}"));
If you are using the core Profile module, you could use something like below. It will show the nodes created by the user whose profile is being viewed. As an added benefit, it only needs to execute one custom database query.
Insert this snippet into template.php in your theme's folder and change "THEMENAME" to the name of your theme:
function THEMENAME_preprocess_user_profile(&$variables) {
// Information about user profile being viewed
$account = $variables['account'];
// Get info on all content types
$content_types = node_get_types('names');
// Get node counts for all content types for current user
$stats = array();
$node_counts = db_query('SELECT type, COUNT(type) AS num FROM {node} WHERE status = 1 AND uid = %d GROUP BY type', $account->uid);
while ($row = db_fetch_array($node_counts)) {
$stats[] = array(
'name' => $content_types[$row['type']],
'type' => $row['type'],
'num' => $row['num'],
);
}
$variables['node_stats'] = $stats;
}
Now, in user-profile.tpl.php can add something similar to:
// If user has created content, display stats
<?php if (count($node_stats) > 0): ?>
// For each content type, display a DIV with name and number of nodes
<?php foreach ($node_stats as $value): ?>
<div><?php print $value['name']; ?> (<?php print $value['num']; ?>)</div>
<?php endforeach; ?>
// Message to show for user that hasn't created any content
<?php else: ?>
<?php print $account->name; ?> has not created any content.
<?php endif; ?>
This is just a general idea of what you can do. You can also add restrictions to the content types you look for/display, check permissions for users to see these stats, use CSS to change the look of the stats, etc.
If you are using Content Profile, you could use THEMENAME_preprocess_node() and check that the node is a profile node before executing this code.
Given your simple requirement and the fact that you have the SQL statement in-hand, I'd say just use that. There's no reason to add yet another module to your site and impact it's performance for the sake of a single query.
That said, from a "separation of concerns" standpoint, you shouldn't just drop this SQL in your template. Instead, you should add its result to the list of available variables using a preprocess function in your template.php file, limiting its scope to where you need it so you're not running this database query on any pages but the appropriate profile page.

How to use Wikipedia API to get the page view statistics of a particular page in wikipedia?

The stats.grok.se tool provides the pageview statistics of a particular page in wikipedia. Is there a method to use the wikipedia api to get the same information? What does the page views counter property actually mean?
The Pageview API was released a few days ago: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}
https://wikimedia.org/api/rest_v1/?doc#/
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageview_API
For example https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Foo/daily/20151010/20151012 will give you
{
"items": [
{
"project": "en.wikipedia",
"article": "Foo",
"granularity": "daily",
"timestamp": "2015101000",
"access": "all-access",
"agent": "all-agents",
"views": 79
},
{
"project": "en.wikipedia",
"article": "Foo",
"granularity": "daily",
"timestamp": "2015101100",
"access": "all-access",
"agent": "all-agents",
"views": 81
}
]
}
No, there is not.
The counter property returned from prop=info would tell you how many times the page was viewed from the server. It is disabled on Wikipedia and other Wikimedia wikis because the aggressive squid/varnish caching means only a tiny fraction of page views would make it to the actual server in order to affect that counter, and even then the increased database write load for updating that counter would probably be prohibitive.
The stats.grok.se tool uses anonymized logs from the cache servers to calculate page views; the raw log files are available from http://dammit.lt/wikistats. If you need an API to access the data from stats.grok.se, you should contact the operator of stats.grok.se to request one be created.
Note this was written 4 years ago, and an API has since been created (see this answer). There's not yet a way to access that via api.php, though.
get the daily JSON for the last 30 days like this
http://stats.grok.se/json/en/latest30/Britney_Spears
You can look into the stats here.
Have anyone experienced some API to get the Pageview Stats?
Furthermore, I have also looked into the available Raw Data but could not find the solution to extract the Pageview Count.
There doesn't seem to be any API; however, you can make HTTP requests to stats.grok.se and parse the HTML or JSON result to extract the page view counts.
I created a website http://wikipediaviews.org that does exactly that in order to facilitate easier comparison for multiple pages across multiple months and years. To speed things up, and minimize the number of requests to stats.grok.se, I keep all past query results stored locally.
The code I used is available at http://github.com/vipulnaik/wikipediaviews.
The file with the actual retrieval code is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/pageviewqueries.inc
function getpageviewsonline($page, $month, $language)
{
$url = getpageviewsurl($page,$month,$language);
$html = file_get_contents($url);
preg_match('/(?<=\bhas been viewed)\s+\K[^\s]+/',$html,$numberofpageviews);
return $numberofpageviews[0];
}
The code for getpageviewsurl is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/stringfunctions.inc:
function getpageviewsurl($page,$month,$language)
{
$page = str_replace(" ","_",$page);
$page = str_replace("'","%27",$page);
return "http://stats.grok.se/" . $language . "/" . $month . "/" . $page;
}
PS: In case the link to wikipediaviews.org doesn't work, it's because I registered the domain quite recently. Try http://wikipediaviews.subwiki.org instead in the interim.
em.. this question was asked 6 years ago. There's no such an API in official site in the past.
It changed.
A simple example:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageviews&titles=Buckingham+Palace%7CBank+of+England%7CBritish+Museum
See document:
prop=pageviews
Shows per-page pageview data (the number of daily pageviews for each of the last pvipdays days). The result format is page title (with underscores) => date (Ymd) => count.