Get wikitext from wikipedia API? - wikipedia-api

I'm looking at the API documentation here,
https://www.mediawiki.org/wiki/API:Query
Getting the wikitext for a page is mentioned in the beginning of the documentation,
The action=query module allows you to get information about a wiki and the data stored in it, such as the wikitext of a particular page, the links and categories of a set of pages, or the token you need to change wiki content.
but I cant seem to figure out what parameters to pass in the API request to return the wikitext for a given page. Anyone know how to do this?
I've tried parameters like,
{'action':'query', 'titles':'Anarchism', 'prop':'wikitext', 'format':'json'}

You must use this query .
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Anarchism&rvslots=main

Related

Wikipedia API - Searching by section

I am trying to search by section with the Wikipedia API.
What I already know:
For the below:
https://en.wikipedia.org/w/api.php?format=xml&action=query&prop=revisions&titles=Game_of_Thrones_(season_1)&rvprop=content&rvsection=0
I know the rvsection=0 will give me section 0 of the Wikipedia page and I can change this to get different sections of the page eg. 1,2,3.
What I am wondering is how/if I can search via section name? Eg. In the link above on the Wikipedia page there is a section named "Episodes", how can I search for this, so I get all the content from this section.
If this is not possible, is there a work around for this? What I am wanting to do is get Episode information from different Wikipedia pages.
I have done some more researching into this and have sort of found a solution.
If we want to get a certain section then we need to query this information with the API below:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI_PAGE)&prop=sections&disabletoc=1
This will give us JSON info about names of each section.
Once we have section info, use the parse API to get the wikitext. If we want the HTML, we can change prop to text:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI)&prop=wikitext&section=(SECTION #)&disabletoc=1
As a result, we get the specific section we want formatted in JSON. The next step for me is sorting this and trying to get this HTML/wiki text into plain text.

Missing enrollment terms in Canvas LMS API

I am currently doing some work with the Canvas LMS REST API and have run into an issue when trying to retrieve a list of all enrollment terms defined in the system. When viewing the terms in the online system, I can see all the terms that have been created, from the first ones up to the furthest defined semester. However, when I try to get a list of terms using
GET /api/v1/accounts/:account_id/terms
I only receive a list of 10 terms, while the rest are missing. Does anyone know what could be causing this?
Additionally, is there a difference between a Term and an EnrollmentTerm object? I only see API calls for EnrollmentTerm objects, while a Term seems to be a subset of the data contained in an EnrollmentTerm that only gets passed within a Course. Could someone explain if there is an important difference here, and what I may be missing?
Lastly, could anyone point me towards some information about error codes that are returned from an API call? For example, when I use
POST /api/v1/accounts/:account_id/terms
with some associated parameters, I get a 400 bad request response. When the parameters are incorrectly named, I get a 500 response instead. Any guidance on this matter would be very helpful.
Let me know if there is anything I can do to help clarify things. Thanks for your help!
I got into contact with Canvas developers and found out that this was caused by how they paginate their API responses. Their default cap appears to be at 10 per response, but this can be extended up to 100 by adding ?per_page=100 at the end of the query like so:
POST /api/v1/accounts/:account_id/terms?per_page=100
Additional pages can be retrieved using the URLs returned in the Link header of the response. More info on that can be found here.
An example Link header would be:
<https://<canvas>/api/v1/accounts/:account_id/terms?page=1&per_page=10>; rel="current",
<https://<canvas>/api/v1/accounts/:account_id/terms?page=2&per_page=10>; rel="next",
<https://<canvas>/api/v1/accounts/:account_id/terms?page=1&per_page=10>; rel="first",
<https://<canvas>/api/v1/accounts/:account_id/terms?page=10&per_page=10>; rel="last"
The URLs in the Link header are only included when they are relevant, so the first page will not return a "prev" link and the last page will not return a "next" link, for example.

Accessing full url of all page images Wikipedia API

I'm experimenting with the Wikipedia API and was trying to get the full urls for all images on a particular page, in this example Google's main page (http://en.wikipedia.org/wiki/Google).
I found the page id through the use of another API and then attempted to use this information in the following API to get the full urls of all images on that page:
http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml
I get some of the page images from this but cannot understand why I am not getting all - specifically the logo which is what I was most interested in. Apologies I am aware that there are similar questions which have been asked but I was not able to find one which would assist me here.
The API does not give you all results at once, it defaults to 10 results. You see in the beginning answer that you have a value for the parameter gimcontinue. If you use it like this you get more images: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimcontinue=1092923|Google_bike.jpg
Alternatively, you can ask for more images at once using gimlimit like this: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimlimit=500

How does api archive.org works?

As you surely know web.archive.org lets you inspect the history of a domain, ie:http://web.archive.org/web/*/besttatoo.com
I also has an API: http://archive.org/help/json.php
I need to get data from the API but I can't get many info on how to use it, has anyone used it and can paste some examples of use?
This link provides details about the item LovingU on archive.org:
http://archive.org/details/LovingU&output=json
To create an API query to your liking, use this page:
https://archive.org/advancedsearch.php#raw
That page allows you to choose your output format: JSON, XML, HTML, CSV or RSS and also the parameters your want to see. You can limit the number of results, too.

How to get wiki template's content?

Does anybody know how to get access to the template's body inside the page?
I'm familiar with the API that returns the list of ALL templates that exist on the page, but how I can get access to the template's body? Is there any API for this? For now I see only one possible way... parse it manually. Am I wrong?
You can use the expandtemplates API call, or the rvexpandtemplates parameter for the revisions API call.
This is an old question, but it helped me figure out how to fetch a mediawiki page with template macros expanded. Very useful if you are doing a conversion.
<MW_BASEURL>/api.php?action=query&prop=revisions
&titles=<url_encoded_page_title>&format=xml&rvprop=content&rvexpandtemplates
I am parsing the xml returned from this query to get the expanded page.