Wikipedia API - Searching by section - api

I am trying to search by section with the Wikipedia API.
What I already know:
For the below:
https://en.wikipedia.org/w/api.php?format=xml&action=query&prop=revisions&titles=Game_of_Thrones_(season_1)&rvprop=content&rvsection=0
I know the rvsection=0 will give me section 0 of the Wikipedia page and I can change this to get different sections of the page eg. 1,2,3.
What I am wondering is how/if I can search via section name? Eg. In the link above on the Wikipedia page there is a section named "Episodes", how can I search for this, so I get all the content from this section.
If this is not possible, is there a work around for this? What I am wanting to do is get Episode information from different Wikipedia pages.

I have done some more researching into this and have sort of found a solution.
If we want to get a certain section then we need to query this information with the API below:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI_PAGE)&prop=sections&disabletoc=1
This will give us JSON info about names of each section.
Once we have section info, use the parse API to get the wikitext. If we want the HTML, we can change prop to text:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI)&prop=wikitext&section=(SECTION #)&disabletoc=1
As a result, we get the specific section we want formatted in JSON. The next step for me is sorting this and trying to get this HTML/wiki text into plain text.

Related

Get wikitext from wikipedia API?

I'm looking at the API documentation here,
https://www.mediawiki.org/wiki/API:Query
Getting the wikitext for a page is mentioned in the beginning of the documentation,
The action=query module allows you to get information about a wiki and the data stored in it, such as the wikitext of a particular page, the links and categories of a set of pages, or the token you need to change wiki content.
but I cant seem to figure out what parameters to pass in the API request to return the wikitext for a given page. Anyone know how to do this?
I've tried parameters like,
{'action':'query', 'titles':'Anarchism', 'prop':'wikitext', 'format':'json'}
You must use this query .
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Anarchism&rvslots=main

How can I get a random article in a specific category from the Wikipedia API?

This is my link for getting one random article using Wiki API:
https://en.wikipedia.org/w/api.php?%20format=json&action=query&prop=extracts&exsentences=2&exintro=&explaintext=&generator=random&grnnamespace=0
I need to get from it the first two sentences of the first section, and it works pretty well.
I want to use this kind of link and search this random article in a specific category. This is what I have tried after searching online:
https://en.wikipedia.org/w/api.php?%20format=json&action=query&prop=extracts&exsentences=2&exintro=&explaintext=&generator=random&grnnamespace=0&cmtitle=Category:Music
(I have added this part to the original link: cmtitle=Category:Music )
It doesn't work for me.
It gets the random article like the first link (not under a wanted category, which is Music in this link).
There is no API to get a random category member (and using a parameter from some unrelated API module is certainly not going to help). You could screen scrape Special:RandomInCategory (or turn it into an API module - patches welcome :)
try to use cmlimit to get all of the catgeorymembers, then use a programming language, like Python to request the page, then store every catgeory in an array, and use the random module to get a random catgeorymember from the array you stored them in. then you can use it in a link to get the specific page for the categorymember or anything else that you need.

get all sections separately with wikimedia api

I try to get all seperate sections of a wikipedia article through the api.
I know already :
Howto retrieve a complete text :
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=house&rvprop=content
Howto retrieve a specific section of the text:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=house&rvprop=content&rvsection=0
Howto retrieve all sections seperately with one request ? (JSON Array for example)
What you ask is called parsing, because it requires interpretation of the wikitext source to split the page by sections etc. So the solution is given in https://www.mediawiki.org/wiki/API:Parsing_wikitext
1) Get the list of sections: https://www.mediawiki.org/w/api.php?action=parse&page=API:Parsing_wikitext&prop=sections
2) Ask the parsed wikitext of that section: https://www.mediawiki.org/w/api.php?action=parse&page=API:Parsing_wikitext&section=1&prop=text
I realize this question was asked four years ago, so possibly the following was not available then:
You can use the REST API described here: https://www.mediawiki.org/wiki/REST_API
The REST endpoints are described/documented here: https://en.wikipedia.org/api/rest_v1/#/
The mobile-sections endpoint (intended for consuming info for a mobile device) gives you a nice breakdown with headings, which sounds like what you are asking for.
Alternatively, the metadata endpoint returns a toc (table of contents) section which contains the same breakdown of headings.
Here is an example URL, fetching the mobile sections for the "Egyptian pyramids" page:
https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Egyptian_pyramids
The advantage is that the response is in JSON format (which is what you were asking for).

Accessing full url of all page images Wikipedia API

I'm experimenting with the Wikipedia API and was trying to get the full urls for all images on a particular page, in this example Google's main page (http://en.wikipedia.org/wiki/Google).
I found the page id through the use of another API and then attempted to use this information in the following API to get the full urls of all images on that page:
http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml
I get some of the page images from this but cannot understand why I am not getting all - specifically the logo which is what I was most interested in. Apologies I am aware that there are similar questions which have been asked but I was not able to find one which would assist me here.
The API does not give you all results at once, it defaults to 10 results. You see in the beginning answer that you have a value for the parameter gimcontinue. If you use it like this you get more images: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimcontinue=1092923|Google_bike.jpg
Alternatively, you can ask for more images at once using gimlimit like this: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml&gimlimit=500

Extracting information from specific template on wikimedia api

I'm wondering what the easiest way to extract only the information contained withing a certain template would be using the wikimedia api.
I'd like to extract the information contained in the template "Template:Mycomorphbox" for this page: http://en.wikipedia.org/wiki/Amanita_phalloides
I'm a bit frustrated that it seems like I have to pull the entire content of the page to get the information that I need. Surely there has to be a better way.
Indeed there is a better way. You must not extract information from templates (or from wikitext in general). That's not your job nor your application's, it's MediaWiki's.
Use Wikidata, which is where the structured information from and for Wikipedia is stored. See the Wikibase API documentation and see some of the properties used for biology stuff or ask if something is unclear.